*HTML_PROCESS Function
Syntax
Arguments
- html
The HTML text to process.
- format
Character. Indicates which type of data to search for.
- "K"
Kind of HTML tag (i.e. TABLE, INPUT, %A5)
- "N"
Name (Type + Name or id)
- "H"
Hierarchy (Parents + Type + Name or id)
- "U"
Guaranteed Unique name (HIERARCHY + optional counter)
- "T"
Contents of Tag (including <> characters).
- "X"
Embedded Xbasic code (for a5w tags).
- "E"
Trimmed Embedded Xbasic code (useful for expressions - which we don't want leading/trailing CR-LF's in)
- "I"
inner HTML (inside a block tag)
- "O"
Outer HTML (include outside tags of block, or just the tag itself if the tag is not a block)
- options
Indicates how to process the data that is found.
- "A"
process A5W tags only
- "E"
process end tags as well as start tags
- "M"
output merge - include all code not
- "I"
make greps in search case insensitive
- "T"
merge resumes after the current matching tag
- "B"
merge resumes after the current tag block
- patterns
A CR-LF delimited list of search options required to generate a match:
- "<Format>=<value>"
Match an exact string.
- "<Format>!=<value>"
Find strings that don't match.
- "<Format>$<value>"
Find strings that match a regular expression.
- "<Format>!$<value>"
Find strings that don't match a regular expression.
Description
Html processor.
Discussion
The *HTML_PROCESS() function returns information about a HTML file, such as: the contents of a tag the IDs of all the controls of a type an outline list of all HTML tags a named tag Xbasic code that matches a regular expression a version of the HTML without a tag a version of the HTML without a tag block a version of the HTML without Xbasic expressions that match a pattern its title, a list of input controls on the page.
- Example
- Search String
- K = INPUT
Process all html tags of type INPUT.
- E = ?firstname
Process all html a5w tags where trimmed expression is ?firstname.
- E$\?x_customer\.body.?
Process all html a5w tags that match the regular expression $\?x_customer\.body.?.
- U=HTML|BODY|TABLE#1
Process object that matches unique name HTML|BODY|TABLE#1.
Example
Retrieve the contents of a tag
Example 1 uses *html_process() to extract information from HTML. In the interactive window create an HTML string and extract the inner text of the title tag.
html = "<html><title>The html pages title</title><body></body></html>" ? *html_process(html, "I", "", "K=TITLE") = "The html pages title"
The result is the inner html ( Format: "I") of tags of type TITLE ( Search: "K = TITLE").
Retrieve the IDs of all the controls of a type
Example 2 retrieves the IDs or names of all the input controls on an html page.
html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>" ? *html_process(html, "N" + crlf() , "", "K=INPUT") = input:firstname input:lastname
Retrieve an outline list of all HTML tags
Example 3 generates an outline list of tags in the HTML text, which could be used to build a simple DOM browser.
html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>" ? *html_process(html,"U"+crlf() ) = html html|title html|body html|body|input:firstname html|body|input:lastname
This example visits all tags (except end tags) and a returns a hierarchical unique name for each.
Retrieve a named tag
Example 4, given a unique name, example 4 extracts a tag.
html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>" ? *html_process(html, "T", "", "U=html|body|input:firstname") = <input id="firstname">
Combined with example 3, this example could be used to browse tags.
Retrieve Xbasic code that matches a regular expression
Example 5 extracts embedded Xbasic that matches a regular expression.
html= "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>" ? *html_process(html,"E"+crlf() ,"A","E$.+customer.+") = ? x_customer.body.title ? x_customer.body.detail
The "A" option indicates that only embedded Xbasic tags should be processed. The "E$" search denotes a regular expression to match against expressions. This makes use of the regular expression merge functionality.
Remove a tag
Example 6 removes a tag from HTML.
html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>" ? *html_process(html,"","M","U=html|body|input:firstname") = <html><title>The html pagestitle</title><body><input id="lastname"></body></html>
When the merge flag is used ( Option: "M") and the format is NULL, *html_process() deletes the tag.
Remove a tag block
Example 7 removes a tag block from HTML.
html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>" ? *html_process(html,"","MB","U=html|title") = <html><body><input id="firstname"><input id="lastname"></body></html>
When the merge and block flags are used ( Option: "MB") and the format is NULL, *html_process() deletes the block.
Remove Xbasic expressions that match a pattern
Example 8 removes expressions that match a pattern.
html = "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>" ? *html_process(html, "", "AM", "E$.+customer.+") = "<html><body><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"
Since the format is empty, all matching expression tags get eliminated from the html.
See Also