Xbasic

*HTML_PROCESS Function

Syntax

Processed_HTML as C = *HTML_PROCESS(C html,C format[,C options[,C patterns]])

Arguments

html

The HTML text to process.

format

Character. Indicates which type of data to search for.

"K"

Kind of HTML tag (i.e. TABLE, INPUT, %A5)

"N"

Name (Type + Name or id)

"H"

Hierarchy (Parents + Type + Name or id)

"U"

Guaranteed Unique name (HIERARCHY + optional counter)

"T"

Contents of Tag (including <> characters).

"X"

Embedded Xbasic code (for a5w tags).

"E"

Trimmed Embedded Xbasic code (useful for expressions - which we don't want leading/trailing CR-LF's in)

"I"

inner HTML (inside a block tag)

"O"

Outer HTML (include outside tags of block, or just the tag itself if the tag is not a block)

options

Indicates how to process the data that is found.

"A"

process A5W tags only

"E"

process end tags as well as start tags

"M"

output merge - include all code not

"I"

make greps in search case insensitive

"T"

merge resumes after the current matching tag

"B"

merge resumes after the current tag block

patterns

A CR-LF delimited list of search options required to generate a match:

"<Format>=<value>"

Match an exact string.

"<Format>!=<value>"

Find strings that don't match.

"<Format>$<value>"

Find strings that match a regular expression.

"<Format>!$<value>"

Find strings that don't match a regular expression.

Description

Html processor.

Discussion

The *HTML_PROCESS() function returns information about a HTML file, such as: the contents of a tag the IDs of all the controls of a type an outline list of all HTML tags a named tag Xbasic code that matches a regular expression a version of the HTML without a tag a version of the HTML without a tag block a version of the HTML without Xbasic expressions that match a pattern its title, a list of input controls on the page.

Example
Search String
K = INPUT

Process all html tags of type INPUT.

E = ?firstname

Process all html a5w tags where trimmed expression is ?firstname.

E$\?x_customer\.body.?

Process all html a5w tags that match the regular expression $\?x_customer\.body.?.

U=HTML|BODY|TABLE#1

Process object that matches unique name HTML|BODY|TABLE#1.

Example

Retrieve the contents of a tag

Example 1 uses *html_process() to extract information from HTML. In the interactive window create an HTML string and extract the inner text of the title tag.

html = "<html><title>The html pages title</title><body></body></html>"
? *html_process(html, "I", "", "K=TITLE")
= "The html pages title"

The result is the inner html ( Format: "I") of tags of type TITLE ( Search: "K = TITLE").

Retrieve the IDs of all the controls of a type

Example 2 retrieves the IDs or names of all the input controls on an html page.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html, "N" + crlf() , "", "K=INPUT")
= input:firstname
input:lastname

Retrieve an outline list of all HTML tags

Example 3 generates an outline list of tags in the HTML text, which could be used to build a simple DOM browser.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"U"+crlf() )
= html
html|title
html|body
html|body|input:firstname
html|body|input:lastname

This example visits all tags (except end tags) and a returns a hierarchical unique name for each.

Retrieve a named tag

Example 4, given a unique name, example 4 extracts a tag.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html, "T", "", "U=html|body|input:firstname")
= <input id="firstname">

Combined with example 3, this example could be used to browse tags.

Retrieve Xbasic code that matches a regular expression

Example 5 extracts embedded Xbasic that matches a regular expression.

html= "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"
? *html_process(html,"E"+crlf() ,"A","E$.+customer.+")
= ? x_customer.body.title
? x_customer.body.detail

The "A" option indicates that only embedded Xbasic tags should be processed. The "E$" search denotes a regular expression to match against expressions. This makes use of the regular expression merge functionality.

Remove a tag

Example 6 removes a tag from HTML.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"","M","U=html|body|input:firstname")
= <html><title>The html pagestitle</title><body><input id="lastname"></body></html>

When the merge flag is used ( Option: "M") and the format is NULL, *html_process() deletes the tag.

Remove a tag block

Example 7 removes a tag block from HTML.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"","MB","U=html|title")
= <html><body><input id="firstname"><input id="lastname"></body></html>

When the merge and block flags are used ( Option: "MB") and the format is NULL, *html_process() deletes the block.

Remove Xbasic expressions that match a pattern

Example 8 removes expressions that match a pattern.

html = "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"
? *html_process(html, "", "AM", "E$.+customer.+")
= "<html><body><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"

Since the format is empty, all matching expression tags get eliminated from the html.

See Also