Parent | Child | Previous | Next

Xbasic

*HTML_PROCESS Function

IN THIS PAGE

Example
Retrieve the contents of a tag
Retrieve the IDs of all the controls of a type
Retrieve an outline list of all HTML tags
Retrieve a named tag
Retrieve Xbasic code that matches a regular expression
Remove a tag
Remove a tag block
Remove Xbasic expressions that match a pattern

Syntax

Processed_HTML as C = *HTML_PROCESS(C html,C format[,C options[,C patterns]])

Arguments

html

The HTML text to process.

format

Character. Indicates which type of data to search for.

"K": Kind of HTML tag (i.e. TABLE, INPUT, %A5)
"N": Name (Type + Name or id)
"H": Hierarchy (Parents + Type + Name or id)
"U": Guaranteed Unique name (HIERARCHY + optional counter)
"T": Contents of Tag (including <> characters).
"X": Embedded Xbasic code (for a5w tags).
"E": Trimmed Embedded Xbasic code (useful for expressions - which we don't want leading/trailing CR-LF's in)
"I": inner HTML (inside a block tag)
"O": Outer HTML (include outside tags of block, or just the tag itself if the tag is not a block)

options

Indicates how to process the data that is found.

"A": process A5W tags only
"E": process end tags as well as start tags
"M": output merge - include all code not
"I": make greps in search case insensitive
"T": merge resumes after the current matching tag
"B": merge resumes after the current tag block

patterns

A CR-LF delimited list of search options required to generate a match:

"<Format>=<value>": Match an exact string.
"<Format>!=<value>": Find strings that don't match.
"<Format>$<value>": Find strings that match a regular expression.
"<Format>!$<value>": Find strings that don't match a regular expression.

Description

Html processor.

Discussion

The *HTML_PROCESS() function returns information about a HTML file, such as: the contents of a tag the IDs of all the controls of a type an outline list of all HTML tags a named tag Xbasic code that matches a regular expression a version of the HTML without a tag a version of the HTML without a tag block a version of the HTML without Xbasic expressions that match a pattern its title, a list of input controls on the page.

Example: Search String
K = INPUT: Process all html tags of type INPUT.
E = ?firstname: Process all html a5w tags where trimmed expression is ?firstname.
E$\?x_customer\.body.?: Process all html a5w tags that match the regular expression $\?x_customer\.body.?.
U=HTML|BODY|TABLE#1: Process object that matches unique name HTML|BODY|TABLE#1.

Example

Retrieve the contents of a tag

Example 1 uses *html_process() to extract information from HTML. In the interactive window create an HTML string and extract the inner text of the title tag.

html = "<html><title>The html pages title</title><body></body></html>"
? *html_process(html, "I", "", "K=TITLE")
= "The html pages title"

The result is the inner html ( Format: "I") of tags of type TITLE ( Search: "K = TITLE").

Retrieve the IDs of all the controls of a type

Example 2 retrieves the IDs or names of all the input controls on an html page.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html, "N" + crlf() , "", "K=INPUT")
= input:firstname
input:lastname

Retrieve an outline list of all HTML tags

Example 3 generates an outline list of tags in the HTML text, which could be used to build a simple DOM browser.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"U"+crlf() )
= html
html|title
html|body
html|body|input:firstname
html|body|input:lastname

This example visits all tags (except end tags) and a returns a hierarchical unique name for each.

Retrieve a named tag

Example 4, given a unique name, example 4 extracts a tag.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html, "T", "", "U=html|body|input:firstname")
= <input id="firstname">

Combined with example 3, this example could be used to browse tags.

Retrieve Xbasic code that matches a regular expression

Example 5 extracts embedded Xbasic that matches a regular expression.

html= "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"
? *html_process(html,"E"+crlf() ,"A","E$.+customer.+")
= ? x_customer.body.title
? x_customer.body.detail

The "A" option indicates that only embedded Xbasic tags should be processed. The "E$" search denotes a regular expression to match against expressions. This makes use of the regular expression merge functionality.

Remove a tag

Example 6 removes a tag from HTML.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"","M","U=html|body|input:firstname")
= <html><title>The html pagestitle</title><body><input id="lastname"></body></html>

When the merge flag is used ( Option: "M") and the format is NULL, *html_process() deletes the tag.

Remove a tag block

Example 7 removes a tag block from HTML.

html = "<html><title>The html pagestitle</title><body><input id=\"firstname\"><input id=\"lastname\"></body></html>"
? *html_process(html,"","MB","U=html|title")
= <html><body><input id="firstname"><input id="lastname"></body></html>

When the merge and block flags are used ( Option: "MB") and the format is NULL, *html_process() deletes the block.

Remove Xbasic expressions that match a pattern

Example 8 removes expressions that match a pattern.

html = "<html><body><%a5 ? x_customer.body.title %><%a5 ? x_customer.body.detail %><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"
? *html_process(html, "", "AM", "E$.+customer.+")
= "<html><body><%a5 ? x_product.body.title %><%a5 ? x_product.body.detail %></body></html>"

Since the format is empty, all matching expression tags get eliminated from the html.