Parsing XML Documents with Xbasic
- Parsing XML Documents
- Using the Xbasic XML Parser - A Tutorial
- Running Queries against the XML Data
- Element Queries
- Attribute Queries
- XML Parser Expression Reserved Words
- *elementId
- *value
- *xml
- *istop
- *tagnumber
- *tagcount
- *isleaf
- *depth
- *marked
- *fullname
- *tag
- Marking Elements in the XML Tree
.MarkElements() .UnmarkAllElements() .UnmarkElements() .DeleteMarked() .MoveMarkedAfter() .MoveMarkedBefore() .MoveMarkedInside() - XML Helper Functions
Description
Alpha Anywhere has a powerful XML parser built-in that can be used as an alternative to the Microsoft XML parser. The advantage of the Xbasic XML parser is that it can use all of the powerful string functions in Xbasic. The Microsoft XML parsers are more complex to use because you have to use OLE Automation.
Parsing XML Documents
With the Alpha Anywhere XML parser you can:
Extract information from XML data
Transform XML data (much like an XSLT)
Add elements and attributes to XML data
Remove elements and attributes from XML data
Change attribute values in XML data
Reorder elements in XML data
Using the Xbasic XML Parser - A Tutorial
Assume that you have an XML file with the following data in it:
<employee> <name city="Boston"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name> <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
The following sample Interactive window session shows some of the features of the XML parser:
'Get a new instance of the XML parser sm = xmlSchemaManager.Get() 'Load the XML file into a variable xml = get_from_file("c:\data\testxml.xml") 'Load the XML data into the XML parser. 'The .LoadXML() method or the .LoadUnBalancedXML() method can be used. 'If you want to parse HTML (where unbalanced tags are allowed), then the .LoadUnBalancedXML() method 'should be used dom = sm.LoadXML(xml)
TIP: Xbasic also provides a simple high level function to get a parsed XML document in a single step (without having to first instantiate the xmlSchemaManager object). The *XML_Document() function can also be used. The following single Xbasic command is equivalent to the commands above parse the XML document:
dom = *XML_Document(xml)
Once you have loaded the XML data into the schema manager (i.e. you have parsed the XML - it is loaded into a 'parse tree'), you can start examining the properties and working with the methods of the schema manager. For example:
'The .top property references the outermost element. 'All elements have an .OuterXML property (among many other properties) 'Notice that the bubble help in the Interactive window will show you all of the properties. ?dom.top.OuterXML = <employee> <name city="Boston"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name> <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
When the XML is parsed an array of all of the elements is automatically created.
'The XML data has 7 elements ?dom.all.size() = 7 'The .Output() method can be used to dump information from the XML parse tree. 'The .Output() method can use specially named symbols in the output expression. 'For example, *elementId is the index into the 'all' array (seen above), and '*tag is the name of the element. The output expression is a standard Xbasic expression. ?dom.Output("*elementId +' ' +*tag +crlf()") = 1 employee 2 name 3 firstname 4 lastname 5 name 6 firstname 7 lastname 'Note that in this output expression, 'city' is an attribute value (we can tell because it does not start with *) 'Any 'fields' in the output expression that don't start with * are attribute names. 'Notice that only elements 2 and 5 have values for the 'city' attribute because the 'city' 'attribute is only defined for the 'name' element. ?dom.Output("*elementId +' ' +*tag + ' - ' + city + crlf()") = 1 employee - 2 name - Boston 3 firstname - 4 lastname - 5 name - Ithaca 6 firstname - 7 lastname - 'The 'all' array contains all of the XML elements 'The outerXML property is the entire XML string for that element ?dom.all[5].outerxml = <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> 'The innerXML property is the XML that is contained by that element. ?dom.all[5].innerxml = <firstname>Milton</firstname> <lastname>Jones</lastname>
Once you have parsed the XML, it is easy to modify it. For example, here is how we can delete elements in the XML data. This section shows how nodes in the XML tree can be 'marked' and then deleted.
'Make sure all elements are initially unmarked 'If .T. is returned, then there were some marked elements. 'If .F. is returned, then there were no marked elements. 'In this case since we have not previously marked any elements, .F. is returned. ?dom.UnmarkAllElements() = .F. 'Now mark all elements that have a 'city' attribute equal to 'Boston' 'Note that the query expression is a standard Xbasic filter expression 'No need for cryptic XPath syntax!! 'The second argument is set to .T.. This causes the child elements of each found 'element to also be marked. 'The method returns .T. in this case, indicating that at least one match was found. ?dom.MarkElements("city='boston'",.t.) = .T. 'Now delete the marked elements dom.DeleteMarked() 'Examine the resulting XML 'It looks as expected, but it contains blank rows where the deleted elements were. ?dom.top.OuterXML = <employee> <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee> 'Call the .reformat() method and examine the XML again dom.Reformat() ?dom.top.OuterXML = <employee> <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
Xbasic has several ways in which you can navigate the XML DOM after it has been parsed. First, lets find out how many nodes exist at the top level of the XML tree.
?dom.top.children.size() = 2
There are two nodes. Let's get a pointer to the first node.
c1 = dom.top.children[1]
This object now has several properties, one of which is 'OuterXML'. This property can be read, and set. Let's first examine it:
?c1.OuterXML = <name city="Boston"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name>
Now, let's set it to a new value:
c1.OuterXML = "<name city=\"Atlanta\"></name>"
And now let's examine the entire XML tree to see our change:
?dom.top.OuterXML = <employee> <name city="Atlanta"/> <name city="Ithaca"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
Note that the 'InnerXML' property is similar to the 'OuterXML' property, but it does not include the enclosing tags:
?dom.top.children[2].innerXML = <firstname>Milton</firstname> <lastname>Jones</lastname>
Each element in the XML tree can have an arbitrary number of attributes. You can read and set these attributes, and you can create new attributes. In our example, 'city' is an attribute of the 'Name' element. To find out how many attributes a particular element has, get a pointer to the element and then use the .attribute.size() method. For example, let's examing the second element in our XML tree. First, get a pointer to the element:
c2 = dom.top.children[2]
See how many attributes this element has:
?c2.attribute.size() = 1
Check to see if a particular attribute of this element exists:
?c2.AttributeExists("city") = .T.
Now, get the attribute's value:
?c2.AttributeGet("city") = "Ithaca"
Now, set the attribute to a new value:
c2.AttributeSet("city","Binghamton")
Inspect the element's 'OuterXML' to confirm that the change was made:
?c2.OuterXML = <name city="Binghamton"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name>
Now create a new attribute for the element:
c2.AttributeSet("population","123000")
And again, inspect the element's 'OuterXML':
?c2.OuterXML = <name city="Binghamton" Population="123000"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name>
If we check the size of the 'attribute' array, we see that it now has two entries:
?c2.attribute.size() = 2
We can read the name of the value of any entry in the 'attribute' array:
?c2.attribute[1].name = "city" ?c2.attribute[1].value = "Binghamton"
Attributes can be dropped:
c2.AttributeDrop("population") ?c2.OuterXML = <name city="Binghamton"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name>
Running Queries against the XML Data
Element Queries
The Xbasic XML parser lets you run queries against your XML file using familiar Xbasic query syntax. There is no need to learn complicated QPath syntax, which is normally used to query XML files. In the following example, we run element queries. You can also run attribute queries, which we show later. The .queryElement() method is used to run element queries. This method creates an 'element array' - an array of all elements that match the query expression. (You can think of an element query as returning a 'pruned' version of the DOM - without a top node.) In this example, we find all elements that have a tag name of 'firstname':
q2 = dom.QueryElement("*tag = 'firstname'")
We can find out how many elements were found by calling the .size() method of the 'all' object:
?q2.all.size() = 2
We can dump the values of the elements:
?q2.Output("*value+crlf()") = Frank Milton
Now, let's do a more complex search. Notice that we are using familiar Xbasic filter syntax:
q3 = dom.QueryElement("*tag = 'name' .and. city = 'boston'") ?q3.all.size() = 1 ?q3.all[1].outerXML = <name city="Boston"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name>
Note that the query object is still linked to the XML parser. Any changes made to the XML when working with the results of a query will be reflected in the full XML tree (i.e. the XML shown by dom.top.OuterXML in these examples).
Attribute Queries
Attribute queries are typically less useful. An attribute query returns an 'attribute array' - an array of all attributes that match the 'filter' expression. (In the case of an attribute query, you do not enter a logical filter expression. Instead, you specify a CR-LF delimited list of attribute names). Let's add a new attribute to our XML and then do an attribute query:
c1 = dom.top.children[1] c1.AttributeSet("state","MA") qa1 = dom.queryAttributes("state") ?qa1.all.size() =1 ?qa1.all[1].value = "MA"
Now, let's search for multiple attributes:
qa1 = dom.queryAttributes("state" + crlf() + "name") ?qa1.all.size() =3 ?qa1.all[1].value = "MA" ?qa1.all[1].name = "state"
The .DumpFormat(), .GetValues() and .SetValues() methods can be used with the result of an Attribute query, as shown below: Here we dump out the element name, attribute name and value of each item in the array returned by the attribute query:
?qa1.DumpFormat("E.N=V" + crlf() ) = name.state=MA name.city=Boston name.city=Ithaca
Here we get a CR-LF delimited list of attribute values:
?qa1.GetValues() = MA Boston Ithaca
Now, lets do some transformation on these values:
list = qa1.GetValues() list = alltrim( upper(list) ) qa1.SetValues(list) ?dom.top.OuterXML = <employee> <name city="BOSTON" state="MA"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name> <name city="ITHACA"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
Important: If your XML tree does not have a top element, the XML parser will automatically insert one.
XML Parser Expression Reserved Words
The following table show a list of reserved words that can be used where expressions can be used (i.e. with the .Output(), .QueryElement() .MarkElements(), UnmarkElements(), .Resolve(), and .FindElement() methods)
*elementId
Index into the all[] array. This is the order in which the elements appear in the XML data.
*value
Contents of the XML element as raw text
*xml
Inner XML of an element
*istop
Returns .T. if the element is the top most element
*tagnumber
child number of the element within the current parent element
*tagcount
number of siblings for this element
*isleaf
Returns .T. if an element has no children
*depth
How many nodes deep is the current element (if *istop is .T., then *depth is 1)
*marked
Returns .T. if the current element has been marked. Tags are marked by setting the element's .Marked property, or by called the .MarkElements() method.
*fullname
The fully qualified tag name. A dot separated list of this tag name and all of the parents. Assume you have an <employees> tag with a child <name> tag, with a child <firstname> tag. The *fullname of the 'firstname' tag is 'employees.name.firstname'.
*tag
The current element name. The *tag reserved words can be used with the following 'navigation' directives. Navigation directives are delimited with periods:
*parent - the parent tag
*prev - the previous sibling
*next - the next sibling
*first - the fist sibling on the current branch
*last - the last sibling on the current branch.
Navigation directives can be nested to an arbitrary depth. For example:
This sytax can also be used to get attribute values. For example, the following syntax will get the value of the 'city' attribute from the current element's parent element.
*tag.*parent.city
The following example shows how the navigation directives can be used to create complex output:
q1 = dom.QueryElement("*tag = 'firstname'") ?q1.Output("*value + ' ' + *tag.*next.*value + ' from ' + *tag.*parent.city + crlf()") = Frank Smith from BOSTON Milton Jones from ITHACA
Marking Elements in the XML Tree
Marking elements is useful because it allows you to move and delete elements from the XML tree. You can use an Xbasic filter expression to select the elements that you want to mark. Having marked elements, you can then methods to move and delete the marked elements. For example:
?dom.top.OuterXML = <employee> <name city="BOSTON" state="MA"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name> <name city="ITHACA"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> </employee>
Now, let's mark the element that has a city attribute equal to 'Boston':
?dom.MarkElements("city='boston'") = .T.
Note that .t. indicates that at least one element was selected and marked. This shows that the first element of the top parent was marked, and the second was not marked.
?dom.top.children[1].marked = .T. ?dom.top.children[2].marked = .F.
Get a pointer to the second element of the top parent
c2 = dom.top.children[2]
Now move all of the marked elements after this element:
c2.MoveMarkedAfter()
And here is how the XML tree has been transformed:
?dom.top.OuterXML = <employee> <name city="ITHACA"> <firstname>Milton</firstname> <lastname>Jones</lastname> </name> <name city="BOSTON" state="MA"> <firstname>Frank</firstname> <lastname>Smith</lastname> </name></employee>
The following methods are useful for working with marked elements:
<document>.MarkElements()
Marks elements that satisfy a query expression. NOTE: Remember to unmark all elements before doing a new query to mark elements!
<document>.UnmarkAllElements()
Unmark all elements.
<document>.UnmarkElements()
Unmark elements selected by a query expression.
<document>.DeleteMarked()
Remove marked elements from the XML tree.
<element>.MoveMarkedAfter()
Move all marked elements after this element.
<element>.MoveMarkedBefore()
Move all marked elements before this element.
<element>.MoveMarkedInside()
Move all marked elements to be children of the current element.
XML Helper Functions
A common requirement with working with XML data is to quickly extract some attribute values from an XML element. Xbasic provides a function to do this. The *XML_PEEK_ATTRIBUTE() function can be used to extract attribute values from a top level element in a XML element. The following examples demonstrate the function.
xml = "<data city='Boston' firstname='Fred' lastname='Smith'/>" ?*XML_PEEK_ATTRIBUTE(xml,"city") = "Boston" ?*XML_PEEK_ATTRIBUTE(xml,"firstname") = "Fred"