mirror of
https://github.com/darlinghq/darling-libxml2.git
synced 2025-01-10 22:46:34 +00:00
doc/xmlreader.html minor cleanups
Tue Nov 4 21:16:47 MST 2003 John Fleck <jfleck@inkstain.net> * doc/xmlreader.html minor cleanups
This commit is contained in:
parent
30ce0dd0f8
commit
dbf6ae87aa
@ -1,3 +1,8 @@
|
|||||||
|
Tue Nov 4 21:16:47 MST 2003 John Fleck <jfleck@inkstain.net>
|
||||||
|
|
||||||
|
* doc/xmlreader.html
|
||||||
|
minor cleanups
|
||||||
|
|
||||||
Tue Nov 4 15:52:28 PST 2003 William Brack <wbrack@mmm.com.hk>
|
Tue Nov 4 15:52:28 PST 2003 William Brack <wbrack@mmm.com.hk>
|
||||||
|
|
||||||
* include/libxml/xmlversion.h.in: changed macro ATTRIBUTE_UNUSED
|
* include/libxml/xmlversion.h.in: changed macro ATTRIBUTE_UNUSED
|
||||||
|
@ -67,19 +67,19 @@ namespace and base processing relatively hard.</p>
|
|||||||
|
|
||||||
<p>The <a
|
<p>The <a
|
||||||
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader
|
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader
|
||||||
API from C#</a> provides a far simpler programming model, the API act as a
|
API from C#</a> provides a far simpler programming model. The API acts as a
|
||||||
cursor going forward on the document stream and stopping at each node in the
|
cursor going forward on the document stream and stopping at each node in the
|
||||||
way. The user code keep the control of the progresses and simply call a
|
way. The user's code keeps control of the progress and simply calls a
|
||||||
Read() function repeatedly to progress to each node in sequence in document
|
Read() function repeatedly to progress to each node in sequence in document
|
||||||
order. There is direct support for namespaces, xml:base, entity handling and
|
order. There is direct support for namespaces, xml:base, entity handling and
|
||||||
adding DTD validation on top of it was relatively simple. This API is really
|
adding DTD validation on top of it was relatively simple. This API is really
|
||||||
close to the <a href="http://www.w3.org/TR/DOM-Level-2-Core/">DOM Core
|
close to the <a href="http://www.w3.org/TR/DOM-Level-2-Core/">DOM Core
|
||||||
specification</a> This provides a far more standard, easy to use and powerful
|
specification</a> This provides a far more standard, easy to use and powerful
|
||||||
API than the existing SAX. Moreover integrating extension feature based on
|
API than the existing SAX. Moreover integrating extension features based on
|
||||||
the tree seems relatively easy.</p>
|
the tree seems relatively easy.</p>
|
||||||
|
|
||||||
<p>In a nutshell the XmlTextReader API provides a simpler, more standard and
|
<p>In a nutshell the XmlTextReader API provides a simpler, more standard and
|
||||||
more extensible interface to handle large document than the existing SAX
|
more extensible interface to handle large documents than the existing SAX
|
||||||
version.</p>
|
version.</p>
|
||||||
|
|
||||||
<h2><a name="Walking">Walking a simple tree</a></h2>
|
<h2><a name="Walking">Walking a simple tree</a></h2>
|
||||||
@ -125,12 +125,12 @@ int streamFile(char *filename) {
|
|||||||
<li>the creation of the reader using a filename</li>
|
<li>the creation of the reader using a filename</li>
|
||||||
<li>the repeated call to xmlTextReaderRead() and how any return value
|
<li>the repeated call to xmlTextReaderRead() and how any return value
|
||||||
different from 1 should stop the loop</li>
|
different from 1 should stop the loop</li>
|
||||||
<li>that a negative return mean a parsing error</li>
|
<li>that a negative return means a parsing error</li>
|
||||||
<li>how xmlFreeTextReader() should be used to free up the resources used by
|
<li>how xmlFreeTextReader() should be used to free up the resources used by
|
||||||
the reader.</li>
|
the reader.</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<p>Here is a similar code in python for exactly the same processing:</p>
|
<p>Here is similar code in python for exactly the same processing:</p>
|
||||||
<pre>import libxml2
|
<pre>import libxml2
|
||||||
|
|
||||||
def processNode(reader):
|
def processNode(reader):
|
||||||
@ -155,13 +155,13 @@ def streamFile(filename):
|
|||||||
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader
|
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader
|
||||||
is abstracted as a class like in C#</a> with the same method names (but the
|
is abstracted as a class like in C#</a> with the same method names (but the
|
||||||
properties are currently accessed with methods) and that one doesn't need to
|
properties are currently accessed with methods) and that one doesn't need to
|
||||||
free the reader at the end of the processing, it will get garbage collected
|
free the reader at the end of the processing. It will get garbage collected
|
||||||
once all references have disapeared</p>
|
once all references have disapeared.</p>
|
||||||
|
|
||||||
<h2><a name="Extracting">Extracting informations for the current node</a></h2>
|
<h2><a name="Extracting">Extracting information for the current node</a></h2>
|
||||||
|
|
||||||
<p>So far the example code did not indicate how informations were extracted
|
<p>So far the example code did not indicate how information was extracted
|
||||||
from the reader, it was abstrated as a call to the processNode() routine,
|
from the reader. It was abstrated as a call to the processNode() routine,
|
||||||
with the reader as the argument. At each invocation, the parser is stopped on
|
with the reader as the argument. At each invocation, the parser is stopped on
|
||||||
a given node and the reader can be used to query those node properties. Each
|
a given node and the reader can be used to query those node properties. Each
|
||||||
<em>Property</em> is available at the C level as a function taking a single
|
<em>Property</em> is available at the C level as a function taking a single
|
||||||
@ -223,7 +223,7 @@ content of the XML test file.</p>
|
|||||||
<p>For the minimal document "<code><doc/></code>" we get:</p>
|
<p>For the minimal document "<code><doc/></code>" we get:</p>
|
||||||
<pre>0 1 doc 1</pre>
|
<pre>0 1 doc 1</pre>
|
||||||
|
|
||||||
<p>Only one node is found, its depth is 0, type 1 indocate an element start,
|
<p>Only one node is found, its depth is 0, type 1 indicate an element start,
|
||||||
of name "doc" and it is empty. Trying now with
|
of name "doc" and it is empty. Trying now with
|
||||||
"<code><doc></doc></code>" instead leads to:</p>
|
"<code><doc></doc></code>" instead leads to:</p>
|
||||||
<pre>0 1 doc 0
|
<pre>0 1 doc 0
|
||||||
@ -252,7 +252,7 @@ character data are reported:</p>
|
|||||||
1 1 c 1 None
|
1 1 c 1 None
|
||||||
0 15 doc 0 None</pre>
|
0 15 doc 0 None</pre>
|
||||||
|
|
||||||
<p>There is a few things to note:</p>
|
<p>There are a few things to note:</p>
|
||||||
<ul>
|
<ul>
|
||||||
<li>the increase of the depth value (first row) as children nodes are
|
<li>the increase of the depth value (first row) as children nodes are
|
||||||
explored</li>
|
explored</li>
|
||||||
@ -286,16 +286,16 @@ the xmllint.c module in the source distribution:</p>
|
|||||||
}
|
}
|
||||||
}</pre>
|
}</pre>
|
||||||
|
|
||||||
<h2><a name="Extracting1">Extracting informations for the attributes</a></h2>
|
<h2><a name="Extracting1">Extracting information for the attributes</a></h2>
|
||||||
|
|
||||||
<p>The previous examples don't indicate how attributes are processed. The
|
<p>The previous examples don't indicate how attributes are processed. The
|
||||||
simple test "<code><doc a="b"/></code>" provides the following
|
simple test "<code><doc a="b"/></code>" provides the following
|
||||||
result:</p>
|
result:</p>
|
||||||
<pre>0 1 doc 1 None</pre>
|
<pre>0 1 doc 1 None</pre>
|
||||||
|
|
||||||
<p>This prove that attributes nodes are not traversed by default. The
|
<p>This proves that attribute nodes are not traversed by default. The
|
||||||
<em>HasAttributes</em> property allow to detect their presence. To check
|
<em>HasAttributes</em> property allow to detect their presence. To check
|
||||||
their content the API has special instructions basically 2 kind of operations
|
their content the API has special instructions. Basically two kinds of operations
|
||||||
are possible:</p>
|
are possible:</p>
|
||||||
<ol>
|
<ol>
|
||||||
<li>to move the reader to the attribute nodes of the current element, in
|
<li>to move the reader to the attribute nodes of the current element, in
|
||||||
@ -339,20 +339,20 @@ by their name (and namespace):</p>
|
|||||||
print "-- %d %d (%s) [%s]" % (reader.Depth(), reader.NodeType(),
|
print "-- %d %d (%s) [%s]" % (reader.Depth(), reader.NodeType(),
|
||||||
reader.Name(),reader.Value())</pre>
|
reader.Name(),reader.Value())</pre>
|
||||||
|
|
||||||
<p>the output for the same input document reflects the attribute:</p>
|
<p>The output for the same input document reflects the attribute:</p>
|
||||||
<pre>0 1 doc 1 None
|
<pre>0 1 doc 1 None
|
||||||
-- 1 2 (a) [b]</pre>
|
-- 1 2 (a) [b]</pre>
|
||||||
|
|
||||||
<p>There is a couple of things to note on the attribute processing:</p>
|
<p>There are a couple of things to note on the attribute processing:</p>
|
||||||
<ul>
|
<ul>
|
||||||
<li>their depth is the one of the carrying element plus one</li>
|
<li>Their depth is the one of the carrying element plus one.</li>
|
||||||
<li>namespace declarations are seen as attributes like in DOM</li>
|
<li>Namespace declarations are seen as attributes, as in DOM.</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<h2><a name="Validating">Validating a document</a></h2>
|
<h2><a name="Validating">Validating a document</a></h2>
|
||||||
|
|
||||||
<p>Libxml2 implementation adds some extra feature on top of the XmlTextReader
|
<p>Libxml2 implementation adds some extra features on top of the XmlTextReader
|
||||||
API, the main one is the ability to DTD validate the parsed document
|
API. The main one is the ability to DTD validate the parsed document
|
||||||
progressively. This is simply the activation of the associated feature of the
|
progressively. This is simply the activation of the associated feature of the
|
||||||
parser used by the reader structure. There are a few options available
|
parser used by the reader structure. There are a few options available
|
||||||
defined as the enum xmlParserProperties in the libxml/xmlreader.h header
|
defined as the enum xmlParserProperties in the libxml/xmlreader.h header
|
||||||
@ -381,7 +381,7 @@ and set the values of those parser properties of the reader. For example</p>
|
|||||||
if ret != 0:
|
if ret != 0:
|
||||||
print "Error parsing and validating %s" % (file)</pre>
|
print "Error parsing and validating %s" % (file)</pre>
|
||||||
|
|
||||||
<p>This routine will parse and validate the file. Errors message can be
|
<p>This routine will parse and validate the file. Error messages can be
|
||||||
captured by registering an error handler. See python/tests/reader2.py for
|
captured by registering an error handler. See python/tests/reader2.py for
|
||||||
more complete Python examples. At the C level the equivalent call to cativate
|
more complete Python examples. At the C level the equivalent call to cativate
|
||||||
the validation feature is just:</p>
|
the validation feature is just:</p>
|
||||||
@ -460,7 +460,7 @@ while reader.Read():
|
|||||||
if reader.Next() != 1: # skip the subtree
|
if reader.Next() != 1: # skip the subtree
|
||||||
break;</pre>
|
break;</pre>
|
||||||
|
|
||||||
<p>Note however that the node instance returned by the Expand() call is only
|
<p>Note, however that the node instance returned by the Expand() call is only
|
||||||
valid until the next Read() operation. The Expand() operation does not
|
valid until the next Read() operation. The Expand() operation does not
|
||||||
affects the Read() ones, however usually once processed the full subtree is
|
affects the Read() ones, however usually once processed the full subtree is
|
||||||
not useful anymore, and the Next() operation allows to skip it completely and
|
not useful anymore, and the Next() operation allows to skip it completely and
|
||||||
|
Loading…
x
Reference in New Issue
Block a user