diff --git a/ChangeLog b/ChangeLog index 75d7082b..62f609f6 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +Wed Oct 24 14:34:25 CEST 2001 Daniel Veillard + + * doc/site.xsl doc/*.html doc/Makefile.am: now autogenerate + the web site from the main HTML document. + Tue Oct 23 14:32:04 CEST 2001 Daniel Veillard * parser.c: fixed an erroneous validation bug when PE refs diff --git a/doc/DOM.html b/doc/DOM.html new file mode 100644 index 00000000..b4269174 --- /dev/null +++ b/doc/DOM.html @@ -0,0 +1,71 @@ + + + + + +DOM Principles + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

DOM Principles

+
+
+ + +
+ + +
Main Menu
+

+DOM stands for the Document +Object Model; this is an API for accessing XML or HTML structured +documents. Native support for DOM in Gnome is on the way (module gnome-dom), +and will be based on gnome-xml. This will be a far cleaner interface to +manipulate XML files within Gnome since it won't expose the internal +structure.

+

The current DOM implementation on top of libxml is the gdome2 Gnome module, this +is a full DOM interface, thanks to Paolo Casarini, check the Gdome2 homepage for more +informations.

+

Daniel Veillard

+
+ + diff --git a/doc/Makefile.am b/doc/Makefile.am index b995b0ee..799a2a76 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -12,11 +12,18 @@ DOC_SOURCE_DIR=.. HTML_DIR=@HTML_DIR@ TARGET_DIR=$(HTML_DIR)/$(DOC_MODULE)/html +PAGES= architecture.html bugs.html contribs.html docs.html DOM.html \ + downloads.html entities.html example.html help.html index.html \ + interface.html intro.html library.html namespaces.html news.html \ + tree.html valid.html XML.html XSLT.html man_MANS = xmlcatalog.1 -# htmldir = $(prefix)/html -# html_DATA = gnome-dev-info.html +all: $(PAGES) + +$(PAGES): xml.html site.xsl + @(if [ -x /usr/bin/xsltproc ] ; then \ + /usr/bin/xsltproc --html site.xsl xml.html > index.html ; fi ); scan: gtkdoc-scan --module=libxml --source-dir=$(DOC_SOURCE_DIR) --ignore-headers="acconfig.h config.h xmlwin32version.h win32config.h trio.h strio.h triop.h" @@ -52,6 +59,6 @@ install-data-local: -(cd $(DESTDIR); gtkdoc-fixxref --module=libxml --html-dir=$(HTML_DIR)) dist-hook: - (cd $(srcdir) ; tar cvf - *.1 *.html *.gif html/*.html html/*.sgml) | (cd $(distdir); tar xf -) + (cd $(srcdir) ; tar cvf - *.1 site.xsl *.html *.gif html/*.html html/*.sgml) | (cd $(distdir); tar xf -) .PHONY : html sgml templates scan diff --git a/doc/XML.html b/doc/XML.html new file mode 100644 index 00000000..97fc381b --- /dev/null +++ b/doc/XML.html @@ -0,0 +1,90 @@ + + + + + +XML + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

XML

+
+
+ + +
+ + +
Main Menu
+

+XML is a standard for +markup-based structured documents. Here is an example XML +document:

+
<?xml version="1.0"?>
+<EXAMPLE prop1="gnome is great" prop2="&amp; linux too">
+  <head>
+   <title>Welcome to Gnome</title>
+  </head>
+  <chapter>
+   <title>The Linux adventure</title>
+   <p>bla bla bla ...</p>
+   <image href="linus.gif"/>
+   <p>...</p>
+  </chapter>
+</EXAMPLE>
+

The first line specifies that it's an XML document and gives useful +information about its encoding. Then the document is a text format whose +structure is specified by tags between brackets. Each tag opened has +to be closed. XML is pedantic about this. However, if a tag is empty +(no content), a single tag can serve as both the opening and closing tag if +it ends with /> rather than with >. Note +that, for example, the image tag has no content (just an attribute) and is +closed by ending the tag with />.

+

XML can be applied sucessfully to a wide range of uses, from long term +structured document maintenance (where it follows the steps of SGML) to +simple data encoding mechanisms like configuration file formatting (glade), +spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where +it is used to encode remote calls between a client and a server.

+

Daniel Veillard

+
+ + diff --git a/doc/XSLT.html b/doc/XSLT.html new file mode 100644 index 00000000..294384aa --- /dev/null +++ b/doc/XSLT.html @@ -0,0 +1,72 @@ + + + + + +XSLT + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

XSLT

+
+
+ + +
+ + +
Main Menu
+

Check the separate libxslt page +

+

+XSL Transformations, is a +language for transforming XML documents into other XML documents (or +HTML/textual output).

+

A separate library called libxslt is being built on top of libxml2. This +module "libxslt" can be found in the Gnome CVS base too.

+

You can check the features +supported and the progresses on the Changelog +

+

Daniel Veillard

+
+ + diff --git a/doc/architecture.html b/doc/architecture.html new file mode 100644 index 00000000..0a31a4ec --- /dev/null +++ b/doc/architecture.html @@ -0,0 +1,80 @@ + + + + + +An overview of libxml architecture + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

An overview of libxml architecture

+
+
+ + +
+ + +
Main Menu
+

Libxml is made of multiple components; some of them are optional, and most +of the block interfaces are public. The main components are:

+
    +
  • an Input/Output layer
  • +
  • FTP and HTTP client layers (optional)
  • +
  • an Internationalization layer managing the encodings support
  • +
  • a URI module
  • +
  • the XML parser and its basic SAX interface
  • +
  • an HTML parser using the same SAX interface (optional)
  • +
  • a SAX tree module to build an in-memory DOM representation
  • +
  • a tree module to manipulate the DOM representation
  • +
  • a validation module using the DOM representation (optional)
  • +
  • an XPath module for global lookup in a DOM representation + (optional)
  • +
  • a debug module (optional)
  • +
+

Graphically this gives the following:

+

a graphical view of the various

+

+

Daniel Veillard

+
+ + diff --git a/doc/bugs.html b/doc/bugs.html new file mode 100644 index 00000000..ca7fe707 --- /dev/null +++ b/doc/bugs.html @@ -0,0 +1,101 @@ + + + + + +Reporting bugs and getting help + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Reporting bugs and getting help

+
+
+ + +
+ + +
Main Menu
+

Well, bugs or missing features are always possible, and I will make a +point of fixing them in a timely fashion. The best way to report a bug is to +use the Gnome +bug tracking database (make sure to use the "libxml" module name). I look +at reports there regularly and it's good to have a reminder when a bug is +still open. Check the instructions on +reporting bugs and be sure to specify that the bug is for the package +libxml.

+

There is also a mailing-list xml@gnome.org for libxml, with an on-line archive (old). To subscribe to this list, +please visit the associated Web page and +follow the instructions. Do not send code, I won't debug it +(but patches are really appreciated!).

+

Check the following before +posting:

+
    +
  • read the FAQ +
  • +
  • make sure you are using a recent + version, and that the problem still shows up in those
  • +
  • check the list + archives to see if the problem was reported already, in this case + there is probably a fix available, similary check the registered + open bugs +
  • +
  • make sure you can reproduce the bug with xmllint or one of the test + programs found in source in the distribution
  • +
  • Please send the command showing the error as well as the input (as an + attachement)
  • +
+

Then send the bug with associated informations to reproduce it to the xml@gnome.org list; if it's really libxml +related I will approve it.. Please do not send me mail directly, it makes +things really harder to track and in some cases I'm not the best person to +answer a given question, ask the list instead.

+

Of course, bugs reported with a suggested patch for fixing them will +probably be processed faster.

+

If you're looking for help, a quick look at the list archive may actually +provide the answer, I usually send source samples when answering libxml usage +questions. The auto-generated +documentantion is not as polished as I would like (i need to learn more +about Docbook), but it's a good starting point.

+

Daniel Veillard

+
+ + diff --git a/doc/contribs.html b/doc/contribs.html new file mode 100644 index 00000000..bace2108 --- /dev/null +++ b/doc/contribs.html @@ -0,0 +1,107 @@ + + + + + +Contributions + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Contributions

+
+
+ + +
+ + +
Main Menu
+ +

+

Daniel Veillard

+

$Id$

+

Daniel Veillard

+
+ + diff --git a/doc/docs.html b/doc/docs.html new file mode 100644 index 00000000..085cfacd --- /dev/null +++ b/doc/docs.html @@ -0,0 +1,88 @@ + + + + + +Documentation + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Documentation

+
+
+ + +
+ + +
Main Menu
+

There are some on-line resources about using libxml:

+
    +
  1. Check the FAQ +
  2. +
  3. Check the extensive + documentation automatically extracted from code comments (using gtk + doc).
  4. +
  5. Look at the documentation about libxml + internationalization support +
  6. +
  7. This page provides a global overview and some + examples on how to use libxml.
  8. +
  9. +James Henstridge + wrote some nice + documentation explaining how to use the libxml SAX interface.
  10. +
  11. George Lebl wrote an article + for IBM developerWorks about using libxml.
  12. +
  13. Check the TODO + file +
  14. +
  15. Read the 1.x to 2.x upgrade path. If you are + starting a new project using libxml you should really use the 2.x + version.
  16. +
  17. And don't forget to look at the mailing-list + archive.
  18. +
+

Daniel Veillard

+
+ + diff --git a/doc/downloads.html b/doc/downloads.html new file mode 100644 index 00000000..98e8d8c7 --- /dev/null +++ b/doc/downloads.html @@ -0,0 +1,88 @@ + + + + + +Downloads + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Downloads

+
+
+ + +
+ + +
Main Menu
+

The latest versions of libxml can be found on xmlsoft.org (Seattle, France) or on the Gnome FTP server either +as a source +archive or RPM +packages. (NOTE that you need both the libxml(2) and libxml(2)-devel +packages installed to compile applications using libxml.) Igor Zlatkovic is now the maintainer +of the Windows port, he +provides binaries +

+

Snapshot:

+ +

Contribs:

+

I do accept external contributions, especially if compiling on another +platform, get in touch with me to upload the package. I will keep them in the +contrib directory +

+

Libxml is also available from CVS:

+ +

Daniel Veillard

+
+ + diff --git a/doc/entities.html b/doc/entities.html new file mode 100644 index 00000000..f5ee99d6 --- /dev/null +++ b/doc/entities.html @@ -0,0 +1,127 @@ + + + + + +Entities or no entities + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Entities or no entities

+
+
+ + +
+ + +
Main Menu
+

Entities in principle are similar to simple C macros. An entity defines an +abbreviation for a given string that you can reuse many times throughout the +content of your document. Entities are especially useful when a given string +may occur frequently within a document, or to confine the change needed to a +document to a restricted area in the internal subset of the document (at the +beginning). Example:

+
1 <?xml version="1.0"?>
+2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
+3 <!ENTITY xml "Extensible Markup Language">
+4 ]>
+5 <EXAMPLE>
+6    &xml;
+7 </EXAMPLE>
+

Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing +its name with '&' and following it by ';' without any spaces added. There +are 5 predefined entities in libxml allowing you to escape charaters with +predefined meaning in some parts of the xml document content: +&lt; for the character '<', &gt; +for the character '>', &apos; for the character ''', +&quot; for the character '"', and +&amp; for the character '&'.

+

One of the problems related to entities is that you may want the parser to +substitute an entity's content so that you can see the replacement text in +your application. Or you may prefer to keep entity references as such in the +content to be able to save the document back without losing this usually +precious information (if the user went through the pain of explicitly +defining entities, he may have a a rather negative attitude if you blindly +susbtitute them as saving time). The xmlSubstituteEntitiesDefault() +function allows you to check and change the behaviour, which is to not +substitute entities by default.

+

Here is the DOM tree built by libxml for the previous document in the +default case:

+
/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
+DOCUMENT
+version=1.0
+   ELEMENT EXAMPLE
+     TEXT
+     content=
+     ENTITY_REF
+       INTERNAL_GENERAL_ENTITY xml
+       content=Extensible Markup Language
+     TEXT
+     content=
+

And here is the result when substituting entities:

+
/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
+DOCUMENT
+version=1.0
+   ELEMENT EXAMPLE
+     TEXT
+     content=     Extensible Markup Language
+

So, entities or no entities? Basically, it depends on your use case. I +suggest that you keep the non-substituting default behaviour and avoid using +entities in your XML document or data if you are not willing to handle the +entity references elements in the DOM tree.

+

Note that at save time libxml enforces the conversion of the predefined +entities where necessary to prevent well-formedness problems, and will also +transparently replace those with chars (i.e. it will not generate entity +reference elements in the DOM tree or call the reference() SAX callback when +finding them in the input).

+

+WARNING: handling entities +on top of the libxml SAX interface is difficult!!! If you plan to use +non-predefined entities in your documents, then the learning cuvre to handle +then using the SAX API may be long. If you plan to use complex documents, I +strongly suggest you consider using the DOM interface instead and let libxml +deal with the complexity rather than trying to do it yourself.

+

Daniel Veillard

+
+ + diff --git a/doc/example.html b/doc/example.html new file mode 100644 index 00000000..db361109 --- /dev/null +++ b/doc/example.html @@ -0,0 +1,249 @@ + + + + + +A real example + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

A real example

+
+
+ + +
+ + +
Main Menu
+

Here is a real size example, where the actual content of the application +data is not kept in the DOM tree but uses internal structures. It is based on +a proposal to keep a database of jobs related to Gnome, with an XML based +storage structure. Here is an XML encoded jobs +base:

+
<?xml version="1.0"?>
+<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location">
+  <gjob:Jobs>
+
+    <gjob:Job>
+      <gjob:Project ID="3"/>
+      <gjob:Application>GBackup</gjob:Application>
+      <gjob:Category>Development</gjob:Category>
+
+      <gjob:Update>
+        <gjob:Status>Open</gjob:Status>
+        <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified>
+        <gjob:Salary>USD 0.00</gjob:Salary>
+      </gjob:Update>
+
+      <gjob:Developers>
+        <gjob:Developer>
+        </gjob:Developer>
+      </gjob:Developers>
+
+      <gjob:Contact>
+        <gjob:Person>Nathan Clemons</gjob:Person>
+        <gjob:Email>nathan@windsofstorm.net</gjob:Email>
+        <gjob:Company>
+        </gjob:Company>
+        <gjob:Organisation>
+        </gjob:Organisation>
+        <gjob:Webpage>
+        </gjob:Webpage>
+        <gjob:Snailmail>
+        </gjob:Snailmail>
+        <gjob:Phone>
+        </gjob:Phone>
+      </gjob:Contact>
+
+      <gjob:Requirements>
+      The program should be released as free software, under the GPL.
+      </gjob:Requirements>
+
+      <gjob:Skills>
+      </gjob:Skills>
+
+      <gjob:Details>
+      A GNOME based system that will allow a superuser to configure 
+      compressed and uncompressed files and/or file systems to be backed 
+      up with a supported media in the system.  This should be able to 
+      perform via find commands generating a list of files that are passed 
+      to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 
+      or via operations performed on the filesystem itself. Email 
+      notification and GUI status display very important.
+      </gjob:Details>
+
+    </gjob:Job>
+
+  </gjob:Jobs>
+</gjob:Helping>
+

While loading the XML file into an internal DOM tree is a matter of +calling only a couple of functions, browsing the tree to gather the ata and +generate the internal structures is harder, and more error prone.

+

The suggested principle is to be tolerant with respect to the input +structure. For example, the ordering of the attributes is not significant, +the XML specification is clear about it. It's also usually a good idea not to +depend on the order of the children of a given node, unless it really makes +things harder. Here is some code to parse the information for a person:

+
/*
+ * A person record
+ */
+typedef struct person {
+    char *name;
+    char *email;
+    char *company;
+    char *organisation;
+    char *smail;
+    char *webPage;
+    char *phone;
+} person, *personPtr;
+
+/*
+ * And the code needed to parse it
+ */
+personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
+    personPtr ret = NULL;
+
+DEBUG("parsePerson\n");
+    /*
+     * allocate the struct
+     */
+    ret = (personPtr) malloc(sizeof(person));
+    if (ret == NULL) {
+        fprintf(stderr,"out of memory\n");
+        return(NULL);
+    }
+    memset(ret, 0, sizeof(person));
+
+    /* We don't care what the top level element name is */
+    cur = cur->xmlChildrenNode;
+    while (cur != NULL) {
+        if ((!strcmp(cur->name, "Person")) && (cur->ns == ns))
+            ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
+        if ((!strcmp(cur->name, "Email")) && (cur->ns == ns))
+            ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
+        cur = cur->next;
+    }
+
+    return(ret);
+}
+

Here are a couple of things to notice:

+
    +
  • Usually a recursive parsing style is the more convenient one: XML data + is by nature subject to repetitive constructs and usually exibits highly + stuctured patterns.
  • +
  • The two arguments of type xmlDocPtr and xmlNsPtr, + i.e. the pointer to the global XML document and the namespace reserved to + the application. Document wide information are needed for example to + decode entities and it's a good coding practice to define a namespace for + your application set of data and test that the element and attributes + you're analyzing actually pertains to your application space. This is + done by a simple equality test (cur->ns == ns).
  • +
  • To retrieve text and attributes value, you can use the function + xmlNodeListGetString to gather all the text and entity reference + nodes generated by the DOM output and produce an single text string.
  • +
+

Here is another piece of code used to parse another level of the +structure:

+
#include <libxml/tree.h>
+/*
+ * a Description for a Job
+ */
+typedef struct job {
+    char *projectID;
+    char *application;
+    char *category;
+    personPtr contact;
+    int nbDevelopers;
+    personPtr developers[100]; /* using dynamic alloc is left as an exercise */
+} job, *jobPtr;
+
+/*
+ * And the code needed to parse it
+ */
+jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
+    jobPtr ret = NULL;
+
+DEBUG("parseJob\n");
+    /*
+     * allocate the struct
+     */
+    ret = (jobPtr) malloc(sizeof(job));
+    if (ret == NULL) {
+        fprintf(stderr,"out of memory\n");
+        return(NULL);
+    }
+    memset(ret, 0, sizeof(job));
+
+    /* We don't care what the top level element name is */
+    cur = cur->xmlChildrenNode;
+    while (cur != NULL) {
+        
+        if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) {
+            ret->projectID = xmlGetProp(cur, "ID");
+            if (ret->projectID == NULL) {
+                fprintf(stderr, "Project has no ID\n");
+            }
+        }
+        if ((!strcmp(cur->name, "Application")) && (cur->ns == ns))
+            ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
+        if ((!strcmp(cur->name, "Category")) && (cur->ns == ns))
+            ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
+        if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns))
+            ret->contact = parsePerson(doc, ns, cur);
+        cur = cur->next;
+    }
+
+    return(ret);
+}
+

Once you are used to it, writing this kind of code is quite simple, but +boring. Ultimately, it could be possble to write stubbers taking either C +data structure definitions, a set of XML examples or an XML DTD and produce +the code needed to import and export the content between C data and XML +storage. This is left as an exercise to the reader :-)

+

Feel free to use the code for the full C +parsing example as a template, it is also available with Makefile in the +Gnome CVS base under gnome-xml/example

+

Daniel Veillard

+
+ + diff --git a/doc/help.html b/doc/help.html new file mode 100644 index 00000000..61b98696 --- /dev/null +++ b/doc/help.html @@ -0,0 +1,78 @@ + + + + + +How to help + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

How to help

+
+
+ + +
+ + +
Main Menu
+

You can help the project in various ways, the best thing to do first is to +subscribe to the mailing-list as explained before, check the archives and the Gnome bug +database::

+
    +
  1. provide patches when you find problems
  2. +
  3. provide the diffs when you port libxml to a new platform. They may not + be integrated in all cases but help pinpointing portability problems + and
  4. +
  5. provide documentation fixes (either as patches to the code comments or + as HTML diffs).
  6. +
  7. provide new documentations pieces (translations, examples, etc ...)
  8. +
  9. Check the TODO file and try to close one of the items
  10. +
  11. take one of the points raised in the archive or the bug database and + provide a fix. Get in touch with me + before to avoid synchronization problems and check that the suggested + fix will fit in nicely :-)
  12. +
+

Daniel Veillard

+
+ + diff --git a/doc/index.html b/doc/index.html new file mode 100644 index 00000000..c4c230a2 --- /dev/null +++ b/doc/index.html @@ -0,0 +1,95 @@ + + + + + +The XML C library for Gnome + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

libxml

+
+
+ + +
+ + +
Main Menu
+

+

+

Separate documents:

+ +

Daniel Veillard

+
+ + diff --git a/doc/interface.html b/doc/interface.html new file mode 100644 index 00000000..6d1a0e66 --- /dev/null +++ b/doc/interface.html @@ -0,0 +1,115 @@ + + + + + +The SAX interface + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

The SAX interface

+
+
+ + +
+ + +
Main Menu
+

Sometimes the DOM tree output is just too large to fit reasonably into +memory. In that case (and if you don't expect to save back the XML document +loaded using libxml), it's better to use the SAX interface of libxml. SAX is +a callback-based interface to the parser. Before parsing, +the application layer registers a customized set of callbacks which are +called by the library as it progresses through the XML input.

+

To get more detailed step-by-step guidance on using the SAX interface of +libxml, see the nice +documentation.written by James +Henstridge.

+

You can debug the SAX behaviour by using the testSAX +program located in the gnome-xml module (it's usually not shipped in the +binary packages of libxml, but you can find it in the tar source +distribution). Here is the sequence of callbacks that would be reported by +testSAX when parsing the example XML document shown earlier:

+
SAX.setDocumentLocator()
+SAX.startDocument()
+SAX.getEntity(amp)
+SAX.startElement(EXAMPLE, prop1='gnome is great', prop2='&amp; linux too')
+SAX.characters(   , 3)
+SAX.startElement(head)
+SAX.characters(    , 4)
+SAX.startElement(title)
+SAX.characters(Welcome to Gnome, 16)
+SAX.endElement(title)
+SAX.characters(   , 3)
+SAX.endElement(head)
+SAX.characters(   , 3)
+SAX.startElement(chapter)
+SAX.characters(    , 4)
+SAX.startElement(title)
+SAX.characters(The Linux adventure, 19)
+SAX.endElement(title)
+SAX.characters(    , 4)
+SAX.startElement(p)
+SAX.characters(bla bla bla ..., 15)
+SAX.endElement(p)
+SAX.characters(    , 4)
+SAX.startElement(image, href='linus.gif')
+SAX.endElement(image)
+SAX.characters(    , 4)
+SAX.startElement(p)
+SAX.characters(..., 3)
+SAX.endElement(p)
+SAX.characters(   , 3)
+SAX.endElement(chapter)
+SAX.characters( , 1)
+SAX.endElement(EXAMPLE)
+SAX.endDocument()
+

Most of the other interfaces of libxml are based on the DOM tree-building +facility, so nearly everything up to the end of this document presupposes the +use of the standard DOM tree build. Note that the DOM tree itself is built by +a set of registered default callbacks, without internal specific +interface.

+

Daniel Veillard

+
+ + diff --git a/doc/intro.html b/doc/intro.html new file mode 100644 index 00000000..2a343d3b --- /dev/null +++ b/doc/intro.html @@ -0,0 +1,87 @@ + + + + + +Introduction + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Introduction

+
+
+ + +
+ + +
Main Menu
+

This document describes libxml, the XML C library developped for the Gnome project. XML is a standard for building tag-based +structured documents/data.

+

Here are some key points about libxml:

+
    +
  • Libxml exports Push and Pull type parser interfaces for both XML and + HTML.
  • +
  • Libxml can do DTD validation at parse time, using a parsed document + instance, or with an arbitrary DTD.
  • +
  • Libxml now includes nearly complete XPath, XPointer and XInclude implementations.
  • +
  • It is written in plain C, making as few assumptions as possible, and + sticking closely to ANSI C/POSIX for easy embedding. Works on + Linux/Unix/Windows, ported to a number of other platforms.
  • +
  • Basic support for HTTP and FTP client allowing aplications to fetch + remote resources
  • +
  • The design is modular, most of the extensions can be compiled out.
  • +
  • The internal document repesentation is as close as possible to the DOM interfaces.
  • +
  • Libxml also has a SAX + like interface; the interface is designed to be compatible with Expat.
  • +
  • This library is released both under the W3C + IPR and the GNU + LGPL. Use either at your convenience, basically this should make + everybody happy, if not, drop me a mail.
  • +
+

Warning: unless you are forced to because your application links with a +Gnome library requiring it, Do Not Use libxml1, use +libxml2

+

Daniel Veillard

+
+ + diff --git a/doc/library.html b/doc/library.html new file mode 100644 index 00000000..289c7f51 --- /dev/null +++ b/doc/library.html @@ -0,0 +1,241 @@ + + + + + +The XML library interfaces + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

The XML library interfaces

+
+
+ + +
+ + +
Main Menu
+

This section is directly intended to help programmers getting bootstrapped +using the XML library from the C language. It is not intended to be +extensive. I hope the automatically generated documents will provide the +completeness required, but as a separate set of documents. The interfaces of +the XML library are by principle low level, there is nearly zero abstraction. +Those interested in a higher level API should look at +DOM.

+

The parser interfaces for XML are +separated from the HTML parser +interfaces. Let's have a look at how the XML parser can be called:

+

Invoking the parser : the pull method

+

Usually, the first thing to do is to read an XML input. The parser accepts +documents either from in-memory strings or from files. The functions are +defined in "parser.h":

+
+
xmlDocPtr xmlParseMemory(char *buffer, int size);
+

Parse a null-terminated string containing the document.

+
+
+
xmlDocPtr xmlParseFile(const char *filename);
+

Parse an XML document contained in a (possibly compressed) + file.

+
+

The parser returns a pointer to the document structure (or NULL in case of +failure).

+

Invoking the parser: the push method

+

In order for the application to keep the control when the document is +being fetched (which is common for GUI based programs) libxml provides a push +interface, too, as of version 1.8.3. Here are the interface functions:

+
xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
+                                         void *user_data,
+                                         const char *chunk,
+                                         int size,
+                                         const char *filename);
+int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
+                                         const char *chunk,
+                                         int size,
+                                         int terminate);
+

and here is a simple example showing how to use the interface:

+
            FILE *f;
+
+            f = fopen(filename, "r");
+            if (f != NULL) {
+                int res, size = 1024;
+                char chars[1024];
+                xmlParserCtxtPtr ctxt;
+
+                res = fread(chars, 1, 4, f);
+                if (res > 0) {
+                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
+                                chars, res, filename);
+                    while ((res = fread(chars, 1, size, f)) > 0) {
+                        xmlParseChunk(ctxt, chars, res, 0);
+                    }
+                    xmlParseChunk(ctxt, chars, 0, 1);
+                    doc = ctxt->myDoc;
+                    xmlFreeParserCtxt(ctxt);
+                }
+            }
+

The HTML parser embedded into libxml also has a push interface; the +functions are just prefixed by "html" rather than "xml".

+

Invoking the parser: the SAX interface

+

The tree-building interface makes the parser memory-hungry, first loading +the document in memory and then building the tree itself. Reading a document +without building the tree is possible using the SAX interfaces (see SAX.h and +James +Henstridge's documentation). Note also that the push interface can be +limited to SAX: just use the two first arguments of +xmlCreatePushParserCtxt().

+

Building a tree from scratch

+

The other way to get an XML tree in memory is by building it. Basically +there is a set of functions dedicated to building new elements. (These are +also described in <libxml/tree.h>.) For example, here is a piece of +code that produces the XML document used in the previous examples:

+
    #include <libxml/tree.h>
+    xmlDocPtr doc;
+    xmlNodePtr tree, subtree;
+
+    doc = xmlNewDoc("1.0");
+    doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
+    xmlSetProp(doc->children, "prop1", "gnome is great");
+    xmlSetProp(doc->children, "prop2", "& linux too");
+    tree = xmlNewChild(doc->children, NULL, "head", NULL);
+    subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
+    tree = xmlNewChild(doc->children, NULL, "chapter", NULL);
+    subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
+    subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
+    subtree = xmlNewChild(tree, NULL, "image", NULL);
+    xmlSetProp(subtree, "href", "linus.gif");
+

Not really rocket science ...

+

Traversing the tree

+

Basically by including "tree.h" your +code has access to the internal structure of all the elements of the tree. +The names should be somewhat simple like parent, +children, next, prev, +properties, etc... For example, still with the previous +example:

+
doc->children->children->children
+

points to the title element,

+
doc->children->children->next->children->children
+

points to the text node containing the chapter title "The Linux +adventure".

+

+NOTE: XML allows PIs and comments to be +present before the document root, so doc->children may point +to an element which is not the document Root Element; a function +xmlDocGetRootElement() was added for this purpose.

+

Modifying the tree

+

Functions are provided for reading and writing the document content. Here +is an excerpt from the tree API:

+
+
xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const + xmlChar *value);
+

This sets (or changes) an attribute carried by an ELEMENT node. + The value can be NULL.

+
+
+
const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar + *name);
+

This function returns a pointer to new copy of the property + content. Note that the user must deallocate the result.

+
+

Two functions are provided for reading and writing the text associated +with elements:

+
+
xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar + *value);
+

This function takes an "external" string and converts it to one + text node or possibly to a list of entity and text nodes. All + non-predefined entity references like &Gnome; will be stored + internally as entity nodes, hence the result of the function may not be + a single node.

+
+
+
xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int + inLine);
+

This function is the inverse of + xmlStringGetNodeList(). It generates a new string + containing the content of the text and entity nodes. Note the extra + argument inLine. If this argument is set to 1, the function will expand + entity references. For example, instead of returning the &Gnome; + XML encoding in the string, it will substitute it with its value (say, + "GNU Network Object Model Environment").

+
+

Saving a tree

+

Basically 3 options are possible:

+
+
void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int + *size);
+

Returns a buffer into which the document has been saved.

+
+
+
extern void xmlDocDump(FILE *f, xmlDocPtr doc);
+

Dumps a document to an open file descriptor.

+
+
+
int xmlSaveFile(const char *filename, xmlDocPtr cur);
+

Saves the document to a file. In this case, the compression + interface is triggered if it has been turned on.

+
+

Compression

+

The library transparently handles compression when doing file-based +accesses. The level of compression on saves can be turned on either globally +or individually for one file:

+
+
int xmlGetDocCompressMode (xmlDocPtr doc);
+

Gets the document compression ratio (0-9).

+
+
+
void xmlSetDocCompressMode (xmlDocPtr doc, int mode);
+

Sets the document compression ratio.

+
+
+
int xmlGetCompressMode(void);
+

Gets the default compression ratio.

+
+
+
void xmlSetCompressMode(int mode);
+

Sets the default compression ratio.

+
+

Daniel Veillard

+
+ + diff --git a/doc/namespaces.html b/doc/namespaces.html new file mode 100644 index 00000000..d975f3fe --- /dev/null +++ b/doc/namespaces.html @@ -0,0 +1,104 @@ + + + + + +Namespaces + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Namespaces

+
+
+ + +
+ + +
Main Menu
+

The libxml library implements XML namespaces support by +recognizing namespace contructs in the input, and does namespace lookup +automatically when building the DOM tree. A namespace declaration is +associated with an in-memory structure and all elements or attributes within +that namespace point to it. Hence testing the namespace is a simple and fast +equality operation at the user level.

+

I suggest that people using libxml use a namespace, and declare it in the +root element of their document as the default namespace. Then they don't need +to use the prefix in the content but we will have a basis for future semantic +refinement and merging of data from different sources. This doesn't increase +the size of the XML output significantly, but significantly increases its +value in the long-term. Example:

+
<mydoc xmlns="http://mydoc.example.org/schemas/">
+   <elem1>...</elem1>
+   <elem2>...</elem2>
+</mydoc>
+

The namespace value has to be an absolute URL, but the URL doesn't have to +point to any existing resource on the Web. It will bind all the element and +atributes with that URL. I suggest to use an URL within a domain you control, +and that the URL should contain some kind of version information if possible. +For example, "http://www.gnome.org/gnumeric/1.0/" is a good +namespace scheme.

+

Then when you load a file, make sure that a namespace carrying the +version-independent prefix is installed on the root element of your document, +and if the version information don't match something you know, warn the user +and be liberal in what you accept as the input. Also do *not* try to base +namespace checking on the prefix value. <foo:text> may be exactly the +same as <bar:text> in another document. What really matters is the URI +associated with the element or the attribute, not the prefix string (which is +just a shortcut for the full URI). In libxml, element and attributes have an +ns field pointing to an xmlNs structure detailing the namespace +prefix and its URI.

+

@@Interfaces@@

+

@@Examples@@

+

Usually people object to using namespaces together with validity checking. +I will try to make sure that using namespaces won't break validity checking, +so even if you plan to use or currently are using validation I strongly +suggest adding namespaces to your document. A default namespace scheme +xmlns="http://...." should not break validity even on less +flexible parsers. Using namespaces to mix and differentiate content coming +from multiple DTDs will certainly break current validation schemes. I will +try to provide ways to do this, but this may not be portable or +standardized.

+

Daniel Veillard

+
+ + diff --git a/doc/news.html b/doc/news.html new file mode 100644 index 00000000..a18b8ef1 --- /dev/null +++ b/doc/news.html @@ -0,0 +1,608 @@ + + + + + +News + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

News

+
+
+ + +
+ + +
Main Menu
+

CVS only : check the Changelog file +for a really accurate description

+

Items floating around but not actively worked on, get in touch with me if +you want to test those

+
    +
  • Implementing XSLT, this is done + as a separate C library on top of libxml called libxslt
  • +
  • Finishing up XPointer and XInclude +
  • +
  • (seeems working but delayed from release) parsing/import of Docbook + SGML docs
  • +
+

2.4.6: Oct 10 2001

+
    +
  • added and updated man pages by John Fleck
  • +
  • portability and configure fixes
  • +
  • an infinite loop on the HTML parser was removed (William)
  • +
  • Windows makefile patches from Igor
  • +
  • fixed half a dozen bugs reported fof libxml or libxslt
  • +
  • updated xmlcatalog to be able to modify SGML super catalogs
  • +
+

2.4.5: Sep 14 2001

+
    +
  • Remove a few annoying bugs in 2.4.4
  • +
  • forces the HTML serializer to output decimal charrefs since some + version of Netscape can't handle hexadecimal ones
  • +
+

1.8.16: Sep 14 2001

+
  • maintenance release of the old libxml1 branch, couple of bug and + portability fixes
+

2.4.4: Sep 12 2001

+
    +
  • added --convert to xmlcatalog, bug fixes and cleanups of XML + Catalog
  • +
  • a few bug fixes and some portability changes
  • +
  • some documentation cleanups
  • +
+

2.4.3: Aug 23 2001

+
    +
  • XML Catalog support see the doc
  • +
  • New NaN/Infinity floating point code
  • +
  • A few bug fixes
  • +
+

2.4.2: Aug 15 2001

+
    +
  • adds xmlLineNumbersDefault() to control line number generation
  • +
  • lot of bug fixes
  • +
  • the Microsoft MSC projects files shuld now be up to date
  • +
  • inheritance of namespaces from DTD defaulted attributes
  • +
  • fixes a serious potential security bug
  • +
  • added a --format option to xmllint
  • +
+

2.4.1: July 24 2001

+
    +
  • possibility to keep line numbers in the tree
  • +
  • some computation NaN fixes
  • +
  • extension of the XPath API
  • +
  • cleanup for alpha and ia64 targets
  • +
  • patch to allow saving through HTTP PUT or POST
  • +
+

2.4.0: July 10 2001

+
    +
  • Fixed a few bugs in XPath, validation, and tree handling.
  • +
  • Fixed XML Base implementation, added a coupel of examples to the + regression tests
  • +
  • A bit of cleanup
  • +
+

2.3.14: July 5 2001

+
    +
  • fixed some entities problems and reduce mem requirement when + substituing them
  • +
  • lots of improvements in the XPath queries interpreter can be + substancially faster
  • +
  • Makefiles and configure cleanups
  • +
  • Fixes to XPath variable eval, and compare on empty node set
  • +
  • HTML tag closing bug fixed
  • +
  • Fixed an URI reference computating problem when validating
  • +
+

2.3.13: June 28 2001

+
    +
  • 2.3.12 configure.in was broken as well as the push mode XML parser
  • +
  • a few more fixes for compilation on Windows MSC by Yon Derek
  • +
+

1.8.14: June 28 2001

+
    +
  • Zbigniew Chyla gave a patch to use the old XML parser in push mode
  • +
  • Small Makefile fix
  • +
+

2.3.12: June 26 2001

+
    +
  • lots of cleanup
  • +
  • a couple of validation fix
  • +
  • fixed line number counting
  • +
  • fixed serious problems in the XInclude processing
  • +
  • added support for UTF8 BOM at beginning of entities
  • +
  • fixed a strange gcc optimizer bugs in xpath handling of float, gcc-3.0 + miscompile uri.c (William), Thomas Leitner provided a fix for the + optimizer on Tru64
  • +
  • incorporated Yon Derek and Igor Zlatkovic fixes and improvements for + compilation on Windows MSC
  • +
  • update of libxml-doc.el (Felix Natter)
  • +
  • fixed 2 bugs in URI normalization code
  • +
+

2.3.11: June 17 2001

+
    +
  • updates to trio, Makefiles and configure should fix some portability + problems (alpha)
  • +
  • fixed some HTML serialization problems (pre, script, and block/inline + handling), added encoding aware APIs, cleanup of this code
  • +
  • added xmlHasNsProp()
  • +
  • implemented a specific PI for encoding support in the DocBook SGML + parser
  • +
  • some XPath fixes (-Infinity, / as a function parameter and namespaces + node selection)
  • +
  • fixed a performance problem and an error in the validation code
  • +
  • fixed XInclude routine to implement the recursive behaviour
  • +
  • fixed xmlFreeNode problem when libxml is included statically twice
  • +
  • added --version to xmllint for bug reports
  • +
+

2.3.10: June 1 2001

+
    +
  • fixed the SGML catalog support
  • +
  • a number of reported bugs got fixed, in XPath, iconv detection, + XInclude processing
  • +
  • XPath string function should now handle unicode correctly
  • +
+

2.3.9: May 19 2001

+

Lots of bugfixes, and added a basic SGML catalog support:

+
    +
  • HTML push bugfix #54891 and another patch from Jonas Borgström
  • +
  • some serious speed optimisation again
  • +
  • some documentation cleanups
  • +
  • trying to get better linking on solaris (-R)
  • +
  • XPath API cleanup from Thomas Broyer
  • +
  • Validation bug fixed #54631, added a patch from Gary Pennington, fixed + xmlValidGetValidElements()
  • +
  • Added an INSTALL file
  • +
  • Attribute removal added to API: #54433
  • +
  • added a basic support for SGML catalogs
  • +
  • fixed xmlKeepBlanksDefault(0) API
  • +
  • bugfix in xmlNodeGetLang()
  • +
  • fixed a small configure portability problem
  • +
  • fixed an inversion of SYSTEM and PUBLIC identifier in HTML document
  • +
+

1.8.13: May 14 2001

+
  • bugfixes release of the old libxml1 branch used by Gnome
+

2.3.8: May 3 2001

+
    +
  • Integrated an SGML DocBook parser for the Gnome project
  • +
  • Fixed a few things in the HTML parser
  • +
  • Fixed some XPath bugs raised by XSLT use, tried to fix the floating + point portability issue
  • +
  • Speed improvement (8M/s for SAX, 3M/s for DOM, 1.5M/s for + DOM+validation using the XML REC as input and a 700MHz celeron).
  • +
  • incorporated more Windows cleanup
  • +
  • added xmlSaveFormatFile()
  • +
  • fixed problems in copying nodes with entities references (gdome)
  • +
  • removed some troubles surrounding the new validation module
  • +
+

2.3.7: April 22 2001

+
    +
  • lots of small bug fixes, corrected XPointer
  • +
  • Non determinist content model validation support
  • +
  • added xmlDocCopyNode for gdome2
  • +
  • revamped the way the HTML parser handles end of tags
  • +
  • XPath: corrctions of namespacessupport and number formatting
  • +
  • Windows: Igor Zlatkovic patches for MSC compilation
  • +
  • HTML ouput fixes from P C Chow and William M. Brack
  • +
  • Improved validation speed sensible for DocBook
  • +
  • fixed a big bug with ID declared in external parsed entities
  • +
  • portability fixes, update of Trio from Bjorn Reese
  • +
+

2.3.6: April 8 2001

+
    +
  • Code cleanup using extreme gcc compiler warning options, found and + cleared half a dozen potential problem
  • +
  • the Eazel team found an XML parser bug
  • +
  • cleaned up the user of some of the string formatting function. used the + trio library code to provide the one needed when the platform is missing + them
  • +
  • xpath: removed a memory leak and fixed the predicate evaluation + problem, extended the testsuite and cleaned up the result. XPointer seems + broken ...
  • +
+

2.3.5: Mar 23 2001

+
    +
  • Biggest change is separate parsing and evaluation of XPath expressions, + there is some new APIs for this too
  • +
  • included a number of bug fixes(XML push parser, 51876, notations, + 52299)
  • +
  • Fixed some portability issues
  • +
+

2.3.4: Mar 10 2001

+
    +
  • Fixed bugs #51860 and #51861
  • +
  • Added a global variable xmlDefaultBufferSize to allow default buffer + size to be application tunable.
  • +
  • Some cleanup in the validation code, still a bug left and this part + should probably be rewritten to support ambiguous content model :-\
  • +
  • Fix a couple of serious bugs introduced or raised by changes in 2.3.3 + parser
  • +
  • Fixed another bug in xmlNodeGetContent()
  • +
  • Bjorn fixed XPath node collection and Number formatting
  • +
  • Fixed a loop reported in the HTML parsing
  • +
  • blank space are reported even if the Dtd content model proves that they + are formatting spaces, this is for XmL conformance
  • +
+

2.3.3: Mar 1 2001

+
    +
  • small change in XPath for XSLT
  • +
  • documentation cleanups
  • +
  • fix in validation by Gary Pennington
  • +
  • serious parsing performances improvements
  • +
+

2.3.2: Feb 24 2001

+
    +
  • chasing XPath bugs, found a bunch, completed some TODO
  • +
  • fixed a Dtd parsing bug
  • +
  • fixed a bug in xmlNodeGetContent
  • +
  • ID/IDREF support partly rewritten by Gary Pennington
  • +
+

2.3.1: Feb 15 2001

+
    +
  • some XPath and HTML bug fixes for XSLT
  • +
  • small extension of the hash table interfaces for DOM gdome2 + implementation
  • +
  • A few bug fixes
  • +
+

2.3.0: Feb 8 2001 (2.2.12 was on 25 Jan but I didn't kept track)

+
    +
  • Lots of XPath bug fixes
  • +
  • Add a mode with Dtd lookup but without validation error reporting for + XSLT
  • +
  • Add support for text node without escaping (XSLT)
  • +
  • bug fixes for xmlCheckFilename
  • +
  • validation code bug fixes from Gary Pennington
  • +
  • Patch from Paul D. Smith correcting URI path normalization
  • +
  • Patch to allow simultaneous install of libxml-devel and + libxml2-devel
  • +
  • the example Makefile is now fixed
  • +
  • added HTML to the RPM packages
  • +
  • tree copying bugfixes
  • +
  • updates to Windows makefiles
  • +
  • optimisation patch from Bjorn Reese
  • +
+

2.2.11: Jan 4 2001

+
    +
  • bunch of bug fixes (memory I/O, xpath, ftp/http, ...)
  • +
  • added htmlHandleOmittedElem()
  • +
  • Applied Bjorn Reese's IPV6 first patch
  • +
  • Applied Paul D. Smith patches for validation of XInclude results
  • +
  • added XPointer xmlns() new scheme support
  • +
+

2.2.10: Nov 25 2000

+
    +
  • Fix the Windows problems of 2.2.8
  • +
  • integrate OpenVMS patches
  • +
  • better handling of some nasty HTML input
  • +
  • Improved the XPointer implementation
  • +
  • integrate a number of provided patches
  • +
+

2.2.9: Nov 25 2000

+
  • erroneous release :-(
+

2.2.8: Nov 13 2000

+
    +
  • First version of XInclude + support
  • +
  • Patch in conditional section handling
  • +
  • updated MS compiler project
  • +
  • fixed some XPath problems
  • +
  • added an URI escaping function
  • +
  • some other bug fixes
  • +
+

2.2.7: Oct 31 2000

+
    +
  • added message redirection
  • +
  • XPath improvements (thanks TOM !)
  • +
  • xmlIOParseDTD() added
  • +
  • various small fixes in the HTML, URI, HTTP and XPointer support
  • +
  • some cleanup of the Makefile, autoconf and the distribution content
  • +
+

2.2.6: Oct 25 2000:

+
    +
  • Added an hash table module, migrated a number of internal structure to + those
  • +
  • Fixed a posteriori validation problems
  • +
  • HTTP module cleanups
  • +
  • HTML parser improvements (tag errors, script/style handling, attribute + normalization)
  • +
  • coalescing of adjacent text nodes
  • +
  • couple of XPath bug fixes, exported the internal API
  • +
+

2.2.5: Oct 15 2000:

+
    +
  • XPointer implementation and testsuite
  • +
  • Lot of XPath fixes, added variable and functions registration, more + tests
  • +
  • Portability fixes, lots of enhancements toward an easy Windows build + and release
  • +
  • Late validation fixes
  • +
  • Integrated a lot of contributed patches
  • +
  • added memory management docs
  • +
  • a performance problem when using large buffer seems fixed
  • +
+

2.2.4: Oct 1 2000:

+
    +
  • main XPath problem fixed
  • +
  • Integrated portability patches for Windows
  • +
  • Serious bug fixes on the URI and HTML code
  • +
+

2.2.3: Sep 17 2000

+
    +
  • bug fixes
  • +
  • cleanup of entity handling code
  • +
  • overall review of all loops in the parsers, all sprintf usage has been + checked too
  • +
  • Far better handling of larges Dtd. Validating against Docbook XML Dtd + works smoothly now.
  • +
+

1.8.10: Sep 6 2000

+
  • bug fix release for some Gnome projects
+

2.2.2: August 12 2000

+
    +
  • mostly bug fixes
  • +
  • started adding routines to access xml parser context options
  • +
+

2.2.1: July 21 2000

+
    +
  • a purely bug fixes release
  • +
  • fixed an encoding support problem when parsing from a memory block
  • +
  • fixed a DOCTYPE parsing problem
  • +
  • removed a bug in the function allowing to override the memory + allocation routines
  • +
+

2.2.0: July 14 2000

+
    +
  • applied a lot of portability fixes
  • +
  • better encoding support/cleanup and saving (content is now always + encoded in UTF-8)
  • +
  • the HTML parser now correctly handles encodings
  • +
  • added xmlHasProp()
  • +
  • fixed a serious problem with &#38;
  • +
  • propagated the fix to FTP client
  • +
  • cleanup, bugfixes, etc ...
  • +
  • Added a page about libxml Internationalization + support +
  • +
+

1.8.9: July 9 2000

+
    +
  • fixed the spec the RPMs should be better
  • +
  • fixed a serious bug in the FTP implementation, released 1.8.9 to solve + rpmfind users problem
  • +
+

2.1.1: July 1 2000

+
    +
  • fixes a couple of bugs in the 2.1.0 packaging
  • +
  • improvements on the HTML parser
  • +
+

2.1.0 and 1.8.8: June 29 2000

+
    +
  • 1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to + new instructions. It fixes a nasty problem + about &#38; charref parsing
  • +
  • 2.1.0 also ease the upgrade from libxml v1 to the recent version. it + also contains numerous fixes and enhancements: +
      +
    • added xmlStopParser() to stop parsing
    • +
    • improved a lot parsing speed when there is large CDATA blocs
    • +
    • includes XPath patches provided by Picdar Technology
    • +
    • tried to fix as much as possible DtD validation and namespace + related problems
    • +
    • output to a given encoding has been added/tested
    • +
    • lot of various fixes
    • +
    +
  • +
+

2.0.0: Apr 12 2000

+
    +
  • First public release of libxml2. If you are using libxml, it's a good + idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally + scheduled for Apr 3 the relase occured only on Apr 12 due to massive + workload.
  • +
  • The include are now located under $prefix/include/libxml (instead of + $prefix/include/gnome-xml), they also are referenced by +
    #include <libxml/xxx.h>
    +

    instead of

    +
    #include "xxx.h"
    +
  • +
  • a new URI module for parsing URIs and following strictly RFC 2396
  • +
  • the memory allocation routines used by libxml can now be overloaded + dynamically by using xmlMemSetup()
  • +
  • The previously CVS only tool tester has been renamed + xmllint and is now installed as part of the libxml2 + package
  • +
  • The I/O interface has been revamped. There is now ways to plug in + specific I/O modules, either at the URI scheme detection level using + xmlRegisterInputCallbacks() or by passing I/O functions when creating a + parser context using xmlCreateIOParserCtxt()
  • +
  • there is a C preprocessor macro LIBXML_VERSION providing the version + number of the libxml module in use
  • +
  • a number of optional features of libxml can now be excluded at + configure time (FTP/HTTP/HTML/XPath/Debug)
  • +
+

2.0.0beta: Mar 14 2000

+
    +
  • This is a first Beta release of libxml version 2
  • +
  • It's available only fromxmlsoft.org + FTP, it's packaged as libxml2-2.0.0beta and available as tar and + RPMs
  • +
  • This version is now the head in the Gnome CVS base, the old one is + available under the tag LIB_XML_1_X
  • +
  • This includes a very large set of changes. Froma programmatic point of + view applications should not have to be modified too much, check the upgrade page +
  • +
  • Some interfaces may changes (especially a bit about encoding).
  • +
  • the updates includes: +
      +
    • fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly + handled now
    • +
    • Better handling of entities, especially well formedness checking + and proper PEref extensions in external subsets
    • +
    • DTD conditional sections
    • +
    • Validation now correcly handle entities content
    • +
    • change + structures to accomodate DOM
    • +
    +
  • +
  • Serious progress were made toward compliance, here are the result of the test against the + OASIS testsuite (except the japanese tests since I don't support that + encoding yet). This URL is rebuilt every couple of hours using the CVS + head version.
  • +
+

1.8.7: Mar 6 2000

+
    +
  • This is a bug fix release:
  • +
  • It is possible to disable the ignorable blanks heuristic used by + libxml-1.x, a new function xmlKeepBlanksDefault(0) will allow this. Note + that for adherence to XML spec, this behaviour will be disabled by + default in 2.x . The same function will allow to keep compatibility for + old code.
  • +
  • Blanks in <a> </a> constructs are not ignored anymore, + avoiding heuristic is really the Right Way :-\
  • +
  • The unchecked use of snprintf which was breaking libxml-1.8.6 + compilation on some platforms has been fixed
  • +
  • nanoftp.c nanohttp.c: Fixed '#' and '?' stripping when processing + URIs
  • +
+

1.8.6: Jan 31 2000

+
  • added a nanoFTP transport module, debugged until the new version of rpmfind can use + it without troubles
+

1.8.5: Jan 21 2000

+
    +
  • adding APIs to parse a well balanced chunk of XML (production [43] content of the + XML spec)
  • +
  • fixed a hideous bug in xmlGetProp pointed by Rune.Djurhuus@fast.no
  • +
  • Jody Goldberg <jgoldberg@home.com> provided another patch trying + to solve the zlib checks problems
  • +
  • The current state in gnome CVS base is expected to ship as 1.8.5 with + gnumeric soon
  • +
+

1.8.4: Jan 13 2000

+
    +
  • bug fixes, reintroduced xmlNewGlobalNs(), fixed xmlNewNs()
  • +
  • all exit() call should have been removed from libxml
  • +
  • fixed a problem with INCLUDE_WINSOCK on WIN32 platform
  • +
  • added newDocFragment()
  • +
+

1.8.3: Jan 5 2000

+
    +
  • a Push interface for the XML and HTML parsers
  • +
  • a shell-like interface to the document tree (try tester --shell :-)
  • +
  • lots of bug fixes and improvement added over XMas hollidays
  • +
  • fixed the DTD parsing code to work with the xhtml DTD
  • +
  • added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()
  • +
  • Fixed bugs in xmlNewNs()
  • +
  • External entity loading code has been revamped, now it uses + xmlLoadExternalEntity(), some fix on entities processing were added
  • +
  • cleaned up WIN32 includes of socket stuff
  • +
+

1.8.2: Dec 21 1999

+
    +
  • I got another problem with includes and C++, I hope this issue is fixed + for good this time
  • +
  • Added a few tree modification functions: xmlReplaceNode, + xmlAddPrevSibling, xmlAddNextSibling, xmlNodeSetName and + xmlDocSetRootElement
  • +
  • Tried to improve the HTML output with help from Chris Lahey +
  • +
+

1.8.1: Dec 18 1999

+
    +
  • various patches to avoid troubles when using libxml with C++ compilers + the "namespace" keyword and C escaping in include files
  • +
  • a problem in one of the core macros IS_CHAR was corrected
  • +
  • fixed a bug introduced in 1.8.0 breaking default namespace processing, + and more specifically the Dia application
  • +
  • fixed a posteriori validation (validation after parsing, or by using a + Dtd not specified in the original document)
  • +
  • fixed a bug in
  • +
+

1.8.0: Dec 12 1999

+
    +
  • cleanup, especially memory wise
  • +
  • the parser should be more reliable, especially the HTML one, it should + not crash, whatever the input !
  • +
  • Integrated various patches, especially a speedup improvement for large + dataset from Carl Nygard, + configure with --with-buffers to enable them.
  • +
  • attribute normalization, oops should have been added long ago !
  • +
  • attributes defaulted from Dtds should be available, xmlSetProp() now + does entities escapting by default.
  • +
+

1.7.4: Oct 25 1999

+
    +
  • Lots of HTML improvement
  • +
  • Fixed some errors when saving both XML and HTML
  • +
  • More examples, the regression tests should now look clean
  • +
  • Fixed a bug with contiguous charref
  • +
+

1.7.3: Sep 29 1999

+
    +
  • portability problems fixed
  • +
  • snprintf was used unconditionnally, leading to link problems on system + were it's not available, fixed
  • +
+

1.7.1: Sep 24 1999

+
    +
  • The basic type for strings manipulated by libxml has been renamed in + 1.7.1 from CHAR to xmlChar. The reason + is that CHAR was conflicting with a predefined type on Windows. However + on non WIN32 environment, compatibility is provided by the way of a + #define .
  • +
  • Changed another error : the use of a structure field called errno, and + leading to troubles on platforms where it's a macro
  • +
+

1.7.0: sep 23 1999

+
    +
  • Added the ability to fetch remote DTD or parsed entities, see the nanohttp module.
  • +
  • Added an errno to report errors by another mean than a simple printf + like callback
  • +
  • Finished ID/IDREF support and checking when validation
  • +
  • Serious memory leaks fixed (there is now a memory wrapper module)
  • +
  • Improvement of XPath + implementation
  • +
  • Added an HTML parser front-end
  • +
+

Daniel Veillard

+
+ + diff --git a/doc/site.xsl b/doc/site.xsl new file mode 100644 index 00000000..faedf8b0 --- /dev/null +++ b/doc/site.xsl @@ -0,0 +1,354 @@ + + + + + + + + + + + + + + +
+ + + + + + + +
+
+ Main Menu +
+
+ +
+
+
+ + + <xsl:apply-templates/> + + + + + + + + + + + + + + + + +
+ Gnome Logo + W3C Logo + Red Hat Logo + + + + + +
+ + + + +
+ + + + + + +
+
+
+
+ + + + + + intro.html + + + docs.html + + + bugs.html + + + help.html + + + help.html + + + downloads.html + + + news.html + + + contribs.html + + + xsltproc2.html + + + API.html + + + XSLT.html + + + XML.html + + + valid.html + + + tree.html + + + library.html + + + interface.html + + + example.html + + + entities.html + + + architecture.html + + + namespaces.html + + + DOM.html + + + unknown.html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + +
+ + + + + + +
+ + + + +
+ + + + +
+ +

Daniel Veillard

+
+
+
+
+
+ + +
+
+ + + + + + + + + + + + + + + + + +
+ + + + + +
+ + + + + + +
+ + + + +
+ + + + +
+ + + + + + +

Daniel Veillard

+
+
+
+
+
+ +
+ + + + + + + + + + + +
diff --git a/doc/tree.html b/doc/tree.html new file mode 100644 index 00000000..d3f942a8 --- /dev/null +++ b/doc/tree.html @@ -0,0 +1,110 @@ + + + + + +The tree output + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

The tree output

+
+
+ + +
+ + +
Main Menu
+

The parser returns a tree built during the document analysis. The value +returned is an xmlDocPtr (i.e., a pointer to an +xmlDoc structure). This structure contains information such +as the file name, the document type, and a children pointer +which is the root of the document (or more exactly the first child under the +root which is the document). The tree is made of xmlNodes, +chained in double-linked lists of siblings and with a children<->parent +relationship. An xmlNode can also carry properties (a chain of xmlAttr +structures). An attribute may have a value which is a list of TEXT or +ENTITY_REF nodes.

+

Here is an example (erroneous with respect to the XML spec since there +should be only one ELEMENT under the root):

+

 structure.gif

+

In the source package there is a small program (not installed by default) +called xmllint which parses XML files given as argument and +prints them back as parsed. This is useful for detecting errors both in XML +code and in the XML parser itself. It has an option --debug +which prints the actual in-memory structure of the document; here is the +result with the example given before:

+
DOCUMENT
+version=1.0
+standalone=true
+  ELEMENT EXAMPLE
+    ATTRIBUTE prop1
+      TEXT
+      content=gnome is great
+    ATTRIBUTE prop2
+      ENTITY_REF
+      TEXT
+      content= linux too 
+    ELEMENT head
+      ELEMENT title
+        TEXT
+        content=Welcome to Gnome
+    ELEMENT chapter
+      ELEMENT title
+        TEXT
+        content=The Linux adventure
+      ELEMENT p
+        TEXT
+        content=bla bla bla ...
+      ELEMENT image
+        ATTRIBUTE href
+          TEXT
+          content=linus.gif
+      ELEMENT p
+        TEXT
+        content=...
+

This should be useful for learning the internal representation model.

+

Daniel Veillard

+
+ + diff --git a/doc/valid.html b/doc/valid.html new file mode 100644 index 00000000..fc24a4b8 --- /dev/null +++ b/doc/valid.html @@ -0,0 +1,93 @@ + + + + + +Validation, or are you afraid of DTDs ? + + + + + +
+Gnome LogoW3C LogoRed Hat Logo +
+

The XML C library for Gnome

+

Validation, or are you afraid of DTDs ?

+
+
+ + +
+ + +
Main Menu
+

Well what is validation and what is a DTD ?

+

Validation is the process of checking a document against a set of +construction rules; a DTD (Document Type Definition) is such +a set of rules.

+

The validation process and building DTDs are the two most difficult parts +of the XML life cycle. Briefly a DTD defines all the possibles element to be +found within your document, what is the formal shape of your document tree +(by defining the allowed content of an element, either text, a regular +expression for the allowed list of children, or mixed content i.e. both text +and children). The DTD also defines the allowed attributes for all elements +and the types of the attributes. For more detailed information, I suggest +that you read the related parts of the XML specification, the examples found +under gnome-xml/test/valid/dtd and any of the large number of books available +on XML. The dia example in gnome-xml/test/valid should be both simple and +complete enough to allow you to build your own.

+

A word of warning, building a good DTD which will fit the needs of your +application in the long-term is far from trivial; however, the extra level of +quality it can ensure is well worth the price for some sets of applications +or if you already have already a DTD defined for your application field.

+

The validation is not completely finished but in a (very IMHO) usable +state. Until a real validation interface is defined the way to do it is to +define and set the xmlDoValidityCheckingDefaultValue +external variable to 1, this will of course be changed at some point:

+

extern int xmlDoValidityCheckingDefaultValue;

+

...

+

xmlDoValidityCheckingDefaultValue = 1;

+

+

To handle external entities, use the function +xmlSetExternalEntityLoader(xmlExternalEntityLoader f); to +link in you HTTP/FTP/Entities database library to the standard libxml +core.

+

@@interfaces@@

+

Daniel Veillard

+
+ + diff --git a/doc/xml.html b/doc/xml.html index e927025c..19f3de84 100644 --- a/doc/xml.html +++ b/doc/xml.html @@ -8,14 +8,9 @@ -

-

The XML C library for Gnome

-

libxml, a.k.a. gnome-xml

+

libxml, a.k.a. gnome-xml