Minimal XML - Dom oriented library

By Giancarlo Niccolai

For the impatients: download it!

NEWS!

2004-04-11: Mxmlplus 0.9.2 is out. This version adds several bug fixes, MSVC build environment and a path oriented iterator. A little more bug cleanup and version 1.0 will be ready!
Also, the MXMLPlus online documentation has been updated and improved.

2004-01-11: Mxmlplus for windows is out. Now mxml plus is compiled as a .LIB file with BorlandC freetools. The projects now has a makefile.bc that is used with Borland's make and a make_b32 bat that helps automatize some compilation tasks. Check it in our project page, or have a look at the MXMLPlus online documentation

NEWS!

2003-08-15: Mxmlplus is out. MXMLPlus is a C++ library based on the engine of MXML (but integrated with stl:: classes) that merges the power and DOM concepts you find in MXML with a real object oriented environment. Check it in our project page, or have a look at the MXMLPlus online documentation

Abstract

Mxml is a pure C library (yet having an object oriented layout) that is meant to help developers implementing XML file interpretation in their projects. The compact design allows to put it in any project, for how small it desires to be (an average program will grow from 15 to 30 kb when including it).

The self-contained api has anything an average program needs to import the content of an XML file in a dom oriented tree-like, and to search for the relevant data. Some constructs (as the ability to load an existing unique data subitem of a node in the parent item itself) makes it useful for program configuration or database oriented XMl. XML files can be then modified directly or by the use of iterator objects, and then written, or they can be generated on-the-fly by the program.

MXML has also a set of little utilities that programmer may find useful to have indipendently from the XML parsing ability: the self growing string, that is a string on which data can be written at will, and handles reallocations of memory in a smart and efficient way, other than the refiller and the output, two ready to use I/O abstractors for C, which allow to provide a very small callback function to handle the underlying stream. So MXML is able to use user-defined functions that read or write data from/to any kind of streamable support. Support for serialization on ansi C stdio, Unix open* family functions and self growing string/char * streams is already provided in the library.

MXML also provides callbacks for progression checking, and eventually on-line parsing of the XML stream as it builds up (like i.e. expat)

MXML does not provides XML validation (DTD or schema); consult the TODO list in the sources to see some features that are in an almost-ready status

Data model

MXML has three kind of objects that you will be able to deal with: the documents, the nodes and the iterators.

A document has mainly a root node, which holds all the top-level document nodes. Even if only one root -tag- node is allowed in a valid XML file, it is possible to have more non-tag nodes, like comments, processing instructions, and directives. In a document object, MXML also stores eventual errors and the line where an error has been encountered. The document is also used to carry around "formatting style" requests in input or output.

A node is the minimal unit of information; MXML distinguishes among 6 kind of nodes:

Iterators are meant to access easily the document nodes structure, allowing to partition the tree in subtrees.

Nodes in-depth

Each node has a name, a set of attributes (each of which is a couple of string, the name and the value), a data and the links to the siblings nodes. Some node types can have some of this values not set. In example, both processing instructions and directives has a name (the string immediately following the opening tag) and a data (the content of the tag), but they have not an attribute set. Data and comments notes has only data, while tag nodes have pretty everything. Document nodes are empty, "transparent".

To simplify the work of programmers having to scan configuration oriented XML documents, if a tag node has only one data node among its children, its data element will be valorized with the data element of that child, and the data node will be removed (thus "flattening" the structure). Is it also possible to create a new tag node with it's data element set to a string. If, later on, there is the necessity to add a new data node to the item, it does not matters if it has the data element valorized: on output, both the data element and all the children data node will be correctly written. The data element is always written after all the chilren nodes.

Navigation in the node tree is guaranteed by fuor node attributes, pointing respectively to the next node in the same level of the hyerarcy (and with the same parent), to the previous one, to the first node of the node children, and finally to the parent. So, to traverse the whole tree, one has to start from the root node and then descend recursively in the first child node: then all the "brothers" of that node are scanned up to the last node in the tree.

Utilities for node management

The API is discussed in detail in the documentation; what is necessary to bring here under the attention of the reader, is that some utilities are provided to access/modify the content of the attribute list of the nodes, and to retreive the depth and, eventually, the path of a node.

The depth of a node is its distance from the root node, counted in steps necessary to reach it or his first brother.

The path of a node is the list of the name of all its ancestors, plus the node name, separated by a "/" character: /main/item is a path indicating that the node named item can be found immediately below the node named "main", which is at toplevel. Node paths are not uniques, moreover, only a node having a name, and whose parents are all named, can have a path.

Naming conventions

MXML naming conventions are rigid ways in which each function, structure and data is called. Each function in the library begins with "mxml_", the name of the object that the function refers to and the operation on that object. In example, the function to create a new node is "mxml_node_new()". Mxml function namings are not necessarily restricted to structures, but could also refer to abstract objects, like "mxml_path_*" (which operates on strings represenging node paths), or "mxml_attribute_*" (which handle the attribute lists for nodes). Every symbol defined in .h file begins with "MXML_". So, a document is typedeffed as an MXML_DOCUMENT, and a node is a structure named MXML_NODE.

Api documentation

Here is the api documentation. Currently, the data is roughly created with doxygen, but a complete documentation effort is under way. Have a look at test/mxml_test.c, in the source package, for a pretty complete reference of what you can do and how with mxml.

Write the author

Comments, and hands, are always welcome. Write at Giancarlo Niccolai: antispam /at/ niccolai.ws for comments, help, code slices, poetry, anything ;-)
Mxml is developed with the help of SourceForge