5.2. The Haskell XML Toolbox in comparison to HaXml and HXML

In this section the Haskell XML Toolbox is compared with HaXml [WWW21] and HXML [WWW25]. Many valuable ideas of HaXml and HXML have been adopted by the Haskell XML Toolbox. It is not the intention of this section to run down these great projects. The intention is to show how their ideas have been extended and generalized.

The Haskell XML Toolbox, HaXml and HXML differ in the way how XML documents are represented in Haskell. First the approaches of HXML and HaXml are introduced, after this the data model of the Haskell XML Toolbox is compared with them.

In HXML, XML document subsets are represented as a Tree of XMLNodes. The hierarchical structure of XML documents is modeled by the generic tree data type Tree. This type does not distinguish between inner nodes and leafs. Leafs are just nodes with an empty list of children.

Example 5-1. Document subset in HXML


data Tree a  = Tree a [Tree a]

type XML	 = Tree XMLNode
data XMLNode =
      RTNode                 -- root node
    | ELNode GI AttList      -- element node: GI, attributes
    | TXNode String          -- text node
    | PINode Name String     -- processing instruction (target,value)
    | CXNode String          -- comment node
    | ENNode Name            -- general entity reference
      deriving Show
			

DTDs are modeled totally different in HXML. They are represented by named fields and are not stored in the tree model where the document subset is stored.

Example 5-2. DTD subset in HXML


data DTD = DTD {
    elements :: FM.FM Name ELEMTYPE,    -- elemtps / element types
    attlists :: FM.FM Name [ATTDEF],    -- elemtype.attdefs
    genents  :: FM.FM Name EntityText,  -- general entities
    parments :: FM.FM Name EntityText,  -- parameter entities
    notations:: [DCN],                  -- nots/notations
    dtdname  :: Name                    -- name (document type name)
} 	deriving Show
			

HaXml's representation of XML documents differs totally from the approach HXML and the Haskell XML Toolbox use. Instead of modeling XML documents with a generic tree type, HaXml uses a more data centric approach. The whole structure of XML documents is modeled by different algebraic data types. There exist special types for almost each production of the XML 1.0 specification [WWW01]. XML documents are modeled by the data type Document, which consists of a Prolog and the document subset, an Element. This data model distinguishes clearly between leafs and inner nodes. Leafs are types which constructors do not take any arguments.

Example 5-3. XML documents in HaXml


data Document = Document Prolog (SymTab EntityDef) Element
data Prolog   = Prolog (Maybe XMLDecl) (Maybe DocTypeDecl)
data XMLDecl  = XMLDecl VersionInfo (Maybe EncodingDecl) (Maybe SDDecl)
...
			

The document subset is modeled in HaXml by the algebraic types Element and Content. An Element has an attribute list and a list of Content types. If the content list is empty, the element is a leaf. Together these types define a mutually recursive, multi-branch tree.

Example 5-4. Document subset in HaXml


data Element   = Elem Name [Attribute] [Content]

type Attribute = (Name, AttValue)
data Content   = CElem Element
               | CString Bool CharData
               | CRef Reference
               | CMisc Misc
			

HaXml introduced the idea of using filter functions and combinators for processing parts of the XML data model. The examples from the previous chapters show that this approach is very powerful and flexible. The whole XML parser of the Haskell XML Toolbox bases on filters. The filters of HaXml work for nodes of the type Content.

Example 5-5. The filter type of HaXml


type CFilter   = Content -> [Content]
			

The Haskell XML Toolbox uses the most generic data model in contrast to HaXml and HXML. Its data model is a generalization of the data models discusses above.

The generic tree data model NTree of the Haskell XML Toolbox forms the basis for representing XML documents in Haskell. This type does not distinguish between inner nodes and leafs. Leafs are just nodes with an empty child list. The most important aspect is that this generic tree data model represents a whole XML document, including the DTD subset, the document subset and all other logical units of XML. Two algebraic data types XNode and DTDElem are used to represent all logical units of XML.

Example 5-6. XML documents represented in the Haskell XML Toolbox


data NTree  node = NTree node (NTrees node)
type NTrees node = [NTree node]

type XmlTree  = NTree  XNode
type XmlTrees = NTrees XNode

data XNode =
      XTag TagName TagAttrl
    | XDTD DTDElem TagAttrl
    | ...

data DTDElem =
      DOCTYPE
    | ELEMENT
    | ATTLIST
    | ...
			

HXML uses the same generic data model as the Haskell XML Toolbox for representing the document subset, but DTDs are represented by a totally different model: named fields. HaXml uses special types for XML's logical units. This leads to the fact that the DTD subset is modeled totally different than the document subset.

The advantage of representing the whole XML document by one generic data type lies in the fact that one unique design for processing the whole document can be used. Because all logical parts of XML are modeled by one generic data model, filters (see Section 2.3) can be used to process the whole XML document and not only parts as in HaXml and HXML.

HaXml's filters can only work on the type Content that just represents a small part of XML documents, the document subset. If one wants to process other parts of an XML document, one cannot use filters any more, but has to implement special functions. The same applies for HXML.

The generalization used in the Haskell XML Toolbox makes the design of applications that process whole XML documents very uniform. In effect the design of the whole XML parser of the Haskell XML Toolbox bases on filters. Merging of the internal DTD part and external DTD part is done by filters, checking the validity constraints of DTDs and document subsets is done by filters, or processings like transforming the whole XmlTree back to XML is done by filters.