Haskell XML Toolbox 9.2.0

Contents


Introduction

The Haskell XML Toolbox is a collection of tools for processing XML with Haskell. It is purely written in Haskell. The Haskell XML Toolbox is a project of the University of Applied Sciences Wedel,

The main design goal of the Haskell XML Toolbox is the support of various XML standards including Extensible Markup Language (XML) 1.0 (Second Edition) with DTD processing and Validation, Namespaces in XML 1.0 (Second Edition), XML Path Language (XPath), XSL Transformations (XSLT), RELAX NG Specification, as well as HTML/XHTML processing.

Description

The Haskell XML Toolbox bases on the ideas of HaXml and HXML, but introduces a more general and flexible approach for processing XML with Haskell. The Haskell XML Toolbox uses a generic data model for representing XML documents, including the DTD subset and the document subset, in Haskell. This data model makes it possible to use filter functions as a uniform design of XML processing applications. The processing filters are implemented as arrows. This is more flexible than the filter approach from HXML and HaXml, but all filter applications can easily be transformed into arrows.

Since version 5.2 HXT works with arrows instead of filters. The filter part has been separated from this library and is available in an extra package (see HXT with Filters) There is a cookbook for using this arrow interface to build (nontrivial) applications. Manuel Ohlendorf has developed examples for processing RDF and has documented the development in his master thesis: A Cookbook for the Haskell XML Toolbox with Examples for Processing RDF Documents (the thesis as PDF)

New Features in hxt-9.2:

New Features in hxt-9.1:

Features:

Documentation

The API documentation of the HXT packages is available on hackage and can be found via Hayoo!.

A (somewhat) gentle introduction to HXT is avalable in the Haskell Wiki. HXT-9 is not downwards compatible to older versions. When upgrading sources to HXT-9, especially dealing with configuration options has to be changed. The HXT Wiki page contains hints to upgrade old sources. Conversion between XML and native Haskell data is described in another Haskell Wiki page HXT: Conversion of Haskell data from/to XML with picklers. A description of the XML Schema regex package including various examples can be found in Regular expressions for XML Schema

The XSLT transformer has been developed by Tim Walkenhorst in this master thesis: Implementing an XSLT processor for the Haskell XML Toolbox. It's a rather complete implementation, but it's of course not a substitute for Xalan or other advanced XSLT systems. The XSLT module consists of less than 2000 lines of code. Compared with the more than 300,000 lines of Java for Xalan, this Haskell code can be viewed as one of the first formal specifications for XSLT.

Manuel Ohlendorfs master thesis, describing the arrow interface of the toolbox: A Cookbook for the Haskell XML Toolbox with Examples for Processing RDF Documents (the thesis as PDF). The source code of the example application is included in the doc/cookbook directory of the distribution.

The master's thesis "Design and Implementation of a validating XML parser in Haskell" by Martin Schmidt describes the design and motivation of the Haskell XML Toolbox (the thesis as HTML or PDF) and the development of the DTD validator module. The documentation in the thesis is a bit out of date, the modules and module names and some function names have been changed. For details the online haddock documentation should be used.

The description of the development of the XPath modules is described (in german) in Konzeption und Implementierung eines XPath-Moduls für die Haskell XML Toolbox (PDF-document).

The description of the internals of the Relax NG validator modules is described (in german) in Design und Entwicklung eines Relax NG Schema Validators auf Basis der Haskell XML Toolbox (PDF-document).

Requirements

It is recommended to install the HXT packages available from Hackage.

HXT Downloads

With HXT-9 the toolbox has been split into smaller packages. The core package includes the validating XML parser and an error tolerant HTML parser, all the XML/HTML processing arrows and the picklers for conversion from/to native Haskell data. This package does not depend on any HTTP library. For HTTP access the packages hxt-http or hxt-curl are available. The core package and the extensions require some basic funtionality, which could be useful for other (none XML/HTML) projects. These are automatically installed by cabal when installing the main packages.

  • hxt: The HXT core
  • hxt-http: Binding to HTTP package
  • hxt-curl: Binding to LibCurl package
  • hxt-xpath:XPath extension
  • hxt-xslt: XSLT transformer
  • hxt-relaxng: RelaxNG validator
  • hxt-tagsoup: Lazy HTML parser based on tagsoup
  • hxt-expat: Binding to expat parser via hexpat package
  • hxt-charproperties: Basic package for XML and Unicode character properties
  • hxt-unicode: Basic package for decoding various encodings into Unicode
  • hxt-regex-xmlschema: A lightweight and stand alone regex package for XML Schema regular expressions
  • hxt-cache: A cache for parsed XML/HTML pages, stored as DOM in binary format
  • There are various examples included in the .tar archives, usually in subdirectories example.

    A git repository is available on GitHub. This repository contains all HXT packages.

    Installation

    Installation is done like usual via cabal: cabal install hxt

    Known problems and limitations

    The parser has been tested with the XML Validation Suite form the W3C. The following problems have been encountered:

    Portability

    Portability to Windows based systems has not been tested very intensively, but did work on an XP system with the Cygwin tools installed. Development was done under Linux with GHC 6.12 with the -Wall flag. No warnings were issued, when compiling the sources.

    HXT with Filters

    The old filter variant of HXT hxt-filter is not supported by the HXT-9 version any more.

    Related work

    Feedback

    We are interested in hearing your feedback on our Haskell XML Toolbox, suggestions for improvements, comments and criticisms.

    Mail address is hxmltoolbox@fh-wedel.de


    The Haskell XML Toolbox is distributed under the MIT License. Valid HTML 4.01!
    Last modified: 2012-01-24