2.4. Filter combinators

2.4.1. Introduction

Functional programming languages like Haskell allow the use of higher-order functions, which take some functions as input, return a function as a result, or both. Filter combinators are higher-order functions, or operators, for combining several filters (see Section 2.3). By combining simple filters, more complex functions can be created. Because all filters share the same type, it is possible that any filter can be composed with any other. All filters can be mixed and matched freely, because all functions in Haskell are side-effect free.

The idea of filter combinators was adapted and extended from HaXml [WWW21]. In their paper "Haskell and XML: Generic Combinators or Type-Based Translation?" [WWW22] Malcom Wallace and Colin Runciman describe various algebraic laws for their combinators.

The use of combinators makes it possible to hide details of programming over data structures. All details of data structure manipulation are hidden in combinator functions. In effect, the combinators define problem specific control structures. With this approach it is possible to reach a form of expression that is natural for the problem itself.

Conceptually, combinators are much like Unix pipes in that they allow building up more complex computational sequences by flexibly arranging highly specialized tools [WWW24]. The equivalent to Unix pipes it the "Irish composition" combinator, represented by the infix operator "o". This combinator applies one filter to the results of another one. This is similar to a pipe, which passes the output of one program as the input to another one.

The combinator library provides all functions that are necessary for traversing and processing XML documents represented as an XmlTree. Haskell allows the definition of own infix operator symbols. Some combinators are defined as infix operators where it seemed more natural.

2.4.2. List of combinators

Basic filter combinators

o :: (a -> [b]) -> (c -> [a]) -> c -> [b]

Irish composition. Sequential composition of two filters. The left filter is applied to the result of the right filter. (conjunction)

(+++) :: (a -> [b]) -> (a -> [b]) -> (a -> [b])

Binary parallel composition, the function unifies the results of two filters sequential. Each filter uses a copy of state. (union)

cat :: [a -> [b]] -> (a -> [b])

Concatenates the results of all filters in a list, a list version of union +++. The combinator sticks lots of filters together.

($$) :: (a -> [b]) -> [a] -> [b]

Applies a filter to a list.

processChildren :: TFilter node -> TFilter node

Applies a filter to the child list of a node.

Choice combinators

orElse :: (a -> [b]) -> (a -> [b]) -> (a -> [b])

Directional choice. Second filter is only applied if the first one produces an empty list.

(?>) :: (a -> [b]) -> ThenElse (a -> [b]) -> (a -> [b])

In combination with the type ThenElse a = a :> a this combinator models an expression that resembles the conditional expression "p ? f : g" of C: "p ?> f :> g". If the predicate filter p is true, the filter f is applied otherwise g is applied.

when :: TFilter node -> TFilter node -> TFilter nod

First filter is only applied to the passed node, if the second filter produces a list with contents, otherwise a list with the passed node is returned. The second filter is typically a predicate filter.

Definition: f `when` g = g ?> f :> this

whenNot :: TFilter node -> TFilter node -> TFilter node

First filter is only applied to the passed node, if the second filter produces an empty list, otherwise a list of the passed node is returned. The second filter is typically a predicate filter.

Definition: f `whenNot` g = g ?> this :> f

guards :: TFilter node -> TFilter node -> TFilter node

First filter is only applied to the passed node, if the second filter produces a list with contents, otherwise an empty list is returned. The second filter is typically a predicate filter.

Definition: g `guards` f = g ?> f :> none

containing :: (a -> [b]) -> (b -> [c]) -> a -> [b]

Pruning: Keep only those nodes for which the second filter produces a list with contents and apply the first filter to these nodes.

Definition: f `containing` g = filter (not . null . g) . f

notContaining :: (a -> [b]) -> (b -> [c]) -> a -> [b]

Pruning: Keep only those nodes for which the second filter produces an empty list and apply the first filter to these nodes.

Definition: f `notContaining` g = filter (null . g) . f

(/>) :: TFilter node -> TFilter node -> TFilter node

Interior search, pronounced "slash". First filter is applied to a node, the children of the result are taken and the second filter is applied to them, these children are returned. (meaning g inside f)

Definition: f /> g = g `o` getChildren `o` f

(</) :: TFilter node -> TFilter node -> TFilter node

Exterior search, pronounced "outside". First filter is applied to a node, the second to its children. If both filters return a result, the node is returned. (meaning f containing g)

Definition: f </ g = f `containing` (g `o` getChildren)

Recursive search combinators

deep :: TFilter node -> TFilter node

Filter is applied to each node of the tree. If the filter returns a result, processing is stopped. If not, the filter is applied to the next node of the tree. (top down traversal)

deepest :: TFilter node -> TFilter node

See deep, but bottom up traversal.

multi :: TFilter node -> TFilter node

Returns all successful applications of the filter to the whole tree.

Recursive transformation combinators

applyBottomUp :: TFilter node -> TFilter node

Constructs a new tree by applying the passed filter to each node of the tree. The tree is traversed bottom-up.

applyTopDown :: TFilter node -> TFilter node

Constructs a new tree by applying the passed filter to each node of the tree. The tree is traversed top-down.

applyBottomUpIfNot :: TFilter node -> TFilter node -> TFilter node

Constructs a new tree with a guarded top down transformation. The first filter is only applied to those nodes for which the second filter produces a result.

Monadic composition

liftM :: Monad m => (a -> [b]) -> a -> m [b]

Lift a normal filter to a monadic filter so that it can be used in monads.

(.<) :: Monad m => (b -> m [c]) -> (a -> m[b]) -> (a -> m[c])

Monadic Irish composition. Sequential composition of two filters. The first filter is applied to the result of the second filter (conjunction). Monadic version of the basic filter o.

($$<) :: Monad m => (a -> m [b]) -> [a] -> m [b]

Apply a monadic filter to a list. Monadic version of the basic filter combinator cat.

Special Filter combinators for XmlTree

whenOk :: XmlFilter -> XmlFilter ->XmlFilter

Applies the first filter to a node and checks the result for errors. In case of an error it stops processing and returns the list with the XError node. Otherwise processing is continued and the second filter is applied to the result.

whenOkM :: Monad m => (a -> XmlTrees) -> (XmlTrees -> m XmlTrees) -> a -> m XmlTrees

Monadic version of whenOk.

2.4.3. Binding power and associativity

Some combinators like the basic filter combinators and choice combinators can be expressed more naturally by operators. In Haskell completely new operators can be invented by using infix versions of functions and defining precedences and associativities for them.

The combinator operators are listed in following tables in decreasing order of their binding power. Combinators defined as prefix functions have to be written in back-quotes to be used as operators.

Table 2-1. Binding power and associativity of functional combinators

Precedence levelOperatorAssociativity
6 (high) `containing`, `notContaining` left
5 `o`, +++ right
5 />, </, `orElse` left
4 `when`, `whenNot`, `guards`, `whenOk` right
3 ?>, :> right
0 (low) $$ right

Table 2-2. Binding power and associativity of monadic combinators

Precedence levelOperatorAssociativity
5 (high) .< right
4 `whenOkM` right
0 (low) $$< right