Tue 30 Dec 2014

## Fast Circular Substitution

Posted by Edward Kmett under Algorithms , Haskell , Type Theory[3] Comments

Emil Axelsson and Koen Claessen wrote a functional pearl last year about Using Circular Programs for Higher-Order Syntax.

About 6 months ago I had an opportunity to play with this approach in earnest, and realized we can speed it up a great deal. This has kept coming up in conversation ever since, so I've decided to write up an article here.

In my bound library I exploit the fact that monads are about substitution to make a monad transformer that manages substitution for me.

Here I'm going to take a more coupled approach.

To have a type system with enough complexity to be worth examining, I'll adapt Dan Doel's UPTS, which is a pure type system with universe polymorphism. I won't finish the implementation here, but from where we get it should be obvious how to finish the job.

Unlike Axelsson and Claessen I'm not going to bother to abstract over my name representation.

To avoid losing the original name from the source, we'll just track names as strings with an integer counting the number of times it has been 'primed'. The name is purely for expository purposes, the real variable identifier is the number. We'll follow the Axelsson and Claessen convention of having the identifier assigned to each binder be larger than any one bound inside of it. If you don't need he original source names you can cull them from the representation, but they can be useful if you are representing a syntax tree for something you parsed and/or that you plan to pretty print later.

data Name = Name String Int deriving (Show,Read) hint :: Name -> String hint (Name n _) = n nameId :: Name -> Int nameId (Name _ i) = i instance Eq Name where (==) = (==) `on` nameId instance Ord Name where compare = compare `on` nameId prime :: String -> Int -> Name prime n i = Name n (i + 1)

So what is the language I want to work with?

type Level = Int data Constant = Level | LevelLiteral {-# UNPACK #-} !Level | Omega deriving (Eq,Ord,Show,Read,Typeable) data Term a = Free a | Bound {-# UNPACK #-} !Name | Constant !Constant | Term a :+ {-# UNPACK #-} !Level | Max [Term a] | Type !(Term a) | Lam {-# UNPACK #-} !Name !(Term a) !(Term a) | Pi {-# UNPACK #-} !Name !(Term a) !(Term a) | Sigma {-# UNPACK #-} !Name !(Term a) !(Term a) | App !(Term a) !(Term a) | Fst !(Term a) | Snd !(Term a) | Pair !(Term a) !(Term a) !(Term a) deriving (Show,Read,Eq,Ord,Functor,Foldable,Traversable,Typeable)

That is perhaps a bit paranoid about remaining strict, but it seemed like a good idea at the time.

We can define capture avoiding substitution on terms:

subst :: Eq a => a -> Term a -> Term a -> Term a subst a x y = y >>= \a' -> if a == a' then x else return a'

Now we finally need to implement Axelsson and Claessen's circular programming trick. Here we'll abstract over terms that allow us to find the highest bound value within them:

class Bindable t where bound :: t -> Int

and instantiate it for our `Term`

type

instance Bindable (Term a) where bound Free{} = 0 bound Bound{} = 0 -- intentional! bound Constant{} = 0 bound (a :+ _) = bound a bound (Max xs) = foldr (\a r -> bound a `max` r) 0 xs bound (Type t) = bound t bound (Lam b t _) = nameId b `max` bound t bound (Pi b t _) = nameId b `max` bound t bound (Sigma b t _) = nameId b `max` bound t bound (App x y) = bound x `max` bound y bound (Fst t) = bound t bound (Snd t) = bound t bound (Pair t x y) = bound t `max` bound x `max` bound y

As in the original pearl we avoid traversing into the body of the binders, hence the _'s in the code above.

Now we can abstract over the pattern used to create a binder in the functional pearl, since we have multiple binder types in this syntax tree, and the code would get repetitive.

binder :: Bindable t => (Name -> t) -> (Name -> t -> r) -> String -> (t -> t) -> r binder bd c n e = c b body where body = e (bd b) b = prime n (bound body) lam, pi, sigma :: String -> Term a -> (Term a -> Term a) -> Term a lam s t = binder Bound (`Lam` t) s pi s t = binder Bound (`Pi` t) s sigma s t = binder Bound (`Sigma` t) s

We may not always want to give names to the variables we capture, so let's define:

lam_, pi_, sigma_ :: Term a -> (Term a -> Term a) -> Term a lam_ = lam "_" pi_ = pi "_" sigma_ = sigma "_"

Now, here's the interesting part. The problem with Axelsson and Claessen's original trick is that every substitution is being handled separately. This means that if you were to write a monad for doing substitution with it, it'd actually be quite slow. You have to walk the syntax tree over and over and over.

We can fuse these together by making a single pass:

instantiate :: Name -> t -> IntMap t -> IntMap t instantiate = IntMap.insert . nameId rebind :: IntMap (Term b) -> Term a -> (a -> Term b) -> Term b rebind env xs0 f = go xs0 where go = \case Free a -> f a Bound b -> env IntMap.! nameId b Constant c -> Constant c m :+ n -> go m :+ n Type t -> Type (go t) Max xs -> Max (fmap go xs) Lam b t e -> lam (hint b) (go t) $ \v -> rebind (instantiate b v env) e f Pi b t e -> pi (hint b) (go t) $ \v -> rebind (instantiate b v env) e f Sigma b t e -> sigma (hint b) (go t) $ \v -> rebind (instantiate b v env) e f App x y -> App (go x) (go y) Fst x -> Fst (go x) Snd x -> Snd (go x) Pair t x y -> Pair (go t) (go x) (go y)

Note that the `Lam`

, `Pi`

and `Sigma`

cases just extend the current environment.

With that now we can upgrade the pearl's encoding to allow for an actual Monad in the same sense as `bound`

.

instance Applicative Term where pure = Free (< *>) = ap instance Monad Term where return = Free (>>=) = rebind IntMap.empty

To show that we can work with this syntax tree representation, let's write an evaluator from it to weak head normal form:

First we'll need some helpers:

apply :: Term a -> [Term a] -> Term a apply = foldl App rwhnf :: IntMap (Term a) -> [Term a] -> Term a -> Term a rwhnf env stk (App f x) = rwhnf env (rebind env x Free:stk) f rwhnf env (x:stk) (Lam b _ e) = rwhnf (instantiate b x env) stk e rwhnf env stk (Fst e) = case rwhnf env [] e of Pair _ e' _ -> rwhnf env stk e' e' -> Fst e' rwhnf env stk (Snd e) = case rwhnf env [] e of Pair _ _ e' -> rwhnf env stk e' e' -> Snd e' rwhnf env stk e = apply (rebind env e Free) stk

Then we can start off the `whnf`

by calling our helper with an initial starting environment:

whnf :: Term a -> Term a whnf = rwhnf IntMap.empty []

So what have we given up? Well, `bound`

automatically lets you compare terms for alpha equivalence by quotienting out the placement of "F" terms in the syntax tree. Here we have a problem in that the identifiers we get assigned aren't necessarily canonical.

But we can get the same identifiers out by just using the monad above:

alphaEq :: Eq a => Term a -> Term a -> Bool alphaEq = (==) `on` liftM id

It makes me a bit uncomfortable that our monad is only up to alpha equivalence and that `liftM`

swaps out the identifiers used throughout the entire syntax tree, and we've also lost the ironclad protection against exotic terms.

But overall, this is a much faster version of Axelsson and Claessen's trick and it can be used as a drop-in replacement for something like `bound`

in many cases, and unlike bound, it lets you use HOAS-style syntax for constructing `lam`

, `pi`

and `sigma`

terms.

With pattern synonyms you can prevent the user from doing bad things as well. Once 7.10 ships you'd be able to use a bidirectional pattern synonym for `Pi`

, `Sigma`

and `Lam`

to hide the real constructors behind. I'm not yet sure of the "best practices" in this area.

Here's the code all in one place:

Happy Holidays,

-Edward

December 30th, 2014 at 3:13 pm

One small optimization missing that was in Axelsson/Claessen, ยง3.3, is the bound Lam case may be written as

bound (Lam b t _) = nameId b

Also applies to Pi and Sigma. Does this property no longer hold?

December 30th, 2014 at 3:23 pm

That is there for the body of Lam, hence the _ that is there, but the type of the lambda doesn’t have the bound variable occurring inside of it, and so isn’t included in that guarantee and has to be added separately.

January 1st, 2015 at 6:02 pm

Thanks a lot, that’s the kind of Bound

example that I was waiting for: bigger than

the ones I have seen so far, but still digestible. I will take a closer look soon.