The Comonad.Reader

Internalized Guarded Recursion for Equational Reasoning

2022-10-24T17:47:24Z

I recently presented a paper on infinite traversals at the Haskell Symposium: A totally predictable outcome: an investigation of traversals of infinite structures. The main result there is a characterization of when a call to traverse on an infinite Traversable functor (like an infinite lazy list) yields a non-bottom result. It turns out this is a condition on the Applicative one traverses with that loosely amounts to it having only a single data constructor. What I want to talk about here is how the technique introduced in that paper, which I call "internal guarded recursion" can be used not only in a lightweight formal way to prove characterization theorems or the like, but just in everyday programming as a "back of the envelope" or "streetfighting" hack to quickly figure out when recursive functional programs terminate and when they go into infinite loops.

Let's talk about the basic trick that makes the whole thing work. First, we introduce an abstract newtype for identity, which we will disallow pattern matching against, and instead only allow access to through the structure of an applicative functor.

 
newtype Later a = Later a deriving Functor
instance Applicative Later where
    pure = Later
    Later f < *> Later x = Later (f x)

Next, we introduce the only function allowed to perform recursion:

 
lfix :: (Later a -> a) -> a
lfix f = fix (f . pure)

This function has almost the same type signature as the typical fixpoint operator, but it "guards" the argument to the function it is taking the fixedpoint of by our abstract Later type constructor.

Now, if you write code that only has recursion via `lfix` and no other function can implicitly or explicitly invoke itself (which the paper refers to as "working in the guarded fragment), your code will never produce a bottom. You can have whatever sorts of recursive Haskell '98 data definitions you like, it doesn't matter! (However, if you have "impredicative" datatypes that pack polymorphic functions into them, I think it would matter... but let's leave that aside). Try, for example, using only this form of recursion, to write a function that produces an infinite list. You'll realize that each recursive step requires using up one Later constructor as "fuel". And since there's no way to get an infinite amount of Later constructors to begin with, you'll only be able to produce lists of finite depth.

However, we can create related data structures to our existing ones, which "guard" their own recurrence behind a Later type constructor as well -- and we can create, consume and manipulate those also, and also do so without risk of writing an expression that produces a bottom. For example, here is the type of possibly infinite lists:

 
data Stream a =
    Nil
    | Cons a (Later (Stream a)

And here is a function that interleaves two such lists:

 
sinterleave :: Stream a -> Stream a -> Stream a
sinterleave = lfix $ \f s1 s2 -> case s1 of
    (Cons x xs) -> Cons x (f < *> pure s2 < *> xs)
    _ -> s2

Now, I'm going to diverge from the paper and pose a sort of general problem, based on some discussions I had at ICFP. Suppose you have some tricky recursion, possibly involving "tying the knot" and want to show that it terminates, or to figure out under which conditions it terminates -- how can you do that? It turns out that internal guarded recursion can help! Here's the recipe:

1. Write your function using only explicit recursion (via fix).
2. Change fix to lfix
3. Figure out what work you have to do adding applicative operations involving Later to fix the types.

The paper has in it a general theorem that says, loosely speaking, that if you have code involving lfix and Later, and change that back to fix and erase all the mucking around with Later you get "essentially the same" function, and you still have a guarantee it won't produce bottoms. So this just turns that around -- start with your normal code, and show you can write it even in the guarded fragment, and then that tells you the properties of your original code!

I'll present this approach to reasoning about two tricky but well known problems in functional programming. First, as suggested by Tom Schrijvers as a question at the talk, is the famous "repmin" function introduced by Bird in 1984. This is a program that makes essential use of laziness to traverse a tree only once, but replacing each element in the tree by the minimum element anywhere in the tree. Here's a quick one-liner version, making use of traversal in the writer monad -- it works over any finite traversable structure, including typical trees. But it is perhaps easiest to test it over lists. For now, we'll ignore the issue of what happens with traversals of infinite structures, as that will complicate the example.

 
repMin1 :: (Traversable t, Ord a) => t a -> t a
repMin1 xs =
     let (ans,m) = fmap minimum . runWriter $
                    traverse (\x -> tell [x] >> pure m) xs in ans

Note that this above definition makes use of a recursive definition -- the body of the definition of (ans,m) makes use of the m being defined. This works because the definition does not pattern match on the m to compute -- otherwise we would bottom out. Using internal guarded recursion, we can let the type system guide us into rewriting our code into a form where it is directly evident that this does not bottom, rather than relying on careful reasoning about semantics. The first step is to mechanically transform the initial definition into one that is exactly the same, but where the implicit recursion has been rendered explicit by use of fix:

 
repMin2 :: (Traversable t, Ord a) => t a -> t a
repMin2 xs =
  let res = fix go in fst res
   where
    go res = fmap minimum . runWriter $
               traverse (\x -> tell [x] >> pure (snd res)) xs

The next step is to now replace fix by lfix. When we do so, the type of go will no longer be correct. In particular, its argument, res will now be guarded by a Later. So we can no longer apply snd directly to it, but instead have to fmap. The compiler will notice this and yell at us, at which point we make that small tweak as well. In turn, this forces a change to the type signature of the overall function. With that done, everything still checks!

 
repMin3 :: (Traversable t, Ord a) => t a -> t (Later a)
repMin3 xs =
  let res = lfix go in fst res
   where
    go res = fmap minimum . runWriter $
                traverse (\x -> tell [x] >> pure (snd < $> res)) xs

We have now verified that the original repMin1 function does not bottom out on finite structures. Further, the "one layer" of Later in the type of repMin3 tells us that there was exactly one recursive step invoked in computing the final result!

The astute reader may have noticed a further complication -- to genuinely be in the guarded recursive fragment, we need to make sure all functions in sight have not been written using standard recursion, but only with guarded recursion. But in fact, both minimum and traverse are going to be written recursively! We limited ourselves to considering finite trees to avoid worrying about this for our example. But let's now briefly consider what happens otherwise. By the results in the paper, we can still use a guarded recursive traverse in the writer monad, which will produce a potentially productive stream of results -- one where there may be arbitrarily many Later steps between each result. Further, a guarded recursive minimum on such a stream, or even on a necessarily productive Stream as given above, will necessarily produce a value that is potentially infinitely delayed. So without grinding out the detailed equational substitution, we can conclude that the type signature we would have to produce in the case of a potentially infinite tree would in fact be: (Traversable t, Ord a) => t a -> t (Partial a) -- where a partial value is one that may be delayed behind an arbitrary (including infinite) sequence of Later. This in turns tells us that repMin on a potentially infinite structure would still produce safely the skeleton of the structure we started with. However, at each individual leaf, the value would potentially be bottom. And, in fact, by standard reasoning (it takes an infinite amount of time to find the minimum of an infinite stream), we can conclude that when repMin is run on an infinite structure, then indeed each leaf would be bottom!

We'll now consider one further example, arising from work by Kenneth Foner on fixed points of comonads. In their paper, Foner provides an efficient fixed point operator for comonads with an "apply" operator, but also makes reference to an inefficient version which they believe has the same semantics, and was introduced by Dominic Orchard. This latter operator is extremely simple to define, and so an easy candidate for an example. We'll first recall the methods of comonads, and then introduce Orchard's fixed-point:

 
class Functor w => Comonad w where
    extract :: w a -> a
    duplicate :: w a -> w (w a)
    extend :: (w a -> b) -> w a -> w b
 
cfix f :: Comonad w => (w a -> a) -> w a
cfix f = fix (extend f)

So the question is -- when does cfix not bottom out? To answer this, we again just change fix to lfix and let the typechecker tells us what goes wrong. We quickly discover that our code no longer typechecks, because lfix enforces we are given a Later (w a) but the argument to extend f needs to be a plain old w a. We ask ghc for the type of the intermediate conversion function necessary, and arrive at the following:

 
lcfix :: Comonad w => (Later (w b) -> w a) -> (w a -> b) -> w b
lcfix conv f = lfix (extend f . conv)

So we discover that comonad fix will not bottom when we can provide some conv function that is "like the identity" (so it erases away when we strip out the mucking about with Later) but can send Later (w a) -> w b. If we choose to unify a and b, then this property (of some type to be equipped with an "almost identity" between it and it delayed by a Later) is examined in the paper at some length under the name "stability" -- and our conclusion is that cfix will terminate when the type w a is stable (which is to say that it in one way or another represents a potentially partial value). Also from the paper, we know that one easy way to get stability is when the type w is Predictable -- i.e. when it has an "almost identity" map Later (w a) -> w (Later a) and when a itself is stable. This handles most uses of comonad fix -- since functors of "fixed shape" (otherwise known as representable, or iso to r -> a for a fixed r) are all stable. And the stability condition on the underlying a tells us that even though we'll get out a perfectly good spine, whether or not there will be a bottom value at any given location in the resultant w a depends on the precise function being passed in.

In fact, if we simply start with the idea of predictability in hand, we can specialize the above code in a different way, by taking predict itself to be our conversion function, and unifying b with Later a, which yields the following:

 
lcfix2 :: (Comonad w, Predict w) => (w (Later a) -> a) -> w a
lcfix2 f = lfix (extend f . predict)

This signature is nice because it does not require stability -- i.e. there is no possibility of partial results. Further, it is particularly suggestive -- it looks almost like that of lfix but lifts both the input to the argument and the output of the fixed-point up under a w. This warns us how hard it is to get useful values out of fixing a comonad -- in particular, just as with our lfix itself, we can't directly pattern match on the values we are taking fixed points of, but instead only use them in constructing larger structures.

These examples illustrate both the power of the internal guarded recursion approach, and also some of its limits. It can tell us a lot of high level information about what does and doesn't produce bottoms, and it can produce conditions under which bottoms will never occur. However, there are also cases where we have code that sometimes bottoms, depending on specific functions it is passed -- the fact that it potentially bottoms is represented in the type, but the exact conditions under which bottoms will or will not occur aren't able to be directly "read off". In fact, in the references to the paper, there are much richer variants of guarded recursion that allow more precision in typing various sorts of recursive functions, and of course there is are general metamathematical barriers to going sufficiently far -- a typing system rich enough to say if any integer function terminates is also rich enough to say if e.g. the collatz conjecture is true or not! But with all those caveats in mind, I think this is still a useful tool that doesn't only have theoretical properties, but also practical use. The next time you have a tricky recursive function that you're pretty sure terminates, try these simple steps: 1) rewrite to use explicit fixed points; 2) change those to guarded recursive fixed points; 3) let ghc guide you in fixing the types; 4) see what you learn!

Computational Quadrinitarianism (Curious Correspondences go Cubical)

2022-10-24T17:47:24Z

Back in 2011, in an influential blog post [1], Robert Harper coined the term "computational trinitarianism" to describe an idea that had been around a long time — that the connection between programming languages, logic, and categories, most famously expressed in the Curry-Howard-Lambek correspondence — should guide the practice of researchers into computation. In particular "any concept arising in one aspect should have meaning from the perspective of the other two". This was especially satisfying to those of us trying learning categorical semantics and often finding it disappointing how little it is appreciated in the computer science community writ large.

1. Categories

Over the years I've thought about trinitarianism a lot, and learned from where it fails to work as much as where it succeeds. One difficulty is that learning to read a logic like a type theory, or vice versa, is almost a definitional trick, because it occurs at the level of reinterpretation of syntax. With categories it is typically not so easy. (There is a straightforward version of categorical semantics like this — yielding "syntactic categories" — but it is difficult to connect it to the broader world of categorical semantics, and often it is sidestepped in favor of deeper models.)

One thing I came to realize is that there is no one notion of categorical semantics — the way in which the simply typed lambda calculus takes models in cartesian closed categories is fundamentally unlike the way in which linear logics take models in symmetric monoidal categories. If you want to study models of dependent type theories, you have a range of approaches, only some of which have been partially unified by Ahrens, Lumsdaine and Voevodsky in particular [2]. And then there are the LCCC models pioneered by Seely for extensional type theory, not to mention the approach that takes semantics directly in toposes, or in triposes (the latter having been invented to unify a variety of structures, and in the process giving rise to still more questions). And then there is the approach that doesn't use categories at all, but multicategories.

Going the other way, we also run into obstacles: there is a general notion, opposite to the "syntactic category" of a type theory, which is the "internal logic" of a category. But depending on the form of category, "internal logic" can take many forms. If you are in a topos, there is a straightforward internal logic called the Mitchell–Bénabou language. In this setting, most "logical" operations factor through the truth-lattice of the subobject classifier. This is very convenient, but if you don't have a proper subobject classifier, then you are forced to reach for other interpretations. As such, it is not infrequently the case that we have a procedure for deriving a category from some logical theory, and a procedure for constructing a logical theory from some category, but there is no particular reason to expect that where we arrive, when we take the round-trip, is close to, much less precisely, where we began.

2. Spaces, Logics

Over the past few years I've been in a topos theory reading group. In the course of this, I've realized at least one problem with all the above (by no means the only one) — Harper's holy trinity is fundamentally incomplete. There is another structure of interest — of equal weight to categories, logics, and languages — which it is necessary to understand to see how everything fits. This structure is spaces. I had thought that it was a unique innovation of homotopy type theory to consider logics (resp. type theories) that took semantics in spaces. But it turns out that I just didn't know the history of constructive logic very well. In fact, in roughly the same period that Curry was exploring the relationship of combinatory algebras to logic, Alfred Tarski and Marshall Stone were developing topological models for intuitionistic logic, in terms of what we call Heyting Algebras [3] [4]. And just as, as Harper explained, logic, programming and category theory give us insights into implication in the form of entailment, typing judgments, and morphisms, so to, as we will see, do spaces.

A Heyting algebra is a special type of distributive lattice (partially ordered set, equipped with meet and join operations, such that meet and join distribute over one another) which has an implication operation that satisfies curry/uncurry adjointness — i.e. such that c ∧ a ≤ b < -> c ≤ a → b. (Replace meet here by "and" (spelled "*"), and ≤ by ⊢ and we have the familiar type-theoretic statement that c * a ⊢ b < -> c ⊢ a → b).

If you haven't encountered this before, it is worth unpacking. Given a set, we equip it with a partial order by specifying a "≤" operation, such that a ≤ a, if a ≤ b and b ≤ a, then a = b, and finally that if a ≤ b and b ≤ c, then a ≤ c. We can think of such things as Hasse diagrams — a bunch of nodes with some lines between them that only go upwards. If a node b is reachable from a node a by following these upwards lines, then a ≤ b. This "only upwards" condition is enough to enforce all three conditions. We can define ∨ (join) as a binary operation that takes two nodes, and gives a node a ∨ b that is greater than either node, and furthermore is the uniquely least node greater than both of them. (Note: A general partial order may have many pairs of nodes that do not have any node greater than both of them, or may that may have more than one incomparable node greater than them.) We can define ∧ (meet) dually, as the uniquely greatest node less than both of them. If all elements of a partially ordered set have a join and meet, we have a lattice.

It is tempting to read meet and join as "and" and "or" in logic. But these logical connectives satisfy an additional important property — distributivity: a & (b | c) = (a & b) | (a & c). (By the lattice laws, the dual property with and swapped with or is also implied). Translated for lattices this reads: a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c). Rather than thinking just about boolean logic, we can think about lattices built from sets — with meets as union, join as intersection, and ≤ given by inclusion. It is easy to verify that such lattices are distributive. Furthermore, every distributive lattice can be given (up to isomorphism) as one built out of sets in this way. While a partially ordered set can have a Hasse diagram of pretty arbitrary shape, a lattice is more restrictive — I imagine it as sort of the tiled diamonds of an actual lattice like one might use in a garden, but with some nodes and edges possibly removed.

Furthermore, there's an amazing result that you can tell if a lattice is distributive by looking for just two prototypical non-distributive lattices as sublattices. If neither is contained in the original lattice, then the lattice is distributed. These tell us how distribution can fail in two canonical ways. The first is three incomparable elements, all of which share a common join (the top) and meet (the bottom). The join of anything but their bottom element with them is therefore the top. Hence if we take the meet of two joins, we still get the top. But the meet of any two non-top elements is the bottom and so, if we take the join of any element with the meet of any other two, we get back to the first element, not all the way to the top, and the equality fails. The second taboo lattice is constructed by having two elements in an ordered relationship, and another incomparable to them — again augmented with a bottom and top. A similar argument shows that if you go one way across the desired entity, you pick out the topmost of the two ordered elements, and the other way yields the bottommost. (The wikipedia article on distributive lattices has some very good diagrams to visualize all this). So a distributive lattice has even more structure than before — incomparable elements must have enough meets and joins to prevent these sublattices from appearing, and this forces even more the appearance of a tiled-diamond like structure.

To get us to a Heyting algebra, we need more structure still — we need implication, which is like an internal function arrow, or an internal ≤ relation. Recall that the equation we want to satisfy is "c ∧ a ≤ b < -> c ≤ a → b". The idea is that we should be able to read ≤ itself as an "external implication" and so if c and a taken together imply b, "a implies b" is the portion of that implication if we "only have" c. We can see it as a partial application of the external implication. If we have a lattice that permits infinite joins (or just a finite lattice such that we don't need them), then it is straightforward to see how to construct this. To build a → b, we just look at every possible choice of c that satisfies c ∧ a ≤ b, and then take the join of all of them to be our object a → b. Then, by construction, a → b is necessarily greater than or equal to any c that satisfies the left hand side of the equation. And conversely, any element that a → b is greater than is necessarily one that satisfies the left hand side, and the bi-implication is complete. (This, by the way, gives a good intuition for the definition of an exponential in a category of presheaves). Another way to think of a → b is as the greatest element of the lattice such that a → b ∧ a ≤ b (exercise: relate this to the first definition). It is also a good exercise to explore what happens in certain simple cases — what if a is 0 (false)? What if it is 1? The same as b? Now ask the same questions of b.

So why is a Heyting algebra a topological construct? Consider any topological space as given by a collection of open sets, satisfying the usual principles (including the empty set and the total set, and closed under union and finite intersection). These covers have a partial ordering, given by containment. They have unions and intersections (all joins and meets), a top and bottom element (the total space, and the null space). Furthermore, they have an implication operation as described above. As an open set, a → b is given by the meet of all opens c for which a ∧ c ≤ b. (We can think of this as "the biggest context, for which a ⊢ b"). In fact, the axioms for open sets feel almost exactly like the rules we've described for Heyting algebras. It turns out this is only half true — open sets always give Heyting algebras, and we can turn every Heyting algebra into a space. However, in both directions the round trip may take us to somewhere slightly different than where we started. Nonetheless it turns out that if we take complete Heyting algebras where finite meets distribute over infinite joins, we get something called "frames." And the opposite category of frames yields "locales" — a suitable generalization of topological spaces, first named by John Isbell in 1972 [5]. Spaces that correspond precisely to locales are called sober, and locales that correspond precisely to spaces are said to have "enough points" or be "spatial locales".

In fact, we don't need to fast-forward to 1972 to get some movement in the opposite direction. In 1944, McKinsey and Tarski embarked on a program of "The Algebra of Topology" which sought to describe topological spaces in purely algebraic (axiomatic) terms [6]. The resultant closure algebras (these days often discussed as their duals, interior algebras) provided a semantics for S4 modal logic. [7] A further development in this regard came with Kripke models for logic [8] (though arguably they're really Beth models [9]).

Here's an easy way to think about Kripke models. Start with any partial ordered set. Now, for each object, instead consider instead all morphisms into it. Since each morphism from any object a to any object b exists only if a ≤ b, and we consider such paths unique (if there are two "routes" showing a ≤ b, we consider them the same in this setting) this amounts to replacing each element a with the set of all elements ≤ a. (The linked pdf does this upside down, but it doesn't really matter). Even though the initial setting may not have been Heyting algebra, this transformed setting is a Heyting algebra. (In fact, by a special case of the Yoneda lemma!). This yields Kripke models.

Now consider "collapsings" of elements in the initial partial order — monotone downwards maps taken by sending some elements to other elements less than them in a way that doesn't distort orderings. (I.e. if f(a) ≤ f(b) in the collapsed order, then that means that a ≤ b in the original order). Just as we can lift elements from the initial partial order into their downsets (sets of elements less than them) in the kripkified Heyting Algebra, we can lift our collapsing functions into collapsing functions in our generated Heyting Algebra. With a little work we can see that collapsings in the partial order also yield collapsings in the Heyting Algebra.

Furthermore, it turns out, more or less, that you can generate every closure algebra in this way. Now if we consider closure algebras a bit (and this shouldn't surprise us if we know about S4), we see that we can always take a to Ca, that if we send a → b, then we can send Ca → Cb, and furthermore that CCa → Ca in a natural way (in fact, they're equal!). So closure algebras have the structure of an idempotent monad. (Note: the arrows here should not be seen as representing internal implication — as above they represent the logical turnstile ⊢ or perhaps, if you're really in a Kripke setting, the forcing turnstile ⊩).

Now we have a correspondence between logic and computation (Curry-Howard), logic and categories (Lambek-Scott), and logic and spaces (Tarski-Stone). So maybe, instead of Curry-Howard-Lambek, we should speak of Curry-Howard-Lambek-Scott-Tarski-Stone! (Or, if we want to actually bother to say it, just Curry-Howard-Lambek-Stone. Sorry, Tarski and Scott!) Where do the remaining correspondences arise from? A cubical Kan operation, naturally! But let us try to sketch in a few more details.

3. Spaces, Categories

All this about monads and Yoneda suggests that there's something categorical going on. And indeed, there is. A poset is, in essence, a "decategorified category" — that is to say, a category where any two objects have at most one morphism between them. I think of it as if it were a balloon animal that somebody let all the air out of. We can pick up the end of our poset and blow into it, inflating the structure back up, and allowing multiple morphisms between each object. If we do so, something miraculous occurs — our arbitrary posets turn into arbitrary categories, and the induced Heyting algebra from their opens turns into the induced category of set-valued presheaves of that category. The resultant structure is a presheaf topos. If we "inflate up" an appropriate notion of a closure operator we arrive at a Grothendieck topos! And indeed, the internal language of a topos is higher-order intuitionistic type theory [10].

4. Spaces, Programming Languages

All of this suggests a compelling story: logic describes theories via algebraic syntax. Equipping these theories with various forms of structural operations produces categories of one sort or another, in the form of fibrations. The intuition is that types are spaces, and contexts are also spaces. And furthermore, types are covered by the contexts in which their terms may be derived. This is one sense in which we it seems possible to interpret the Meillies/Zeilberger notion of a type refinement system as a functor [11].

But where do programming languages fit in? Programming languages, difficult as it is to sometimes remember, are more than their type theories. They have a semantic of computation as well. For example, a general topos does not have partial functions, or a fixed point combinator. But computations, often, do. This led to one of the first applications of topology to programming languages — the introduction of domain theory, in which terms are special kinds of spaces — directed complete partial orders — and functions obey a special kind of continuity (preservation of directed suprema) that allows us to take their fixed points. But while the category of dcpos is cartesian closed, the category of dcpos with only appropriately continuous morphisms is not. Trying to resolve this gap, one way or another, seems to have been a theme of research in domain theory throughout the 80s and 90s [12].

Computations can also be concurrent. Topological and topos-theoretic notions again can play an important role. In particular, to consider two execution paths to be "the same" one needs a notion of equivalence. This equivalence can be seen, stepwise, as a topological "two-cell" tracing out at each step an equivalence between the two execution paths. One approach to this is in Joyal, Nielson and Winskel's treatment of open maps [13]. I've also just seen Patrick Schultz and David I. Spivak's "Temporal Type Theory" which seems very promising in this regard [14].

What is the general theme? Computation starts somewhere, and then goes somewhere else. If it stayed in the same place, it would not "compute". A computation is necessarily a path in some sense. Computational settings describe ways to take maps between spaces, under a suitable notion of topology. To describe the spaces themselves, we need a language — that language is a logic, or a type theory. Toposes are a canonical place (though not the only one) where logics and spaces meet (and where, to a degree, we can even distinguish their "logical" and "spatial" content). That leaves categories as the ambient language in which all this interplay can be described and generalized.

5. Spaces, Categories

All the above only sketches the state of affairs up to roughly the mid '90s. The connection to spaces starts in the late 30s, going through logic, and then computation. But the categorical notion of spaces we have is in some sense impoverished. A topos-theoretic generalization of a space still only describes, albeit in generalized terms, open sets and their lattice of subobject relations. Spaces have a whole other structure built on top of that. From their topology we can extract algebraic structures that describe their shape — this is the subject of algebraic topology. In fact, it was in axiomatizing a branch of algebraic topology (homology) that category theory was first compelled to be invented. And the "standard construction" of a monad was first constructed in the study of homology groups (as the Godement resolution).

What happens if we turn the tools of categorical generalization of algebraic topology on categories themselves? This corresponds to another step in the "categorification" process described above. Where to go from "0" to "1" we took a partially ordered set and allowed there to be multiple maps between objects, to go from "1" to "2" we can now take a category, where such multiple maps exist, and allow there to be multiple maps between maps. Now two morphisms, say "f . g" and "h" need not merely be equal or not, but they may be "almost equal" with their equality given by a 2-cell. This is just as two homotopies between spaces may themselves be homotopic. And to go from "2" to "3" we can continue the process again. This yields n-categories. An n-category with all morphisms at every level invertible is an (oo,0)-category, or an infinity groupoid. And in many setups this is the same thing as a topological space (and the question of which setup is appropriate falls under the name "homotopy hypothesis" [15]). When morphisms at the first level (the category level) can have direction (just as in normal categories) then those are (oo,1)-categories, and the correspondence between groupoids and spaces is constructed as an equivalence of such categories. These too have direct topological content, and one setting in which this is especially apparent is that of quasi-categories, which are (oo,1)-categories that are built directly from simplicial sets — an especially nice categorical model of spaces (the simplicial sets at play here are those that satisfy a "weak" Kan condition, which is a way of asking that composition behave correctly).

It is in these generalized (oo,1)-toposes that homotopy type theory takes its models. And, it is hypothesized that a suitable version of HoTT should in fact be the initial model (or "internal logic") of an "elementary infinity topos" when we finally figure out how to describe what such a thing is.

So perhaps it is not that we should be computational trinitarians, or quadrinitarians. Rather, it is that the different aspects which we examine — logic, languages, categories, spaces — only appear as distinct manifestations when viewed at a low dimensionality. In the untruncated view of the world, the modern perspective is, perhaps, topological pantheism — spaces are in all things, and through spaces, all things are made as one.

Thanks to James Deikun and Dan Doel for helpful technical and editorial comments

[1] https://existentialtype.wordpress.com/2011/03/27/the-holy-trinity/
[2] https://arxiv.org/abs/1705.04310
[3] http://www.sciacchitano.it/Tempo/Aussagenkalk%C3%BCl%20Topologie.pdf
[4] https://eudml.org/doc/27235
[5] http://www.mscand.dk/article/view/11409
[6] https://www.dimap.ufrn.br/~jmarcos/papers/AoT-McKinsey_Tarski.pdf
[7] https://logic.berkeley.edu/colloquium/BezhanishviliSlides.pdf
[8] www2.math.uu.se/~palmgren/tillog/heyting3.pdf
[9] http://festschriften.illc.uva.nl/j50/contribs/troelstra/troelstra.pdf
[10] http://www.sciencedirect.com/science/article/pii/0022404980901024
[11] https://arxiv.org/abs/1310.0263
[12] https://www.dpmms.cam.ac.uk/~martin/Research/Oldpapers/synthetic91.pdf
[13] http://www.brics.dk/RS/94/7/index.html
[14] https://arxiv.org/abs/1710.10258
[15] https://ncatlab.org/nlab/show/homotopy+hypothesis

The State Comonad

2022-10-24T17:47:24Z

Is State a Comonad?

Not Costate or rather, Store as we tend to call it today, but actually State s itself?

Let's see!

Recently there was a post to reddit in which the author King_of_the_Homeless suggested that he might have a Monad for Store. Moreover, it is one that is compatible with the existing Applicative and ComonadApply instances. My knee-jerk reaction was to disbelieve the result, but I'm glad I stuck with playing with it over the last day or so.

In a much older post, I showed how to use the Co comonad-to-monad-transformer to convert Store s into State s, but this is a different beast, it is a monad directly on Store s.

 
{-# language DeriveFunctor #-}
 
import Control.Comonad
import Data.Semigroup
 
data Store s a = Store
  { peek :: s -> a, pos :: s }
  deriving Functor
 
instance Comonad (Store s) where
  extract (Store f s) = f s
  duplicate (Store f s) = Store (Store f) s
 
instance
 (Semigroup s, Monoid s) =>
 Applicative (Store s) where
  pure a = Store (const a) mempty
  Store f s < *> Store g t = Store (\m -> f m (g m)) (mappend s t)
 
instance
 Semigroup s =>
 ComonadApply (Store s) where
  Store f s < @> Store g t = Store (\m -> f m (g m)) (s <> t)
 
instance
 (Semigroup s, Monoid s) =>
 Monad (Store s) where
  return = pure
  m >>= k = Store
    (\s -> peek (k (peek m s)) s)
    (pos m `mappend` pos (k (peek m mempty)))

My apologies for the Semigroup vs. Monoid, noise, as I'm still using GHC 8.2 locally. This will get a bit cleaner in a couple of months.

Also, peek here is flipped relative to the version in Control.Comonad.Store.Class so that I can use it directly as a field accessor.

As I noted, at first I was hesitant to believe it could work, but then I realized I'd already implemented something like this for a special case of Store, in the 'streams' library, which got me curious. Upon reflection, this feels like the usual Store comonad is using the ability to distribute (->) e out or (,) e in using a "comonoid", which is always present in Haskell, just like how the State monad does. But the type above seems to indicate we can go the opposite direction with a monoid.

So, in the interest of exploring duality, let's see if we can build a comonad instance for `State s`!

Writing down the definition for state:

 
newtype State s a = State
  { runState :: s -> (a, s) }
  deriving Functor
 
instance Applicative (State s) where
  pure a = State $ \s -> (a, s)
  State mf < *> State ma = State $ \s -> case mf s of
    (f, s') -> case ma s' of
      (a, s'') -> (f a, s'')
 
instance Monad (State s) where
  return = pure
  State m >>= k = State $ \s -> case m s of
    (a, s') -> runState (k a) s'

Given a monoid for out state, extraction is pretty obvious:

instance
 Monoid s =>
 Comonad (State s) where
  extract m = fst $ runState m mempty

But the first stab we might take at how to duplicate, doesn't work.

 
  duplicate m = State $ \s ->
   (State $ \t -> runState m (mappend s t)
   , s
   )

It passes the `extract . duplicate = id` law easily:

 
extract (duplicate m)
= extract $ State $ \s -> (State $ \t -> runState m (mappend s t), s)
= State $ \t -> runState m (mappend s mempty)
= State $ \t -> runState m t
= State $ runState m
= m

But fails the second law:

 
fmap extract (duplicate m)
= fmap extract $ State $ \s -> (State $ \t -> runState m (mappend s t), s)
= State $ \s -> (extract $ State $ \t -> runState m (mappend s t), s)
= State $ \s -> (fst $ runState m (mappend s mempty), s)
= State $ \s -> (evalState m s, s)

because it discards the changes in the state.

But the King_of_the_Homeless's trick from that post (and the Store code above) can be modified to this case. All we need to do is ensure that we modify the output state 's' as if we'd performed the action unmolested by the inner monoidal state that we can't see.

 
  duplicate m = State $ \s ->
    ( State $ \t -> runState m (mappend s t)
    , snd $ runState m s
    )

Now:

 
extract (duplicate m)
= extract $ State $ \s -> (State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \t -> runState m (mappend mempty t)
= State $ \t -> runState m t
= State $ runState m
= m

just like before, but the inner extraction case now works out and performs the state modification as expected:

 
fmap extract (duplicate m)
= fmap extract $ State $ \s -> (State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \s -> (extract $ State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \s -> (fst $ runState m (mappend s mempty), snd $ runState m s)
= State $ \s -> (fst $ runState m s, snd $ runState m s)
= State $ \s -> runState m s
= State $ runState m
= m

This is still kind of a weird beast as it performs the state action twice with different states, but it does pass at least the left and right unit laws.

Some questions:

1. Proving associativity is left as an exercise. It passes visual inspection and my gut feeling, but I haven't bothered to do all the plumbing to check it out. I've been wrong enough before, it'd be nice to check!

[Edit: Simon Marechal (bartavelle) has a coq proof of the associativity and other axioms.]

2. Does this pass the ComonadApply laws?

 
instance Monoid s => ComonadApply (State s) where
  (< @>) = (< *>)

[Edit: No.]

3. The streams code above suggests at least one kind of use-case, something like merging together changes of position in a stream, analogous to the "zipping monad" you have on infinite streams. But now the positions aren't just Integers, they are arbitrary values taken from any monoid you want. What other kind of spaces might we want to "zip" in this manner?

4. Is there an analogous construction possible for an "update monad" or "coupdate comonad"? Does it require a monoid that acts on a monoid rather than arbitrary state like the semi-direct product of monoids or a "twisted functor"? Is the result if it exists a twisted functor?

5. Does `Co` translate from this comonad to the monad on `Store`?

6. Is there a (co)monad transformer version of these?

Link: Gist

Adjoint Triples

2022-10-24T17:47:24Z

A common occurrence in category theory is the adjoint triple. This is a pair of adjunctions relating three functors:

F ⊣ G ⊣ H
F ⊣ G, G ⊣ H

Perhaps part of the reason they are so common is that (co)limits form one:

colim ⊣ Δ ⊣ lim

where Δ : C -> C^J is the diagonal functor, which takes objects in C to the constant functor returning that object. A version of this shows up in Haskell (with some extensions) and dependent type theories, as:

∃ ⊣ Const ⊣ ∀
Σ ⊣ Const ⊣ Π

where, if we only care about quantifying over a single variable, existential and sigma types can be seen as a left adjoint to a diagonal functor that maps types into constant type families (either over * for the first triple in Haskell, or some other type for the second in a dependently typed language), while universal and pi types can be seen as a right adjoint to the same.

It's not uncommon to see the above information in type theory discussion forums. But, there are a few cute properties and examples of adjoint triples that I haven't really seen come up in such contexts.

To begin, we can compose the two adjunctions involved, since the common functor ensures things match up. By calculating on the hom definition, we can see:

Hom(FGA, B)     Hom(GFA, B)
    ~=              ~=
Hom(GA, GB)     Hom(FA, HB)
    ~=              ~=
Hom(A, HGB)     Hom(A, GHB)

So there are two ways to compose the adjunctions, giving two induced adjunctions:

FG ⊣ HG,  GF ⊣ GH

And there is something special about these adjunctions. Note that FG is the comonad for the F ⊣ G adjunction, while HG is the monad for the G ⊣ H adjunction. Similarly, GF is the F ⊣ G monad, and GH is the G ⊣ H comonad. So each adjoint triple gives rise to two adjunctions between monads and comonads.

The second of these has another interesting property. We often want to consider the algebras of a monad, and coalgebras of a comonad. The (co)algebra operations with carrier A have type:

alg   : GFA -> A
coalg : A -> GHA

but these types are isomorphic according to the GF ⊣ GH adjunction. Thus, one might guess that GF monad algebras are also GH comonad coalgebras, and that in such a situation, we actually have some structure that can be characterized both ways. In fact this is true for any monad left adjoint to a comonad; [0] but all adjoint triples give rise to these.

The first adjunction actually turns out to be more familiar for the triple examples above, though. (Edit: [2]) If we consider the Σ ⊣ Const ⊣ Π adjunction, where:

Σ Π : (A -> Type) -> Type
Const : Type -> (A -> Type)

we get:

ΣConst : Type -> Type
ΣConst B = A × B
ΠConst : Type -> Type
ΠConst B = A -> B

So this is the familiar adjunction:

A × - ⊣ A -> -

But, there happens to be a triple that is a bit more interesting for both cases. It refers back to categories of functors vs. bare type constructors mentioned in previous posts. So, suppose we have a category called Con whose objects are (partially applied) type constructors (f, g) with kind * -> *, and arrows are polymorphic functions with types like:

 
forall x. f x -> g x

And let us further imagine that there is a similar category, called Func, except its objects are the things with Functor instances. Now, there is a functor:

U : Func -> Con

that 'forgets' the functor instance requirement. This functor is in the middle of an adjoint triple:

F ⊣ U ⊣ C
F, C : Con -> Func

where F creates the free functor over a type constructor, and C creates the cofree functor over a type constructor. These can be written using the types:

 
data F f a = forall e. F (e -> a) (f e)
newtype C f a = C (forall r. (a -> r) -> f r)

and these types will also serve as the types involved in the composite adjunctions:

FU ⊣ CU : Func -> Func
UF ⊣ UC : Con -> Con

Now, CU is a monad on functors, and the Yoneda lemma tells us that it is actually the identity monad. Similarly, FU is a comonad, and the co-Yoneda lemma tells us that it is the identity comonad (which makes sense, because identity is self-adjoint; and the above is why F and C are often named (Co)Yoneda in Haskell examples).

On the other hand, UF is a monad on type constructors (note, U isn't represented in the Haskell types; F and C just play triple duty, and the constraints on f control what's going on):

 
eta :: f a -> F f a
eta = F id
 
transform :: (forall x. f x -> g x) -> F f a -> F g a
transform tr (F g x) = F g (tr x)
 
mu :: F (F f) a -> F f a
mu (F g (F h x)) = F (g . h) x

and UC is a comonad:

 
epsilon :: C f a -> f a
epsilon (C e) = e id
 
transform' :: (forall x. f x -> g x) -> C f a -> C g a
transform' tr (C e) = C (tr . e)
 
delta :: C f a -> C (C f) a
delta (C e) = C $ \h -> C $ \g -> e (g . h)

These are not the identity (co)monad, but this is the case where we have algebras and coalgebras that are equivalent. So, what are the (co)algebras? If we consider UF (and unpack the definitions somewhat):

 
alg :: forall e. (e -> a, f e) -> f a
alg (id, x) = x
alg (g . h, x) = alg (g, alg (h, x))

and for UC:

 
coalg :: f a -> forall r. (a -> r) -> f r
coalg x id = x
coalg x (g . h) = coalg (coalg x h) g

in other words, (co)algebra actions of these (co)monads are (mangled) fmap implementations, and the commutativity requirements are exactly what is required to be a law abiding instance. So the (co)algebras are exactly the Functors. [1]

There are, of course, many other examples of adjoint triples. And further, there are even adjoint quadruples, which in turn give rise to adjoint triples of (co)monads. Hopefully this has sparked some folks' interest in finding and studying more interesting examples.

[0]: Another exmaple is A × - ⊣ A -> - where the A in question is a monoid. (Co)monad (co)algebras of these correspond to actions of the monoid on the carrier set.

[1]: This shouldn't be too surprising, because having a category of (co)algebraic structures that is equivalent to the category of (co)algebras of the (co)monad that comes from the (co)free-forgetful adjunction is the basis for doing algebra in category theory (with (co)monads, at least). However, it is somewhat unusual for a forgetful functor to have both a left and right adjoint. In many cases, something is either algebraic or coalgebraic, and not both.

[2]: Urs Schreiber informed me of an interesting interpretation of the ConstΣ ⊣ ConstΠ adjunction. If you are familiar with modal logic and the possible worlds semantics thereof, you can probably imagine that we could model it using something like P : W -> Type, where W is the type of possible worlds, and propositions are types. Then values of type Σ P demonstrate that P holds in particular worlds, while values of type Π P demonstrate that it holds in all worlds. Const turns these types back into world-indexed 'propositions,' so ConstΣ is the possibility modality and ConstΠ is the necessity modality.

Some Rough Notes on Univalent Foundations and B-Systems, Part I

2022-10-24T17:47:24Z

I recently attended RDP in Warsaw, where there was quite a bit of work on Homotopy Type Theory, including a special workshop organized to present recent and ongoing work. The organizers of all the events did a fantastic job and there was a great deal of exciting work. I should add that I will not be able to go to RDP next year, as the two constituent central conferences (RTA — Rewriting Techniques and Applications and TLCA — Typed Lambda Calculus and Applications) have merged and changed names. Next year it will now be called FSCD — Formal Structures for Computation and Deduction. So I very much look forward to attending FSCD instead.

In any case, one of the invited speakers was Vladimir Voevodsky, who gave an invited talk on his recent work relating to univalent foundations titled "From Syntax to Semantics of Dependent Type Theories — Formalized”. This was a very clear talk that helped me understand his current research direction and the motivations for it. I also had the benefit of some very useful conversations with others involved in collaboration with some of this work, who patiently answered my questions. The notes below are complimentary to the slides from his talk.

I had sort of understood what the motivation for studying “C-Systems” was, but I had not taken it on myself to look at Voevodsky’s “B-Systems” before, nor had I grasped how his research programme fit together. Since I found this experience enlightening, I figured I might as well write up what I think I understand, with all the usual caveats. Also note, in all the below, by “type theory” I invariably mean the intensional sort. So all the following is in reference to the B-systems paper that Voevodsky has posted on arXiv (arXiv:1410.5389).

That said, if anything I describe here strikes you as funny, it is more likely that I am not describing things right than that the source material is troublesome — i.e. take this with a grain of salt. And bear in mind that I am not attempting to directly paraphrase Voevodsky himself or others I spoke to, but rather I am giving an account of where what they described resonated with me, and filtered through my own examples, etc. Also, if all of the “why and wherefore” is already familiar to you, feel free to skip directly to the “B-Systems” section where I will just discuss Voevodsky’s paper on this topic, and my attempts to understand portions of it. And if you already understand B-Systems, please do reply and explain all the things I’m sure I’m missing!

Some Review

We have a model of type theory in simiplicial sets that validates the univalence axiom (and now a few other models that validate this axiom as well). This is to say, it is a model with not only higher dimensional structure, but higher structure of a very “coherent” sort. The heart of this relates to our construction of a “universe”. In our categorical model, all our types translate into objects of various sorts. The “universe,” aka the type-of-types, translates into a very special object, one which “indexes” all other objects. A more categorical way of saying this is that all other types are “fibered over” the universe — i.e. that from every other type there is a map back to a specific point within the universe. The univalence axiom can be read as saying that all equivalent types are fibered over points in the universe that are connected (i.e. there is a path between those points).

Even in a relatively simple dependent type theory, equivalence of types quickly becomes undecidable in general, as it is a superset of the problem of deciding type inhabitation, which in turn corresponds to the decidability of propositions in the logic corresponding to a type theory, and then by any number of well-known results cannot be resolved in general for most interesting theories. This in turn means that the structure of a univalent universe is “describable” but it is not fully enumerable, and is very complex.

We also have a line of work dating back to before the introduction of univalence, which investigated the higher groupoid structure (or, if you prefer, higher topological structure or quillen model structure) induced by identity types. But without either univalence or higher-inductive types, this higher groupoid structure is unobservable internally. This is to say, models were possible that would send types to things with higher structure, but no particular use would be made of this higher structure. So, such models could potentially be used to demonstrate that certain new axioms were not conservative over the existing theory, but on their own they did not provide ideas about how to extend the theory.

How to relate this higher groupoid structure to universes? Well, in a universe, one has paths. Without univalence, these are just identity paths. But regardless, we now get a funny “completion” as our identity paths must themselves be objects in our universe, and so too the paths between them, etc. In models without higher structure, we might say “there is only one path from each object to itself” and then we need not worry too much about this potential explosion of paths at each level. But by enforcing the higher groupoid structure, this means that our universe now blossoms with all the potentially distinct paths at each level. However, with the only way in our syntax to create such “extra paths” as reflexivity, any such path structure in our model remains “latent”, and can be added or removed without any effect.

The univalence axiom relies on these higher groupoid structures, but it cannot be reduced to them. Rather, in the model, we must have a fibration over the universe with identity lifting along this fibration to reach the next step — to then modify the universe by forcing paths other than identity paths — those between equivalent types. This is in a sense a further “higher completion” of our universe, adding in first all the possible paths between types, but then the paths between those paths, and so on up. Because, by univalence, we can state such paths, then in our model we must include all
of them.

The Problem

All along I have been saying “models of type theory”. And it is true enough. We do know how to model type theories of various sorts categorically (i.e. representing the translation from their syntax into their semantics as functorial). But we do not have full models of "fully-featured" type theories; i.e. if we view type theories as pizzas we have models of cheese slices, and perhaps slices with olives and slices with pepperoni, etc. But we do not have models of pizzas with "all the toppings". Here, by "toppings" I mean things such as the addition of "all inductive types," "some coinductive types," "certain higher-inductive types," "pattern matching," "induction-induction," "induction-recursion," "excluded middle as an axiom," "choice as an axiom," "propositional resizing as an axiom," etc.

Rather, we have a grab bag of tricks, as well as a few slightly different approaches — Categories with Attributes, Categories with Families, and so forth. One peculiar feature of these sorts of models, as opposed to the models of extensional type theory, is that these models are not indexed by types, but by “lists of types” that directly correspond to the contexts in which we make typing judgments in intensional theories.

In any case, these models are usually used in an ad-hoc fashion. If you want to examine a particular feature of a language, you first pick from one of these different but related sorts of models. Then you go on to build a version with the minimal set of what you need — so maybe identity types, maybe sigma types, maybe natural numbers, and then you introduce your new construction or generate your result or the like.

So people may say “we know how to work with these things, and we know the tricks, so given a theory, we can throw together the facts about it pretty quickly.” Now of course there are maybe only a hundred people on the planet (myself not among them) who can really just throw together a categorical model of some one or another dependent type theory at the drop of a hat.

But there’s a broader problem. How can we speak about the mutual compatibility of different extensions and features if each one is validated independently in a different way? This is a problem very familiar to us in the field of programming languages — you have a lot of “improvements” to your language, all of a similar form. But then you put such “improvements” together and now something goes wrong. In fact, the famous “newtype deriving bug” in GHC some years back, which opened a big hole in the type system, was of exactly that form — two extensions (in that case, newtype deriving and type families) that are on their own safe and useful, together have an unexpected bad effect. It is also possible to imagine bad interactions occuring only when three extensions exist together, and soforth. So as the number of extensions increases, the number of interactions to check spirals upwards in a very difficult fashion.

So the correct way to have confidence in the coexistence of these various extensions is to have a general model that contains the sort of theory we actually want to work in, rather than these toy theories that let us look at portions in isolation. And this certainly involves having a theory that lets us validate all inductive types at once in the model, rather than extending it over and over for each new type we add. Additionally, people tend to model things with at most one universe. And when we are not looking at universes, it is often omitted altogether, or done “incorrectly” as an inhabitant of itself, purely for the sake of convenience.

So now, if I tell someone with mathematical experience what my theory “means” and they say “is this actually proven” I’m in the embarrassing position of saying “no, it is not. but the important bits all are and we know how to put them together.” So here I am, trying to advocate the idea of fully formal verification, but without a fully top-to-bottom formally verified system myself — not even codewise, but in even the basic mathematical sense.

Univalence makes this problem more urgent. Without univalence, we can often get away with more hand-wavy arguments, because things are “obvious”. Furthermore, they relate to the way things are done elsewhere. So logic can be believed by analogy to how people usually think about logic, numbers by analogy to the peano system, which people already “believe in,” and soforth. Furthermore, without univalence, most operations are “directly constructive” in the sense that you can pick your favorite “obvious” and non-categorical model, and they will tend to hold in that as well — so you can think of your types as sets, and terms as elements of sets. Or you can think of your types as classifying computer programs and your terms as runnable code, etc. In each case, the behavior leads to basically what you would expect.

But in none of these “obvious” models does univalence hold. And furthermore, it is “obviously” wrong in them.

And that is just on the “propaganda” side as people say. For the same reasons, univalence tends to be incompatible with many “obvious” extensions — for example, not only “uniqueness of identity proofs” has to go, but pattern matching had to be rethought so as not to imply it, and furthermore it is not known if it is sound in concert with many other extensions such as general coinductive types, etc. (In fact, the newtype deriving bug itself can be seen as a "very special case" of the incompatibility of univalence with Uniqueness of Identity Proofs, as I have been discussing with people informally for quite some time).

Hence, because univalence interacts with so many other extensions, it feels even more urgent to have a full account. Unlike prior research, which really focused on developing and understanding type systems, this is more of an engineering problem, although a proof-engineering problem to be sure.

The Approach

Rather than just giving a full account of “one important type system,” Voevodsky seems to be aiming for a generally smooth way to develop such full accounts even as type systems change. So he is interested in reusable technology, so to speak. One analogy may be that he is interested in building the categorical semantics version of a logical framework. His tool for doing this is what he calls a “C-system”, which is a slight variant of Cartmell’s Categories with Attributes mentioned above. One important aspect of C-systems seems to be that that they stratify types and terms in some fashion, and that you can see them as generated by some “data” about a ground set of types, terms, and relations. To be honest, I haven’t looked at them more closely than that, since I saw at least some of the “point” of them and know that to really understand the details I'll have to study categorical semantics more generally, which is ongoing.

But the plan isn’t just to have a suitable categorical model of type theories. Rather it is to give a description of how one goes from the “raw terms” as syntax trees all the way through to how typing judgments are passed on them and then to their full elaborations in contexts and finally to their eventual “meaning” as categorically presented.

Of course, most of these elements are well studied already, as are their interactions. But they do not live in a particularly compatible formulation with categorical semantics. This then makes it difficult to prove that “all the pieces line up” and in particular, a pain to prove that a given categorical semantics for a given syntax is the “initial” one — i.e. that if there is any other semantics for that syntax, it can be arrived at by first “factoring through” the morphism from syntax to the initial semantics. Such proofs can be executed, but again it would be good to have “reusable technology” to carry them out in general.

pre-B-Systems

Now we move into the proper notes on Voevodsky's "B-Systems" paper.

If C-systems are at the end-point of the conveyor belt, we need the pieces in the middle. And that is what a B-system is. Continuing the analogy with the conveyor belt, what we get out at the end is a “finished piece” — so an object in a C-system is a categorified version of a “type in-context” capturing the “indexing” or “fibration” of that type over its “base space” of types it may depend on, and also capturing the families of terms that may be formed in various different contexts of other terms with other types.

B-systems, which are a more “syntactic” presentation, have a very direct notion of “dynamics” built in — they describe how objects and contexts may be combined and put together, and directly give, by their laws, which slots fit into which tabs, etc. Furthermore, B-systems are to be built by equipping simpler systems with successively more structure. This gives us a certain sort of notion of how to talk about the distinction between things closer to "raw syntax" (not imbued with any particular meaning) and that subset of raw syntactic structures which have certain specified actions.

So enough prelude. What precisely is a B-system? We start with a pre-B-system, as described below (corresponding to Definition 2.1 in the paper).

First there is a family of sets, indexed by the natural numbers. We call it B_n. B_0 is to be thought of as the empty context. B_1 as the set of typing contexts with one element, B_2 as the set with two elements, where the second may be indexed over the first, etc. Elements of B_3 thus can be thought of as looking like "x_1 : T_1, x_2 : T_2(x_1), x_3 : T_3(x_1,x_2)" where T_2 is a type family over one type, T_3 a type family over two types, etc.

For all typing contexts of at least one element, we can also interpret them as simply the _type_ of their last element, but as indexed by the types of all their prior elements. Conceptually, B_n is the set of "types in context, with no more than n-1 dependencies".

Now, we introduce another family of sets, indexed by the natural numbers starting at 1. We call this set ˜B_n. ˜B_1 is to be thought of as the set of all values that may be drawn from any type in the set B_1, and soforth. Thus, each set ˜B_n is to be thought of as fibered over B_n. We think of this as "terms in context, whose types have no more than n-1 dependencies". Elements of ˜B_3 can be though of as looking like "x_1 : T_1, x_2 : T_2(x_1), x_3 : T_3(x_1,x_2) ⊢ y : x". That is to say, elements of B_n for some n look like "everything to the left of the turnstile" and elements of ˜B_n for some n look like "the left and right hand sides of the turnstile together."

We now, for each n, give a map:

∂ : ˜B_n+1 -> B_n+1.

This map is the witness to this fibration. Conceptually, it says "give me an element of some type of dependency level n, and I will pick out which type this is an element of". We can call ∂ the "type of" operator.

We add a second basic map:

ft : B_n+1 -> B_n

This is a witness to the fact that all our higher B_n are built as extensions of smaller ones. It says "Give me a context, and I will give you the smaller context that has every element except the final one". Alternately, it reads "Give me a type indexed over a context, and I will throw away the type and give back just the context." Or, "Give me a type that may depend on n+1 things, and I will give the type it depends on that may only depend on n things. We can call ft the "context of" operator.

Finally, we add a number of maps to correspond to weakening and substitution -- four in all. In each case, we take m >= n. we denote the i-fold application of ft by ft_i.

1) T (type weakening).

T : (Y : B_n+1) -> (X : B_m+1) -> ft(Y) = ft_(m+1-n)(X) -> B_m+2
This reads: Give me two types-in-context, X and Y. Now, if the context for Y agrees with the context for X in the initial segment (i.e. discarding the elements of the context of X which are "longer" than the context for Y), then I can give you back X again, but now in an extended context that includes Y as well.

2) ˜T (term weakening).

˜T : (Y : B_n+1) -> (r : ˜B_m+1) -> ft(Y)=ft_(m+1-n)(∂(r)) -> ˜B_m+2

This reads: Give me a type-in-context Y, and a term-in-context r. Now, if the context of Y agrees with the context for the type of r as above, then I can give you back r again, but now as a term-in-context whose type has an extended context that includes Y as well.

3) S (type substitution).

S : (s : ˜B_n+1) -> (X : B_m+2) -> ∂(s) = ft_(m+1-n)(X) -> B_m+1

This reads: give me a term-in-context s, and a type-in-context X. Now, if the context of the type of s agrees with the context of the X in the initial segment, we may then produce a new type, which is X with one less element in its context (because we have substituted the explicit term s for where the prior dependency data was recorded).

4) ˜S (term substitution).

˜S : (s : ˜B_n+1) -> (r : ˜B_m+2) -> ∂(s) = ft_(m+1-n)(∂(r)) -> ˜B_m+1

This reads: give me two terms-in-context, r and s. Now given the usual compatibility condition on contexts, we can produce a new term, which is like r, but where the context has one less dependency (because we have substituted the explicit term s for everywhere where there was dependency data prior).

Let us now review what we have: We have dependent terms and types, related by explicit maps between them. For every term we have its type, and for every type we have its context. Furthermore, we have weakening by types of types and terms -- so we record where "extra types" may be introduced into contexts without harm. We also have substitution of terms into type and terms -- so we record where reductions may take place, and the resulting effect on the dependency structure.

Unital pre-B-systems

We now introduce a further piece of data, which renders a pre-B-system a _unital_ pre-B-system, corresponding to Definition 2.2 in the paper. For each n we add an operation:

δ : B_n+1 -> ˜B_n+2

This map "turns a context into a term". Conceptually it is the step that equips a pre-B-system with a universe, as it is what allows types to transform into terms. I find the general definition a bit confusing, but I believe it can be rendered syntactically for for B_2, it can be as something like the following: "x_1 : T_1, x_2 : T_2(x_1) -> x_1 : T_1, x_2 : T_2(x_1), x_3 : U ⊢ x_2 : x_3". That is to say, given any context, we now have a universe U that gives the type of "universes of types", and we say that the type itself is a term that is an element of a universe. Informally, one can think of δ as the "term of" operator.

But this specific structure is not indicated by any laws yet on δ. Indeed, the next thing we do is to introduce a "B0-system" which adds some further coherence conditions to restrict this generality.

B0-systems

The following are my attempt to "verbalize" the B0 system conditions (as restrictions on non-unital pre-B-systems) as covered in definition 2.5. I do not reproduce here the actual formal statements of these conditions, for which one should refer to the paper and just reason through very carefully.

1. The context of a weakening of a type is the same as the weakening of the context of a type.

2. The type of the weakening of a term is the same as the weakening of the type of a term.

3. The context of a substitution into a type is the same as the substitution into a context of a type

4. The type of a substitution into a term is the same as a substitution into the type of a term.

Finally, we "upgrade" a non-unital B0-system to a unital B0-system with one further condition:

5. ∂(δ(X)) =T(X,X).

I read this to say that, "for any type-in-context X, the type of the term of X is the same as the weakening of the context of X by the assumption of X itself." This is to say, if I create a term for some type X, and then discard that term, this is the same as extending the context of X by X again.

Here, I have not even gotten to "full" B-systems yet, and am only on page 5 of a seventeen page paper. But I have been poking at these notes for long enough without posting them, so I'll leave off for now, and hopefully, possibly, when time permits, return to at least the second half of section 2.

On the unsafety of interleaved I/O

2022-10-24T17:47:25Z

One area where I'm at odds with the prevailing winds in Haskell is lazy I/O. It's often said that lazy I/O is evil, scary and confusing, and it breaks things like referential transparency. Having a soft spot for it, and not liking most of the alternatives, I end up on the opposite side when the topic comes up, if I choose to pick the fight. I usually don't feel like I come away from such arguments having done much good at giving lazy I/O its justice. So, I thought perhaps it would be good to spell out my whole position, so that I can give the best defense I can give, and people can continue to ignore it, without costing me as much time in the future. :)

So, what's the argument that lazy I/O, or unsafeInterleaveIO on which it's based, breaks referential transparency? It usually looks something like this:

 
swap (x, y) = (y, x)
 
setup = do
  r1 < - newIORef True
  r2 <- newIORef True
  v1 <- unsafeInterleaveIO $ do writeIORef r2 False ; readIORef r1
  v2 <- unsafeInterleaveIO $ do writeIORef r1 False ; readIORef r2
  return (v1, v2)
 
main = do
  p1 <- setup
  p2 <- setup
  print p1
  print . swap $ p2

I ran this, and got:

(True, False)
(True, False)

So this is supposed to demonstrate that the pure values depend on evaluation order, and we have broken a desirable property of Haskell.

First a digression. Personally I distinguish the terms, "referential transparency," and, "purity," and use them to identify two desirable properties of Haskell. The first I use for the property that allows you to factor your program by introducing (or eliminating) named subexpressions. So, instead of:

 
f e e

we are free to write:

 
let x = e in f x x

or some variation. I have no argument for this meaning, other than it's what I thought it meant when I first heard the term used with respect to Haskell, it's a useful property, and it's the best name I can think of for the property. I also (of course) think it's better than some of the other explanations you'll find for what people mean when they say Haskell has referential transparency, since it doesn't mention functions or "values". It's just about equivalence of expressions.

Anyhow, for me, the above example is in no danger of violating referential transparency. There is no factoring operation that will change the meaning of the program. I can even factor out setup (or inline it, since it's already named):

 
main = let m = setup
        in do p1 < - m
              p2 <- m
              print p1
              print . swap $ p2

This is the way in which IO preserves referential transparency, unlike side effects, in my view (note: the embedded language represented by IO does not have this property, since otherwise p1 could be used in lieu of p2; this is why you shouldn't spend much time writing IO stuff, because it's a bad language embedded in a good one).

The other property, "purity," I pull from Amr Sabry's paper, What is a Purely Functional Language? There he argues that a functional language should be considered "pure" if it is an extension of the lambda calculus in which there are no contexts which observe differences in evaluation order. Effectively, evaluation order must only determine whether or not you get an answer, not change the answer you get.

This is slightly different from my definition of referential transparency earlier, but it's also a useful property to have. Referential transparency tells us that we can freely refactor, and purity tells us that we can change the order things are evaluated, both without changing the meaning of our programs.

Now, it would seem that the original interleaving example violates purity. Depending on the order that the values are evaluated, opponents of lazy I/O say, the values change. However, this argument doesn't impress me, because I think the proper way to think about unsafeInterleaveIO is as concurrency, and in that case, it isn't very strange that the results of running it would be non-deterministic. And in that case, there's not much you can do to prove that the evaluation order is affecting results, and that you aren't simply very unlucky and always observing results that happen to correspond to evaluation order.

In fact, there's something I didn't tell you. I didn't use the unsafeInterleaveIO from base. I wrote my own. It looks like this:

 
unsafeInterleaveIO :: IO a -> IO a
unsafeInterleaveIO action = do
  iv < - new
  forkIO $
    randomRIO (1,5) >>= threadDelay . (*1000) >>
    action >>= write iv
  return . read $ iv

iv is an IVar (I used ivar-simple). The pertinent operations on them are:

 
new :: IO (IVar a)
write :: IVar a -> a -> IO ()
read :: IVar a -> a

new creates an empty IVar, and we can write to one only once; trying to write a second time will throw an exception. But this is no problem for me, because I obviously only attempt to write once. read will block until its argument is actually is set, and since that can only happen once, it is considered safe for read to not require IO. [1]

Using this and forkIO, one can easily write something like unsafeInterleaveIO, which accepts an IO a argument and yields an IO a whose result is guaranteed to be the result of running the argument at some time in the future. The only difference is that the real unsafeInterleaveIO schedules things just in time, whereas mine schedules them in a relatively random order (I'll admit I had to try a few times before I got the 'expected' lazy IO answer).

But, we could even take this to be the specification of interleaving. It runs IO actions concurrently, and you will be fine as long as you aren't attempting to depend on the exact scheduling order (or whether things get scheduled at all in some cases).

In fact, thinking of lazy I/O as concurrency turns most spooky examples into threading problems that I would expect most people to consider rather basic. For instance:

Don't pass a handle to another thread and close it in the original.
Don't fork more file-reading threads than you have file descriptors.
Don't fork threads to handle files if you're concerned about the files being closed deterministically.
Don't read from the same handle in multiple threads (unless you don't care about each thread seeing a random subsequence of the stream).

And of course, the original example in this article is just non-determinism introduced by concurrency, but not of a sort that requires fundamentally different explanation than fork. The main pitfall, in my biased opinion, is that the scheduling for interleaving is explained in a way that encourages people to try to guess exactly what it will do. But the presumption of purity (and the reordering GHC actually does based on it) actually means that you cannot assume that much more about the scheduling than you can about my scheduler, at least in general.

This isn't to suggest that lazy I/O is appropriate for every situation. Sometimes the above advice means that it is not appropriate to use concurrency. However, in my opinion, people are over eager to ban lazy I/O even for simple uses where it is the nicest solution, and justify it based on the 'evil' and 'confusing' ascriptions. But, personally, I don't think this is justified, unless one does the same for pretty much all concurrency.

I suppose the only (leading) question left to ask is which should be declared unsafe, fork or ivars, since together they allow you to construct a(n even less deterministic) unsafeInterleaveIO?

[1] Note that there are other implementations of IVar. I'd expect the most popular to be in monad-par by Simon Marlow. That allows one to construct an operation like read, but it is actually less deterministic in my construction, because it seems that it will not block unless perhaps you write and read within a single 'transaction,' so to speak.

In fact, this actually breaks referential transparency in conjunction with forkIO:

 
deref = runPar . get
 
randomDelay = randomRIO (1,10) >>= threadDelay . (1000*)
 
myHandle m = m `catch` \(_ :: SomeExpression) -> putStrLn "Bombed"
 
mySpawn :: IO a -> IO (IVar a)
mySpawn action = do
  iv < - runParIO new
  forkIO $ randomDelay >> action >>= runParIO . put_ iv
  return iv
 
main = do
  iv < - mySpawn (return True)
  myHandle . print $ deref iv
  randomDelay
  myHandle . print $ deref iv

Sometimes this will print "Bombed" twice, and sometimes it will print "Bombed" followed by "True". The latter will never happen if we factor out the deref iv however. The blocking behavior is essential to deref maintaining referential transparency, and it seems like monad-par only blocks within a single runPar, not across multiples. Using ivar-simple in this example always results in "True" being printed twice.

It is also actually possible for unsafeInterleaveIO to break referential transparency if it is implemented incorrectly (or if the optimizer mucks with the internals in some bad way). But I haven't seen an example that couldn't be considered a bug in the implementation rather than some fundamental misbehavior. And my reference implementation here (with a suboptimal scheduler) suggests that there is no break that isn't just a bug.

Categories of Structures in Haskell

2022-10-24T17:47:25Z

In the last couple posts I've used some 'free' constructions, and not remarked too much on how they arise. In this post, I'd like to explore them more. This is going to be something of a departure from the previous posts, though, since I'm not going to worry about thinking precisely about bottom/domains. This is more an exercise in applying some category theory to Haskell, "fast and loose".

(Advance note: for some continuous code to look at see this file.)

First, it'll help to talk about how some categories can work in Haskell. For any kind k made of * and (->), [0] we can define a category of type constructors. Objects of the category will be first-class [1] types of that kind, and arrows will be defined by the following type family:

 
newtype Transformer f g = Transform { ($$) :: forall i. f i ~> g i }
 
type family (~>) :: k -> k -> * where
  (~>) = (->)
  (~>) = Transformer
 
type a < -> b = (a -> b, b -> a)
type a < ~> b = (a ~> b, b ~> a)

So, for a base case, * has monomorphic functions as arrows, and categories for higher kinds have polymorphic functions that saturate the constructor:

 
  Int ~> Char = Int -> Char
  Maybe ~> [] = forall a. Maybe a -> [a]
  Either ~> (,) = forall a b. Either a b -> (a, b)
  StateT ~> ReaderT = forall s m a. StateT s m a -> ReaderT s m a

We can of course define identity and composition for these, and it will be handy to do so:

 
class Morph (p :: k -> k -> *) where
  id :: p a a
  (.) :: p b c -> p a b -> p a c
 
instance Morph (->) where
  id x = x
  (g . f) x = g (f x)
 
instance Morph ((~>) :: k -> k -> *)
      => Morph (Transformer :: (i -> k) -> (i -> k) -> *) where
  id = Transform id
  Transform f . Transform g = Transform $ f . g

These categories can be looked upon as the most basic substrates in Haskell. For instance, every type of kind * -> * is an object of the relevant category, even if it's a GADT or has other structure that prevents it from being nicely functorial.

The category for * is of course just the normal category of types and functions we usually call Hask, and it is fairly analogous to the category of sets. One common activity in category theory is to study categories of sets equipped with extra structure, and it turns out we can do this in Haskell, as well. And it even makes some sense to study categories of structures over any of these type categories.

When we equip our types with structure, we often use type classes, so that's how I'll do things here. Classes have a special status socially in that we expect people to only define instances that adhere to certain equational rules. This will take the place of equations that we are not able to state in the Haskell type system, because it doesn't have dependent types. So using classes will allow us to define more structures that we normally would, if only by convention.

So, if we have a kind k, then a corresponding structure will be σ :: k -> Constraint. We can then define the category (k,σ) as having objects t :: k such that there is an instance σ t. Arrows are then taken to be f :: t ~> u such that f "respects" the operations of σ.

As a simple example, we have:

 
  k = *
  σ = Monoid :: * -> Constraint
 
  Sum Integer, Product Integer, [Integer] :: (*, Monoid)
 
  f :: (Monoid m, Monoid n) => m -> n
    if f mempty = mempty
       f (m <> n) = f m <> f n

This is just the category of monoids in Haskell.

As a side note, we will sometimes be wanting to quantify over these "categories of structures". There isn't really a good way to package together a kind and a structure such that they work as a unit, but we can just add a constraint to the quantification. So, to quantify over all Monoids, we'll use 'forall m. Monoid m => ...'.

Now, once we have these categories of structures, there is an obvious forgetful functor back into the unadorned category. We can then look for free and cofree functors as adjoints to this. More symbolically:

 
  Forget σ :: (k,σ) -> k
  Free   σ :: k -> (k,σ)
  Cofree σ :: k -> (k,σ)
 
  Free σ ⊣ Forget σ ⊣ Cofree σ

However, what would be nicer (for some purposes) than having to look for these is being able to construct them all systematically, without having to think much about the structure σ.

Category theory gives a hint at this, too, in the form of Kan extensions. In category terms they look like:

  p : C -> C'
  f : C -> D
  Ran p f : C' -> D
  Lan p f : C' -> D

  Ran p f c' = end (c : C). Hom_C'(c', p c) ⇒ f c
  Lan p f c' = coend (c : c). Hom_C'(p c, c') ⊗ f c

where ⇒ is a "power" and ⊗ is a copower, which are like being able to take exponentials and products by sets (or whatever the objects of the hom category are), instead of other objects within the category. Ends and coends are like universal and existential quantifiers (as are limits and colimits, but ends and coends involve mixed-variance).

Some handy theorems relate Kan extensions and adjoint functors:

  if L ⊣ R
  then L = Ran R Id and R = Lan L Id

  if Ran R Id exists and is absolute
  then Ran R Id ⊣ R

  if Lan L Id exists and is absolute
  then L ⊣ Lan L Id

  Kan P F is absolute iff forall G. (G . Kan P F) ~= Kan P (G . F)

It turns out we can write down Kan extensions fairly generally in Haskell. Our restricted case is:

 
  p = Forget σ :: (k,σ) -> k
  f = Id :: (k,σ) -> (k,σ)
 
  Free   σ = Ran (Forget σ) Id :: k -> (k,σ)
  Cofree σ = Lan (Forget σ) Id :: k -> (k,σ)
 
  g :: (k,σ) -> j
  g . Free   σ = Ran (Forget σ) g
  g . Cofree σ = Lan (Forget σ) g

As long as the final category is like one of our type constructor categories, ends are universal quantifiers, powers are function types, coends are existential quantifiers and copowers are product spaces. This only breaks down for our purposes when g is contravariant, in which case they are flipped. For higher kinds, these constructions occur point-wise. So, we can break things down into four general cases, each with cases for each arity:

 
newtype Ran0 σ p (f :: k -> *) a =
  Ran0 { ran0 :: forall r. σ r => (a ~> p r) -> f r }
 
newtype Ran1 σ p (f :: k -> j -> *) a b =
  Ran1 { ran1 :: forall r. σ r => (a ~> p r) -> f r b }
 
-- ...
 
data RanOp0 σ p (f :: k -> *) a =
  forall e. σ e => RanOp0 (a ~> p e) (f e)
 
-- ...
 
data Lan0 σ p (f :: k -> *) a =
  forall e. σ e => Lan0 (p e ~> a) (f e)
 
data Lan1 σ p (f :: k -> j -> *) a b =
  forall e. σ e => Lan1 (p e ~> a) (f e b)
 
-- ...
 
data LanOp0 σ p (f :: k -> *) a =
  LanOp0 { lan0 :: forall r. σ r => (p r -> a) -> f r }
 
-- ...

The more specific proposed (co)free definitions are:

 
type family Free   :: (k -> Constraint) -> k -> k
type family Cofree :: (k -> Constraint) -> k -> k
 
newtype Free0 σ a = Free0 { gratis0 :: forall r. σ r => (a ~> r) -> r }
type instance Free = Free0
 
newtype Free1 σ f a = Free1 { gratis1 :: forall g. σ g => (f ~> g) -> g a }
type instance Free = Free1
 
-- ...
 
data Cofree0 σ a = forall e. σ e => Cofree0 (e ~> a) e
type instance Cofree = Cofree0
 
data Cofree1 σ f a = forall g. σ g => Cofree1 (g ~> f) (g a)
type instance Cofree = Cofree1
 
-- ...

We can define some handly classes and instances for working with these types, several of which generalize existing Haskell concepts:

 
class Covariant (f :: i -> j) where
  comap :: (a ~> b) -> (f a ~> f b)
 
class Contravariant f where
  contramap :: (b ~> a) -> (f a ~> f b)
 
class Covariant m => Monad (m :: i -> i) where
  pure :: a ~> m a
  join :: m (m a) ~> m a
 
class Covariant w => Comonad (w :: i -> i) where
  extract :: w a ~> a
  split :: w a ~> w (w a)
 
class Couniversal σ f | f -> σ where
  couniversal :: σ r => (a ~> r) -> (f a ~> r)
 
class Universal σ f | f -> σ where
  universal :: σ e => (e ~> a) -> (e ~> f a)
 
instance Covariant (Free0 σ) where
  comap f (Free0 e) = Free0 (e . (.f))
 
instance Monad (Free0 σ) where
  pure x = Free0 $ \k -> k x
  join (Free0 e) = Free0 $ \k -> e $ \(Free0 e) -> e k
 
instance Couniversal σ (Free0 σ) where
  couniversal h (Free0 e) = e h
 
-- ...

The only unfamiliar classes here should be (Co)Universal. They are for witnessing the adjunctions that make Free σ the initial σ and Cofree σ the final σ in the relevant way. Only one direction is given, since the opposite is very easy to construct with the (co)monad structure.

Free σ is a monad and couniversal, Cofree σ is a comonad and universal.

We can now try to convince ourselves that Free σ and Cofree σ are absolute Here are some examples:

 
free0Absolute0 :: forall g σ a. (Covariant g, σ (Free σ a))
               => g (Free0 σ a) < -> Ran σ Forget g a
free0Absolute0 = (l, r)
 where
 l :: g (Free σ a) -> Ran σ Forget g a
 l g = Ran0 $ \k -> comap (couniversal $ remember0 . k) g
 
 r :: Ran σ Forget g a -> g (Free σ a)
 r (Ran0 e) = e $ Forget0 . pure
 
free0Absolute1 :: forall (g :: * -> * -> *) σ a x. (Covariant g, σ (Free σ a))
               => g (Free0 σ a) x < -> Ran σ Forget g a x
free0Absolute1 = (l, r)
 where
 l :: g (Free σ a) x -> Ran σ Forget g a x
 l g = Ran1 $ \k -> comap (couniversal $ remember0 . k) $$ g
 
 r :: Ran σ Forget g a x -> g (Free σ a) x
 r (Ran1 e) = e $ Forget0 . pure
 
free0Absolute0Op :: forall g σ a. (Contravariant g, σ (Free σ a))
                 => g (Free0 σ a) < -> RanOp σ Forget g a
free0Absolute0Op = (l, r)
 where
 l :: g (Free σ a) -> RanOp σ Forget g a
 l = RanOp0 $ Forget0 . pure
 
 r :: RanOp σ Forget g a -> g (Free σ a)
 r (RanOp0 h g) = contramap (couniversal $ remember0 . h) g
 
-- ...

As can be seen, the definitions share a lot of structure. I'm quite confident that with the right building blocks these could be defined once for each of the four types of Kan extensions, with types like:

 
freeAbsolute
  :: forall g σ a. (Covariant g, σ (Free σ a))
  => g (Free σ a) < ~> Ran σ Forget g a
 
cofreeAbsolute
  :: forall g σ a. (Covariant g, σ (Cofree σ a))
  => g (Cofree σ a) < ~> Lan σ Forget g a
 
freeAbsoluteOp
  :: forall g σ a. (Contravariant g, σ (Free σ a))
  => g (Free σ a) < ~> RanOp σ Forget g a
 
cofreeAbsoluteOp
  :: forall g σ a. (Contravariant g, σ (Cofree σ a))
  => g (Cofree σ a) < ~> LanOp σ Forget g a

However, it seems quite difficult to structure things in a way such that GHC will accept the definitions. I've successfully written freeAbsolute using some axioms, but turning those axioms into class definitions and the like seems impossible.

Anyhow, the punchline is that we can prove absoluteness using only the premise that there is a valid σ instance for Free σ and Cofree σ. This tends to be quite easy; we just borrow the structure of the type we are quantifying over. This means that in all these cases, we are justified in saying that Free σ ⊣ Forget σ ⊣ Cofree σ, and we have a very generic presentations of (co)free structures in Haskell. So let's look at some.

We've already seen Free Monoid, and last time we talked about Free Applicative, and its relation to traversals. But, Applicative is to traversal as Functor is to lens, so it may be interesting to consider constructions on that. Both Free Functor and Cofree Functor make Functors:

 
instance Functor (Free1 Functor f) where
  fmap f (Free1 e) = Free1 $ fmap f . e
 
instance Functor (Cofree1 Functor f) where
  fmap f (Cofree1 h e) = Cofree1 h (fmap f e)

And of course, they are (co)monads, covariant functors and (co)universal among Functors. But, it happens that I know some other types with these properties:

 
data CoYo f a = forall e. CoYo (e -> a) (f e)
 
instance Covariant CoYo where
  comap f = Transform $ \(CoYo h e) -> CoYo h (f $$ e)
 
instance Monad CoYo where
  pure = Transform $ CoYo id
  join = Transform $ \(CoYo h (CoYo h' e)) -> CoYo (h . h') e
 
instance Functor (CoYo f) where
  fmap f (CoYo h e) = CoYo (f . h) e
 
instance Couniversal Functor CoYo where
  couniversal tr = Transform $ \(CoYo h e) -> fmap h (tr $$ e)
 
newtype Yo f a = Yo { oy :: forall r. (a -> r) -> f r }
 
instance Covariant Yo where
  comap f = Transform $ \(Yo e) -> Yo $ (f $$) . e
 
instance Comonad Yo where
  extract = Transform $ \(Yo e) -> e id
  split = Transform $ \(Yo e) -> Yo $ \k -> Yo $ \k' -> e $ k' . k
 
instance Functor (Yo f) where
  fmap f (Yo e) = Yo $ \k -> e (k . f)
 
instance Universal Functor Yo where
  universal tr = Transform $ \e -> Yo $ \k -> tr $$ fmap k e

These are the types involved in the (co-)Yoneda lemma. CoYo is a monad, couniversal among functors, and CoYo f is a Functor. Yo is a comonad, universal among functors, and is always a Functor. So, are these equivalent types?

 
coyoIso :: CoYo < ~> Free Functor
coyoIso = (Transform $ couniversal pure, Transform $ couniversal pure)
 
yoIso :: Yo < ~> Cofree Functor
yoIso = (Transform $ universal extract, Transform $ universal extract)

Indeed they are. And similar identities hold for the contravariant versions of these constructions.

I don't have much of a use for this last example. I suppose to be perfectly precise, I should point out that these uses of (Co)Yo are not actually part of the (co-)Yoneda lemma. They are two different constructions. The (co-)Yoneda lemma can be given in terms of Kan extensions as:

 
yoneda :: Ran Id f < ~> f
 
coyoneda :: Lan Id f < ~> f

But, the use of (Co)Yo to make Functors out of things that aren't necessarily is properly thought of in other terms. In short, we have some kind of category of Haskell types with only identity arrows---it is discrete. Then any type constructor, even non-functorial ones, is certainly a functor from said category (call it Haskrete) into the normal one (Hask). And there is an inclusion functor from Haskrete into Hask:

             F
 Haskrete -----> Hask
      |        /|
      |       /
      |      /
Incl  |     /
      |    /  Ran/Lan Incl F
      |   /
      |  /
      v /
    Hask

So, (Co)Free Functor can also be thought of in terms of these Kan extensions involving the discrete category.

To see more fleshed out, loadable versions of the code in this post, see this file. I may also try a similar Agda development at a later date, as it may admit the more general absoluteness constructions easier.

[0]: The reason for restricting ourselves to kinds involving only * and (->) is that they work much more simply than data kinds. Haskell values can't depend on type-level entities without using type classes. For *, this is natural, but for something like Bool -> *, it is more natural for transformations to be able to inspect the booleans, and so should be something more like forall b. InspectBool b => f b -> g b.

[1]: First-class types are what you get by removing type families and synonyms from consideration. The reason for doing so is that these can't be used properly as parameters and the like, except in cases where they reduce to some other type that is first-class. For example, if we define:

 
type I a = a

even though GHC will report I :: * -> *, it is not legal to write Transform I I.

Domains, Sets, Traversals and Applicatives

2022-10-24T17:47:25Z

Last time I looked at free monoids, and noticed that in Haskell lists don't really cut it. This is a consequence of laziness and general recursion. To model a language with those properties, one needs to use domains and monotone, continuous maps, rather than sets and total functions (a call-by-value language with general recursion would use domains and strict maps instead).

This time I'd like to talk about some other examples of this, and point out how doing so can (perhaps) resolve some disagreements that people have about the specific cases.

The first example is not one that I came up with: induction. It's sometimes said that Haskell does not have inductive types at all, or that we cannot reason about functions on its data types by induction. However, I think this is (techincally) inaccurate. What's true is that we cannot simply pretend that that our types are sets and use the induction principles for sets to reason about Haskell programs. Instead, one has to figure out what inductive domains would be, and what their proof principles are.

Fortunately, there are some papers about doing this. The most recent (that I'm aware of) is Generic Fibrational Induction. I won't get too into the details, but it shows how one can talk about induction in a general setting, where one has a category that roughly corresponds to the type theory/programming language, and a second category of proofs that is 'indexed' by the first category's objects. Importantly, it is not required that the second category is somehow 'part of' the type theory being reasoned about, as is often the case with dependent types, although that is also a special case of their construction.

One of the results of the paper is that this framework can be used to talk about induction principles for types that don't make sense as sets. Specifically:

 
newtype Hyp = Hyp ((Hyp -> Int) -> Int)

the type of "hyperfunctions". Instead of interpreting this type as a set, where it would effectively require a set that is isomorphic to the power set of its power set, they interpret it in the category of domains and strict functions mentioned earlier. They then construct the proof category in a similar way as one would for sets, except instead of talking about predicates as subsets, we talk about sub-domains instead. Once this is done, their framework gives a notion of induction for this type.

This example is suitable for ML (and suchlike), due to the strict functions, and sort of breaks the idea that we can really get away with only thinking about sets, even there. Sets are good enough for some simple examples (like flat domains where we don't care about ⊥), but in general we have to generalize induction itself to apply to all types in the 'good' language.

While I haven't worked out how the generic induction would work out for Haskell, I have little doubt that it would, because ML actually contains all of Haskell's data types (and vice versa). So the fact that the framework gives meaning to induction for ML implies that it does so for Haskell. If one wants to know what induction for Haskell's 'lazy naturals' looks like, they can study the ML analogue of:

 
data LNat = Zero | Succ (() -> LNat)

because function spaces lift their codomain, and make things 'lazy'.

----

The other example I'd like to talk about hearkens back to the previous article. I explained how foldMap is the proper fundamental method of the Foldable class, because it can be massaged to look like:

 
foldMap :: Foldable f => f a -> FreeMonoid a

and lists are not the free monoid, because they do not work properly for various infinite cases.

I also mentioned that foldMap looks a lot like traverse:

 
foldMap  :: (Foldable t   , Monoid m)      => (a -> m)   -> t a -> m
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)

And of course, we have Monoid m => Applicative (Const m), and the functions are expected to agree in this way when applicable.

Now, people like to get in arguments about whether traversals are allowed to be infinite. I know Ed Kmett likes to argue that they can be, because he has lots of examples. But, not everyone agrees, and especially people who have papers proving things about traversals tend to side with the finite-only side. I've heard this includes one of the inventors of Traversable, Conor McBride.

In my opinion, the above disagreement is just another example of a situation where we have a generic notion instantiated in two different ways, and intuition about one does not quite transfer to the other. If you are working in a language like Agda or Coq (for proving), you will be thinking about traversals in the context of sets and total functions. And there, traversals are finite. But in Haskell, there are infinitary cases to consider, and they should work out all right when thinking about domains instead of sets. But I should probably put forward some argument for this position (and even if I don't need to, it leads somewhere else interesting).

One example that people like to give about finitary traversals is that they can be done via lists. Given a finite traversal, we can traverse to get the elements (using Const [a]), traverse the list, then put them back where we got them by traversing again (using State [a]). Usually when you see this, though, there's some subtle cheating in relying on the list to be exactly the right length for the second traversal. It will be, because we got it from a traversal of the same structure, but I would expect that proving the function is actually total to be a lot of work. Thus, I'll use this as an excuse to do my own cheating later.

Now, the above uses lists, but why are we using lists when we're in Haskell? We know they're deficient in certain ways. It turns out that we can give a lot of the same relevant structure to the better free monoid type:

 
newtype FM a = FM (forall m. Monoid m => (a -> m) -> m) deriving (Functor)
 
instance Applicative FM where
  pure x = FM ($ x)
  FM ef < *> FM ex = FM $ \k -> ef $ \f -> ex $ \x -> k (f x)
 
instance Monoid (FM a) where
  mempty = FM $ \_ -> mempty
  mappend (FM l) (FM r) = FM $ \k -> l k <> r k
 
instance Foldable FM where
  foldMap f (FM e) = e f
 
newtype Ap f b = Ap { unAp :: f b }
 
instance (Applicative f, Monoid b) => Monoid (Ap f b) where
  mempty = Ap $ pure mempty
  mappend (Ap l) (Ap r) = Ap $ (<>) < $> l < *> r
 
instance Traversable FM where
  traverse f (FM e) = unAp . e $ Ap . fmap pure . f

So, free monoids are Monoids (of course), Foldable, and even Traversable. At least, we can define something with the right type that wouldn't bother anyone if it were written in a total language with the right features, but in Haskell it happens to allow various infinite things that people don't like.

Now it's time to cheat. First, let's define a function that can take any Traversable to our free monoid:

 
toFreeMonoid :: Traversable t => t a -> FM a
toFreeMonoid f = FM $ \k -> getConst $ traverse (Const . k) f

Now let's define a Monoid that's not a monoid:

 
data Cheat a = Empty | Single a | Append (Cheat a) (Cheat a)
 
instance Monoid (Cheat a) where
  mempty = Empty
  mappend = Append

You may recognize this as the data version of the free monoid from the previous article, where we get the real free monoid by taking a quotient. using this, we can define an Applicative that's not valid:

 
newtype Cheating b a =
  Cheating { prosper :: Cheat b -> a } deriving (Functor)
 
instance Applicative (Cheating b) where
  pure x = Cheating $ \_ -> x
 
  Cheating f < *> Cheating x = Cheating $ \c -> case c of
    Append l r -> f l (x r)

Given these building blocks, we can define a function to relabel a traversable using a free monoid:

 
relabel :: Traversable t => t a -> FM b -> t b
relabel t (FM m) = propser (traverse (const hope) t) (m Single)
 where
 hope = Cheating $ \c -> case c of
   Single x -> x

And we can implement any traversal by taking a trip through the free monoid:

 
slowTraverse
  :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
slowTraverse f t = fmap (relabel t) . traverse f . toFreeMonoid $ t

And since we got our free monoid via traversing, all the partiality I hid in the above won't blow up in practice, rather like the case with lists and finite traversals.

Arguably, this is worse cheating. It relies on the exact association structure to work out, rather than just number of elements. The reason is that for infinitary cases, you cannot flatten things out, and there's really no way to detect when you have something infinitary. The finitary traversals have the luxury of being able to reassociate everything to a canonical form, while the infinite cases force us to not do any reassociating at all. So this might be somewhat unsatisfying.

But, what if we didn't have to cheat at all? We can get the free monoid by tweaking foldMap, and it looks like traverse, so what happens if we do the same manipulation to the latter?

It turns out that lens has a type for this purpose, a slight specialization of which is:

 
newtype Bazaar a b t =
  Bazaar { runBazaar :: forall f. Applicative f => (a -> f b) -> f t }

Using this type, we can reorder traverse to get:

 
howBizarre :: Traversable t => t a -> Bazaar a b (t b)
howBizarre t = Bazaar $ \k -> traverse k t

But now, what do we do with this? And what even is it? [1]

If we continue drawing on intuition from Foldable, we know that foldMap is related to the free monoid. Traversable has more indexing, and instead of Monoid uses Applicative. But the latter are actually related to the former; Applicatives are monoidal (closed) functors. And it turns out, Bazaar has to do with free Applicatives.

If we want to construct free Applicatives, we can use our universal property encoding trick:

 
newtype Free p f a =
  Free { gratis :: forall g. p g => (forall x. f x -> g x) -> g a }

This is a higher-order version of the free p, where we parameterize over the constraint we want to use to represent structures. So Free Applicative f is the free Applicative over a type constructor f. I'll leave the instances as an exercise.

Since free monoid is a monad, we'd expect Free p to be a monad, too. In this case, it is a McBride style indexed monad, as seen in The Kleisli Arrows of Outrageous Fortune.

 
type f ~> g = forall x. f x -> g x
 
embed :: f ~> Free p f
embed fx = Free $ \k -> k fx
 
translate :: (f ~> g) -> Free p f ~> Free p g
translate tr (Free e) = Free $ \k -> e (k . tr)
 
collapse :: Free p (Free p f) ~> Free p f
collapse (Free e) = Free $ \k -> e $ \(Free e') -> e' k

That paper explains how these are related to Atkey style indexed monads:

 
data At key i j where
  At :: key -> At key i i
 
type Atkey m i j a = m (At a j) i
 
ireturn :: IMonad m => a -> Atkey m i i a
ireturn = ...
 
ibind :: IMonad m => Atkey m i j a -> (a -> Atkey m j k b) -> Atkey m i k b
ibind = ...

It turns out, Bazaar is exactly the Atkey indexed monad derived from the Free Applicative indexed monad (with some arguments shuffled) [2]:

 
hence :: Bazaar a b t -> Atkey (Free Applicative) t b a
hence bz = Free $ \tr -> runBazaar bz $ tr . At
 
forth :: Atkey (Free Applicative) t b a -> Bazaar a b t
forth fa = Bazaar $ \g -> gratis fa $ \(At a) -> g a
 
imap :: (a -> b) -> Bazaar a i j -> Bazaar b i j
imap f (Bazaar e) = Bazaar $ \k -> e (k . f)
 
ipure :: a -> Bazaar a i i
ipure x = Bazaar ($ x)
 
(>>>=) :: Bazaar a j i -> (a -> Bazaar b k j) -> Bazaar b k i
Bazaar e >>>= f = Bazaar $ \k -> e $ \x -> runBazaar (f x) k
 
(>==>) :: (s -> Bazaar i o t) -> (i -> Bazaar a b o) -> s -> Bazaar a b t
(f >==> g) x = f x >>>= g

As an aside, Bazaar is also an (Atkey) indexed comonad, and the one that characterizes traversals, similar to how indexed store characterizes lenses. A Lens s t a b is equivalent to a coalgebra s -> Store a b t. A traversal is a similar Bazaar coalgebra:

 
  s -> Bazaar a b t
    ~
  s -> forall f. Applicative f => (a -> f b) -> f t
    ~
  forall f. Applicative f => (a -> f b) -> s -> f t

It so happens that Kleisli composition of the Atkey indexed monad above (>==>) is traversal composition.

Anyhow, Bazaar also inherits Applicative structure from Free Applicative:

 
instance Functor (Bazaar a b) where
  fmap f (Bazaar e) = Bazaar $ \k -> fmap f (e k)
 
instance Applicative (Bazaar a b) where
  pure x = Bazaar $ \_ -> pure x
  Bazaar ef < *> Bazaar ex = Bazaar $ \k -> ef k < *> ex k

This is actually analogous to the Monoid instance for the free monoid; we just delegate to the underlying structure.

The more exciting thing is that we can fold and traverse over the first argument of Bazaar, just like we can with the free monoid:

 
bfoldMap :: Monoid m => (a -> m) -> Bazaar a b t -> m
bfoldMap f (Bazaar e) = getConst $ e (Const . f)
 
newtype Comp g f a = Comp { getComp :: g (f a) } deriving (Functor)
 
instance (Applicative f, Applicative g) => Applicative (Comp g f) where
  pure = Comp . pure . pure
  Comp f < *> Comp x = Comp $ liftA2 (< *>) f x
 
btraverse
  :: (Applicative f) => (a -> f a') -> Bazaar a b t -> Bazaar a' b t
btraverse f (Bazaar e) = getComp $ e (c . fmap ipure . f)

This is again analogous to the free monoid code. Comp is the analogue of Ap, and we use ipure in traverse. I mentioned that Bazaar is a comonad:

 
extract :: Bazaar b b t -> t
extract (Bazaar e) = runIdentity $ e Identity

And now we are finally prepared to not cheat:

 
honestTraverse
  :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
honestTraverse f = fmap extract . btraverse f . howBizarre

So, we can traverse by first turning out Traversable into some structure that's kind of like the free monoid, except having to do with Applicative, traverse that, and then pull a result back out. Bazaar retains the information that we're eventually building back the same type of structure, so we don't need any cheating.

To pull this back around to domains, there's nothing about this code to object to if done in a total language. But, if we think about our free Applicative-ish structure, in Haskell, it will naturally allow infinitary expressions composed of the Applicative operations, just like the free monoid will allow infinitary monoid expressions. And this is okay, because some Applicatives can make sense of those, so throwing them away would make the type not free, in the same way that even finite lists are not the free monoid in Haskell. And this, I think, is compelling enough to say that infinite traversals are right for Haskell, just as they are wrong for Agda.

For those who wish to see executable code for all this, I've put a files here and here. The latter also contains some extra goodies at the end that I may talk about in further installments.

[1] Truth be told, I'm not exactly sure.

[2] It turns out, you can generalize Bazaar to have a correspondence for every choice of p

 
newtype Bizarre p a b t =
  Bizarre { bizarre :: forall f. p f => (a -> f b) -> f t }

hence and forth above go through with the more general types. This can be seen here.

Free Monoids in Haskell

2022-10-24T17:47:25Z

It is often stated that Foldable is effectively the toList class. However, this turns out to be wrong. The real fundamental member of Foldable is foldMap (which should look suspiciously like traverse, incidentally). To understand exactly why this is, it helps to understand another surprising fact: lists are not free monoids in Haskell.

This latter fact can be seen relatively easily by considering another list-like type:

 
data SL a = Empty | SL a :> a
 
instance Monoid (SL a) where
  mempty = Empty
  mappend ys Empty = ys
  mappend ys (xs :> x) = (mappend ys xs) :> x
 
single :: a -> SL a
single x = Empty :> x

So, we have a type SL a of snoc lists, which are a monoid, and a function that embeds a into SL a. If (ordinary) lists were the free monoid, there would be a unique monoid homomorphism from lists to snoc lists. Such a homomorphism (call it h) would have the following properties:

 
h [] = Empty
h (xs <> ys) = h xs <> h ys
h [x] = single x

And in fact, this (together with some general facts about Haskell functions) should be enough to define h for our purposes (or any purposes, really). So, let's consider its behavior on two values:

 
h [1] = single 1
 
h [1,1..] = h ([1] <> [1,1..]) -- [1,1..] is an infinite list of 1s
          = h [1] <> h [1,1..]

This second equation can tell us what the value of h is at this infinite value, since we can consider it the definition of a possibly infinite value:

 
x = h [1] <> x = fix (single 1 <>)
h [1,1..] = x

(single 1 <>) is a strict function, so the fixed point theorem tells us that x = ⊥.

This is a problem, though. Considering some additional equations:

 
[1,1..] <> [n] = [1,1..] -- true for all n
h [1,1..] = ⊥
h ([1,1..] <> [1]) = h [1,1..] <> h [1]
                   = ⊥ <> single 1
                   = ⊥ :> 1
                   ≠ ⊥

So, our requirements for h are contradictory, and no such homomorphism can exist.

The issue is that Haskell types are domains. They contain these extra partially defined values and infinite values. The monoid structure on (cons) lists has infinite lists absorbing all right-hand sides, while the snoc lists are just the opposite.

This also means that finite lists (or any method of implementing finite sequences) are not free monoids in Haskell. They, as domains, still contain the additional bottom element, and it absorbs all other elements, which is incorrect behavior for the free monoid:

 
pure x <> ⊥ = ⊥
h ⊥ = ⊥
h (pure x <> ⊥) = [x] <> h ⊥
                = [x] ++ ⊥
                = x:⊥
                ≠ ⊥

So, what is the free monoid? In a sense, it can't be written down at all in Haskell, because we cannot enforce value-level equations, and because we don't have quotients. But, if conventions are good enough, there is a way. First, suppose we have a free monoid type FM a. Then for any other monoid m and embedding a -> m, there must be a monoid homomorphism from FM a to m. We can model this as a Haskell type:

 
forall a m. Monoid m => (a -> m) -> FM a -> m

Where we consider the Monoid m constraint to be enforcing that m actually has valid monoid structure. Now, a trick is to recognize that this sort of universal property can be used to define types in Haskell (or, GHC at least), due to polymorphic types being first class; we just rearrange the arguments and quantifiers, and take FM a to be the polymorphic type:

 
newtype FM a = FM { unFM :: forall m. Monoid m => (a -> m) -> m }

Types defined like this are automatically universal in the right sense. [1] The only thing we have to check is that FM a is actually a monoid over a. But that turns out to be easily witnessed:

 
embed :: a -> FM a
embed x = FM $ \k -> k x
 
instance Monoid (FM a) where
  mempty = FM $ \_ -> mempty
  mappend (FM e1) (FM e2) = FM $ \k -> e1 k <> e2 k

Demonstrating that the above is a proper monoid delegates to instances of Monoid being proper monoids. So as long as we trust that convention, we have a free monoid.

However, one might wonder what a free monoid would look like as something closer to a traditional data type. To construct that, first ignore the required equations, and consider only the generators; we get:

 
data FMG a = None | Single a | FMG a :<> FMG a

Now, the proper FM a is the quotient of this by the equations:

 
None :<> x = x = x :<> None
x :<> (y :<> z) = (x :<> y) :<> z

One way of mimicking this in Haskell is to hide the implementation in a module, and only allow elimination into Monoids (again, using the convention that Monoid ensures actual monoid structure) using the function:

 
unFMG :: forall a m. Monoid m => FMG a -> (a -> m) -> m
unFMG None _ = mempty
unFMG (Single x) k = k x
unFMG (x :<> y) k = unFMG x k <> unFMG y k

This is actually how quotients can be thought of in richer languages; the quotient does not eliminate any of the generated structure internally, it just restricts the way in which the values can be consumed. Those richer languages just allow us to prove equations, and enforce properties by proof obligations, rather than conventions and structure hiding. Also, one should note that the above should look pretty similar to our encoding of FM a using universal quantification earlier.

Now, one might look at the above and have some objections. For one, we'd normally think that the quotient of the above type is just [a]. Second, it seems like the type is revealing something about the associativity of the operations, because defining recursive values via left nesting is different from right nesting, and this difference is observable by extracting into different monoids. But aren't monoids supposed to remove associativity as a concern? For instance:

 
ones1 = embed 1 <> ones1
ones2 = ones2 <> embed 1

Shouldn't we be able to prove these are the same, becuase of an argument like:

 
ones1 = embed 1 <> (embed 1 <> ...)
      ... reassociate ...
      = (... <> embed 1) <> embed 1
      = ones2

The answer is that the equation we have only specifies the behavior of associating three values:

 
x <> (y <> z) = (x <> y) <> z

And while this is sufficient to nail down the behavior of finite values, and finitary reassociating, it does not tell us that infinitary reassociating yields the same value back. And the "... reassociate ..." step in the argument above was decidedly infinitary. And while the rules tell us that we can peel any finite number of copies of embed 1 to the front of ones1 or the end of ones2, it does not tell us that ones1 = ones2. And in fact it is vital for FM a to have distinct values for these two things; it is what makes it the free monoid when we're dealing with domains of lazy values.

Finally, we can come back to Foldable. If we look at foldMap:

 
foldMap :: (Foldable f, Monoid m) => (a -> m) -> f a -> m

we can rearrange things a bit, and get the type:

 
Foldable f => f a -> (forall m. Monoid m => (a -> m) -> m)

And thus, the most fundamental operation of Foldable is not toList, but toFreeMonoid, and lists are not free monoids in Haskell.

[1]: What we are doing here is noting that (co)limits are objects that internalize natural transformations, but the natural transformations expressible by quantification in GHC are already automatically internalized using quantifiers. However, one has to be careful that the quantifiers are actually enforcing the relevant naturality conditions. In many simple cases they are.

Fast Circular Substitution

2022-10-24T17:47:25Z

Emil Axelsson and Koen Claessen wrote a functional pearl last year about Using Circular Programs for Higher-Order Syntax.

About 6 months ago I had an opportunity to play with this approach in earnest, and realized we can speed it up a great deal. This has kept coming up in conversation ever since, so I've decided to write up an article here.

In my bound library I exploit the fact that monads are about substitution to make a monad transformer that manages substitution for me.

Here I'm going to take a more coupled approach.

To have a type system with enough complexity to be worth examining, I'll adapt Dan Doel's UPTS, which is a pure type system with universe polymorphism. I won't finish the implementation here, but from where we get it should be obvious how to finish the job.

Unlike Axelsson and Claessen I'm not going to bother to abstract over my name representation.

To avoid losing the original name from the source, we'll just track names as strings with an integer counting the number of times it has been 'primed'. The name is purely for expository purposes, the real variable identifier is the number. We'll follow the Axelsson and Claessen convention of having the identifier assigned to each binder be larger than any one bound inside of it. If you don't need he original source names you can cull them from the representation, but they can be useful if you are representing a syntax tree for something you parsed and/or that you plan to pretty print later.

 
data Name = Name String Int
   deriving (Show,Read)
 
hint :: Name -> String
hint (Name n _) = n
 
nameId :: Name -> Int
nameId (Name _ i) = i
 
instance Eq Name where
  (==) = (==) `on` nameId
 
instance Ord Name where
  compare = compare `on` nameId
 
prime :: String -> Int -> Name
prime n i = Name n (i + 1)

So what is the language I want to work with?

 
type Level = Int
 
data Constant
  = Level
  | LevelLiteral {-# UNPACK #-} !Level
  | Omega
  deriving (Eq,Ord,Show,Read,Typeable)
 
data Term a
  = Free a
  | Bound {-# UNPACK #-} !Name
  | Constant !Constant
  | Term a :+ {-# UNPACK #-} !Level
  | Max  [Term a]
  | Type !(Term a)
  | Lam   {-# UNPACK #-} !Name !(Term a) !(Term a)
  | Pi    {-# UNPACK #-} !Name !(Term a) !(Term a)
  | Sigma {-# UNPACK #-} !Name !(Term a) !(Term a)
  | App !(Term a) !(Term a)
  | Fst !(Term a)
  | Snd !(Term a)
  | Pair !(Term a) !(Term a) !(Term a)
  deriving (Show,Read,Eq,Ord,Functor,Foldable,Traversable,Typeable)

That is perhaps a bit paranoid about remaining strict, but it seemed like a good idea at the time.

We can define capture avoiding substitution on terms:

 
subst :: Eq a => a -> Term a -> Term a -> Term a
subst a x y = y >>= \a' ->
  if a == a'
    then x
    else return a'

Now we finally need to implement Axelsson and Claessen's circular programming trick. Here we'll abstract over terms that allow us to find the highest bound value within them:

 
class Bindable t where
  bound :: t -> Int

and instantiate it for our Term type

 
instance Bindable (Term a) where
  bound Free{}        = 0
  bound Bound{}       = 0 -- intentional!
  bound Constant{}    = 0
  bound (a :+ _)      = bound a
  bound (Max xs)      = foldr (\a r -> bound a `max` r) 0 xs
  bound (Type t)      = bound t
  bound (Lam b t _)   = nameId b `max` bound t
  bound (Pi b t _)    = nameId b `max` bound t
  bound (Sigma b t _) = nameId b `max` bound t
  bound (App x y)     = bound x `max`  bound y
  bound (Fst t)       = bound t
  bound (Snd t)       = bound t
  bound (Pair t x y)  = bound t `max` bound x `max` bound y

As in the original pearl we avoid traversing into the body of the binders, hence the _'s in the code above.

Now we can abstract over the pattern used to create a binder in the functional pearl, since we have multiple binder types in this syntax tree, and the code would get repetitive.

 
binder :: Bindable t =>
  (Name -> t) ->
  (Name -> t -> r) ->
  String -> (t -> t) -> r
binder bd c n e = c b body where
  body = e (bd b)
  b = prime n (bound body)
 
lam, pi, sigma :: String -> Term a -> (Term a -> Term a) -> Term a
lam s t   = binder Bound (`Lam` t) s
pi s t    = binder Bound (`Pi` t) s
sigma s t = binder Bound (`Sigma` t) s

We may not always want to give names to the variables we capture, so let's define:

lam_, pi_, sigma_ :: Term a -> (Term a -> Term a) -> Term a
lam_   = lam "_"
pi_    = pi "_"
sigma_ = sigma "_"

Now, here's the interesting part. The problem with Axelsson and Claessen's original trick is that every substitution is being handled separately. This means that if you were to write a monad for doing substitution with it, it'd actually be quite slow. You have to walk the syntax tree over and over and over.

We can fuse these together by making a single pass:

 
instantiate :: Name -> t -> IntMap t -> IntMap t
instantiate = IntMap.insert . nameId
 
rebind :: IntMap (Term b) -> Term a -> (a -> Term b) -> Term b
rebind env xs0 f = go xs0 where
  go = \case
    Free a       -> f a
    Bound b      -> env IntMap.! nameId b
    Constant c   -> Constant c
    m :+ n       -> go m :+ n
    Type t       -> Type (go t)
    Max xs       -> Max (fmap go xs)
    Lam b t e    -> lam   (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    Pi b t e     -> pi    (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    Sigma b t e  -> sigma (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    App x y      -> App (go x) (go y)
    Fst x        -> Fst (go x)
    Snd x        -> Snd (go x)
    Pair t x y   -> Pair (go t) (go x) (go y)

Note that the Lam, Pi and Sigma cases just extend the current environment.

With that now we can upgrade the pearl's encoding to allow for an actual Monad in the same sense as bound.

 
instance Applicative Term where
  pure = Free
  (< *>) = ap
 
instance Monad Term where
  return = Free
  (>>=) = rebind IntMap.empty

To show that we can work with this syntax tree representation, let's write an evaluator from it to weak head normal form:

First we'll need some helpers:

 
apply :: Term a -> [Term a] -> Term a
apply = foldl App
 
rwhnf :: IntMap (Term a) ->
  [Term a] -> Term a -> Term a
rwhnf env stk     (App f x)
  = rwhnf env (rebind env x Free:stk) f
rwhnf env (x:stk) (Lam b _ e)
  = rwhnf (instantiate b x env) stk e
rwhnf env stk (Fst e)
  = case rwhnf env [] e of
  Pair _ e' _ -> rwhnf env stk e'
  e'          -> Fst e'
rwhnf env stk (Snd e)
  = case rwhnf env [] e of
  Pair _ _ e' -> rwhnf env stk e'
  e'          -> Snd e'
rwhnf env stk e
  = apply (rebind env e Free) stk

Then we can start off the whnf by calling our helper with an initial starting environment:

 
whnf :: Term a -> Term a
whnf = rwhnf IntMap.empty []

So what have we given up? Well, bound automatically lets you compare terms for alpha equivalence by quotienting out the placement of "F" terms in the syntax tree. Here we have a problem in that the identifiers we get assigned aren't necessarily canonical.

But we can get the same identifiers out by just using the monad above:

 
alphaEq :: Eq a => Term a -> Term a -> Bool
alphaEq = (==) `on` liftM id

It makes me a bit uncomfortable that our monad is only up to alpha equivalence and that liftM swaps out the identifiers used throughout the entire syntax tree, and we've also lost the ironclad protection against exotic terms.

But overall, this is a much faster version of Axelsson and Claessen's trick and it can be used as a drop-in replacement for something like bound in many cases, and unlike bound, it lets you use HOAS-style syntax for constructing lam, pi and sigma terms.

With pattern synonyms you can prevent the user from doing bad things as well. Once 7.10 ships you'd be able to use a bidirectional pattern synonym for Pi, Sigma and Lam to hide the real constructors behind. I'm not yet sure of the "best practices" in this area.

Here's the code all in one place:

[Download Circular.hs]

Happy Holidays,
-Edward