In any case, one of the invited speakers was Vladimir Voevodsky, who gave an invited talk on his recent work relating to univalent foundations titled "From Syntax to Semantics of Dependent Type Theories — Formalized”. This was a very clear talk that helped me understand his current research direction and the motivations for it. I also had the benefit of some very useful conversations with others involved in collaboration with some of this work, who patiently answered my questions. The notes below are complimentary to the slides from his talk.
I had sort of understood what the motivation for studying “C-Systems” was, but I had not taken it on myself to look at Voevodsky’s “B-Systems” before, nor had I grasped how his research programme fit together. Since I found this experience enlightening, I figured I might as well write up what I think I understand, with all the usual caveats. Also note, in all the below, by “type theory” I invariably mean the intensional sort. So all the following is in reference to the B-systems paper that Voevodsky has posted on arXiv (arXiv:1410.5389).
That said, if anything I describe here strikes you as funny, it is more likely that I am not describing things right than that the source material is troublesome — i.e. take this with a grain of salt. And bear in mind that I am not attempting to directly paraphrase Voevodsky himself or others I spoke to, but rather I am giving an account of where what they described resonated with me, and filtered through my own examples, etc. Also, if all of the “why and wherefore” is already familiar to you, feel free to skip directly to the “B-Systems” section where I will just discuss Voevodsky’s paper on this topic, and my attempts to understand portions of it. And if you already understand B-Systems, please do reply and explain all the things I’m sure I’m missing!
Some Review
We have a model of type theory in simiplicial sets that validates the univalence axiom (and now a few other models that validate this axiom as well). This is to say, it is a model with not only higher dimensional structure, but higher structure of a very “coherent” sort. The heart of this relates to our construction of a “universe”. In our categorical model, all our types translate into objects of various sorts. The “universe,” aka the type-of-types, translates into a very special object, one which “indexes” all other objects. A more categorical way of saying this is that all other types are “fibered over” the universe — i.e. that from every other type there is a map back to a specific point within the universe. The univalence axiom can be read as saying that all equivalent types are fibered over points in the universe that are connected (i.e. there is a path between those points).
Even in a relatively simple dependent type theory, equivalence of types quickly becomes undecidable in general, as it is a superset of the problem of deciding type inhabitation, which in turn corresponds to the decidability of propositions in the logic corresponding to a type theory, and then by any number of well-known results cannot be resolved in general for most interesting theories. This in turn means that the structure of a univalent universe is “describable” but it is not fully enumerable, and is very complex.
We also have a line of work dating back to before the introduction of univalence, which investigated the higher groupoid structure (or, if you prefer, higher topological structure or quillen model structure) induced by identity types. But without either univalence or higher-inductive types, this higher groupoid structure is unobservable internally. This is to say, models were possible that would send types to things with higher structure, but no particular use would be made of this higher structure. So, such models could potentially be used to demonstrate that certain new axioms were not conservative over the existing theory, but on their own they did not provide ideas about how to extend the theory.
How to relate this higher groupoid structure to universes? Well, in a universe, one has paths. Without univalence, these are just identity paths. But regardless, we now get a funny “completion” as our identity paths must themselves be objects in our universe, and so too the paths between them, etc. In models without higher structure, we might say “there is only one path from each object to itself” and then we need not worry too much about this potential explosion of paths at each level. But by enforcing the higher groupoid structure, this means that our universe now blossoms with all the potentially distinct paths at each level. However, with the only way in our syntax to create such “extra paths” as reflexivity, any such path structure in our model remains “latent”, and can be added or removed without any effect.
The univalence axiom relies on these higher groupoid structures, but it cannot be reduced to them. Rather, in the model, we must have a fibration over the universe with identity lifting along this fibration to reach the next step — to then modify the universe by forcing paths other than identity paths — those between equivalent types. This is in a sense a further “higher completion” of our universe, adding in first all the possible paths between types, but then the paths between those paths, and so on up. Because, by univalence, we can state such paths, then in our model we must include all
of them.
The Problem
All along I have been saying “models of type theory”. And it is true enough. We do know how to model type theories of various sorts categorically (i.e. representing the translation from their syntax into their semantics as functorial). But we do not have full models of "fully-featured" type theories; i.e. if we view type theories as pizzas we have models of cheese slices, and perhaps slices with olives and slices with pepperoni, etc. But we do not have models of pizzas with "all the toppings". Here, by "toppings" I mean things such as the addition of "all inductive types," "some coinductive types," "certain higher-inductive types," "pattern matching," "induction-induction," "induction-recursion," "excluded middle as an axiom," "choice as an axiom," "propositional resizing as an axiom," etc.
Rather, we have a grab bag of tricks, as well as a few slightly different approaches — Categories with Attributes, Categories with Families, and so forth. One peculiar feature of these sorts of models, as opposed to the models of extensional type theory, is that these models are not indexed by types, but by “lists of types” that directly correspond to the contexts in which we make typing judgments in intensional theories.
In any case, these models are usually used in an ad-hoc fashion. If you want to examine a particular feature of a language, you first pick from one of these different but related sorts of models. Then you go on to build a version with the minimal set of what you need — so maybe identity types, maybe sigma types, maybe natural numbers, and then you introduce your new construction or generate your result or the like.
So people may say “we know how to work with these things, and we know the tricks, so given a theory, we can throw together the facts about it pretty quickly.” Now of course there are maybe only a hundred people on the planet (myself not among them) who can really just throw together a categorical model of some one or another dependent type theory at the drop of a hat.
But there’s a broader problem. How can we speak about the mutual compatibility of different extensions and features if each one is validated independently in a different way? This is a problem very familiar to us in the field of programming languages — you have a lot of “improvements” to your language, all of a similar form. But then you put such “improvements” together and now something goes wrong. In fact, the famous “newtype deriving bug” in GHC some years back, which opened a big hole in the type system, was of exactly that form — two extensions (in that case, newtype deriving and type families) that are on their own safe and useful, together have an unexpected bad effect. It is also possible to imagine bad interactions occuring only when three extensions exist together, and soforth. So as the number of extensions increases, the number of interactions to check spirals upwards in a very difficult fashion.
So the correct way to have confidence in the coexistence of these various extensions is to have a general model that contains the sort of theory we actually want to work in, rather than these toy theories that let us look at portions in isolation. And this certainly involves having a theory that lets us validate all inductive types at once in the model, rather than extending it over and over for each new type we add. Additionally, people tend to model things with at most one universe. And when we are not looking at universes, it is often omitted altogether, or done “incorrectly” as an inhabitant of itself, purely for the sake of convenience.
So now, if I tell someone with mathematical experience what my theory “means” and they say “is this actually proven” I’m in the embarrassing position of saying “no, it is not. but the important bits all are and we know how to put them together.” So here I am, trying to advocate the idea of fully formal verification, but without a fully top-to-bottom formally verified system myself — not even codewise, but in even the basic mathematical sense.
Univalence makes this problem more urgent. Without univalence, we can often get away with more hand-wavy arguments, because things are “obvious”. Furthermore, they relate to the way things are done elsewhere. So logic can be believed by analogy to how people usually think about logic, numbers by analogy to the peano system, which people already “believe in,” and soforth. Furthermore, without univalence, most operations are “directly constructive” in the sense that you can pick your favorite “obvious” and non-categorical model, and they will tend to hold in that as well — so you can think of your types as sets, and terms as elements of sets. Or you can think of your types as classifying computer programs and your terms as runnable code, etc. In each case, the behavior leads to basically what you would expect.
But in none of these “obvious” models does univalence hold. And furthermore, it is “obviously” wrong in them.
And that is just on the “propaganda” side as people say. For the same reasons, univalence tends to be incompatible with many “obvious” extensions — for example, not only “uniqueness of identity proofs” has to go, but pattern matching had to be rethought so as not to imply it, and furthermore it is not known if it is sound in concert with many other extensions such as general coinductive types, etc. (In fact, the newtype deriving bug itself can be seen as a "very special case" of the incompatibility of univalence with Uniqueness of Identity Proofs, as I have been discussing with people informally for quite some time).
Hence, because univalence interacts with so many other extensions, it feels even more urgent to have a full account. Unlike prior research, which really focused on developing and understanding type systems, this is more of an engineering problem, although a proof-engineering problem to be sure.
The Approach
Rather than just giving a full account of “one important type system,” Voevodsky seems to be aiming for a generally smooth way to develop such full accounts even as type systems change. So he is interested in reusable technology, so to speak. One analogy may be that he is interested in building the categorical semantics version of a logical framework. His tool for doing this is what he calls a “C-system”, which is a slight variant of Cartmell’s Categories with Attributes mentioned above. One important aspect of C-systems seems to be that that they stratify types and terms in some fashion, and that you can see them as generated by some “data” about a ground set of types, terms, and relations. To be honest, I haven’t looked at them more closely than that, since I saw at least some of the “point” of them and know that to really understand the details I'll have to study categorical semantics more generally, which is ongoing.
But the plan isn’t just to have a suitable categorical model of type theories. Rather it is to give a description of how one goes from the “raw terms” as syntax trees all the way through to how typing judgments are passed on them and then to their full elaborations in contexts and finally to their eventual “meaning” as categorically presented.
Of course, most of these elements are well studied already, as are their interactions. But they do not live in a particularly compatible formulation with categorical semantics. This then makes it difficult to prove that “all the pieces line up” and in particular, a pain to prove that a given categorical semantics for a given syntax is the “initial” one — i.e. that if there is any other semantics for that syntax, it can be arrived at by first “factoring through” the morphism from syntax to the initial semantics. Such proofs can be executed, but again it would be good to have “reusable technology” to carry them out in general.
pre-B-Systems
Now we move into the proper notes on Voevodsky's "B-Systems" paper.
If C-systems are at the end-point of the conveyor belt, we need the pieces in the middle. And that is what a B-system is. Continuing the analogy with the conveyor belt, what we get out at the end is a “finished piece” — so an object in a C-system is a categorified version of a “type in-context” capturing the “indexing” or “fibration” of that type over its “base space” of types it may depend on, and also capturing the families of terms that may be formed in various different contexts of other terms with other types.
B-systems, which are a more “syntactic” presentation, have a very direct notion of “dynamics” built in — they describe how objects and contexts may be combined and put together, and directly give, by their laws, which slots fit into which tabs, etc. Furthermore, B-systems are to be built by equipping simpler systems with successively more structure. This gives us a certain sort of notion of how to talk about the distinction between things closer to "raw syntax" (not imbued with any particular meaning) and that subset of raw syntactic structures which have certain specified actions.
So enough prelude. What precisely is a B-system? We start with a pre-B-system, as described below (corresponding to Definition 2.1 in the paper).
First there is a family of sets, indexed by the natural numbers. We call it B_n
. B_0
is to be thought of as the empty context. B_1
as the set of typing contexts with one element, B_2
as the set with two elements, where the second may be indexed over the first, etc. Elements of B_3
thus can be thought of as looking like "x_1 : T_1, x_2 : T_2(x_1), x_3 : T_3(x_1,x_2)
" where T_2
is a type family over one type, T_3
a type family over two types, etc.
For all typing contexts of at least one element, we can also interpret them as simply the _type_ of their last element, but as indexed by the types of all their prior elements. Conceptually, B_n
is the set of "types in context, with no more than n-1 dependencies".
Now, we introduce another family of sets, indexed by the natural numbers starting at 1. We call this set ˜B_n
. ˜B_1
is to be thought of as the set of all values that may be drawn from any type in the set B_1, and soforth. Thus, each set ˜B_n
is to be thought of as fibered over B_n
. We think of this as "terms in context, whose types have no more than n-1 dependencies". Elements of ˜B_3
can be though of as looking like "x_1 : T_1, x_2 : T_2(x_1), x_3 : T_3(x_1,x_2) ⊢ y : x
". That is to say, elements of B_n
for some n look like "everything to the left of the turnstile" and elements of ˜B_n for some n look like "the left and right hand sides of the turnstile together."
We now, for each n, give a map:
∂ : ˜B_n+1 -> B_n+1.
This map is the witness to this fibration. Conceptually, it says "give me an element of some type of dependency level n, and I will pick out which type this is an element of". We can call ∂ the "type of" operator.
We add a second basic map:
ft : B_n+1 -> B_n
This is a witness to the fact that all our higher B_n
are built as extensions of smaller ones. It says "Give me a context, and I will give you the smaller context that has every element except the final one". Alternately, it reads "Give me a type indexed over a context, and I will throw away the type and give back just the context." Or, "Give me a type that may depend on n+1 things, and I will give the type it depends on that may only depend on n things. We can call ft the "context of" operator.
Finally, we add a number of maps to correspond to weakening and substitution -- four in all. In each case, we take m >= n. we denote the i-fold application of ft
by ft_i
.
1) T (type weakening).
T : (Y : B_n+1) -> (X : B_m+1) -> ft(Y) = ft_(m+1-n)(X) -> B_m+2
This reads: Give me two types-in-context, X and Y. Now, if the context for Y agrees with the context for X in the initial segment (i.e. discarding the elements of the context of X which are "longer" than the context for Y), then I can give you back X again, but now in an extended context that includes Y as well.
2) ˜T (term weakening).
˜T : (Y : B_n+1) -> (r : ˜B_m+1) -> ft(Y)=ft_(m+1-n)(∂(r)) -> ˜B_m+2
This reads: Give me a type-in-context Y, and a term-in-context r. Now, if the context of Y agrees with the context for the type of r as above, then I can give you back r again, but now as a term-in-context whose type has an extended context that includes Y as well.
3) S (type substitution).
S : (s : ˜B_n+1) -> (X : B_m+2) -> ∂(s) = ft_(m+1-n)(X) -> B_m+1
This reads: give me a term-in-context s, and a type-in-context X. Now, if the context of the type of s agrees with the context of the X in the initial segment, we may then produce a new type, which is X with one less element in its context (because we have substituted the explicit term s for where the prior dependency data was recorded).
4) ˜S (term substitution).
˜S : (s : ˜B_n+1) -> (r : ˜B_m+2) -> ∂(s) = ft_(m+1-n)(∂(r)) -> ˜B_m+1
This reads: give me two terms-in-context, r and s. Now given the usual compatibility condition on contexts, we can produce a new term, which is like r, but where the context has one less dependency (because we have substituted the explicit term s for everywhere where there was dependency data prior).
Let us now review what we have: We have dependent terms and types, related by explicit maps between them. For every term we have its type, and for every type we have its context. Furthermore, we have weakening by types of types and terms -- so we record where "extra types" may be introduced into contexts without harm. We also have substitution of terms into type and terms -- so we record where reductions may take place, and the resulting effect on the dependency structure.
Unital pre-B-systems
We now introduce a further piece of data, which renders a pre-B-system a _unital_ pre-B-system, corresponding to Definition 2.2 in the paper. For each n we add an operation:
δ : B_n+1 -> ˜B_n+2
This map "turns a context into a term". Conceptually it is the step that equips a pre-B-system with a universe, as it is what allows types to transform into terms. I find the general definition a bit confusing, but I believe it can be rendered syntactically for for B_2, it can be as something like the following: "x_1 : T_1, x_2 : T_2(x_1) -> x_1 : T_1, x_2 : T_2(x_1), x_3 : U ⊢ x_2 : x_3
". That is to say, given any context, we now have a universe U
that gives the type of "universes of types", and we say that the type itself is a term that is an element of a universe. Informally, one can think of δ
as the "term of" operator.
But this specific structure is not indicated by any laws yet on δ. Indeed, the next thing we do is to introduce a "B0-system" which adds some further coherence conditions to restrict this generality.
B0-systems
The following are my attempt to "verbalize" the B0 system conditions (as restrictions on non-unital pre-B-systems) as covered in definition 2.5. I do not reproduce here the actual formal statements of these conditions, for which one should refer to the paper and just reason through very carefully.
1. The context of a weakening of a type is the same as the weakening of the context of a type.
2. The type of the weakening of a term is the same as the weakening of the type of a term.
3. The context of a substitution into a type is the same as the substitution into a context of a type
4. The type of a substitution into a term is the same as a substitution into the type of a term.
Finally, we "upgrade" a non-unital B0-system to a unital B0-system with one further condition:
5. ∂(δ(X)) =T(X,X).
I read this to say that, "for any type-in-context X, the type of the term of X is the same as the weakening of the context of X by the assumption of X itself." This is to say, if I create a term for some type X, and then discard that term, this is the same as extending the context of X by X again.
Here, I have not even gotten to "full" B-systems yet, and am only on page 5 of a seventeen page paper. But I have been poking at these notes for long enough without posting them, so I'll leave off for now, and hopefully, possibly, when time permits, return to at least the second half of section 2.
]]>So, what's the argument that lazy I/O, or unsafeInterleaveIO
on which it's based, breaks referential transparency? It usually looks something like this:
swap (x, y) = (y, x) setup = do r1 < - newIORef True r2 <- newIORef True v1 <- unsafeInterleaveIO $ do writeIORef r2 False ; readIORef r1 v2 <- unsafeInterleaveIO $ do writeIORef r1 False ; readIORef r2 return (v1, v2) main = do p1 <- setup p2 <- setup print p1 print . swap $ p2
I ran this, and got:
(True, False) (True, False)
So this is supposed to demonstrate that the pure values depend on evaluation order, and we have broken a desirable property of Haskell.
First a digression. Personally I distinguish the terms, "referential transparency," and, "purity," and use them to identify two desirable properties of Haskell. The first I use for the property that allows you to factor your program by introducing (or eliminating) named subexpressions. So, instead of:
f e e
we are free to write:
let x = e in f x x
or some variation. I have no argument for this meaning, other than it's what I thought it meant when I first heard the term used with respect to Haskell, it's a useful property, and it's the best name I can think of for the property. I also (of course) think it's better than some of the other explanations you'll find for what people mean when they say Haskell has referential transparency, since it doesn't mention functions or "values". It's just about equivalence of expressions.
Anyhow, for me, the above example is in no danger of violating referential transparency. There is no factoring operation that will change the meaning of the program. I can even factor out setup
(or inline it, since it's already named):
main = let m = setup in do p1 < - m p2 <- m print p1 print . swap $ p2
This is the way in which IO
preserves referential transparency, unlike side effects, in my view (note: the embedded language represented by IO
does not have this property, since otherwise p1
could be used in lieu of p2
; this is why you shouldn't spend much time writing IO
stuff, because it's a bad language embedded in a good one).
The other property, "purity," I pull from Amr Sabry's paper, What is a Purely Functional Language? There he argues that a functional language should be considered "pure" if it is an extension of the lambda calculus in which there are no contexts which observe differences in evaluation order. Effectively, evaluation order must only determine whether or not you get an answer, not change the answer you get.
This is slightly different from my definition of referential transparency earlier, but it's also a useful property to have. Referential transparency tells us that we can freely refactor, and purity tells us that we can change the order things are evaluated, both without changing the meaning of our programs.
Now, it would seem that the original interleaving example violates purity. Depending on the order that the values are evaluated, opponents of lazy I/O say, the values change. However, this argument doesn't impress me, because I think the proper way to think about unsafeInterleaveIO
is as concurrency, and in that case, it isn't very strange that the results of running it would be non-deterministic. And in that case, there's not much you can do to prove that the evaluation order is affecting results, and that you aren't simply very unlucky and always observing results that happen to correspond to evaluation order.
In fact, there's something I didn't tell you. I didn't use the unsafeInterleaveIO
from base. I wrote my own. It looks like this:
unsafeInterleaveIO :: IO a -> IO a unsafeInterleaveIO action = do iv < - new forkIO $ randomRIO (1,5) >>= threadDelay . (*1000) >> action >>= write iv return . read $ iv
iv
is an IVar
(I used ivar-simple). The pertinent operations on them are:
new :: IO (IVar a) write :: IVar a -> a -> IO () read :: IVar a -> a
new
creates an empty IVar
, and we can write
to one only once; trying to write a second time will throw an exception. But this is no problem for me, because I obviously only attempt to write once. read
will block until its argument is actually is set, and since that can only happen once, it is considered safe for read
to not require IO
. [1]
Using this and forkIO
, one can easily write something like unsafeInterleaveIO
, which accepts an IO a
argument and yields an IO a
whose result is guaranteed to be the result of running the argument at some time in the future. The only difference is that the real unsafeInterleaveIO
schedules things just in time, whereas mine schedules them in a relatively random order (I'll admit I had to try a few times before I got the 'expected' lazy IO answer).
But, we could even take this to be the specification of interleaving. It runs IO
actions concurrently, and you will be fine as long as you aren't attempting to depend on the exact scheduling order (or whether things get scheduled at all in some cases).
In fact, thinking of lazy I/O as concurrency turns most spooky examples into threading problems that I would expect most people to consider rather basic. For instance:
And of course, the original example in this article is just non-determinism introduced by concurrency, but not of a sort that requires fundamentally different explanation than fork. The main pitfall, in my biased opinion, is that the scheduling for interleaving is explained in a way that encourages people to try to guess exactly what it will do. But the presumption of purity (and the reordering GHC actually does based on it) actually means that you cannot assume that much more about the scheduling than you can about my scheduler, at least in general.
This isn't to suggest that lazy I/O is appropriate for every situation. Sometimes the above advice means that it is not appropriate to use concurrency. However, in my opinion, people are over eager to ban lazy I/O even for simple uses where it is the nicest solution, and justify it based on the 'evil' and 'confusing' ascriptions. But, personally, I don't think this is justified, unless one does the same for pretty much all concurrency.
I suppose the only (leading) question left to ask is which should be declared unsafe, fork or ivars, since together they allow you to construct a(n even less deterministic) unsafeInterleaveIO
?
[1] Note that there are other implementations of IVar
. I'd expect the most popular to be in monad-par by Simon Marlow. That allows one to construct an operation like read
, but it is actually less deterministic in my construction, because it seems that it will not block unless perhaps you write and read within a single 'transaction,' so to speak.
In fact, this actually breaks referential transparency in conjunction with forkIO
:
deref = runPar . get randomDelay = randomRIO (1,10) >>= threadDelay . (1000*) myHandle m = m `catch` \(_ :: SomeExpression) -> putStrLn "Bombed" mySpawn :: IO a -> IO (IVar a) mySpawn action = do iv < - runParIO new forkIO $ randomDelay >> action >>= runParIO . put_ iv return iv main = do iv < - mySpawn (return True) myHandle . print $ deref iv randomDelay myHandle . print $ deref iv
Sometimes this will print "Bombed" twice, and sometimes it will print "Bombed" followed by "True". The latter will never happen if we factor out the deref iv
however. The blocking behavior is essential to deref
maintaining referential transparency, and it seems like monad-par only blocks within a single runPar
, not across multiples. Using ivar-simple in this example always results in "True" being printed twice.
It is also actually possible for unsafeInterleaveIO
to break referential transparency if it is implemented incorrectly (or if the optimizer mucks with the internals in some bad way). But I haven't seen an example that couldn't be considered a bug in the implementation rather than some fundamental misbehavior. And my reference implementation here (with a suboptimal scheduler) suggests that there is no break that isn't just a bug.
]]>
(Advance note: for some continuous code to look at see this file.)
First, it'll help to talk about how some categories can work in Haskell. For any kind k
made of *
and (->)
, [0] we can define a category of type constructors. Objects of the category will be first-class [1] types of that kind, and arrows will be defined by the following type family:
newtype Transformer f g = Transform { ($$) :: forall i. f i ~> g i } type family (~>) :: k -> k -> * where (~>) = (->) (~>) = Transformer type a < -> b = (a -> b, b -> a) type a < ~> b = (a ~> b, b ~> a)
So, for a base case, * has monomorphic functions as arrows, and categories for higher kinds have polymorphic functions that saturate the constructor:
Int ~> Char = Int -> Char Maybe ~> [] = forall a. Maybe a -> [a] Either ~> (,) = forall a b. Either a b -> (a, b) StateT ~> ReaderT = forall s m a. StateT s m a -> ReaderT s m a
We can of course define identity and composition for these, and it will be handy to do so:
class Morph (p :: k -> k -> *) where id :: p a a (.) :: p b c -> p a b -> p a c instance Morph (->) where id x = x (g . f) x = g (f x) instance Morph ((~>) :: k -> k -> *) => Morph (Transformer :: (i -> k) -> (i -> k) -> *) where id = Transform id Transform f . Transform g = Transform $ f . g
These categories can be looked upon as the most basic substrates in Haskell. For instance, every type of kind * -> *
is an object of the relevant category, even if it's a GADT or has other structure that prevents it from being nicely functorial.
The category for * is of course just the normal category of types and functions we usually call Hask, and it is fairly analogous to the category of sets. One common activity in category theory is to study categories of sets equipped with extra structure, and it turns out we can do this in Haskell, as well. And it even makes some sense to study categories of structures over any of these type categories.
When we equip our types with structure, we often use type classes, so that's how I'll do things here. Classes have a special status socially in that we expect people to only define instances that adhere to certain equational rules. This will take the place of equations that we are not able to state in the Haskell type system, because it doesn't have dependent types. So using classes will allow us to define more structures that we normally would, if only by convention.
So, if we have a kind k
, then a corresponding structure will be σ :: k -> Constraint
. We can then define the category (k,σ)
as having objects t :: k
such that there is an instance σ t
. Arrows are then taken to be f :: t ~> u
such that f
"respects" the operations of σ
.
As a simple example, we have:
k = * σ = Monoid :: * -> Constraint Sum Integer, Product Integer, [Integer] :: (*, Monoid) f :: (Monoid m, Monoid n) => m -> n if f mempty = mempty f (m <> n) = f m <> f n
This is just the category of monoids in Haskell.
As a side note, we will sometimes be wanting to quantify over these "categories of structures". There isn't really a good way to package together a kind and a structure such that they work as a unit, but we can just add a constraint to the quantification. So, to quantify over all Monoid
s, we'll use 'forall m. Monoid m => ...
'.
Now, once we have these categories of structures, there is an obvious forgetful functor back into the unadorned category. We can then look for free and cofree functors as adjoints to this. More symbolically:
Forget σ :: (k,σ) -> k Free σ :: k -> (k,σ) Cofree σ :: k -> (k,σ) Free σ ⊣ Forget σ ⊣ Cofree σ
However, what would be nicer (for some purposes) than having to look for these is being able to construct them all systematically, without having to think much about the structure σ
.
Category theory gives a hint at this, too, in the form of Kan extensions. In category terms they look like:
p : C -> C' f : C -> D Ran p f : C' -> D Lan p f : C' -> D Ran p f c' = end (c : C). Hom_C'(c', p c) ⇒ f c Lan p f c' = coend (c : c). Hom_C'(p c, c') ⊗ f c
where ⇒
is a "power" and ⊗
is a copower, which are like being able to take exponentials and products by sets (or whatever the objects of the hom category are), instead of other objects within the category. Ends and coends are like universal and existential quantifiers (as are limits and colimits, but ends and coends involve mixed-variance).
Some handy theorems relate Kan extensions and adjoint functors:
if L ⊣ R then L = Ran R Id and R = Lan L Id if Ran R Id exists and is absolute then Ran R Id ⊣ R if Lan L Id exists and is absolute then L ⊣ Lan L Id Kan P F is absolute iff forall G. (G . Kan P F) ~= Kan P (G . F)
It turns out we can write down Kan extensions fairly generally in Haskell. Our restricted case is:
p = Forget σ :: (k,σ) -> k f = Id :: (k,σ) -> (k,σ) Free σ = Ran (Forget σ) Id :: k -> (k,σ) Cofree σ = Lan (Forget σ) Id :: k -> (k,σ) g :: (k,σ) -> j g . Free σ = Ran (Forget σ) g g . Cofree σ = Lan (Forget σ) g
As long as the final category is like one of our type constructor categories, ends are universal quantifiers, powers are function types, coends are existential quantifiers and copowers are product spaces. This only breaks down for our purposes when g
is contravariant, in which case they are flipped. For higher kinds, these constructions occur point-wise. So, we can break things down into four general cases, each with cases for each arity:
newtype Ran0 σ p (f :: k -> *) a = Ran0 { ran0 :: forall r. σ r => (a ~> p r) -> f r } newtype Ran1 σ p (f :: k -> j -> *) a b = Ran1 { ran1 :: forall r. σ r => (a ~> p r) -> f r b } -- ... data RanOp0 σ p (f :: k -> *) a = forall e. σ e => RanOp0 (a ~> p e) (f e) -- ... data Lan0 σ p (f :: k -> *) a = forall e. σ e => Lan0 (p e ~> a) (f e) data Lan1 σ p (f :: k -> j -> *) a b = forall e. σ e => Lan1 (p e ~> a) (f e b) -- ... data LanOp0 σ p (f :: k -> *) a = LanOp0 { lan0 :: forall r. σ r => (p r -> a) -> f r } -- ...
The more specific proposed (co)free definitions are:
type family Free :: (k -> Constraint) -> k -> k type family Cofree :: (k -> Constraint) -> k -> k newtype Free0 σ a = Free0 { gratis0 :: forall r. σ r => (a ~> r) -> r } type instance Free = Free0 newtype Free1 σ f a = Free1 { gratis1 :: forall g. σ g => (f ~> g) -> g a } type instance Free = Free1 -- ... data Cofree0 σ a = forall e. σ e => Cofree0 (e ~> a) e type instance Cofree = Cofree0 data Cofree1 σ f a = forall g. σ g => Cofree1 (g ~> f) (g a) type instance Cofree = Cofree1 -- ...
We can define some handly classes and instances for working with these types, several of which generalize existing Haskell concepts:
class Covariant (f :: i -> j) where comap :: (a ~> b) -> (f a ~> f b) class Contravariant f where contramap :: (b ~> a) -> (f a ~> f b) class Covariant m => Monad (m :: i -> i) where pure :: a ~> m a join :: m (m a) ~> m a class Covariant w => Comonad (w :: i -> i) where extract :: w a ~> a split :: w a ~> w (w a) class Couniversal σ f | f -> σ where couniversal :: σ r => (a ~> r) -> (f a ~> r) class Universal σ f | f -> σ where universal :: σ e => (e ~> a) -> (e ~> f a) instance Covariant (Free0 σ) where comap f (Free0 e) = Free0 (e . (.f)) instance Monad (Free0 σ) where pure x = Free0 $ \k -> k x join (Free0 e) = Free0 $ \k -> e $ \(Free0 e) -> e k instance Couniversal σ (Free0 σ) where couniversal h (Free0 e) = e h -- ...
The only unfamiliar classes here should be (Co)Universal
. They are for witnessing the adjunctions that make Free σ
the initial σ
and Cofree σ
the final σ
in the relevant way. Only one direction is given, since the opposite is very easy to construct with the (co)monad structure.
Free σ
is a monad and couniversal, Cofree σ
is a comonad and universal.
We can now try to convince ourselves that Free σ
and Cofree σ
are absolute Here are some examples:
free0Absolute0 :: forall g σ a. (Covariant g, σ (Free σ a)) => g (Free0 σ a) < -> Ran σ Forget g a free0Absolute0 = (l, r) where l :: g (Free σ a) -> Ran σ Forget g a l g = Ran0 $ \k -> comap (couniversal $ remember0 . k) g r :: Ran σ Forget g a -> g (Free σ a) r (Ran0 e) = e $ Forget0 . pure free0Absolute1 :: forall (g :: * -> * -> *) σ a x. (Covariant g, σ (Free σ a)) => g (Free0 σ a) x < -> Ran σ Forget g a x free0Absolute1 = (l, r) where l :: g (Free σ a) x -> Ran σ Forget g a x l g = Ran1 $ \k -> comap (couniversal $ remember0 . k) $$ g r :: Ran σ Forget g a x -> g (Free σ a) x r (Ran1 e) = e $ Forget0 . pure free0Absolute0Op :: forall g σ a. (Contravariant g, σ (Free σ a)) => g (Free0 σ a) < -> RanOp σ Forget g a free0Absolute0Op = (l, r) where l :: g (Free σ a) -> RanOp σ Forget g a l = RanOp0 $ Forget0 . pure r :: RanOp σ Forget g a -> g (Free σ a) r (RanOp0 h g) = contramap (couniversal $ remember0 . h) g -- ...
As can be seen, the definitions share a lot of structure. I'm quite confident that with the right building blocks these could be defined once for each of the four types of Kan extensions, with types like:
freeAbsolute :: forall g σ a. (Covariant g, σ (Free σ a)) => g (Free σ a) < ~> Ran σ Forget g a cofreeAbsolute :: forall g σ a. (Covariant g, σ (Cofree σ a)) => g (Cofree σ a) < ~> Lan σ Forget g a freeAbsoluteOp :: forall g σ a. (Contravariant g, σ (Free σ a)) => g (Free σ a) < ~> RanOp σ Forget g a cofreeAbsoluteOp :: forall g σ a. (Contravariant g, σ (Cofree σ a)) => g (Cofree σ a) < ~> LanOp σ Forget g a
However, it seems quite difficult to structure things in a way such that GHC will accept the definitions. I've successfully written freeAbsolute
using some axioms, but turning those axioms into class definitions and the like seems impossible.
Anyhow, the punchline is that we can prove absoluteness using only the premise that there is a valid σ
instance for Free σ
and Cofree σ
. This tends to be quite easy; we just borrow the structure of the type we are quantifying over. This means that in all these cases, we are justified in saying that Free σ ⊣ Forget σ ⊣ Cofree σ
, and we have a very generic presentations of (co)free structures in Haskell. So let's look at some.
We've already seen Free Monoid
, and last time we talked about Free Applicative
, and its relation to traversals. But, Applicative
is to traversal as Functor
is to lens, so it may be interesting to consider constructions on that. Both Free Functor
and Cofree Functor
make Functor
s:
instance Functor (Free1 Functor f) where fmap f (Free1 e) = Free1 $ fmap f . e instance Functor (Cofree1 Functor f) where fmap f (Cofree1 h e) = Cofree1 h (fmap f e)
And of course, they are (co)monads, covariant functors and (co)universal among Functor
s. But, it happens that I know some other types with these properties:
data CoYo f a = forall e. CoYo (e -> a) (f e) instance Covariant CoYo where comap f = Transform $ \(CoYo h e) -> CoYo h (f $$ e) instance Monad CoYo where pure = Transform $ CoYo id join = Transform $ \(CoYo h (CoYo h' e)) -> CoYo (h . h') e instance Functor (CoYo f) where fmap f (CoYo h e) = CoYo (f . h) e instance Couniversal Functor CoYo where couniversal tr = Transform $ \(CoYo h e) -> fmap h (tr $$ e) newtype Yo f a = Yo { oy :: forall r. (a -> r) -> f r } instance Covariant Yo where comap f = Transform $ \(Yo e) -> Yo $ (f $$) . e instance Comonad Yo where extract = Transform $ \(Yo e) -> e id split = Transform $ \(Yo e) -> Yo $ \k -> Yo $ \k' -> e $ k' . k instance Functor (Yo f) where fmap f (Yo e) = Yo $ \k -> e (k . f) instance Universal Functor Yo where universal tr = Transform $ \e -> Yo $ \k -> tr $$ fmap k e
These are the types involved in the (co-)Yoneda lemma. CoYo
is a monad, couniversal among functors, and CoYo f
is a Functor
. Yo
is a comonad, universal among functors, and is always a Functor
. So, are these equivalent types?
coyoIso :: CoYo < ~> Free Functor coyoIso = (Transform $ couniversal pure, Transform $ couniversal pure) yoIso :: Yo < ~> Cofree Functor yoIso = (Transform $ universal extract, Transform $ universal extract)
Indeed they are. And similar identities hold for the contravariant versions of these constructions.
I don't have much of a use for this last example. I suppose to be perfectly precise, I should point out that these uses of (Co)Yo
are not actually part of the (co-)Yoneda lemma. They are two different constructions. The (co-)Yoneda lemma can be given in terms of Kan extensions as:
yoneda :: Ran Id f < ~> f coyoneda :: Lan Id f < ~> f
But, the use of (Co)Yo
to make Functor
s out of things that aren't necessarily is properly thought of in other terms. In short, we have some kind of category of Haskell types with only identity arrows---it is discrete. Then any type constructor, even non-functorial ones, is certainly a functor from said category (call it Haskrete) into the normal one (Hask). And there is an inclusion functor from Haskrete into Hask:
F Haskrete -----> Hask | /| | / | / Incl | / | / Ran/Lan Incl F | / | / v / Hask
So, (Co)Free Functor
can also be thought of in terms of these Kan extensions involving the discrete category.
To see more fleshed out, loadable versions of the code in this post, see this file. I may also try a similar Agda development at a later date, as it may admit the more general absoluteness constructions easier.
[0]: The reason for restricting ourselves to kinds involving only *
and (->)
is that they work much more simply than data kinds. Haskell values can't depend on type-level entities without using type classes. For *, this is natural, but for something like Bool -> *
, it is more natural for transformations to be able to inspect the booleans, and so should be something more like forall b. InspectBool b => f b -> g b
.
[1]: First-class types are what you get by removing type families and synonyms from consideration. The reason for doing so is that these can't be used properly as parameters and the like, except in cases where they reduce to some other type that is first-class. For example, if we define:
type I a = a
even though GHC will report I :: * -> *
, it is not legal to write Transform I I
.
This time I'd like to talk about some other examples of this, and point out how doing so can (perhaps) resolve some disagreements that people have about the specific cases.
The first example is not one that I came up with: induction. It's sometimes said that Haskell does not have inductive types at all, or that we cannot reason about functions on its data types by induction. However, I think this is (techincally) inaccurate. What's true is that we cannot simply pretend that that our types are sets and use the induction principles for sets to reason about Haskell programs. Instead, one has to figure out what inductive domains would be, and what their proof principles are.
Fortunately, there are some papers about doing this. The most recent (that I'm aware of) is Generic Fibrational Induction. I won't get too into the details, but it shows how one can talk about induction in a general setting, where one has a category that roughly corresponds to the type theory/programming language, and a second category of proofs that is 'indexed' by the first category's objects. Importantly, it is not required that the second category is somehow 'part of' the type theory being reasoned about, as is often the case with dependent types, although that is also a special case of their construction.
One of the results of the paper is that this framework can be used to talk about induction principles for types that don't make sense as sets. Specifically:
newtype Hyp = Hyp ((Hyp -> Int) -> Int)
the type of "hyperfunctions". Instead of interpreting this type as a set, where it would effectively require a set that is isomorphic to the power set of its power set, they interpret it in the category of domains and strict functions mentioned earlier. They then construct the proof category in a similar way as one would for sets, except instead of talking about predicates as subsets, we talk about sub-domains instead. Once this is done, their framework gives a notion of induction for this type.
This example is suitable for ML (and suchlike), due to the strict functions, and sort of breaks the idea that we can really get away with only thinking about sets, even there. Sets are good enough for some simple examples (like flat domains where we don't care about ⊥), but in general we have to generalize induction itself to apply to all types in the 'good' language.
While I haven't worked out how the generic induction would work out for Haskell, I have little doubt that it would, because ML actually contains all of Haskell's data types (and vice versa). So the fact that the framework gives meaning to induction for ML implies that it does so for Haskell. If one wants to know what induction for Haskell's 'lazy naturals' looks like, they can study the ML analogue of:
data LNat = Zero | Succ (() -> LNat)
because function spaces lift their codomain, and make things 'lazy'.
----
The other example I'd like to talk about hearkens back to the previous article. I explained how foldMap
is the proper fundamental method of the Foldable
class, because it can be massaged to look like:
foldMap :: Foldable f => f a -> FreeMonoid a
and lists are not the free monoid, because they do not work properly for various infinite cases.
I also mentioned that foldMap
looks a lot like traverse
:
foldMap :: (Foldable t , Monoid m) => (a -> m) -> t a -> m traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
And of course, we have Monoid m => Applicative (Const m)
, and the functions are expected to agree in this way when applicable.
Now, people like to get in arguments about whether traversals are allowed to be infinite. I know Ed Kmett likes to argue that they can be, because he has lots of examples. But, not everyone agrees, and especially people who have papers proving things about traversals tend to side with the finite-only side. I've heard this includes one of the inventors of Traversable
, Conor McBride.
In my opinion, the above disagreement is just another example of a situation where we have a generic notion instantiated in two different ways, and intuition about one does not quite transfer to the other. If you are working in a language like Agda or Coq (for proving), you will be thinking about traversals in the context of sets and total functions. And there, traversals are finite. But in Haskell, there are infinitary cases to consider, and they should work out all right when thinking about domains instead of sets. But I should probably put forward some argument for this position (and even if I don't need to, it leads somewhere else interesting).
One example that people like to give about finitary traversals is that they can be done via lists. Given a finite traversal, we can traverse to get the elements (using Const [a]
), traverse the list, then put them back where we got them by traversing again (using State [a]
). Usually when you see this, though, there's some subtle cheating in relying on the list to be exactly the right length for the second traversal. It will be, because we got it from a traversal of the same structure, but I would expect that proving the function is actually total to be a lot of work. Thus, I'll use this as an excuse to do my own cheating later.
Now, the above uses lists, but why are we using lists when we're in Haskell? We know they're deficient in certain ways. It turns out that we can give a lot of the same relevant structure to the better free monoid type:
newtype FM a = FM (forall m. Monoid m => (a -> m) -> m) deriving (Functor) instance Applicative FM where pure x = FM ($ x) FM ef < *> FM ex = FM $ \k -> ef $ \f -> ex $ \x -> k (f x) instance Monoid (FM a) where mempty = FM $ \_ -> mempty mappend (FM l) (FM r) = FM $ \k -> l k <> r k instance Foldable FM where foldMap f (FM e) = e f newtype Ap f b = Ap { unAp :: f b } instance (Applicative f, Monoid b) => Monoid (Ap f b) where mempty = Ap $ pure mempty mappend (Ap l) (Ap r) = Ap $ (<>) < $> l < *> r instance Traversable FM where traverse f (FM e) = unAp . e $ Ap . fmap pure . f
So, free monoids are Monoids
(of course), Foldable
, and even Traversable
. At least, we can define something with the right type that wouldn't bother anyone if it were written in a total language with the right features, but in Haskell it happens to allow various infinite things that people don't like.
Now it's time to cheat. First, let's define a function that can take any Traversable
to our free monoid:
toFreeMonoid :: Traversable t => t a -> FM a toFreeMonoid f = FM $ \k -> getConst $ traverse (Const . k) f
Now let's define a Monoid
that's not a monoid:
data Cheat a = Empty | Single a | Append (Cheat a) (Cheat a) instance Monoid (Cheat a) where mempty = Empty mappend = Append
You may recognize this as the data version of the free monoid from the previous article, where we get the real free monoid by taking a quotient. using this, we can define an Applicative
that's not valid:
newtype Cheating b a = Cheating { prosper :: Cheat b -> a } deriving (Functor) instance Applicative (Cheating b) where pure x = Cheating $ \_ -> x Cheating f < *> Cheating x = Cheating $ \c -> case c of Append l r -> f l (x r)
Given these building blocks, we can define a function to relabel a traversable using a free monoid:
relabel :: Traversable t => t a -> FM b -> t b relabel t (FM m) = propser (traverse (const hope) t) (m Single) where hope = Cheating $ \c -> case c of Single x -> x
And we can implement any traversal by taking a trip through the free monoid:
slowTraverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b) slowTraverse f t = fmap (relabel t) . traverse f . toFreeMonoid $ t
And since we got our free monoid via traversing, all the partiality I hid in the above won't blow up in practice, rather like the case with lists and finite traversals.
Arguably, this is worse cheating. It relies on the exact association structure to work out, rather than just number of elements. The reason is that for infinitary cases, you cannot flatten things out, and there's really no way to detect when you have something infinitary. The finitary traversals have the luxury of being able to reassociate everything to a canonical form, while the infinite cases force us to not do any reassociating at all. So this might be somewhat unsatisfying.
But, what if we didn't have to cheat at all? We can get the free monoid by tweaking foldMap
, and it looks like traverse
, so what happens if we do the same manipulation to the latter?
It turns out that lens has a type for this purpose, a slight specialization of which is:
newtype Bazaar a b t = Bazaar { runBazaar :: forall f. Applicative f => (a -> f b) -> f t }
Using this type, we can reorder traverse
to get:
howBizarre :: Traversable t => t a -> Bazaar a b (t b) howBizarre t = Bazaar $ \k -> traverse k t
But now, what do we do with this? And what even is it? [1]
If we continue drawing on intuition from Foldable
, we know that foldMap
is related to the free monoid. Traversable
has more indexing, and instead of Monoid
uses Applicative
. But the latter are actually related to the former; Applicative
s are monoidal (closed) functors. And it turns out, Bazaar
has to do with free Applicative
s.
If we want to construct free Applicative
s, we can use our universal property encoding trick:
newtype Free p f a = Free { gratis :: forall g. p g => (forall x. f x -> g x) -> g a }
This is a higher-order version of the free p
, where we parameterize over the constraint we want to use to represent structures. So Free Applicative f
is the free Applicative
over a type constructor f
. I'll leave the instances as an exercise.
Since free monoid is a monad, we'd expect Free p
to be a monad, too. In this case, it is a McBride style indexed monad, as seen in The Kleisli Arrows of Outrageous Fortune.
type f ~> g = forall x. f x -> g x embed :: f ~> Free p f embed fx = Free $ \k -> k fx translate :: (f ~> g) -> Free p f ~> Free p g translate tr (Free e) = Free $ \k -> e (k . tr) collapse :: Free p (Free p f) ~> Free p f collapse (Free e) = Free $ \k -> e $ \(Free e') -> e' k
That paper explains how these are related to Atkey style indexed monads:
data At key i j where At :: key -> At key i i type Atkey m i j a = m (At a j) i ireturn :: IMonad m => a -> Atkey m i i a ireturn = ... ibind :: IMonad m => Atkey m i j a -> (a -> Atkey m j k b) -> Atkey m i k b ibind = ...
It turns out, Bazaar
is exactly the Atkey indexed monad derived from the Free Applicative
indexed monad (with some arguments shuffled) [2]:
hence :: Bazaar a b t -> Atkey (Free Applicative) t b a hence bz = Free $ \tr -> runBazaar bz $ tr . At forth :: Atkey (Free Applicative) t b a -> Bazaar a b t forth fa = Bazaar $ \g -> gratis fa $ \(At a) -> g a imap :: (a -> b) -> Bazaar a i j -> Bazaar b i j imap f (Bazaar e) = Bazaar $ \k -> e (k . f) ipure :: a -> Bazaar a i i ipure x = Bazaar ($ x) (>>>=) :: Bazaar a j i -> (a -> Bazaar b k j) -> Bazaar b k i Bazaar e >>>= f = Bazaar $ \k -> e $ \x -> runBazaar (f x) k (>==>) :: (s -> Bazaar i o t) -> (i -> Bazaar a b o) -> s -> Bazaar a b t (f >==> g) x = f x >>>= g
As an aside, Bazaar
is also an (Atkey) indexed comonad, and the one that characterizes traversals, similar to how indexed store characterizes lenses. A Lens s t a b
is equivalent to a coalgebra s -> Store a b t
. A traversal is a similar Bazaar
coalgebra:
s -> Bazaar a b t ~ s -> forall f. Applicative f => (a -> f b) -> f t ~ forall f. Applicative f => (a -> f b) -> s -> f t
It so happens that Kleisli composition of the Atkey indexed monad above (>==>)
is traversal composition.
Anyhow, Bazaar
also inherits Applicative
structure from Free Applicative
:
instance Functor (Bazaar a b) where fmap f (Bazaar e) = Bazaar $ \k -> fmap f (e k) instance Applicative (Bazaar a b) where pure x = Bazaar $ \_ -> pure x Bazaar ef < *> Bazaar ex = Bazaar $ \k -> ef k < *> ex k
This is actually analogous to the Monoid
instance for the free monoid; we just delegate to the underlying structure.
The more exciting thing is that we can fold and traverse over the first argument of Bazaar
, just like we can with the free monoid:
bfoldMap :: Monoid m => (a -> m) -> Bazaar a b t -> m bfoldMap f (Bazaar e) = getConst $ e (Const . f) newtype Comp g f a = Comp { getComp :: g (f a) } deriving (Functor) instance (Applicative f, Applicative g) => Applicative (Comp g f) where pure = Comp . pure . pure Comp f < *> Comp x = Comp $ liftA2 (< *>) f x btraverse :: (Applicative f) => (a -> f a') -> Bazaar a b t -> Bazaar a' b t btraverse f (Bazaar e) = getComp $ e (c . fmap ipure . f)
This is again analogous to the free monoid code. Comp
is the analogue of Ap
, and we use ipure
in traverse
. I mentioned that Bazaar
is a comonad:
extract :: Bazaar b b t -> t extract (Bazaar e) = runIdentity $ e Identity
And now we are finally prepared to not cheat:
honestTraverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b) honestTraverse f = fmap extract . btraverse f . howBizarre
So, we can traverse by first turning out Traversable
into some structure that's kind of like the free monoid, except having to do with Applicative
, traverse that, and then pull a result back out. Bazaar
retains the information that we're eventually building back the same type of structure, so we don't need any cheating.
To pull this back around to domains, there's nothing about this code to object to if done in a total language. But, if we think about our free Applicative
-ish structure, in Haskell, it will naturally allow infinitary expressions composed of the Applicative
operations, just like the free monoid will allow infinitary monoid expressions. And this is okay, because some Applicative
s can make sense of those, so throwing them away would make the type not free, in the same way that even finite lists are not the free monoid in Haskell. And this, I think, is compelling enough to say that infinite traversals are right for Haskell, just as they are wrong for Agda.
For those who wish to see executable code for all this, I've put a files here and here. The latter also contains some extra goodies at the end that I may talk about in further installments.
[1] Truth be told, I'm not exactly sure.
[2] It turns out, you can generalize Bazaar
to have a correspondence for every choice of p
newtype Bizarre p a b t = Bizarre { bizarre :: forall f. p f => (a -> f b) -> f t }
hence
and forth
above go through with the more general types. This can be seen here.
Foldable
is effectively the toList
class. However, this turns out to be wrong. The real fundamental member of Foldable
is foldMap
(which should look suspiciously like traverse
, incidentally). To understand exactly why this is, it helps to understand another surprising fact: lists are not free monoids in Haskell.
This latter fact can be seen relatively easily by considering another list-like type:
data SL a = Empty | SL a :> a instance Monoid (SL a) where mempty = Empty mappend ys Empty = ys mappend ys (xs :> x) = (mappend ys xs) :> x single :: a -> SL a single x = Empty :> x
So, we have a type SL a
of snoc lists, which are a monoid, and a function that embeds a
into SL a
. If (ordinary) lists were the free monoid, there would be a unique monoid homomorphism from lists to snoc lists. Such a homomorphism (call it h
) would have the following properties:
h [] = Empty h (xs <> ys) = h xs <> h ys h [x] = single x
And in fact, this (together with some general facts about Haskell functions) should be enough to define h
for our purposes (or any purposes, really). So, let's consider its behavior on two values:
h [1] = single 1 h [1,1..] = h ([1] <> [1,1..]) -- [1,1..] is an infinite list of 1s = h [1] <> h [1,1..]
This second equation can tell us what the value of h
is at this infinite value, since we can consider it the definition of a possibly infinite value:
x = h [1] <> x = fix (single 1 <>) h [1,1..] = x
(single 1 <>)
is a strict function, so the fixed point theorem tells us that x = ⊥
.
This is a problem, though. Considering some additional equations:
[1,1..] <> [n] = [1,1..] -- true for all n h [1,1..] = ⊥ h ([1,1..] <> [1]) = h [1,1..] <> h [1] = ⊥ <> single 1 = ⊥ :> 1 ≠ ⊥
So, our requirements for h
are contradictory, and no such homomorphism can exist.
The issue is that Haskell types are domains. They contain these extra partially defined values and infinite values. The monoid structure on (cons) lists has infinite lists absorbing all right-hand sides, while the snoc lists are just the opposite.
This also means that finite lists (or any method of implementing finite sequences) are not free monoids in Haskell. They, as domains, still contain the additional bottom element, and it absorbs all other elements, which is incorrect behavior for the free monoid:
pure x <> ⊥ = ⊥ h ⊥ = ⊥ h (pure x <> ⊥) = [x] <> h ⊥ = [x] ++ ⊥ = x:⊥ ≠ ⊥
So, what is the free monoid? In a sense, it can't be written down at all in Haskell, because we cannot enforce value-level equations, and because we don't have quotients. But, if conventions are good enough, there is a way. First, suppose we have a free monoid type FM a
. Then for any other monoid m
and embedding a -> m
, there must be a monoid homomorphism from FM a
to m
. We can model this as a Haskell type:
forall a m. Monoid m => (a -> m) -> FM a -> m
Where we consider the Monoid m
constraint to be enforcing that m
actually has valid monoid structure. Now, a trick is to recognize that this sort of universal property can be used to define types in Haskell (or, GHC at least), due to polymorphic types being first class; we just rearrange the arguments and quantifiers, and take FM a
to be the polymorphic type:
newtype FM a = FM { unFM :: forall m. Monoid m => (a -> m) -> m }
Types defined like this are automatically universal in the right sense. [1] The only thing we have to check is that FM a
is actually a monoid over a
. But that turns out to be easily witnessed:
embed :: a -> FM a embed x = FM $ \k -> k x instance Monoid (FM a) where mempty = FM $ \_ -> mempty mappend (FM e1) (FM e2) = FM $ \k -> e1 k <> e2 k
Demonstrating that the above is a proper monoid delegates to instances of Monoid
being proper monoids. So as long as we trust that convention, we have a free monoid.
However, one might wonder what a free monoid would look like as something closer to a traditional data type. To construct that, first ignore the required equations, and consider only the generators; we get:
data FMG a = None | Single a | FMG a :<> FMG a
Now, the proper FM a
is the quotient of this by the equations:
None :<> x = x = x :<> None x :<> (y :<> z) = (x :<> y) :<> z
One way of mimicking this in Haskell is to hide the implementation in a module, and only allow elimination into Monoid
s (again, using the convention that Monoid
ensures actual monoid structure) using the function:
unFMG :: forall a m. Monoid m => FMG a -> (a -> m) -> m unFMG None _ = mempty unFMG (Single x) k = k x unFMG (x :<> y) k = unFMG x k <> unFMG y k
This is actually how quotients can be thought of in richer languages; the quotient does not eliminate any of the generated structure internally, it just restricts the way in which the values can be consumed. Those richer languages just allow us to prove equations, and enforce properties by proof obligations, rather than conventions and structure hiding. Also, one should note that the above should look pretty similar to our encoding of FM a
using universal quantification earlier.
Now, one might look at the above and have some objections. For one, we'd normally think that the quotient of the above type is just [a]
. Second, it seems like the type is revealing something about the associativity of the operations, because defining recursive values via left nesting is different from right nesting, and this difference is observable by extracting into different monoids. But aren't monoids supposed to remove associativity as a concern? For instance:
ones1 = embed 1 <> ones1 ones2 = ones2 <> embed 1
Shouldn't we be able to prove these are the same, becuase of an argument like:
ones1 = embed 1 <> (embed 1 <> ...) ... reassociate ... = (... <> embed 1) <> embed 1 = ones2
The answer is that the equation we have only specifies the behavior of associating three values:
x <> (y <> z) = (x <> y) <> z
And while this is sufficient to nail down the behavior of finite values, and finitary reassociating, it does not tell us that infinitary reassociating yields the same value back. And the "... reassociate ..." step in the argument above was decidedly infinitary. And while the rules tell us that we can peel any finite number of copies of embed 1
to the front of ones1
or the end of ones2
, it does not tell us that ones1 = ones2
. And in fact it is vital for FM a
to have distinct values for these two things; it is what makes it the free monoid when we're dealing with domains of lazy values.
Finally, we can come back to Foldable
. If we look at foldMap
:
foldMap :: (Foldable f, Monoid m) => (a -> m) -> f a -> m
we can rearrange things a bit, and get the type:
Foldable f => f a -> (forall m. Monoid m => (a -> m) -> m)
And thus, the most fundamental operation of Foldable
is not toList
, but toFreeMonoid
, and lists are not free monoids in Haskell.
[1]: What we are doing here is noting that (co)limits are objects that internalize natural transformations, but the natural transformations expressible by quantification in GHC are already automatically internalized using quantifiers. However, one has to be careful that the quantifiers are actually enforcing the relevant naturality conditions. In many simple cases they are.
]]>About 6 months ago I had an opportunity to play with this approach in earnest, and realized we can speed it up a great deal. This has kept coming up in conversation ever since, so I've decided to write up an article here.
In my bound library I exploit the fact that monads are about substitution to make a monad transformer that manages substitution for me.
Here I'm going to take a more coupled approach.
To have a type system with enough complexity to be worth examining, I'll adapt Dan Doel's UPTS, which is a pure type system with universe polymorphism. I won't finish the implementation here, but from where we get it should be obvious how to finish the job.
Unlike Axelsson and Claessen I'm not going to bother to abstract over my name representation.
To avoid losing the original name from the source, we'll just track names as strings with an integer counting the number of times it has been 'primed'. The name is purely for expository purposes, the real variable identifier is the number. We'll follow the Axelsson and Claessen convention of having the identifier assigned to each binder be larger than any one bound inside of it. If you don't need he original source names you can cull them from the representation, but they can be useful if you are representing a syntax tree for something you parsed and/or that you plan to pretty print later.
data Name = Name String Int deriving (Show,Read) hint :: Name -> String hint (Name n _) = n nameId :: Name -> Int nameId (Name _ i) = i instance Eq Name where (==) = (==) `on` nameId instance Ord Name where compare = compare `on` nameId prime :: String -> Int -> Name prime n i = Name n (i + 1)
So what is the language I want to work with?
type Level = Int data Constant = Level | LevelLiteral {-# UNPACK #-} !Level | Omega deriving (Eq,Ord,Show,Read,Typeable) data Term a = Free a | Bound {-# UNPACK #-} !Name | Constant !Constant | Term a :+ {-# UNPACK #-} !Level | Max [Term a] | Type !(Term a) | Lam {-# UNPACK #-} !Name !(Term a) !(Term a) | Pi {-# UNPACK #-} !Name !(Term a) !(Term a) | Sigma {-# UNPACK #-} !Name !(Term a) !(Term a) | App !(Term a) !(Term a) | Fst !(Term a) | Snd !(Term a) | Pair !(Term a) !(Term a) !(Term a) deriving (Show,Read,Eq,Ord,Functor,Foldable,Traversable,Typeable)
That is perhaps a bit paranoid about remaining strict, but it seemed like a good idea at the time.
We can define capture avoiding substitution on terms:
subst :: Eq a => a -> Term a -> Term a -> Term a subst a x y = y >>= \a' -> if a == a' then x else return a'
Now we finally need to implement Axelsson and Claessen's circular programming trick. Here we'll abstract over terms that allow us to find the highest bound value within them:
class Bindable t where bound :: t -> Int
and instantiate it for our Term
type
instance Bindable (Term a) where bound Free{} = 0 bound Bound{} = 0 -- intentional! bound Constant{} = 0 bound (a :+ _) = bound a bound (Max xs) = foldr (\a r -> bound a `max` r) 0 xs bound (Type t) = bound t bound (Lam b t _) = nameId b `max` bound t bound (Pi b t _) = nameId b `max` bound t bound (Sigma b t _) = nameId b `max` bound t bound (App x y) = bound x `max` bound y bound (Fst t) = bound t bound (Snd t) = bound t bound (Pair t x y) = bound t `max` bound x `max` bound y
As in the original pearl we avoid traversing into the body of the binders, hence the _'s in the code above.
Now we can abstract over the pattern used to create a binder in the functional pearl, since we have multiple binder types in this syntax tree, and the code would get repetitive.
binder :: Bindable t => (Name -> t) -> (Name -> t -> r) -> String -> (t -> t) -> r binder bd c n e = c b body where body = e (bd b) b = prime n (bound body) lam, pi, sigma :: String -> Term a -> (Term a -> Term a) -> Term a lam s t = binder Bound (`Lam` t) s pi s t = binder Bound (`Pi` t) s sigma s t = binder Bound (`Sigma` t) s
We may not always want to give names to the variables we capture, so let's define:
lam_, pi_, sigma_ :: Term a -> (Term a -> Term a) -> Term a lam_ = lam "_" pi_ = pi "_" sigma_ = sigma "_"
Now, here's the interesting part. The problem with Axelsson and Claessen's original trick is that every substitution is being handled separately. This means that if you were to write a monad for doing substitution with it, it'd actually be quite slow. You have to walk the syntax tree over and over and over.
We can fuse these together by making a single pass:
instantiate :: Name -> t -> IntMap t -> IntMap t instantiate = IntMap.insert . nameId rebind :: IntMap (Term b) -> Term a -> (a -> Term b) -> Term b rebind env xs0 f = go xs0 where go = \case Free a -> f a Bound b -> env IntMap.! nameId b Constant c -> Constant c m :+ n -> go m :+ n Type t -> Type (go t) Max xs -> Max (fmap go xs) Lam b t e -> lam (hint b) (go t) $ \v -> rebind (instantiate b v env) e f Pi b t e -> pi (hint b) (go t) $ \v -> rebind (instantiate b v env) e f Sigma b t e -> sigma (hint b) (go t) $ \v -> rebind (instantiate b v env) e f App x y -> App (go x) (go y) Fst x -> Fst (go x) Snd x -> Snd (go x) Pair t x y -> Pair (go t) (go x) (go y)
Note that the Lam
, Pi
and Sigma
cases just extend the current environment.
With that now we can upgrade the pearl's encoding to allow for an actual Monad in the same sense as bound
.
instance Applicative Term where pure = Free (< *>) = ap instance Monad Term where return = Free (>>=) = rebind IntMap.empty
To show that we can work with this syntax tree representation, let's write an evaluator from it to weak head normal form:
First we'll need some helpers:
apply :: Term a -> [Term a] -> Term a apply = foldl App rwhnf :: IntMap (Term a) -> [Term a] -> Term a -> Term a rwhnf env stk (App f x) = rwhnf env (rebind env x Free:stk) f rwhnf env (x:stk) (Lam b _ e) = rwhnf (instantiate b x env) stk e rwhnf env stk (Fst e) = case rwhnf env [] e of Pair _ e' _ -> rwhnf env stk e' e' -> Fst e' rwhnf env stk (Snd e) = case rwhnf env [] e of Pair _ _ e' -> rwhnf env stk e' e' -> Snd e' rwhnf env stk e = apply (rebind env e Free) stk
Then we can start off the whnf
by calling our helper with an initial starting environment:
whnf :: Term a -> Term a whnf = rwhnf IntMap.empty []
So what have we given up? Well, bound
automatically lets you compare terms for alpha equivalence by quotienting out the placement of "F" terms in the syntax tree. Here we have a problem in that the identifiers we get assigned aren't necessarily canonical.
But we can get the same identifiers out by just using the monad above:
alphaEq :: Eq a => Term a -> Term a -> Bool alphaEq = (==) `on` liftM id
It makes me a bit uncomfortable that our monad is only up to alpha equivalence and that liftM
swaps out the identifiers used throughout the entire syntax tree, and we've also lost the ironclad protection against exotic terms.
But overall, this is a much faster version of Axelsson and Claessen's trick and it can be used as a drop-in replacement for something like bound
in many cases, and unlike bound, it lets you use HOAS-style syntax for constructing lam
, pi
and sigma
terms.
With pattern synonyms you can prevent the user from doing bad things as well. Once 7.10 ships you'd be able to use a bidirectional pattern synonym for Pi
, Sigma
and Lam
to hide the real constructors behind. I'm not yet sure of the "best practices" in this area.
Here's the code all in one place:
Happy Holidays,
-Edward
You’ve recently entered the world of strongly typed functional programming, and you’ve decided it is great. You’ve written a program or two or a library or two, and you’re getting the hang of it. You hop on IRC and hear new words and ideas every day. There are always new concepts to learn, new libraries to explore, new ways to refactor your code, new typeclasses to make instances of.
Now, you’re a social person, and you want to go forth and share all the great things you’ve learned. And you have learned enough to distinguish some true statements from some false statements, and you want to go and slay all the false statements in the world.
Is this really what you want to do? Do you want to help people, do you want to teach people new wonderful things? Do you want to share the things that excite you? Or do you want to feel better about yourself, confirm that you are programming better, confirm that you are smarter and know more, reassure yourself that your adherence to a niche language is ok by striking out against the mainstream? Of course, you want to do the former. But a part of you probably secretly wants to do the latter, because in my experience that part is in all of us. It is our ego, and it drives us to great things, but it also can hold us back, make us act like jerks, and, worst of all, stand in the way of communicating with others about what we truly care about.
Haskell wasn’t built on great ideas, although it has those. It was built on a culture of how ideas are treated. It was not built on slaying others’ dragons, but on finding our own way; not tearing down rotten ideas (no matter how rotten) but showing by example how we didn’t need those ideas after all.
In functional programming, our proofs are not by contradiction, but by construction. If you want to teach functional programming, or preach functional programming, or just to even have productive discussions as we all build libraries and projects together, it will serve you well to learn that ethic.
You know better than the next developer, or so you think. This is because of something you have learned. So how do you help them want to learn it too? You do not tell them this is a language for smart people. You do not tell them you are smart because you use this language. You tell them that types are for fallible people, like we all are. They help us reason and catch our mistakes, because while software has grown more complex, we’re still stuck with the same old brains. If they tell you they don’t need types to catch errors, tell them that they must be much smarter than you, because you sure do. But even more, tell them that all the brainpower they use to not need types could turn into even greater, bigger, and more creative ideas if they let the compiler help them.
This is not a language for clever people, although there are clever things that can be done in this language. It is a language for simple things and clever things alike, and sometimes we want to be simple, and sometimes we want to be clever. But we don’t give bonus points for being clever. Sometimes, it’s just fun, like solving a crossword puzzle or playing a tricky Bach prelude, or learning a tango. We want to keep simple things simple so that tricky things are possible.
It is not a language that is “more mathematical” or “for math” or “about math”. Yes, in a deep formal sense, programming is math. But when someone objects to this, this is not because they are a dumb person, a bad person, or a malicious person. They object because they have had a bad notion of math foisted on them. “Math” is the thing that people wield over them to tell them they are not good enough, that they cannot learn things, that they don’t have the mindset for it. That’s a dirty lie. Math is not calculation — that’s what computers are for. Nor is math just abstract symbols. Nor is math a prerequisite for Haskell. If anything, Haskell might be what makes somebody find math interesting at all. Our equation should not be that math is hard, and so programming is hard. Rather, it should be that programming can be fun, and this means that math can be fun too. Some may object that programming is not only math, because it is engineering as well, and creativity, and practical tradeoffs. But, surprisingly, these are also elements of the practice of math, if not the textbooks we are given.
I have known great Haskell programmers, and even great computer scientists who know only a little linear algebra maybe, or never bothered to pick up category theory. You don’t need that stuff to be a great Haskell programmer. It might be one way. The only thing you need category theory for is to take great categorical and mathematical concepts from the world and import them back to programming, and translate them along the way so that others don’t need to make the same journey you did. And you don’t even need to do that, if you have patience, because somebody else will come along and do it for you, eventually.
The most important thing, though not hardest, about teaching and spreading knowledge is to emphasize that this is for everyone. Nobody is too young, too inexperienced, too old, too set in their ways, too excitable, insufficiently mathematical, etc. Believe in everyone, attack nobody, even the trolliest.* Attacking somebody builds a culture of sniping and argumentativeness. It spreads to the second trolliest, and soforth, and then eventually to an innocent bystander who just says the wrong thing to spark bad memories of the last big argument.
The hardest thing, and the second most important, is to put aside your pride. If you want to teach people, you have to empathize with how they think, and also with how they feel. If your primary goal is to spread knowledge, then you must be relentlessly self-critical of anything you do or say that gets in the way of that. And you don’t get to judge that — others do. And you must just believe them. I told you this was hard. So if somebody finds you offputting, that’s your fault. If you say something and somebody is hurt or takes offense, it is not their fault for being upset, or feeling bad. This is not about what is abstractly hurtful in a cosmic sense; it is about the fact that you have failed, concretely, to communicate as you desired. So accept the criticism, apologize for giving offense (not just for having upset someone but also for what you did to hurt them), and attempt to learn why they feel how they feel, for next time.
Note that if you have made somebody feel crummy, they may not be in a mood to explain why or how, because their opinion of you has already plummeted. So don’t declare that they must or should explain themselves to you, although you may politely ask. Remember that knowledge does not stand above human behavior. Often, you don't need to know exactly why a person feels the way they do, only that they do, so you can respect that. If you find yourself demanding explanations, ask yourself, if you knew this thing, would that change your behavior? How? If not, then learn to let it go.
Remember also that they were put off by your actions, not by your existence. It is easy to miss this distinction and react defensively. "Fight-or-flight" stands in the way of clear thinking and your ability to empathize; try taking a breath and maybe a walk until the adrenaline isn't derailing your true intentions.
Will this leave you satisfied? That depends. If your goal is to understand everything and have everybody agree with regards to everything that is in some sense objectively true, it will not. If your goal is to have the widest, nicest, most diverse, and most fun Haskell community possible, and to interact in an atmosphere of mutual respect and consideration, then it is the only thing that will leave you satisfied.
If you make even the most modest (to your mind) mistake, be it in social interaction or technical detail, be quick to apologize and retract, and do so freely. What is there to lose? Only your pride. Who keeps track? Only you. What is there to gain? Integrity, and ultimately that integrity will feel far more fulfilling than the cheap passing thrills of cutting somebody else down or deflecting their concerns.
Sometimes it may be, for whatever reason, that somebody doesn’t want to talk to you, because at some point your conversation turned into an argument. Maybe they did it, maybe you did it, maybe you did it together. It doesn’t matter, learn to walk away. Learn from the experience how to communicate better, how to avoid that pattern, how to always be the more positive, more friendly, more forward-looking. Take satisfaction in the effort in that. Don’t talk about them behind their back, because that will only fuel your own bad impulses. Instead, think about how you can change.
Your self-esteem doesn’t need your help. You may feel you need to prove yourself, but you don't. Other people, in general, have better things to do with their time than judge you, even when you may sometimes feel otherwise. You know you’re talented, that you have learned things, and built things, and that this will be recognized in time. Nobody else wants to hear it from you, and the more they hear it, the less they will believe it, and the more it will distract from what you really want, which is not to feed your ego, not to be great, but to accomplish something great, or even just to find others to share something great with. In fact, if anyone's self-esteem should be cared for, it is that of the people you are talking to. The more confident they are in their capacity and their worth, the more willing they will be to learn new things, and to acknowledge that their knowledge, like all of ours, is limited and partial. You must believe in yourself to be willing to learn new things, and if you want to cultivate more learners, you must cultivate that self-belief in others.
Knowledge is not imposing. Knowledge is fun. Anyone, given time and inclination, can acquire it. Don’t only lecture, but continue to learn, because there is always much more than you know. (And if there wasn’t, wow, that would be depressing, because what would there be to learn next?) Learn to value all opinions, because they all come from experiences, and all those experiences have something to teach us. Dynamic typing advocates have brought us great leaps in JIT techniques. If you’re interested in certain numerical optimizations, you need to turn to work pioneered in C++ or Fortran. Like you, I would rather write in Haskell. But it is not just the tools that matter but the ideas, and you will find they come from everywhere.
In fact, we have so much to learn that we direct our learning by setting up barriers — declaring certain tools, fields, languages, or communities not worth our time. This isn’t because they have nothing to offer, but it is a crutch for us to shortcut evaluating too many options all at once. It is fine, and in fact necessary, to narrow the scope of your knowledge to increase its depth. But be glad that others are charting other paths! Who knows what they will bring back from those explorations.
If somebody is chatting about programming on the internet, they’re already ahead of the pack, already interested in craft and knowledge. You may not share their opinions, but you have things to learn from one another, always. Maybe the time and place aren’t right to share ideas and go over disputes. That’s ok. There will be another time and place, or maybe there won’t be. There is a big internet full of people, and you don’t need to be everybody’s friend or everybody’s mentor. You should just avoid being anybody’s enemy, because your time and theirs is too precious to waste it on hard feelings instead of learning new cool stuff.
This advice is not a one-time proposition. Every time we learn something new and want to share it, we face these issues all over again -- the desire to proclaim, to overturn received wisdom all at once -- and the worse the received wisdom, the more vehemently we want to strike out. But if we are generous listeners and attentive teachers, we not only teach better and spread more knowledge, but also learn more, and enjoy ourselves more in the process. To paraphrase Rilke’s “Letter to a Young Poet”: Knowledge is good if it has sprung from necessity. In this nature of its origin lies the judgement of it: there is no other.
Thanks to the various folks in and around the Haskell world who have helped me refine this article. I don't name you only because I don't want to imply your endorsement, or give what is still, at base, a very personal take, any particular sort of imprimatur of a broader group of people, all of whom I suspect will disagree among themselves and with me about various specifics.
*: It has been pointed out to me that this advice is not universal. Clearly there are some things that deserve more pointed responses. Bigotry, outright harassment and poisonous behavior, etc. So please read this paragraph only as it applies to talking about technical issues, not as regards to many other things, where there are people better equipped than me to give advice.
]]>
The annual CUFP workshop is a place where people can see how others are using functional programming to solve real world problems; where practitioners meet and collaborate; where language designers and users can share ideas about the future of their favorite language; and where one can learn practical techniques and approaches for putting functional programming to work.
If you have experience using functional languages in a practical setting, we invite you to submit a proposal to give a talk at the workshop. We're looking for two kinds of talks:
Experience reports are typically 25 minutes long, and aim to inform participants about how functional programming plays out in real-world applications, focusing especially on lessons learned and insights gained. Experience reports don't need to be highly technical; reflections on the commercial, management, or software engineering aspects are, if anything, more important.
Technical talks are also 25 minutes long, and should focus on teaching the audience something about a particular technique or methodology, from the point of view of someone who has seen it play out in practice. These talks could cover anything from techniques for building functional concurrent applications, to managing dynamic reconfigurations, to design recipes for using types effectively in large-scale applications. While these talks will often be based on a particular language, they should be accessible to a broad range of programmers.
We strongly encourage submissions from people in communities that are underrepresented in functional programming, including but not limited to women; people of color; people in gender, sexual and romantic minorities; people with disabilities; people residing in Asia, Africa, or Latin America; and people who have never presented at a conference before. We recognize that inclusion is an important part of our mission to promote functional programming. So that CUFP can be a safe environment in which participants openly exchange ideas, we abide by the SIGPLAN Conference Anti-Harassment Policy.
If you are interested in offering a talk, or nominating someone to do
so, please submit your presentation before 27 June 2014 via the
CUFP 2014 Presentation Submission Form
You do not need to submit a paper, just a short proposal for your talk! There will be a short scribe's report of the presentations and discussions but not of the details of individual talks, as the meeting is intended to be more a discussion forum than a technical interchange.
Nevertheless, presentations will be video taped and presenters will be expected to sign an ACM copyright release form.
Note that we will need all presenters to register for the CUFP workshop and travel to Gothenburg at their own expense.
For more information on CUFP, including videos of presentations from
previous years, take a look at the CUFP website at cufp.org. Note that presenters, like other attendees, will need to register for the event. Presentations will be video taped and presenters will be expected to sign an ACM copyright release form. Acceptance and rejection letters will be sent out by July 16th.
Focus on the interesting bits: Think about what will distinguish your talk, and what will engage the audience, and focus there. There are a number of places to look for those interesting bits.
Let's do some counting exercises. Product Identity Identity
holds exactly two things. It is therefore isomorphic to ((->) Bool)
, or if we prefer, ((->) Either () ())
. That is to say that a pair that holds two values of type a
is the same as a function that takes a two-valued type and yields a value of type a
. A product of more functors in turn is isomorphic to the reader of the sum of each of the datatypes that "represent" them. E.g. Product (Product Identity Identity) (Product (Const ()) Identity)
is iso to ((->) (Either (Either () ()) ())
, i.e. a data type with three possible inhabitants. In making this move we took Product to Either -- multiplication to sum. We can pull a similar trick with Compose. Compose (Product Identity Identity) (Product Identity Identity)
goes to ((->) (Either () (),Either () ())). So again we took Product to a sum type, but now we took Compose to a pair -- a product type! The intuition is that composition multiplies the possibilities of spaces in each nested functor.
Hmm.. products go to sums, composition goes to multiplication, etc. This should remind us of something -- these rules are exactly the rules for working with exponentials. x^n * x^m = x^(n + m). (x^n)^m = x^(n*m). x^0 = 1, x^1 = x.
Seen from the right standpoint, this isn't surprising at all, but almost inevitable. The functors we're describing are known as "representable," a term which derives from category theory. (See appendix on representable functors below).
In Haskell-land, a "representable functor" is just any functor isomorphic to the reader functor ((->) a)
for some appropriate a. Now if we think back to our algebraic representations of data types, we call the arrow type constructor an exponential. We can "count" a -> x
as x^a, since e.g. there are 3^2 distinct functions that inhabit the type 2 -> 3. The intuition for this is that for each input we pick one of the possible results, so as the number of inputs goes up by one, the number of functions goes up by multiplying through by the set of possible results. 1 -> 3 = 3, 2 -> 3 = 3 * 3, (n + 1) -> 3 = 3 * (n -> 3).
Hence, if we "represent" our functors by exponentials, then we can work with them directly as exponentials as well, with all the usual rules. Edward Kmett has a library encoding representable functors in Haskell.
Meanwhile, Peter Hancock prefers to call such functors "Naperian" after John Napier, inventor of the logarithm (See also here). Why Naperian? Because if our functors are isomorphic to exponentials, then we can take their logs! And that brings us back to the initial discussion of type mathematics. We have some functor F, and claim that it is isomorphic to -^R for some concrete data type R. Well, this means that R is the logarithm of F. E.g. (R -> a, S -> a) =~ Either R S -> a
, which is to say that if log F = R and log G =~ S, then log (F * G) = log F + log G. Similarly, for any other data type n, again with log F = R, we have n -> F a =~ n -> R -> a =~ (n * R) -> a
, which is to say that log (F^n) =~ n * log F.
This gives us one intuition for why the sum functor is not generally representable -- it is very difficult to decompose log (F + G) into some simpler compound expression of logs.
So what functors are Representable? Anything that can be seen as a fixed shape with some index. Pairs, fixed-size vectors, fixed-size matrices, any nesting of fixed vectors and matricies. But also infinite structures of regular shape! However, not things whose shape can vary -- not lists, not sums. Trees of fixed depth or infinite binary trees therefore, but not trees of arbitrary depth or with ragged structure, etc.
Representable functors turn out to be extremely powerful tools. Once we know a functor is representable, we know exactly what its applicative instance must be, and that its applicative instance will be "zippy" -- i.e. acting pointwise across the structure. We also know that it has a monad instance! And, unfortunately, that this monad instance is typically fairly useless (in that it is also "zippy" -- i.e. the monad instance on a pair just acts on the two elements pointwise, without ever allowing anything in the first slot to affect anything in the second slot, etc.). But we know more than that. We know that a representable functor, by virtue of being a reader in disguise, cannot have effects that migrate outwards. So any two actions in a representable functor are commutative. And more than that, they are entirely independent.
This means that all representable functors are "distributive"! Given any functor f, and any data type r, then we have
distributeReader :: Functor f => f (r -> a) -> (r -> f a) distributeReader fra = \r -> fmap ($r) fra
That is to say, given an arrow "inside" a functor, we can always pull the arrow out, and "distribute" application across the contents of the functor. A list of functions from Int -> Int
becomes a single function from Int
to a list of Int
, etc. More generally, since all representable functors are isomorphic to reader, given g representable, and f any functor, then we have: distribute :: (Functor f, Representable g) => f (g a) -> g (f a).
This is pretty powerful sauce! And if f and g are both representable, then we get the transposition isomorphism, witnessed by flip
! That's just the beginning of the good stuff. If we take functions and "unrepresent" them back to functors (i.e. take their logs), then we can do things like move from ((->) Bool)
to pairs, etc. Since we're in a pervasively lazy language, we've just created a library for memoization! This is because we've gone from a function to a data structure we can index into, representing each possible argument to this function as a "slot" in the structure. And the laziness pays off because we only need to evaluate the contents of each slot on demand (otherwise we'd have a precomputed lookup table rather than a dynamically-evaluated memo table).
And now suppose we take our representable functor in the form s -> a
and paired it with an "index" into that function, in the form of a concrete s
. Then we'd be able to step that s
forward or backwards and navigate around our structure of a
s. And this is precisely the Store Comonad! And this in turn gives a characterization of the lens laws.
What this all gives us a tiny taste of, in fact, is the tremendous power of the Yoneda lemma, which, in Haskell, is all about going between values and functions, and in fact captures the important universality and uniqueness properties that make working with representable functors tractable. A further tiny taste of Yoneda comes from a nice blog post by Conal Elliott on memoization.
Extra Credit on Sum Functors
There in fact is a log identity on sums. It goes like this:
log(a + c) = log a + log (1 + c/a)
Do you have a useful computational interpretation of this? I've got the inklings of one, but not much else.
Appendix: Notes on Representable Functors in Hask.
The way to think about this is to take some arbitrary category C, and some category that's basically Set (in our case, Hask. In fact, in our case, C is Hask too, and we're just talking about endofunctors on Hask). Now, we take some functor F : C -> Set, and some A which is an element of C. The set of morphisms originating at A (denoted by Hom(A,-)) constitutes a functor called the "hom functor." For any object X in C, we can "plug it in" to Hom(A,-), to then get the set of all arrows from A to X. And for any morphism X -> Y in C, we can derive a morphism from Hom(A,X) to Hom(A,Y), by composition. This is equivalent to, in Haskell-land, using a function f :: x -> y
to send g :: a -> x
to a -> y
by writing "functionAToY = f . g".
So, for any A in C, we have a hom functor on C, which is C -> Set, where the elements of the resultant Set are homomorphisms in C. Now, we have this other arbitrary functor F, which is also C -> Set. Now, if there is an isomorphism of functors between F, and Hom(A,_), then we say F is "representable". A representable functor is thus one that can be worked with entirely as an appropriate hom-functor.
]]>I was incredibly honored and I figured that if that many people (they had 30 or so registered attendees and 10 presentations) were going to spend that much time going over software that I had written, I should at least offer to show up!
I'd like to apologize for any errors in the romanization of people's names or misunderstandings I may have in the following text. My grasp of Japanese is very poor! Please feel free to send me corrections or additions!
Sadly, my boss's immediate reaction to hearing that there was a workshop in Japan about my work was to quip that "You're saying you're huge in Japan?" With him conspicuously not offering to fly me out here, I had to settle for surprising the organizers and attending via Google Hangout.
@nushio was very helpful in getting me connected, and while the speakers gave their talks I sat on the irc.freenode.net #haskell-lens channel and Google Hangout and answered questions and provided a running commentary with more details and references. Per freenode policy the fact that we were logging the channel was announced -- well, at least before things got too far underway.
Here is the IRC session log as a gist. IKEGAMI Daisuke @ikegami__ (ikeg
in the IRC log) tried to keep up a high-level running commentary about what was happening in the video to the log, which may be helpful if you are trying to follow along through each retroactively.
Other background chatter and material is strewn across twitter under the #ekmett_conf hash tag and on a japanese twitter aggregator named togetter
The 1PM start time in Shibuya, Tokyo, Japan translates to midnight at the start of Easter here in Boston, which meant ~6 hours later when we reached the Q&A session, I was a bit loopy from lack of sleep, but they were incredibly polite and didn't seem to mind my long rambling responses.
Thanks to the organizers, we have video of the vast majority of the event! There was no audio for the first couple of minutes, and the recording machine lost power for the last talk and the Q&A session at the end as we ran somewhat longer than they had originally scheduled! -- And since I was attending remotely and a number of others flitted in and out over the course of the night, they were nice enough to put most of the slides and background material online.
Liyang Hu (@liyanghu) started the session off with a nicely self-contained crash course on my profunctors package, since profunctors are used fairly heavily inside the implementation of lens and machines, with a couple of detours into contravariant and bifunctors.
His presentation materials are available interactively from the new FP Complete School of Haskell. You can also watch the video recording of his talk on ustream.
This talk was followed by a much more condensed version of very similar content in Japanese by Hibino Kei (@khibino) His talk was more focused on the relationship between arrows and profunctors, and the slides are available through slideshare.
Once the necessary background material was out of the way, the talk on lens -- arguably the presentation that most of the people were there for -- came early.
@its_out_of_tune gave an incredibly dense overview of how to use the main parts of the lens package in Japanese. His slides are available online and here is a recording of his talk.
Over the course of a half hour, he was able to cram in a great cross-section of the library including material that I hadn't even been able to get to even with 4x the amount of time available during my New York talk on how to use the lens template-haskell code to automatically generate lenses for user data types and how to use the lens Action machinery.
Next up, was my free package and the neat free-game engine that Kinoshita Fumiaki (@fumieval) built on top.
The slides were in English, though the talk and humor were very Japanese. ^_^
That said, he had some amazingly nice demos, including a live demo of his tetris clone, Monaris, which is visible about 10 minutes into the video!
@nebutalab, like me, joined the session remotely through Google Hangout, and proceeded to give a tutorial on how forward mode automatic differentiation works through my AD package.
His slides were made available before the talk and the video is available in two parts due a technical hiccup in the middle of the recording.
I'm currently working to drastically simplify the API for ad with Alex Lang. Fortunately almost all of the material in this presentation will still be relevant to the new design.
Next up, Murayama Shohei (@yuga) gave an introduction to tables, which is a small in memory data-store that I wrote a few months back to sit on top of lens.
Video of @yuga's talk and his slides are available, which I think makes this the first public talk about this project. -_^
Yoshida Sanshiro (@halcat0x15a) gave a nice overview of the currently released version of machines including a lot of examples! I think he may have actually written more code using machines just for demonstrations than I have written using it myself.
Video of his talk is available along with his slide deck -- just tap left or right to move through the slides. He has also written a blog post documenting his early explorations of the library, and some thoughts about using it with attoparsec.
I've recently been trying to redesign machines with coworker Paul CHIUSANO @pchiusano and we've begun greatly simplifying the design of machines based on some work he has been doing in Scala, so unfortunately many of the particulars of this talk will be soon outdated, but the overall 'feel' of working with machines should be preserved across the change-over. Some of these changes can be seen in the master branch on github now.
There were 4 more sessions, but alas, I'm out of time for the moment! I'll continue this write-up with more links to the source material and my thoughts as soon as I can tomorrow!
]]>