Mon 11 Jul 2011
Free Modules and Functional Linear Functionals
Posted by Edward Kmett under Algorithms , Category Theory , Data Structures , Haskell , Linear Algebra , Monads , Monoids , Type Hackery[9] Comments
Today I hope to start a new series of posts exploring constructive abstract algebra in Haskell.
In particular, I want to talk about a novel encoding of linear functionals, polynomials and linear maps in Haskell, but first we're going to have to build up some common terminology.
Having obtained the blessing of Wolfgang Jeltsch, I replaced the algebra package on hackage with something... bigger, although still very much a work in progress.
(Infinite) Modules over Semirings
Recall that a vector space V over a field F is given by an additive Abelian group on V, and a scalar multiplication operator
(.*) :: F -> V -> V
subject to distributivity laws
s .* (u + v) = s .* u + s .* v (s + t) .* v = s .* v + t .* v
and associativity laws
(s * t) .* v = s .* (t .* v)
and respect of the unit of the field.
1 .* v = v
Since multiplication on a field is commutative, we can also add
(*.) :: V -> F -> V v *. f = f .* v
with analogous rules.
But when F is only a Ring, we call the analogous structure a module, and in a ring, we can't rely on the commutativity of multiplication, so we may have to deal left-modules and right-modules, where only one of those products is available.
We can weaken the structure still further. If we lose the negation in our Ring we and go to a Rig (often called a Semiring), now our module is an additive moniod.
If we get rid of the additive and multiplicative unit on our Rig we get down to what some authors call a Ringoid, but which we'll call a Semiring here, because it makes the connection between semiring and semigroup clearer, and the -oid suffix is dangerously overloaded due to category theory.
First we'll define additive semigroups, because I'm going to need both additive and multiplicative monoids over the same types, and Data.Monoid has simultaneously too much and too little structure.
-- (a + b) + c = a + (b + c) class Additive m where (+) :: m -> m -> m replicate1p :: Whole n => n -> m -> m -- (ignore this for now) -- ...
their Abelian cousins
-- a + b = b + a class Additive m => Abelian m
and Multiplicative semigroups
-- (a * b) * c = a * (b * c) class Multiplicative m where (*) :: m -> m -> m pow1p :: Whole n => m -> n -> m -- ...
Then we can define a semirings
-- a*(b + c) = a*b + a*c -- (a + b)*c = a*c + b*c class (Additive m, Abelian m, Multiplicative m) => Semiring
With that we can define modules over a semiring:
-- r .* (x + y) = r .* x + r .* y -- (r + s) .* x = r .* x + s .* x -- (r * s) .* x = r .* (s .* x) class (Semiring r, Additive m) => LeftModule r m (.*) :: r -> m -> m
and analogously:
class (Semiring r, Additive m) => RightModule r m (*.) :: m -> r -> m
For instance every additive semigroup forms a semiring module over the positive natural numbers (1,2..) using replicate1p.
If we know that our addition forms a monoid, then we can form a module over the naturals as well
-- | zero + a = a = a + zero class (LeftModule Natural m, RightModule Natural m ) => AdditiveMonoid m where zero :: m replicate :: Whole n => n -> m -> m
and if our addition forms a group, then we can form a module over the integers
-- | a + negate a = zero = negate a + a class (LeftModule Integer m , RightModule Integer m ) => AdditiveGroup m where negate :: m -> m times :: Integral n => n -> m -> m -- ...
Free Modules over Semirings
A free module on a set E, is a module where the basis vectors are elements of E. Basically it is |E| copies of some (semi)ring.
In Haskell we can represent the free module of a ring directly by defining the action of the (semi)group pointwise.
instance Additive m => Additive (e -> m) where f + g = \x -> f x + g x instance Abelian m => Abelian (e -> m) instance AdditiveMonoid m => AdditiveMonoid (e -> m) where zero = const zero instance AdditiveGroup m => AdditveGroup (e -> m) where f - g = \x -> f x - g x
We could define the following
instance Semiring r => LeftModule r (e -> m) where r .* f = \x -> r * f x
but then we'd have trouble dealing with the Natural and Integer constraints above, so instead we lift modules
instance LeftModule r m => LeftModule r (e -> m) where (.*) m f e = m .* f e instance RightModule r m => RightModule r (e -> m) where (*.) f m e = f e *. m
We could go one step further and define multiplication pointwise, but while the direct product of |e| copies of a ring _does_ define a ring, and this ring is the one provided by the Conal Elliot's vector-space
package, it isn't the most general ring we could construct. But we'll need to take a detour first.
Linear Functionals
A Linear functional f on a module M is a linear function from a M to its scalars R.
That is to say that, f : M -> R such that
f (a .* x + y) = a * f x + f y
Consequently linear functionals also form a module over R. We call this module the dual module M*.
Dan Piponi has blogged about these dual vectors (or covectors) in the context of trace diagrams.
If we limit our discussion to free modules, then M = E -> R, so a linear functional on M looks like (E -> R) -> R
subject to additional linearity constraints on the result arrow.
The main thing we're not allowed to do in our function is apply our function from E -> R to two different E's and then multiply the results together. Our pointwise definitions above satisfy those linearity constraints, but for example:
bad f = f 0 * f 0
does not.
We could capture this invariant in the type by saying that instead we want
newtype LinearM r e = LinearM { runLinearM :: forall r. LeftModule r m => (e -> m) -> m }
we'd have to make a new such type every time we subclassed Semiring. I'll leave further exploration of this more exotic type to another time. (Using some technically illegal module instances we can recover more structure that you'd expect.)
Now we can package up the type of covectors/linear functionals:
infixr 0 $* newtype Linear r a = Linear { ($*) :: (a -> r) -> r }
The sufficiently observant may have already noticed that this type is the same as the Cont monad (subject to the linearity restriction on the result arrow).
In fact the Functor
, Monad
, Applicative
instances for Cont
all carry over, and preserve linearity.
(We lose callCC
, but that is at least partially due to the fact that callCC
has a less than ideal type signature.)
In addition we get a number of additional instances for Alternative
, MonadPlus
, by exploiting the knowledge that r is ring-like:
instance AdditiveMonoid r => Alternative (Linear r a) where Linear f < |> Linear g = Linear (f + g) empty = Linear zero
Note that the (+)
and zero
there are the ones defined on functions from our earlier free module construction!
Linear Maps
Since Linear r
is a monad, Kleisli (Linear r)
forms an Arrow
:
b -> ((a -> r) ~> r)
where the ~> denotes the arrow that is constrained to be linear.
If we swap the order of the arguments so that
(a -> r) ~> (b -> r)
this arrow has a very nice meaning! (See Numeric.Map.Linear)
infixr 0 $# newtype Map r b a = Map { ($#) :: (a -> r) -> (b -> r) }
Map r b a
represents the type of linear maps from a -> b
. Unfortunately due to contravariance the arguments wind up in the "wrong" order.
instance Category (Map r) where Map f . Map g = Map (g . f) id = Map id
So we can see that a linear map from a module A with basis a
to a vector space with basis b
effectively consists of |b| linear functionals on A.
Map r b a
provides a lot of structure. It is a valid instance of an insanely large number of classes.
Vectors and Covectors
In physics, we sometimes call linear functionals covectors or covariant vectors, and if we're feeling particularly loquacious, we'll refer to vectors as contravariant vectors.
This has to do with the fact that when you change basis, you change map the change over covariant vectors covariantly, and map the change over vectors contravariantly. (This distinction is beautifully captured by Einstein's summation notation.)
We also have a notion of covariance and contravariance in computer science!
Functions vary covariantly in their result, and contravariant in their argument. E -> R
is contravariant in E. But we chose this representation for our free modules, so the vectors in our free vector space (or module) are contravariant in E.
class Contravariant f where contramap :: (a -> b) -> f a -> f b -- | Dual function arrows. newtype Op a b = Op { getOp :: b -> a } instance Contravariant (Op a) where contramap f g = Op (getOp g . f)
On the other hand (E -> R) ~> R
varies covariantly with the change of E
.
as witnessed by the fact that it is a Functor
.
instance Functor (Linear r) where fmap f m = Linear $ \k -> m $* k . f
We have lots of classes for manipulating covariant structures, and most of them apply to both (Linear r) and (Map r b).
Other Representations and Design Trade-offs
One common representation of vectors in a free vector space is as some kind of normalized list of scalars and basis vectors. In particular, David Amos's wonderful HaskellForMaths uses
newtype Vect r a = Vect { runVect :: [(r,a)] }
for free vector spaces, only considering them up to linearity, paying for normalization as it goes.
Given the insight above we can see that Vect isn't a representation of vectors in the free vector space, but instead represents the covectors of that space, quite simply because Vect r a varies covariantly with change of basis!
Now the price of using the Monad
on Vect r
is that the monad denormalizes the representation. In particular, you can have multiple copies of the same basis vector., so any function that uses Vect r a
has to merge them together.
On the other hand with the directly encoded linear functionals we've described here, we've placed no obligations on the consumer of a linear functional. They can feed the directly encoded linear functional any vector they want!
In fact, it'll even be quite a bit more efficient to compute,
To see this, just consider:
instance MultiplicativeMonoid r => Monad (Vect r) where return a = Vect [(1,a)] Vect as >>= f = Vect [ (p*q, b) | (p,a) < - as, (q,b) <- runVect (f b) ]
Every >>= must pay for multiplication. Every return will multiply the element by one. On the other hand, the price of return and bind in Linear r is function application.
instance Monad (Linear r) where return a = Linear $ \k -> k a m >>= f = Linear $ \k -> m $* \a -> f a $* k
A Digression on Free Linear Functionals
To wax categorical for a moment, we can construct a forgetful functor U : Vect_F -> Set
that takes a vector space over F to just its set of covectors.
F E = (E -> F, F,\f g x -> f x + g x ,\r f x -> r * f x)
using the pointwise constructions we built earlier.
Then in a classical setting, you can show that F is left adjoint to U.
In particular the witnesses of this adjunction provide the linear map from (E -> F) to V and the function E -> (V ~> F) giving a linear functional on V for each element of E.
In a classical setting you can go a lot farther, and show that all vector spaces (but not all modules) are free.
But in a constructive setting, such as Haskell, we need a fair bit to go back and forth, in particular we wind up need E to be finitely enumerable to go one way, and for it to have decidable equality to go in the other. The latter is fairly easy to see, because even going from E -> (E -> F)
requires that we can define and partially apply something like Kronecker's delta:
delta :: (Rig r, Eq a) => e -> e -> r delta i j | i == j = one | otherwise = zero
The Price of Power
The price we pay is that, given a Rig
, we can go from Vect r a
to Linear r a
but going back requires a
to be be finitely enumerable (or for our functional to satisfy other exotic side-conditions).
vectMap :: Rig r => Vect r a -> Linear r a vectMap (Vect as) = Map $ \k -> sum [ r * k a | (r, a) < - as ]
You can still probe Linear r a
for individual coefficients, or pass it a vector for polynomial evaluation very easily, but for instance determining a degree of a polynomial efficiently requires attaching more structure to your semiring, because the only value you can get out of Linear r a
is an r
.
Optimizing Linear Functionals
In both the Vect r
and Linear r
cases, excessive use of (>>=)
without somehow normalizing or tabulating your data will cause a lot of repeated work.
This is perhaps easiest to see from the fact that Vect r
never used the addition of r
, so it distributed everything into a kind of disjunctive normal form. Linear r
does the same thing.
If you look at the Kleisli arrows of Vect r
or Linear r
as linear mappings, then you can see that Kleisli composition is going to explode the number of terms.
So how can we collapse back down?
In the Kleisli (Vect r)
case we usually build up a map as we walk through the list then spit the list back out in order having added up like terms.
In the Map r
case, we can do better. My representable-tries
package provides a readily instantiable HasTrie
class, and the method:
memo :: HasTrie a => (a -> r) -> a -> r
which is responsible for providing a memoized version of the function from a -> r
in a purely functional way. This is obviously a linear map!
memoMap :: HasTrie a => Map r a a memoMap = Map memo
We can also flip memo around and memoize linear functionals.
memoLinear :: HasTrie a => a -> Linear r a memoLinear = Linear . flip memo
Next time, (co)associative (co)algebras and the myriad means of multiplying (co)vectors!
July 11th, 2011 at 9:48 pm
Your Multiplicative class seems to ignore the existence of noncommutative (semi)rings; surely this important area of mathematics shouldn’t be left out?
July 11th, 2011 at 10:19 pm
Without commutative addition, you don’t get many normalization opportunities. So, in the interest of drawing the line somewhere I left that off. ;) I also avoided talking about left-seminnearings and right-seminearrings. =) I do enjoy playing with the individual grains of sand in my sandbox though.
July 12th, 2011 at 3:07 am
You say “If we get rid of the additive and multiplicative unit on our Rig we get down to what some authors call a Ringoid, but which we’ll call a Semiring”, but earlier you “…go to a Rig (often called a Semiring)”. Are both of them Semirings?
July 12th, 2011 at 3:46 am
This is what I get for trying to nod to the fact that different authors use the same words for different things.
The vocabulary I’m working with here is that a semiring is a pair of semigroups with a pair of distributive laws and that a rig adds a 0 and 1 to a semiring.
Some authors call what I am calling a rig above a semiring, which is somewhat annoying because the semi- just means ‘not-quite’ in this setting, and has no connection to the relationship between group and semigroup. Moreover it leaves the question of what to call the pair-of-semigroups construction that I’m calling semiring above. The name those authors often use is “ringoid”, but that conflicts with the much more useful use of ringoid for referring to Ab-enriched categories, which is likely to wind up in my semigroupoid package.
If a group is a groupoid with one object then a ringoid is a ringoid with one object.
So in the vocabuary I’m using here, we get:
Semiring <= Rig <= Ring
which parallels
Semigroup <= Monoid <= Group
by adding unit(s) and additive inverses respectively.
I’ll see if I can clear up the verbage to make it clearer.
July 12th, 2011 at 5:48 am
I suppose that “replicate1p” and “pow1p” both define repetition. IMHO you can name them with “+” and “*”, adding some symbol to denote repetition, maybe “…”? (And with the right order of arguments in “pow1p”. :) )
July 12th, 2011 at 6:40 am
If we limit our discussion to free modules, then M = E -> R
And every element of M should be 0 on all but finite set of elements of E. This is the definition of free modules.
July 12th, 2011 at 6:49 am
We could capture this invariant in the type by saying that instead we want
Can you please explain what “invariant” you are talking about?
July 12th, 2011 at 8:57 am
Oops, sorry. I meant nonassociative rings. But I guess you do have to draw the line somewhere.
July 12th, 2011 at 9:21 am
@beroal: i didn’t make them symbolic because they have the problem that what do you name the three different operators for ((N+1)*), (N*) and (Z*) ?
whereas the precedent of log1p at least gives some intuition to the name replicate1p, and the action of replicate on lists if the repetition of a monoidal value. I had, however originally started using (#*) and (*#) for times and may revert. They are mostly provided as reasonable default implementations for the natural number multiplication and integer multiplication requirements placed on you by the module superclasses.
As for the free module, technically I’m permitting the infinite free module. Haskell provides me a largely coinductive universe, I’d be remiss in not taking advantage of it. I’ll throw an (infinite) in there. ;) But unlike the classical case I can’t prove the duality in most cases to begin with and I’m not able to enumerate the set of all e in E such that E -> R is non-zero except when the set E is finitely enumerable itself, because I can’t intensionally inspect the function I’m given. Effectively the constraint moves from requiring the function to have a finite number of non-zeros to requiring that any linear functional can only inspect a finite number of vectors, which being constructive is all it can do anyways.
The invariant in question is linearity. One (potentially conservative) way to enforce that would be to never apply our continuation vector twice and take the product of the results (or add it to some value in the ring), but instead only multiply it by values we have lying around in the ring or add and subtract them. If you think about it, quantifying over the choice of a module then prevents violations of linearity. \k -> r .* f k is well typed, we can plumb in zeroes, etc, but \k -> f k * f k requires our module be strengthened to a semiring, and the unmentioned but also non-linear \k -> f k + r also would fail to type check, since f k :: m, and under quantification over the module all we’re permitted to do is use the (.*) and the (+) from our module. the reason this is a problem is if you want to use the zero from a semigroup-with-zero module, you’d need a new type.
There is also a performance impact from the fact that we effectively use the LeftModule dictionary as an interpreter, and most damningly it costs us the ability to use representable-tries for cheap memoization.