<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Comonad.Reader &#187; Algorithms</title>
	<atom:link href="http://comonad.com/reader/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://comonad.com/reader</link>
	<description>types, (co)monads, substructural logic</description>
	<lastBuildDate>Thu, 22 Jul 2010 19:53:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Introducing Speculation</title>
		<link>http://comonad.com/reader/2010/introducing-speculation/</link>
		<comments>http://comonad.com/reader/2010/introducing-speculation/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 19:53:03 +0000</pubDate>
		<dc:creator>Edward Kmett</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Boston Haskell]]></category>
		<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://comonad.com/reader/?p=205</guid>
		<description><![CDATA[A couple of days ago, I gave a talk at Boston Haskell about a shiny new speculative evaluation library, speculation on hackage, that I have implemented in Haskell. The implementation is based on the material presented as "Safe Programmable Speculative Parallelism" by Prakash Prabhu, G Ramalingam, and Kapil Vaswani at last month's PLDI.
I've uploaded a [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of days ago, I gave a talk at Boston Haskell about a shiny new speculative evaluation library, <a href="http://hackage.haskell.org/package/speculation">speculation</a> on hackage, that I have implemented in Haskell. The implementation is based on the material presented as <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=118795">"Safe Programmable Speculative Parallelism"</a> by Prakash Prabhu, G Ramalingam, and Kapil Vaswani at last month's PLDI.</p>
<p>I've uploaded a copy of my slides here:</p>
<p>* Introducing Speculation [<a href='http://comonad.com/reader/wp-content/uploads/2010/07/Speculation.pptx'>PowerPoint</a> | <a href='http://comonad.com/reader/wp-content/uploads/2010/07/Speculation.pdf'>PDF</a>]</p>
<p>This package provides speculative function application and speculative folds. Speculative STM transactions take the place of the transactional rollback machinery from the paper, but transactions are not always required in pure code. To get a feel for the shape of the library, here is an excerpt from the <a href="http://hackage.haskell.org/package/speculation">documentation</a> for one of the combinators:</p>
<blockquote><p>
<code><br />
     spec :: Eq a => a -> (a -> b) -> a -> b<br />
</code></p>
<p><code>spec g f a</code> evaluates <code>f g</code> while forcing <code>a</code>, if <code>g == a</code> then <code>f g</code> is returned, otherwise <code>f a</code> is evaluated and returned. Furthermore, if the argument has already been evaluated, we skip the <code>f g</code> computation entirely. If a good guess at the value of <code>a</code> is available, this is one way to induce parallelism in an otherwise sequential task. However, if the guess isn't available more cheaply than the actual answer, then this saves no work and if the guess is wrong, you risk evaluating the function twice. Under high load, since <code>f g</code> is computed via the spark queue, the speculation will be skipped and you will obtain the same answer as <code>f $! a</code>.
</p></blockquote>
<p>ASCII art time-lines of how this can speed up evaluation are available in both the slides and the documentation linked to above, but assuming an otherwise serial problem, you effectively wager otherwise idle CPU time and the time to generate your guess on the quality of your guess.</p>
<p>Note that <a href="http://hackage.haskell.org/trac/ghc/ticket/4167">numSparks# feature request</a> that was mentioned in the slides has already been implemented in GHC HEAD, and support shall be added to improve the performance of the speculative STM transactions under high load as mentioned in the slides.</p>
]]></content:encoded>
			<wfw:commentRss>http://comonad.com/reader/2010/introducing-speculation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Brodal-Okasaki Heaps in Haskell</title>
		<link>http://comonad.com/reader/2010/brodal-okasaki-heaps-in-haskell/</link>
		<comments>http://comonad.com/reader/2010/brodal-okasaki-heaps-in-haskell/#comments</comments>
		<pubDate>Sun, 16 May 2010 04:38:11 +0000</pubDate>
		<dc:creator>Edward Kmett</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Monoids]]></category>

		<guid isPermaLink="false">http://comonad.com/reader/?p=187</guid>
		<description><![CDATA[I've uploaded a package named heaps to Hackage that provides Brodal-Okasaki bootstrapped skew-binomial heaps in Haskell.

The main features of the library are that it provides a nice containers-like API with provably asymptotically optimal functional heap operations including O(1) insert and O(1) union, and that the library design jump through a number of hoops to provide [...]]]></description>
			<content:encoded><![CDATA[<p>I've uploaded a package named <a href="http://hackage.haskell.org/packages/archive/heaps/0.2/doc/html/Data-Heap.html">heaps</a> to Hackage that provides <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973">Brodal-Okasaki bootstrapped skew-binomial heaps</a> in Haskell.<br />
<span id="more-187"></span></p>
<p>The main features of the library are that it provides a nice <a href="http://hackage.haskell.org/package/containers">containers</a>-like API with provably asymptotically optimal functional heap operations including O(1) insert and O(1) union, and that the library design jump through a number of hoops to provide implementations of common Haskell typeclasses such as <a href="http://www.haskell.org/ghc/docs/6.12.1/html/libraries/base/Data-Foldable.html">Foldable</a>, Data and Typeable.</p>
]]></content:encoded>
			<wfw:commentRss>http://comonad.com/reader/2010/brodal-okasaki-heaps-in-haskell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reverse-Mode Automatic Differentiation in Haskell</title>
		<link>http://comonad.com/reader/2010/reverse-mode-automatic-differentiation-in-haskell/</link>
		<comments>http://comonad.com/reader/2010/reverse-mode-automatic-differentiation-in-haskell/#comments</comments>
		<pubDate>Sun, 16 May 2010 04:27:12 +0000</pubDate>
		<dc:creator>Edward Kmett</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Mathematics]]></category>

		<guid isPermaLink="false">http://comonad.com/reader/?p=183</guid>
		<description><![CDATA[I've uploaded a package named rad to Hackage for handling reverse-mode automatic differentiation in Haskell.

Internally, it leverages a trick from Andy Gill's Kansas Lava to observe sharing in the tape it records for back propagation purposes, and uses type level branding to avoid confusing sensitivities.
I've tried to keep the API relatively close to that of [...]]]></description>
			<content:encoded><![CDATA[<p>I've uploaded a package named <a href="http://hackage.haskell.org/package/rad">rad</a> to Hackage for handling reverse-mode <a href="http://en.wikipedia.org/wiki/Automatic_differentiation">automatic differentiation</a> in Haskell.<br />
<span id="more-183"></span></p>
<p>Internally, it leverages a <a href="http://www.ittc.ku.edu/~andygill/papers/reifyGraph.pdf">trick</a> from Andy Gill's Kansas Lava to observe sharing in the tape it records for back propagation purposes, and uses type level branding to avoid confusing sensitivities.</p>
<p>I've tried to keep the API relatively close to that of Barak Pearlmutter and Jeffrey Mark Siskind's <a href="http://hackage.haskell.org/package/fad">fad</a> package, but I couldn't resist making a couple of minor tweaks here and there for generality.</p>
<p>I still need to go through and finish up the remaining unimplemented fad combinators, figure out a nice way to build a reverse-mode AD tower, validate that I didn't screw up my recollection of basic calculus, and provide a nice API for using this approach to get local reverse mode checkpoints in an otherwise forward mode AD program, but I am quite happy with how things have progressed thus far.</p>
<p>[Edit: I've uploaded minor bug fixes for exp and (**)]</p>
]]></content:encoded>
			<wfw:commentRss>http://comonad.com/reader/2010/reverse-mode-automatic-differentiation-in-haskell/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Iteratees, Parsec and Monoids (Slides)</title>
		<link>http://comonad.com/reader/2009/iteratees-parsec-and-monoid/</link>
		<comments>http://comonad.com/reader/2009/iteratees-parsec-and-monoid/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 16:55:03 +0000</pubDate>
		<dc:creator>Edward Kmett</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Monoids]]></category>
		<category><![CDATA[Parsing]]></category>

		<guid isPermaLink="false">http://comonad.com/reader/?p=122</guid>
		<description><![CDATA[Two talks from the Boston Area Haskell User Group:
<ol>	
       <li><a href='http://comonad.com/reader/wp-content/uploads/2009/08/IntroductionToMonoids.pdf'>Introduction To Monoids (PDF)</a></li>
	<li><a href='http://comonad.com/reader/wp-content/uploads/2009/08/A-Parsing-Trifecta.pdf'>Iteratees, Parsec and Monoids: A Parsing Trifecta (PDF)</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I was asked to give two talks at the <a href="http://groups.google.com/group/bostonhaskell">Boston Area Haskell User Group</a> for this past Tuesday. The first was pitched at a more introductory level and the second was to go deeper into what I have been using monoids for lately.</p>
<p>The first talk covers an introduction to the mathematical notion of a monoid, introduces some of the features of my Haskell monoids library on hackage, and starts to motivate the use of monoidal parallel/incremental parsing, and the modification use of compression algorithms to recycle monoidal results.</p>
<p>The second talk covers a way to generate a locally-context sensitive parallel/incremental parser by modifying <a href="http://okmij.org/ftp/Haskell/Iteratee/Iteratee.hs">Iteratees</a> to enable them to drive a <a href="http://hackage.haskell.org/package/parsec-3.0.0">Parsec 3</a> lexer, and then wrapping that in a monoid based on <a href="http://dragonbook.stanford.edu/lecture-notes/Columbia-COMS-W4115/08-03-05.html">error productions</a> in the grammar before recycling these techniques at a higher level to deal with parsing seemingly stateful structures, such as Haskell layout.</p>
<ol>
<li><a href='http://comonad.com/reader/wp-content/uploads/2009/08/IntroductionToMonoids.pdf'>Introduction To Monoids (PDF)</a></li>
<li><a href='http://comonad.com/reader/wp-content/uploads/2009/08/A-Parsing-Trifecta.pdf'>Iteratees, Parsec and Monoids: A Parsing Trifecta (PDF)</a></li>
</ol>
<p>Due to a late start, I was unable to give the second talk. However, I did give a quick run through to a few die-hards who stayed late and came to the <a href="http://www.cambrew.com/">Cambridge Brewing Company</a> afterwards. As I promised some people that I would post the slides after the talk, here they are. </p>
<p>The current plan is to possibly give the second talk in full at either the September or October Boston Haskell User Group sessions, depending on scheduling and availability.</p>
<p>[ <a href='http://comonad.com/reader/wp-content/uploads/2009/08/Iteratee.hs'>Iteratee.hs</a> ]</p>
]]></content:encoded>
			<wfw:commentRss>http://comonad.com/reader/2009/iteratees-parsec-and-monoid/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Linear Bloom Filters</title>
		<link>http://comonad.com/reader/2008/linear-bloom-filters/</link>
		<comments>http://comonad.com/reader/2008/linear-bloom-filters/#comments</comments>
		<pubDate>Wed, 04 Jun 2008 03:26:55 +0000</pubDate>
		<dc:creator>Edward Kmett</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Data Structures]]></category>

		<guid isPermaLink="false">http://comonad.com/reader/2008/linear-bloom-filters/</guid>
		<description><![CDATA[This post is a bit of a departure from my recent norm. It contains no category theory whatsoever. None. I promise.
Now that I've bored away the math folks, I'll point out that this also isn't a guide to better horticulture. Great, there goes the rest of you.
Instead, I want to talk about Bloom filters, Bloom [...]]]></description>
			<content:encoded><![CDATA[<p>This post is a bit of a departure from my recent norm. It contains no category theory whatsoever. None. I promise.</p>
<p>Now that I've bored away the math folks, I'll point out that this also isn't a guide to better horticulture. Great, there goes the rest of you.</p>
<p>Instead, I want to talk about <a href="http://citeseer.ist.psu.edu/bloom70spacetime.html">Bloom filters</a>, Bloom joins for distributed databases and some novel extensions to them that let you trade in resources that we have in abundance for ones that are scarce, which I've been using for the last few months and which I have never before seen before in print. Primarily because I guess they have little to do with the strengths of Bloom filters.</p>
<p><span id="more-66"></span></p>
<p>For practical purposes you will need to use a <a href="http://en.wikipedia.org/wiki/Bloom_filter#Counting_filters">counted</a> or <a href="http://theory.stanford.edu/~matias/papers/sbf-sigmod-03.pdf">spectral</a> Bloom filter for the purposes of the structure mentioned below. However, as these introduce nothing novel, and simply muddle the exposition, I'll ignore counting and spectral Blooms for now.</p>
<p><b>Bloom Filters</b></p>
<p>Ok, so what is a Bloom filter? Bloom filters date back to 1970. A simple Bloom filter is a novel data structure for approximating membership in a set, yielding only false positives. A filter consists of an m-bit array and k distinct hash functions. To add an element to the filter you run it through each of the k hash functions and setting the appropriate bits. A value is considered to be a member of the set if you hash it through each of the k functions and each of the target bits is set. It is easy to see that this can only result in a false positive, but its also easy to see that you need to set the size of the array before you start adding elements to it, and that you need to balance the number of hash functions to the overall desired precision of your filter. In general you want to have about half of the bits set in the resulting array to maximize your information density -- a fact which can be derived with elementary calculus. From which you can figure out that you get optimal results when <img src='http://comonad.com/latex/2926fa9fe0e15aa4f8ef10438cd6822e.png' title='$k = \frac{m}{n} \ln 2$' alt='$k = \frac{m}{n} \ln 2$' align=absmiddle>.</p>
<p>We can readily approximate k distinct hashing functions by using a single one-way hashing function and carving it up into a number of hashing functions that consist of the right number of bits each. A simpler approach due to <a href="http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf">Kirsch and Mitzenmacher</a> is to sacrifice the independence of the hash functions without particularly adversely affecting the properties of the filter.</p>
<p>The nice thing about a Bloom filter is that the parameters m and k can be varied to tune space requirements and precision.</p>
<p><b>Improving Locality</b></p>
<p>One common way to improve the locality of reference for excessively large Bloom filters is to break up the structure into two tiers. You have an upper tier in which you use a single hash function to bin the data, then within the bin you placed the data you run the remaining k-1 hash functions. This can result in a 'lumpier' distribution of data, but generally improves performance because if you exceed working memory, this model can typically page in a single page from disk to handle the k-1 writes. When you figure that it is common to use between several hashing functions with a bloom filter this can result in a several-fold performance improvement as the data set grows and you become IO bound. As a result of being primarily to optimize IO you typically want to have a bin size that corresponds with your block or page size. </p>
<p>As an admittedly <em>completely</em> unintelligible aside, I am particularly fond of 8k bins for a simple Bloom filter, because they nicely consume 16 bits of hash evenly, and 4k bins, when used with 4 bit counting Blooms, page in and out efficiently and compress nicely with an arithmetic/exponentiated Huffman encoding into near even multiples of the ethernet packet MTU when you tune the ratio of set bits carefully, I've found this to be beneficial for tweaking real world performance.</p>
<p><b>Bloom Joins</b></p>
<p>Given a pair of bloom filters that share a given size m and which use the same k hash functions. You can take their intersection (or union) quite efficiently with bitwise and (or or). This is a well known technique for dealing with distributed database joins when you have data distributed across multiple servers joining against data distributed across other servers. In general, you are only interested in transmitting the data that exists on both sides of the join.</p>
<p>(You can technically free yourself from the requirement that both sides agree on the number of hash functions if you are willing to accept more false positives and you test for membership in the result set using just the hash functions contained in both Blooms. The easiest way to do this is to just agree on an order in which hash functions will be used, which comes for free from the Kirsch/Mitzenmacher approach mentioned above.)</p>
<p><b>The good</b><br />
The nice thing about a standard Bloom join is that you can send the Bloom filter over the network quite cheaply in comparison to the data, and with the addition of counting Bloom filter tricks it can be used to calculate approximately the size of the result set. This allows you to use it to load level <a href="http://labs.google.com/papers/mapreduce.html">MapReduce</a> style workloads effectively by estimating the size of intermediate results quite accurately before you send everything over the network to be aggregated.</p>
<p><b>The bad</b><br />
One problem with this model is that you have to know the size of your data set up front in order to calculate an ideal m for a desired precision level. Moreover both sides of the join have to agree on this figure m before calculating the join.</p>
<p>Now, the main goal of a Bloom join is to conserve an scarce resource (network bandwith) by exchanging cheaper, more plentiful resources (local CPU utilization, and disk IO). In that respect it serves adequately, but we can do better if our goal is more or less purely to optimize network bandwidth. Lets carry that a bit further.</p>
<p><b>Linear Hash Tables</b></p>
<p>To address the limitation that you have to know the size of the bloom a priori, we'll turn to another data structure, the <a href="http://en.wikipedia.org/wiki/Linear_hash">linear hash table</a>. Linear hash tables were designed by Witold Litwin back in 1980 to provide an expandable hash table without a huge stairstep in the cost function whenever you hit a power of two in size. The basic idea of a linear hash table is that you grow the table gradually, by splitting one bucket at a time and using the least significant bits of your hash function.</p>
<p>For sake of variety, I've included a C# 3.5 implementation here:</p>
<p>[<a href="http://comonad.com/csharp/SortedLinearHashTable.cs">SortedLinearHashTable.cs</a>]<br />
[<a href="http://comonad.com/csharp/SinglyLinkedList.cs">SinglyLinkedList.cs</a>]<br />
[<a href="http://comonad.com/csharp/PreparedEqualityComparer.cs">PreparedEqualityComparer.cs</a>]<br />
[<a href="http://comonad.com/csharp/PreparedEqualityComparerTypeProxyAttribute.cs">PreparedEqualityComparerTypeProxyAttribute.cs</a>]</p>
<p>For my regular audience, an implementation in Haskell using STM &mdash; incidentally was the first piece of Haskell I ever wrote &mdash; designed for read-mostly use can be found here:</p>
<p>[<a href="http://comonad.com/haskell/thash/dist/doc/html/">haddock</a>]<br />
[<a href="http://comonad.com/haskell/thash/">darcs</a>]</p>
<p><b>A Linear Bloom Filter</b></p>
<p>Now, we can look at the bi-level structure we introduced above for dealing with improved cache locality and note that we could go in a different direction and treat the upper level as a linear hash table, instead of a simple hash function! This requires that we keep not only the Bloom but also the member list (or at least their hashes). We can optimize this slightly by computing the Bloom of the member list for each page lazily. This costs us quite a bit of storage relative to a traditional Bloom filter, but we can transmit the Bloom of the resulting set over the network more cheaply than we can transmit a linear hash table and it isn't appreciably more expensive locally than a linear hash table due to only lazily constructing the Blooms.</p>
<p>This mechanism gives rise to an actual tree of pages based on the unfolding of the linear hash table in the resulting hierarchical bloom if you choose to represent the interior of the tree.</p>
<p>Again, this isn't a win for all scenarios, but if you are intending to transmit the resulting set over the network, and don't know its size a priori, the combination of properties from the linear hash table and the bloomed pages leads to some interesting options.</p>
<p><b>Linear Bloom Origami</b></p>
<p>Now that we have an expandable hash in our top level, we finally have the machinery to deal with how to perform a join between two linear bloom filters of different size. The model is actually quite simple. We can fold the larger bloom up by <em>or</em>ing together the leaves that were split by the linear hash table in the larger bloom until we have the same number of pages and then perform a standard Bloom join. This frees us from the tyranny of having to have both sides of the join guess in advance a shared number of buckets to use to perform the join. </p>
<p>As an aside, an interesting thought experiment is to go one step further and use a full-fledged sorted linear hash table for the extra cost of sorting the chains, but this doesn't seem to be useful in practice.</p>
<p><b>Mipmapping Blooms</b></p>
<p>If we are willing to pay an <img src='http://comonad.com/latex/01a17e3e6b652247d2115619a119c921.png' title='$\mathcal{O}(n \log n)$' alt='$\mathcal{O}(n \log n)$' align=absmiddle> cost in terms of the data set size cost in terms of CPU utilization and memory bandwidth we can gain some further performance in terms of network utilization through encoding a set of "<a href="http://en.wikipedia.org/wiki/Mipmap">mipmaps</a>" for our filters. </p>
<p>Basically the idea is to fold up the tree by <em>or</em>ing together the pages into an admissible Bloom of the dataset. Then you encode the splitting of each bit that was set in the Bloom using conditional probabilities. This can be transmitted near optimally using arithmetic encoding or exponentiated Huffman.</p>
<p>If a bit is set in the parent Bloom, then at least one of the two bits will be set in the child Blooms; if no bit is set in the parent Bloom, then no bit can be set in the child Blooms. The probability of each bit being set in each child is for all practical intents and purposes independent and can reasonably be modeled as a function of the expected number of set bits. (This is ever-so-slightly suboptimal if the overall number of values is known). You can determine exact values for the weights of each of the three cases using conditional probabilities and then use an arithmetic compressor, or exponentiate the alphabet for a Huffman compressor &mdash; this is otherwise near worst-case for Huffman, since you have two possibilities both just shy of 50% and one much smaller probability. Nicely the regular structure of the exponentiated alphabet is very regular and can be represented efficiently. With careful choice of page size (or bit density within a page) you can transmit the initial page cheaply, and then pack multiple pages into subsequent packets. </p>
<p>Since we can determine the relevance of portions of the tree based on partial information this may allow you to avoid transmitting some branches of the tree. More interestingly we can use it to figure out approximately the size of the join set from the first few pages transmitted and to gain gradual refinements as both sides of the join supply more information. </p>
<p>If you wanted to optimize strictly for network bandwidth and were willing to accept additional latency you could prune branches of the tree after it was clear that the intersection was empty and so no further resolution was required, but in my experience this optimization doesn't seem to be worth the effort.</p>
<p><b>Incremental Update</b></p>
<p>Interestingly if you have already shared a Linear Bloom and need to update your copy it admits a cheap network representation using the same arithmetic/exponentiated Huffman encoding trick mentioned earlier. You lose the ability to ignore all unset bits in the dataset because extending the set of known values will in all likelihood set new bits, but as you add members you can transmit splits using the same mechanism used above, and you have the actual member set needed to populate the child pages accurately.</p>
<p>Interestingly it is the ability to mipmap the intermediate results that sometimes makes it worth dealing with a suboptimal choice of density for the overall Bloom filter, because it only affects the cost on either end of the network, it doesn't affect the network transmission costs all that adversely and more sparse population early in the tree can allow you to have a less oversaturated tree near the root, allowing earlier pruning of branches - I have yet to take this from an art to a science.</p>
<p><b>Conclusion</b></p>
<p>I had intended to explain things in more detail and delve into the asymptotic behavior of hierarchical and linear Blooms, but various people have been hammering me to just post this already, so here it is.</p>
<p>So to recap, we took a normal (or counted or spectral) Bloom filter, crossbred it with Litwin's linear hash table and found that the mutant offspring is an approximation of a set that is better suited to sharing over the network than either structure alone, with a memory usage profile similar to that of a linear hash table. Interestingly as a side effect you can go one step further and allow for transmission of a requested subset of the exact hashes present in which case we've really only used the Blooms to provide partial information about the underlying linear hash table, which can aid in the subsequent join process.</p>
<p>And yes, they are probably better named something like Bloomed linear hash tables, but that doesn't roll off the tongue.</p>
<p>If there is enough interest and I don't get dragged into other things, I might see about packaging up and genericizing some code that I had lying around intended for production use into a more general purpose library for Linear Bloom Filters. </p>
]]></content:encoded>
			<wfw:commentRss>http://comonad.com/reader/2008/linear-bloom-filters/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
