Fri 16 May 2008
Does anyone know of any work on "forgetful laziness?"
The basic idea being that for each thunk instead of overwriting it with the answer as usual in call-by-need, you'd just write a forwarding pointer, and allow GC to collect the answers over time.
This results in recalculation and may be subject to thrashing, so the obvious fix would be either
- a 'forget at most once' policy, which would only mitigate the kind of memory leaks you get from laziness under limited conditions, but which has a worst case payout of doubling the workload or
- an exponential backoff on how often you'll try to recollect a given value, which should preserve for practical purposes the asymptotic behavior of any algorithm, but with a much larger constant for pathological access patterns. [Edit: it may affect asymptotic behavior, because you could lose sharing]
This would allow the recollection of large CAFs, etc. eventually once they had bitrotted long enough.
Not sure if its worth the cost of recalculating and of storing any backoff counter, but most of the horror stories you hear about Haskell center around its occasional horrific memory usage profile.
Tuning points might include studying average thunk lifetimes to construct thunk access profiles rather than use a naive exponential backoff.
It also may exascerbate the opposite problem where naive code often builds up a tower of thunks it needs to evaluate all at once in order to provide an answer (i.e. when working with a lazy accumulating parameter).

May 16th, 2008 at 6:37 pm
This is basically weak references, yes? I looked once upon a time at combining automatic memoization with weak references (useful for dynamic programming), but I never got around to completing it.
May 16th, 2008 at 9:37 pm
Its a little bit different. The goal is much the same, but with a thunk you have a delayed computation that you may need the result of along with a closure of the values you need to actually generate the result.
When you force the thunk, you usually (in call-by-need) update it in place to be the result, unless you know that you only need the result once.
Here the idea is to allow the thunk to ‘unforce’ itself over time, but to remember that it was unforced by tagging the thunk (or ticking a counter in it similar to a mark cost) to cause it to become harder to ‘unforce’ in the future.
Weak references don’t keep their values alive at all. Here a thunk would slowly bitrot. Once you ‘forget’ a value, the weak reference to it never becomes valid again. Here if you look at the same thunk again and you regenerate the value it becomes valid again, but of course thunks themselves are subject to garbage collection.
In some senses there is conceptual overlap between memoization and thunking, but a memo-table gives you some control over bounding the amount of stuff you hold on to, while the only time you get benefit from a thunk is when two computations reference the same thunk.
If you are interested in automatic memoization you might find some of Umut Acar’s papers interesting starting with something like: http://ttic.uchicago.edu/~umut/papers/ml05.pdf