24

I have some experience writing small tools in Haskell and I find it very intuitive to use, especially for writing filters (using interact) that process their standard input and pipe it to standard output.

Recently I tried to use one such filter on a file that was about 10 times larger than usual and I got a Stack space overflow error.

After doing some reading (e.g. here and here) I have identified two guidelines to save stack space (experienced Haskellers, please correct me if I write something that is not correct):

  1. Avoid recursive function calls that are not tail-recursive (this is valid for all functional languages that support tail-call optimization).
  2. Introduce seq to force early evaluation of sub-expressions so that expressions do not grow too large before they are reduced (this is specific to Haskell, or at least to languages using lazy evaluation).

After introducing five or six seq calls in my code my tool runs smoothly again (also on the larger data). However, I find the original code was a bit more readable.

Since I am not an experienced Haskell programmer I wanted to ask if introducing seq in this way is a common practice, and how often one will normally see seq in Haskell production code. Or are there any techniques that allow to avoid using seq too often and still use little stack space?

Giorgio
  • 19,486
  • 16
  • 84
  • 135
  • 1
    Optimizations like the kind you described are almost always going to make the code a bit less elegant. – Robert Harvey Oct 06 '12 at 16:25
  • @Robert Harvey: Are there any alternative techniques to keep stack usage low? I mean I imagine I have to rewrite my functions differently but I have no clue whether there are well-established techniques. My first attempt was to use tail-recursive functions, which helped but did not allow me to solve my problem completely. – Giorgio Oct 06 '12 at 19:17

1 Answers1

18

Unfortunately there are cases when one has to use seq in order to get a efficient/well working program for large data. So in many cases, you cannot do without it in production code. You can find more information in Real World Haskell, Chapter 25. Profiling and optimization.

However, there are possibilities how to avoid using seq directly. This can make code cleaner and more robust. Some ideas:

  1. Use conduit, pipes or iteratees instead of interact. Lazy IO is known to have problems with managing resources (not just memory) and iteratees are designed exactly to overcome this. (I'd suggest to avoid lazy IO alltogether no matter how large your data is - see The problem with lazy I/O.)
  2. Instead of using seq directly use (or design your own) combinators such as foldl' or foldr' or strict versions of libraries (such as Data.Map.Strict or Control.Monad.State.Strict) that are designed for strict computations.
  3. Use BangPatterns extension. It allows to replace seq with strict pattern matching. Declaring strict constructor fields could be also useful in some cases.
  4. It's also possible to use Strategies for forcing evaluation. Strategies library is mostly aimed at parallel computations, but has methods for forcing a value to WHNF (rseq) or full NF (rdeepseq) as well. There are many utility methods for working with collections, combining strategies etc.
Petr
  • 5,507
  • 3
  • 29
  • 46
  • +1: Thanks for the useful hints and links. Point 3 seems quite interesting (and the easiest solution for me to use right now). Regarding suggestion 1, I do not see how avoiding lazy IO can improve things: As far as I understand lazy IO should be better for a filter that is supposed to process a (possibly very long) stream of data. – Giorgio Oct 15 '12 at 21:05
  • 2
    @Giorgio I added a link to Haskell Wiki about problems with Lazy IO. With lazy IO you can have very hard time managing resources. For example, if you don't fully read the input (like due to lazy evaluation), the [file handle remains open](http://www.haskell.org/pipermail/beginners/2011-November/008965.html). And if you go and close the file handle manually, it often happens that due to lazy evaluation reading it is postponed and you close the handle before reading the whole input. And, it's often quite hard to avoid memory problems with lazy IO. – Petr Oct 16 '12 at 05:07
  • I recently had this problem and my program was running out of file descriptors. So I replaced lazy IO with strict IO using strict `ByteString`. – Giorgio Jan 10 '15 at 09:03