12

Here is a simple programming problem from SPOJ: http://www.spoj.com/problems/PROBTRES/.

Basically, you are asked to output the biggest Collatz cycle for numbers between i and j. (Collatz cycle of a number $n$ is the number of steps to eventually get from $n$ to 1.)

I have been looking for a Haskell way to solve the problem with comparative performance than that of Java or C++ (so as to fits in the allowed run-time limit). Although a simple Java solution that memoizes the cycle length of any already computed cycles will work, I haven't been successful at applying the idea to obtain a Haskell solution.

I have tried the Data.Function.Memoize, as well as home-brewed log time memoization technique using the idea from this post: https://stackoverflow.com/questions/3208258/memoization-in-haskell. Unfortunately, memoization actually makes the computation of cycle(n) even slower. I believe the slow down comes from the overhead of the Haskell way. (I tried running with the compiled binary code, instead of interpreting.)

I also suspect that simply iterating numbers from i to j can be costly ($i,j\le10^6$). So I even tried precompute everything for the range query, using idea from http://blog.openendings.net/2013/10/range-trees-and-profiling-in-haskell.html. However, this still gives "Time Limit Exceeding" error.

Can you help to inform a neat competitive Haskell program for this?

  • 10
    This post seems fine to me. It's an algorithmic problem that needs a proper design to achieve adequate performance. What we really don't want here is "how do I fix my broken code" questions. – Robert Harvey Feb 16 '16 at 21:13

2 Answers2

7

I'll answer in Scala, because my Haskell isn't as fresh, and so people will believe this is a general functional programming algorithm question. I'll stick to data structures and concepts that are readily transferable.

We can start with a function that generates a collatz sequence, which is relatively straightforward, except for needing to pass the result as an argument to make it tail recursive:

def collatz(n: Int, result: List[Int] = List()): List[Int] = {
   if (n == 1) {
     1 :: result
   } else if ((n & 1) == 1) {
     collatz(3 * n + 1, n :: result)
   } else {
     collatz(n / 2, n :: result)
   }
 }

This actually puts the sequence in reverse order, but that's perfect for our next step, which is to store the lengths in a map:

def calculateLengths(sequence: List[Int], length: Int,
  lengths: Map[Int, Int]): Map[Int, Int] = sequence match {
    case Nil     => lengths
    case x :: xs => calculateLengths(xs, length + 1, lengths + ((x, length)))
}

You would call this with the answer from the first step, the initial length, and an empty map, like calculateLengths(collatz(22), 1, Map.empty)). This is how you memoize the result. Now we need to modify collatz to be able to use this:

def collatz(n: Int, lengths: Map[Int, Int], result: List[Int] = List()): (List[Int], Int) = {
  if (lengths contains n) {
     (result, lengths(n))
  } else if ((n & 1) == 1) {
    collatz(3 * n + 1, lengths, n :: result)
  } else {
    collatz(n / 2, lengths, n :: result)
  }
}

We eliminate the n == 1 check because we can just initialize the map with 1 -> 1, but we need to add 1 to the lengths we put in the map inside calculateLengths. It now also returns the memoized length where it stopped recursing, which we can use to initialize calculateLengths, like:

val initialMap = Map(1 -> 1)
val (result, length) = collatz(22, initialMap)
val newMap = calculateLengths(result, lengths, initialMap)

Now we have relatively efficient implementations of the pieces, we need to find a way to feed the results of the previous calculation into the input of the next calculation. This is called a fold, and looks like:

def iteration(lengths: Map[Int, Int], n: Int): Map[Int, Int] = {
  val (result, length) = collatz(n, lengths)
  calculateLengths(result, length, lengths)
}

val lengths = (1 to 10).foldLeft(Map(1 -> 1))(iteration)

Now to find the actual answer, we just need to filter the keys in the map between the given range, and find the max value, giving a final result of:

def answer(start: Int, finish: Int): Int = {
  val lengths = (start to finish).foldLeft(Map(1 -> 1))(iteration)
  lengths.filterKeys(x => x >= start && x <= finish).values.max
}

In my REPL for ranges of size 1000 or so, like the example input, the answer returns pretty much instantaneously.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
3

Karl Bielefeld has already answered the question well, I'll just add a Haskell version.

First a simple, non-memoizing version of the basic algorithm to show off the efficient recursion:

simpleCollatz :: Int -> Int -> Int
simpleCollatz count 1 = count + 1
simpleCollatz count n | odd n     = simpleCollatz (count + 1) (3 * n + 1)
                      | otherwise = simpleCollatz (count + 1) (n `div` 2)

That should be almost self-explaining.

I, too, will be using a simple Map to store the results.

-- double imports to make the namespace pretty
import           Data.Map  ( Map )
import qualified Data.Map as Map

-- a new name for the memoizer
type Store = Map Int Int

We can always lookup our final results in the store, so for a single value the signature is

memoCollatz :: Int -> Store -> Store

Let's start with the end case

memoCollatz 1 store = Map.insert 1 1 store

Yes we could add that beforehand, but I don't care. Next simple case please.

memoCollatz n store | Just _ <- Map.lookup n store = store

If the value is there, then it is. Still doing nothing.

                    | odd n     = processNext store (3 * n + 1)
                    | otherwise = processNext store (n `div` 2)

If the value is not there we have to do something. Let's put the in a local function. Notice how this part looks very close to the "simple" solution, only the recursion is a bit more complex.

  where processNext store'' next | Just count <- Map.lookup next store''
                                 = Map.insert n (count + 1) store''

Now we finally do something. If we find the computed value in the store'' (sidenote: there are two haskell syntax highlighters, but one is ugly, the other one gets confused by the prime symbol. That's the only reason for the double-prime.), we just add the new value. But now it gets interesting. If we don't find the value, we have to both compute it and do the update. But we already have functions for both! So

                                | otherwise
                                = processNext (memoCollatz next store'') next

And now we can compute a single value efficiently. If we want to compute several, we just pass on the store via a fold.

collatzRange :: Int -> Int -> Store
collatzRange lower higher = foldr memoCollatz Map.empty [lower..higher]

(It's here that you could initialize the 1/1 case.)

Now all we have to do is to extract the maximum. For now there can't be a value in the store that is higher than one in the range, so it's enough to say

collatzRangeMax :: Int -> Int -> Int
collatzRangeMax lower higher = maximum $ collatzRange lower higher

Of course if you want to compute several ranges and share the store between those computations as well (folds are your friend) you would need a filter, but that's not the main focus here.

MarLinn
  • 161
  • 4
  • 1
    For added speed, [`Data.IntMap.Strict`](https://hackage.haskell.org/package/containers-0.5.7.1/docs/Data-IntMap-Strict.html) should be used. – Olathe Jul 20 '16 at 03:46