Simplifying Some Probabilistic If-Then Spaghetti Code

Question

I've got a fairly substantial code base that I'm trying to simplify. One section in particular deals with probabilistically creating objects. It currently uses hundreds of random number generator calls in if-then statements to achieve the desired result.

Example:

if rand(20) >= 2 then
    //Make Object A
if rand(20) >= 5 then
    //Make Object A
if rand(20) >= 14 then
    //Make Object A
if rand(20) >= 18 then
    //Make Object A

if rand(100) >= 25 then
    if rand(10) < 8 then
        //Make Object B
    elseif rand(10) < 8 then
        //Make Object C
    else
        //Make Object D

My first idea is to use some kind of weighted statistical distribution to choose how many objects I need to make. However, I don't actually know if that would be an improvement in readability and maintainability. I probably wouldn't even know how to begin doing that with some of the bigger nested if-then trees, either.

The if-then trees fall into a few broad categories:

Make 0 to n of the same object
Make a single object, picked from several choices.
The above two combined (make 0 to n objects, with each one chosen from several possibilities)

Are there any easier ways that I can simplify code like this?

I just need to approximate the behavior in an easily readable and editable form.

For example, if it is a game, some approaches might be able to achieve a same level of "playability" even if it did not preserve the same statistical behavior of the same code. If it is a statistical simulation (i.e. a scientific-minded project), such changes in behavior may not be acceptable. — rwong, Aug 13 '15 at 01:28
[Monte Carlo method (wikipedia.org)](https://en.wikipedia.org/wiki/Monte_Carlo_method) — rwong, Aug 13 '15 at 01:29
It's not statistically rigorous, I just need to roughly approximate the behavior in an easily readable and editable form. — Christopher Sheaf, Aug 13 '15 at 01:37
Related reading: **[Generate random numbers with certain spikes?](http://programmers.stackexchange.com/q/274424/22815)** and its marked duplicates. — , Aug 13 '15 at 01:42
What language? I would approach this differently if I had access to functional programming than I would if I was limited to pure procedural code. — Cort Ammon, Aug 13 '15 at 01:53
Should ou really be calling for a different random each time (as in the four first calls to `random(20)`)? Or should you actually create a _single_ random number, and sequentially test if it is larger than 2, then 5, 14 and 18? — heltonbiker, Aug 13 '15 at 02:49
Your first block of code is either not accurately commented or not shown as actually written. As written here, you have a 4/5 chance of creating an object A on the second try *regardless* of the result of the first try. Either the code or the comment is wrong. Which is it? — itsbruce, Aug 13 '15 at 07:59
@itsbruce The first four 'if' statements are all independent and are written correctly. I probably shouldn't have used the words "another", "third" and "last". I was trying to convey that multiple object A's could be created. — Christopher Sheaf, Aug 13 '15 at 08:22
The ambiguity caused by the comments could complicate answers. @CortAmmon , in his answer, saw the ambiguity (he mentions it) assumed the comments were wrong, called it right. If his assumption had been wrong, wasted answer. Better to make the comments more accurate... and you just did that. Thanks. — itsbruce, Aug 13 '15 at 08:27

score 8 · Accepted Answer · answered Aug 13 '15 at 04:58

First things first, I would get a handle on the probabilities of any event occurring. Get out a pen and paper, we'll do this by hand.

Make everything independent when you can. For example, when you see the series of if's at the start:

if rand(20) >= 2 then
    //Make Object A
if rand(20) >= 5 then
    //Make another Object A
if rand(20) >= 14 then
    //Make a third Object A
if rand(20) >= 18 then
    //Make one last Object A

We should recognize that these are 4 independent draws. Under each one, we have two possibilities: created an object A, or "did nothing." Assign each a probability using ratios, so the first if statement's probability of making an object is 2/20.

Other sections will not allow you to treat them as independent draws because of the 'elseif' patterns.

if rand(100) >= 25 then
    if rand(10) < 8 then
        //Make Object B
    elseif rand(10) < 8 then
        //Make Object C
    else
        //Make Object D

will require a tree to describe the probabilities

           X
          / \
p=25/100 /   \ p=75/100
   Do nothing X
             / \
   p=8/10   /   \  p=2/10
        Make B   X
                / \
        p=8/10 /   \ p=2/10
           Make C Make D

Now so far this is just trying to get a handle on what has been written. The next step is to try to simplify it. The idea is that you can shuffle the rolls around, so long as the probabilities are the same. For example, just with some multiplication I can see the probabilities of creating a B C or D.

           X
          / \
p=25/100 /   \ p=75/100
   Do nothing X
   (p=.25)   / \
   p=8/10   /   \  p=2/10
        Make B   X
  (p=.75*.8=.6) / \
        p=8/10 /   \ p=2/10
           Make C Make D
(p=.75*.2*.8=.12)  (p=.75*.2*.2=.03)

Now, from looking at all of these probabilities, they can all be expressed as a rational ratio with 100 on the bottom (.12 = 12/100; .03 = 3/100). Accordingly, we can handle all of this with a single roll, which makes the logic a whole ton simpler.

roll = rand(100) -- a random roll from [0,100) is sufficient for the whole block
if roll < 60 then -- we'll hit this with a probability of .6
    // make object B
elseif roll < 72 then -- 72-60 = 12, so we'll hit this with a probability of .12
    // make object C
elseif roll < 75 -- 75-72 = 3, so we'll hit this with a probability of .03
    // make object D
else -- 100-75=25, so we'll hit this with a probability of .25
    -- do nothing.  This is the equivalent of not entering the
    -- outermost if statement, with a probability of .25.
end

The other pattern I see here is that there are many independent draws which all create the same object. You're right: it would be nice to simplify these. You can use a joint probability table to join them. For example, we can take the first two "make object A" cases.

                   Nothing(p=.3)              Make A (p=.7)  <-- second if
v--- first 'if'  +-------------------------+--------------------------+
Nothing(p=.15)   | Nothing (p=.3*.15=.045) | Make A (p=.7*.15=.105)   |
                 +-------------------------+--------------------------+
Make A (p=.85)   | Make A (p=.3*.85=.255)  | Make 2x A (p=.7*.85=.595 |
                 +-------------------------+--------------------------+

You can apply this process as many times as you like, until you get a table containing the probability of getting 1A, 2A, 3A, or 4A.

When you are done with all of this, what you should see is a small number of draws, followed by a lookup from a few tables to determine the outcome based on those draws. That should be enough to get a firm grasp on the probability spaghetti code.

From there, you could use some functional programming to make this cleaner. You could build a table object which looks up a draw, and returns a function which will do the correct task for that slot in the table. This has the advantage of being a very standard way to do probabilistic outcomes, so any future developer who looks at it will immediately recognize the random draw table and be comfortable with it.

This is probably the most readable way to simplify the code, in my opinion. It almost seems obvious in hindsight. Thank you for the step-by-step breakdown with nice ASCII art. — Christopher Sheaf, Aug 14 '15 at 07:53

score 0 · Answer 2 · answered Aug 13 '15 at 03:45

When I was faced with something like this a while back my solution:

All this information was stored as an external text file, loaded at program startup. It was a collection of tables (all the symbolic resolution was done upon loading, no searching when it was called) with a list of the odds of that entry and either a list of entries or a pointer to another table. Having everything in a human-editable text file made life much easier. The loader also summed all the probabilities, if you extended a table in the future you didn't have to try to rebalance everything in a table to make it total to the same total (like your rand(20)).

Simplifying Some Probabilistic If-Then Spaghetti Code

2 Answers2