How do I manage dependant values without running the same computation twice?

Question

I am working on an application that is essentially a calculator, not the handheld calculator but more of a spreadsheet. It will take many inputs across different views and show the outputs all over the place.

The application will be running many calculations. These calculations depend mostly on user given inputs but it can also depend on the results of other calculations.

Another requirement is that the calculations offered to the user can be edited in a CMS style. This means that when the application starts, it will load the calculations and its necessary inputs from a file called calculations content.

The outputs should always be up to date, meaning that if a user updates a value, then the calculations that depend on this input should run again, sending the output to its dependent calculations and so on.

So far, I've conceived a directed graph of calculations in which parent-children relationship represent the input and its dependent calculations. This way, the process that executes a calculation will be able to check if it has any dependants and run them.

The problem with this pattern is that it can lead to duplicated calculations. If an input A has two dependants B and C and a calculation D depends on both B and C, then when A is updated, D and its dependants will run two times.

It might be worth knowing that the application is built with a Redux architecture on Javascript.

@RobertHarvey: seems the link to the paper mentioned in that article is broken. — Doc Brown, Jan 11 '17 at 21:12
The two times that D updates, does it necessarily calculate the same value? Or, does the initial update temporarily hold an incorrect value? — svidgen, Jan 12 '17 at 05:38

Doc Brown · Answer 1 · 2017-01-11T21:44:49.880

Robert Harvey is right, your problem can be solved in an elegant manner by memoization. However, since the paper of the article he mentioned seems not to be available any more, I try to give you a short explanation how this should work.

Lets say A is an input variable, and B, C, and D calculations with memoized values mB, mC and mD. These values can either hold the calculation result of B/C/D, or they can be "invalid", which means if the result of B is requested, the corresponding calculation is performed once, not more, and the result is stored in mB. So when the result of B is requested again, mB will be returned directly without a second calculation.

Whenever a cell like A is is updated, your program needs to make a standard graph search through the dependency graph starting at A, and mark all nodes it can reach from there as invalid (in the case you described it will reach B, C and D). And that is all - there is no need to do any calculation in advance for B, C, or D as long as their value is not requested. And if those values are requested, the memoization mechanism described above will prevent any duplicate calculation, independently from the order in which the values are needed.

This kind of memoization is not very friendly with our current architecture. We are using Redux, which is a state manager that stores all of the data in an object called state and a know-it-all function processes the whole state once a tiny bit of the state has been changed, because it reacts to changes, I can't hide data (mark it invalid) and only show it when it's requested. Your answer has still helped me though. I hadn't considered making a subset graph of the dependency graph. I will use that to construct an ordered set of nodes to execute in which the repeated nodes go last. — Matias Faure, Jan 11 '17 at 23:15
@FaureHu: yes, constructing the subgraph and determining the right order to process the nodes will surely work. That was actually my first idea, too, how to approach the problem. However, when Robert suggested memoization, I thought it would be the easier way to implement (and to describe) this, because it gives one the correct order for the calculations almost automatically. However, I don't know anything about this "Redux" architecture you mentioned and which restrictions it imposes on your case. — Doc Brown, Jan 12 '17 at 06:24

score 1 · Answer 2 · answered Jan 12 '17 at 04:28

1

I think you're on the right track with the tree of calculations. Perhaps you could also cache the calculated values in the tree (or elsewhere, doesn't really matter). So when you try to perform a calculation, you check if it has a valid cached value, if so: use it, otherwise: execute the calculation.

When the user edits a value, trigger a recalculation of the tree, with the new values. Checking for cache hits (i.e. same calculation and same values) as you calculate may also help speed things up.

answered Jan 12 '17 at 04:28

Maybe_Factor

1,381
11
12

1

I fail to see any difference in your answer to what I already wrote 7 hours ago. – Doc Brown Jan 12 '17 at 06:28
The main point here is that memorization cache could be volatile and be dropped after calculation is done. – Basilevs Jan 12 '17 at 11:05

score 1 · Answer 3 · answered Feb 25 '17 at 18:42

This should really be a comment under Doc Brown's answer, but I don't have enough reputation to write comments yet.

There is an add-on library for Redux called Reselect which is designed to tackle precisely this sort of problem. It provides helper methods for creating a tree of memoized functions, which the authors refer to as selectors.

The key part is: you don't put the calculated values into the Redux state, only the inputs. Whenever your view layer needs to use some derived value foo, it calls fooSelector(state).

Because the state is immutable in Redux, the selectors can easily tell whether their inputs have changed just by checking for reference equality with the previous set of inputs. Some of those inputs can be the results of other selectors.

There is therefore no need to "mark data as invalid". Just call the selector whenever you need its result, and it will automatically do the minimum amount of recalculation necessary to return the correct answer.

This way, the view is guaranteed to stay in sync with the input data and you don't even have to think about the circumstances which will cause the results to change.

For more details, see my answer to a similar question: Functional model vs data model and React / Redux.

How do I manage dependant values without running the same computation twice?

3 Answers3