Optimization as a branch: is that a thing?

Question

My company has highly optimized scientific application which has become so big and complicated that new versions are literally taking t = infinity to produce. (E.G. I have an email which pre-announces a late 2007 release for something that has not been released yet.) New feature requests from the scientific team to the development team leads to a lot of arguing. This got me to thinking..

The application should be clearly written with good design, clear names and lots of automated tests. New features are added to this application. Run and pass the tests. Then this application is branched for optimization a.k.a. a release or production version. Again, run and pass the tests.

Am I missing something? Is this a 'thing'? I guess my question is, if I bludgeon the VP of Product Development with this, would I be right (about the idea, not the bludgeoning)?

[Literally](http://xkcd.com/725/): [You keep using that word](http://tvtropes.org/pmwiki/pmwiki.php/Main/YouKeepUsingThatWord)... — Mark Booth, Feb 22 '12 at 11:27
It sounds like the development team is using a highly-structured "waterfall" approach to development, vs an iterative "agile" approach. "Waterfall" oddly tends to result in more iteration in real life than the iterative approach, but the iteration is less productive. Send everyone to agile school. — Daniel R Hicks, Feb 22 '12 at 12:46

mattnz · Accepted Answer · 2012-02-22T05:07:53.880

It's rare that the cost of (extreme)optimization is cheaper than the cost of hardware. If you had a feature announced in 2007 (5 years ago), by Moores law, computer hardware is now at least 10 times more powerful than when the feature was announced. I suggest looking carefully at the reasons this extreme optimization is done, and ensure it's for valid reasons rather than "because we have to". Alternately, (More likely?) 'Optimization' is being used as an excuse for writing crappy unmaintainable code.

What you still have however, is maintaining the code base, so even if you decide to abandon all performance concerns, a problem still exists.... However, maintenance of large, ugly legacy code bases is a field with there are known techniques.

What are you missing - most legacy code bases were not written with testing in mind, and the cost of retrofitting it is extreme. There are a few good (recent) questions on this topic that you should read up on - at the end of the day you problem is the same as everyone else's who works with legacy code.

The idea of "branching" an optimized code base is excellent. When your boss sees how much small speed improvement costs in time (and therefore money), he'll either buy faster hardware, or get off you case to do it quicker and cheaper with less programmers (Because the speed improvement is cheaper than the faster hardware).

Computational scientists frequently struggle with the question of whether optimization is worth it; see ["How much should scientific software be optimized?" on SciComp.SE](http://scicomp.stackexchange.com/questions/1242/how-much-should-scientific-software-be-optimized) for the computational science perspective. Little of the code I write for research purposes needs to be optimized, but it would not surprise me that scientific applications would invest this effort, since I know of a few software packages that tout performance as a selling point. — Geoff Oxberry, Feb 22 '12 at 07:08
Add to this that if you released the software in 2007 you would have had four years to run the code. Quite a lot of optimisation would be required to get that back. — Jaydee, Feb 22 '12 at 10:00

score 6 · Answer 2 · edited Apr 12 '17 at 07:31

tl;dr answer

Optimisations should not kept in a permanent long term branch

They can be developed in a branch, and then merged back into the release branch once complete, but it should always be possible to test both versions side-by-side and it should be possible to easily switch between optimised and unoptimised versions of any given algorithm, ideally at run time.

Full answer

Remember that there is more than one form of optimisation.

Algorithmic optimisation (replacing an O(n^2) algorithm with an O(n log n) algorithm for instance) almost always has the potential to provide more significant savings than implementation optimisation (hand unrolling a loop or ensuring your inner loops data fits in a certain level of cache).

Unfortunately, when people are thinking about optimisation, they tend to think more about implementation optimisation, which is about specialising software in such a way that it runs more efficiently on one architecture, possibly at the expense of execution speed on another architecture.

I have already answered many other aspects related to your question in my answer to the question How to document and teach others “optimized beyond recognition” computationally intensive code?

Essentially though, it boils down to the golden rules of optimisation:

The First Rule of Program Optimisation: Don't do it.

The Second Rule of Program Optimisation (for experts only!): Don't do it yet."

— Michael A. Jackson

In order to know whether now is the time to optimise requires benchmarking and testing. You need to know where your code is being inefficient, so that you can target your optimisations.

In order to determine whether the optimised version of the code is actually better than the naive implementation at any given time, you need to benchmark them side-by-side with the same data.

Also, remember that just because a given implementation is more efficient on the current generation of CPU's, doesn't mean it will always be so. My answer to the question Is micro-optimisation important when coding? details an example from personal experience where an obsolete optimisation resulted in an order of magnitude slowdown.

score 3 · Answer 3 · answered Feb 22 '12 at 09:16

My company has highly optimized scientific application which has become so big and complicated that new versions are literally taking t = infinity to produce.

Hard to maintain and difficult to introduce new feature = a definition of legacy code.

this application is branched for optimization a.k.a. a release or production version

Bad idea. Your engineers have to maintain two versions. Since your software is related to science (lots of calculations and algorithms I assume), you should have your algorithms implemented in two languages :

some kind of scientific language, like for example matlab or R. This version should be as clear as possible
some kind of programming language, like c, c++, java, etc. This version should be as optimized as possible. Only places that the profiler shows that are the bottle neck should be optimized. The other parts should be very clean. Off course, the code must be unit tested (as high code coverage as possible).

score 0 · Answer 4 · answered Feb 22 '12 at 22:43

0

You could take a look at how Microsoft releases stuff. They alter their focus every release. Features, performance, features, performance, etc... Of course that's not the only thing they do but maybe you could take that as a basis.

answered Feb 22 '12 at 22:43

dvdvorle

849
1
9
18

Optimization as a branch: is that a thing?

4 Answers4

tl;dr answer

Full answer