Documenting mathematical logic in code

Question

Sometimes, although not often, I have to include math logic in my code. The concepts used are mostly very simple, but the resulting code is not - a lot of variables with unclear purpose, and some operations with not so obvious intent. I don't mean that the code is unreadable or unmaintainable, just that it's waaaay harder to understand than the actual math problem. I try to comment the parts which are hardest to understand, but there is the same problem as in just coding them - text does not have the expressive power of math.

I am looking for a more efficient and easy to understand way of explaining the logic behind some of the complex code, preferably in the code itself. I have considered TeX - writing the documentation and generating it separately from the code. But then I'd have to learn TeX, and the documentation will not be in the code itself. Another thing I thought of is taking a picture of the mathematical notations, equations and diagrams written on paper/whiteboard, and including it in javadoc.

Is there a simpler and clearer way?

P.S. Giving descriptive names(timeOfFirstEvent instead of t1) to the variables actually makes the code more verbose and even harder too read.

Learning TeX is not actually that difficult. If you have your code online anywhere, MathJax will pretty-print it in half no time. Please remember there are such languages as [HAL/S](https://en.wikipedia.org/wiki/HAL/S) where your concerns have been echoed a long time ago. — Deer Hunter, Jun 27 '13 at 13:15
Not to toot my own horn, but here is one example: http://meta.stackexchange.com/a/49787/141513 The idea is to write it so that someone who looks at it can understand what it does, even if they don't understand the math behind it. Good function-/variable-names and a simple comment or two are usually enough to do that. — BlueRaja - Danny Pflughoeft, Jun 27 '13 at 18:35
see also: [“Comments are a code smell”](http://softwareengineering.stackexchange.com/q/1/31260) — gnat, Nov 08 '16 at 07:31

score 33 · Accepted Answer · answered Jun 27 '13 at 12:06

33

The right thing to do in such circumstances is to implement the algorithm, formula or whatever with exactly the same variable names as in the primary real-world source (as far as the programming language allows this), and have a succinct comment above it saying something like "Levenshtein distance computation as described in [Knuth1968]", where the citation links to a readily accessible description of the math.

(If you don't have such a reference, but your math is sound and useful, maybe you should consider publishing it yourself. Just sayin'.)

answered Jun 27 '13 at 12:06

Kilian Foth

107,706
45
295
310

By _exactly the same variable names_ I suspect you intend for it to be described using the names of the coefficients (`StandardEarthGravityMPerSecSqr` instead of 9.80665) and a proper name for component techniques (`EulerLinearMomentum`), instead of relying simply on the constants and raw operators. – JustinC Jun 27 '13 at 13:36
4

@JustinC no I think he means the same variable names i.e. if its says `y = m*x + c` you use m, x and c as variables – jk. Jun 27 '13 at 13:57
5

@JustinC I meant: use only those variable and constant names that are in the publication - usually those are one-letter names like n, f, q or maybe n_i. I agree with the OP that `EulerLinearMomentum` is actually less readable then `m`. The point is that source code is not the preferred medium for expressing formulas, so the emphasis should be on making it easy to verify that the code does the same thing as the printed formula, not that the code satisfies the program requirements. – Kilian Foth Jun 27 '13 at 14:09
1

I would agree with that strategy; however, the text we are talking about is code that code has underlying constraints, including a specific precision/scale and behavior (given a known host or target). You aren't spec'ing or designing a mathematical model, you are implementing it in code (in most cases). Without using _proper_ names that describe what is represented its much harder to verify intent. – JustinC Jun 27 '13 at 14:35
3

+1. If the reference is to a recent publication, give the [DOI hyperlink](http://en.wikipedia.org/wiki/Digital_Object_Identifier) to the paper. Example http://dx.doi.org/10.1000/182. This is exactly what DOI was designed for - a short, standard URL for a publication, guaranteed never to change. – MarkJ Jun 27 '13 at 16:23
1

@jk, JustinC - yuk. I've had to go back in and maintain that kind of code, and I don't care how clear it would be to a pure mathematician; single-letter vars are *bad*. Mathematicians' symbols are not coding symbols (with the understandable exception of arithmetic). We don't get superscript and subscript, we don't get Greek letters with special (and often context-sensitive) meaning, and we don't multiply by just shoving two variables together. So, forget E=mc^2; the equation as it should be coded is `var energy = mass * Math.Pow(SPEED_OF_LIGHT, 2)`. – KeithS Jun 27 '13 at 17:06
1

@KeithS - Nonono. Never use `Math.Pow` for integer exponentiation... `SPEED_OF_LIGHT_SQUARED = SPEED_OF_LIGHT * SPEED_OF_LIGHT` Yet this is way too verbose: the reader has to know what `c_vacuum` is. – Deer Hunter Jun 27 '13 at 18:32
2

@KeithS totally depends, for a small equation where every variable has a physical meaning fine, but what if you are implementing say an FFT algorithm where there will be several partial results with no physical meaning. In these situation you absolutely should be matching the mathematical literature because it *is* the domain language – jk. Jun 27 '13 at 18:48
1

I grudgingly agree with using the original, short variable names. @KeithS in my experience, the people maintaining such code are probably more comfortable with the short names. But, when they are declared, add a comment for others. e.g. `double m; // slope of Foo` – user949300 Jun 27 '13 at 18:56
I settled on explaining the purpose of the algorithm in a comment, and giving a link to the source. In my case the source will just be an internal document. – jmruc Jun 28 '13 at 06:11

score 8 · Answer 2 · answered Jun 27 '13 at 13:37

When I have had to implement algorithms like that, there are a couple of things I do.

As much as possible, isolate the algorithm to its own method or preferably class. My current project has it's own equivalent Math class to add complex algorithms to.
Provide a summary of what the algorithm is supposed to do in lay terms including any common acronyms or shorthand references to the term. I do this in the method itself, so it lives with the code.
Provide a summary of the algorithm in technical / mathematical terms and include any external references that I know of. Again, I do this with the method itself so it has a better chance of staying relevant. Plain text isn't great in this case, so I'll cite the mathematical term as best I can and clarify in a parenthetical comment beside it. For example, x^y (x raised to the power y)
Document how I'm breaking the algorithm apart into components and indicate what each variable represents in the algorithm. eg. t1 is time of first event
Code up the algorithm and comment the complex parts. Essentially, I'll add a comment anywhere I take a step that wasn't obvious or straightforward within the algorithm itself. I especially make sure I comment any non-obvious shortcuts and why they are okay that I may take within the implementation.
Write up some unit tests that will validate the operation of the algorithm.

Finally, if it's really, really, really complex then I resign myself to the fact that I own that code for the remainder of my time on that project.

I don't like relying upon an external document for someone else to understand the code. Yes, it can be necessary sometimes especially when getting into arcane details. But whenever possible, I try to keep everything within the code itself so it has a chance of staying updated and easily located. In this case, I value accessibility to information over expressiveness of the documentation.

score 6 · Answer 3 · answered Jun 27 '13 at 14:57

In our projects, which are revolving around research in quantitative financial economics, we utilize a LOT of math, and we follow a combination of what has already been posted:

Provide a link to the main source you're using. For us, the easiest way of doing that is using the BibTex-handle, which is basically an ID for a paper that can be looked up by everybody involved. Depending on the specific source, we regularly add the equation reference as well.
Provide explanations for all variables. Again, we use Tex for that if the original paper uses Greek or other letters. The Reason for this is that often enough papers and books use different notations. If someone needs to rework the math, this makes it a lot easier.
Attempt to code the equation in one piece. It is much easier to recognize that way. DO NOT post the Tex-Code of the full equation into the code - either the equation is very short, and posting tex is messy and superfluous, or the equation is huge, and the tex code is useless, unless you compile it (Use a reference instead). Disassembling an equation into small pieces makes it really hard to understand whats going on (if you're good at math at least).

IMHO, the most important realization is that formulas often depend on context. Every math paper i know takes its time to set up the environment of the model; You should do the same.

Explaining the context in detail is a great idea, focusing on the 'why' before the 'how' could really be helpful. — jmruc, Jun 28 '13 at 06:05

Tulains Córdova · Answer 4 · 2013-06-27T14:09:18.823

3

text does not have the expressive power of math

You are right. Since you already are looking for a way to do it outside code, and Tex is an overkill besides having a steep learning curve, my recommendation is as follows:

Use OpenOffice.org/LibreOffice Math Equation Editor.

It's free. It's open.

You can use it either visually or you can write the equations in a special language.

You don't have to learn the language right away because when you use the GUI, the "code" is generated in a panel for you to see.

In the upper panel you can "draw" the equations using a pallete. In the lower panel the equivalent notation is generated. You can do it the other way around once you've got a grasp of the notation, writing in notation in the lower panel and seeing the graphical output in the top panel.

enter image description here

edited Jun 27 '13 at 14:09

answered Jun 27 '13 at 12:53

Tulains Córdova

39,201
12
97
154

Then what? Include the plain-text code for the math notation in the original code as comments, or take a screenshot and use Javadoc like the OP said he might do with TeX? – dodgethesteamroller Jun 28 '13 at 17:19
@dodgethesteamroller Yes, my answer says "Since you already are looking for a way to do it outside code, and Tex is an overkill.." – Tulains Córdova Jun 28 '13 at 18:02

Documenting mathematical logic in code

4 Answers4

Linked

Related