Literate programming vs reasonably documenting your code

Question

In a project that aspires from the onset to be maintainable across a revolving team of developers, what difference would it make to use literate programming against thorough commenting guidelines?

The latter would imply: classes with explicit purposes of what they do, why they are there, with examples, non-cryptic error codes, variables with inline explanations, a style guide that forces developers to use plain English, full sentences, eschew abbreviations and so on. Add to it that an IDE could be able to collapse the details or you could just extract the docs.

Could it be that literate programming was a solution to problem that was tackled meanwhile through other means? Could it be that back then, when literate programming was created, some languages/tools wouldn't allow for simple mechanisms like these?

Does this answer your question? [How would you know if you've written readable and easily maintainable code?](https://softwareengineering.stackexchange.com/questions/141005/how-would-you-know-if-youve-written-readable-and-easily-maintainable-code) — gnat, Jul 22 '20 at 10:08
see also: [“Comments are a code smell”](https://softwareengineering.stackexchange.com/q/1/31260) — gnat, Jul 22 '20 at 10:09
@gnat: your first comment is completely off-track. The second one is ok (the OP should read it to understand tons of comments are not a solution to maintainability) , though not providing an answer to the question. — Doc Brown, Jul 22 '20 at 10:09
You mean [literate programming by Knuth](https://en.wikipedia.org/wiki/Literate_programming), I guess? — Doc Brown, Jul 22 '20 at 10:14
It’s extraordinarily difficult to get both management and devs to buy-in and then truly follow through on the latter. I cannot imagine what level of organizational commitment it would take to make the former actually happen. I suspect that this is the real reason that it is so rarely seen. — RBarryYoung, Jul 23 '20 at 00:32
Seems the problem is twofold. First, many programmers simply can't (not won't) write reasonably literate & comprehensible blocks of text, so programming careers would be open only to people who can both code and write. This severely restricts the pool of potential employees. Second, judging from the examples I've seen of it, "literate programming" is much more difficult to read & understand than plain old C. — jamesqf, Jul 23 '20 at 03:15
There is still a problem that share both methods: it needs a strong discipline from all developpers so that all comments are updated/deleted when there are changes in the code. Trust me, even with dedication to it, it's easy to forget something when in a hurry. And some developpers have the habit not to read any comment at all. Even with that, it doesn't guarantee that when updating something, one will (can) think of all the consequences on related codes when the project becomes really complex. They are a good start, but it takes more than that to be fully maintainable — Kaddath, Jul 23 '20 at 10:46
The reason you cannot do it in code alone is because code only answer "how", not "why". Code-near documentation helps this. This is also why code review is so important! — Thorbjørn Ravn Andersen, Jul 23 '20 at 13:40
I'm having a little bit of trouble following the exact question as worded. Perhaps it can be reworded just a little bit. — Panzercrisis, Jul 23 '20 at 20:05
*"... a style guide that forces developers to use plain English, full sentences, eschew abbreviations and so on ..."* How are you going to automate that? While those kinds of errors can be caught by humans in a code review, they cannot be automated. And there's nothing wrong, per se, with acronyms. Example: Everyone who works even remotely with NASA knows what NASA. There's no need to spell it out every time if NASA is the target audience. — David Hammen, Jul 23 '20 at 20:49
The way you described it, whichever way you go, it's a disaster waiting to happen; it's too detailed, too granular to be useful, or usable, or maintainable. You're going to end up with a lot of clutter that is going to obscure what the code is actually doing, is going to get in the way of changes, will result in a lot of extraneous rework, and on top of that, it's going to *inevitably* get out of sync with the actual code so it's eventually going to become misleading. And your team is going to see it as a form of micromanagement. 1/2 — Filip Milovanović, Jul 24 '20 at 02:58
IMO, you shouldn't be the one to set these standards, because at the moment, your idea of how it should all look is somewhat skewed and isn't going to work in practice. Instead, let your programming team set their own standards. If they lack experience, provide training, and/or look into how other companies are doing it, maybe bring in a consultant. It's not about being thorough "out-of-band", it's about readable code, about documenting the main concepts, the important APIs or integration points, and about human-to-human communication within the team, and beyond, with other stakeholders. 2/2 — Filip Milovanović, Jul 24 '20 at 02:59
I think the nowaday use of Jupyter Notebooks is what became (without intent) of literal programming (at least in science) — lalala, Jul 24 '20 at 09:27

amon · Answer 1 · 2020-07-23T18:46:45.920

Literate programming is the nice idea that you can write your code together with an explanation or walkthrough of that code. Importantly, you are not constrained by the syntax of the underlying programming language but can structure your literate program in any way to want. (Literate programming involves chunks of code embedded into text, not comments into code.)

There are three huge problems with literate programming: it takes a lot of effort, there is little tooling, and changes become more difficult.

Documentation always requires effort. Literate programming requires less effort than maintaining separate documentation of comparable quality. However, this amount of effort is still unwarranted for most kinds of code. A lot of code is not interesting and requires little discussion, it's mostly just delegating stuff to some framework. The kind of tricky logic that benefits from literate programming is comparatively rare.
While there are various tools for literate programming (including Knuth's original WEB, and decent support in the Haskell ecosystem), they all suck. The next-best thing I've come across is org-mode, but that requires the use of Emacs. The problem is that programming is more than typing letters, it's also debugging and navigating code, which benefits greatly from an IDE-style experience. Auto-complete is non-negotiable! Literate programming tools also tend to require non-standard build processes, or mess up line numbers in error messages – not acceptable. If a tool makes your code easier to understand but harder to debug, that's not necessarily a good choice.
Related to this is the issue that changes to literately programmed software become more difficult. When you refactor code, you also have to restructure the document. But while you have a compiler or linter to ensure that your code continues to make sense, there's no guarantee that you haven't disrupted the structure of the document. Literate programming is writing and programming to equal parts.

So while full-blown literate programming does not seem to have a place in modern software development, it is still possible to reap some of the benefits. Consider in particular that literate programming is now over 35 years old, so a lot has happened in the meanwhile.

Extracting a function with a useful name has many of the same benefits of a chunk of code in literate programming. It's arguably even better because variable names get their separate scope. Most programming languages allow functions to be defined in an arbitrary order, which also allows you to structure the source code within a file in a sensible manner.
Literate programming can be used to describe the “why” of a code in a human-readable manner. A somewhat related idea is to express requirements for your program in a both human- and machine-readable format, e.g. as suggested by BDD. This forms a kind of executable specification.
Some markup languages have the ability to pull code snippets from your source code. This lets the code be code and lets you construct a narrative around these snippets, without having to duplicate, copy, or update the code. Unfortunately, the popular Markdown has no built-in mechanism for that (but RST, AsciiDoc, and Latex+listings do). This is possibly the best current alternative for creating literate programming-style documents.

"*Literate programming involves chunks of code embedded into text, not comments into code.*" - This immediately makes me think of [Rust's doc tests](https://doc.rust-lang.org/book/ch14-02-publishing-to-crates-io.html#documentation-comments-as-tests), i.e. small sample bits of code embedded in documentation comments that are actually runnable tests. Not quite the same, as I don't think there's any straightforward way of different samples to interact with each other, but seems somewhat related to the literate programming concept. — 8bittree, Jul 22 '20 at 16:24
@8bittree yes, doctests (borrowed from Python) are a play on the “executable documentation” idea and are brilliant, but don't allow larger programs to be assembled in this style (in Rust, every doctest is its own program). Jupyter is another example that allows code and text to be interleaved, but Jupyter's execution model precludes a program to be presented in arbitrary order – code blocks are executed top to bottom. — amon, Jul 22 '20 at 16:34
To more explicitly state it: Code is checked by a compiler. Comments are not. The two are destined to diverge, and you easily end up with comments that deceive. I like executable documentation like in Python, but remember that only the code is validated to work. The comments explaining the valid code snippet could be *completely* wrong. — Alexander, Jul 22 '20 at 20:25
I agree with the modern IDEs not handling a file being more than one thing at the same time well, and that it is a problem for writing documentation and code in the same file. Markdown works reasonably well. I found that if the file could be open in multiple windows in the same editor - one for writing code, and the other for documentation - that worked reasonably well for me. — Thorbjørn Ravn Andersen, Jul 23 '20 at 09:16
There's a fourth problem with literate programming: redundancy. Saying what you're going to do, then doing it. This is part of the reason why changes are so difficult - in this case it mirrors the effect in traditional programming where stale comments are left after the code has been changed. That's why the advice is so often to write code to be readable, so that comments aren't needed. — user7761803, Jul 23 '20 at 11:49
@user7761803: You still need to somehow capture the "why" and intent. It can take *years* to get into a code base that is only a naked implementation (even with reasonable naming of functions and variables). The original developers may not be available for questions for whatever reason. For example, part of the current implementation may be half-baked or incorrect (not what was intended). — Peter Mortensen, Jul 23 '20 at 13:18
@Alexander I would say it's less about "validated to work" (just because code compiles it is far guaranteed from working, compiling in most languages is a very low bar to clear) but about the possible divergence. — Voo, Jul 24 '20 at 09:53
Literate programming is still used by a few people. The [Inform 7 compiler/tools](http://inform7.com/talks/2020/06/07/narrascope-ii.html) by Graham Nelson are written in literate programming, and total over 300K LOC. Whether it's worthwhile is hard to answer; it's certainly contributed to the many year delay of the open sourcing of Inform 7. — curiousdannii, Jul 25 '20 at 05:45
@user7761803 when you mention redundancy, I think of unit-tests, where one of the benefits is literally to make you consider each choice more than once (add the if-condition, and test it's making the right choice etc) - which is redundancy but in a good way! Especially when the test is called RecordNameTest.shouldRetainOracle8Compatibility() or something, so when you make a change and it fails, you have a clue why the code was there. A pipe-dream, I know, but still ... — SusanW, Jul 25 '20 at 10:28

score 27 · Answer 2 · answered Jul 22 '20 at 13:02

27

Literate programming is great in situations where the code is mostly there in support of the prose. That's why Jupyter notebooks and similar are common for scientific programming. I also use it when I am teaching a programming workshop.

In other situations, people often mistakenly think of the comments as being for humans and the code as being for the computer. If that were so, we may as well write in machine code, because the computer doesn't care. Instead we write in high-level programming languages because it's easier for humans to read and write.

Maintainability isn't achieved by "saving" code with copious comments. Messy code is really difficult to write clean documentation for, especially if you're asking the same person to write both. Clean code mostly stands on its own, with comments and other documentation playing a supporting role.

answered Jul 22 '20 at 13:02

Karl Bielefeldt

146,727
38
279
479

Would jupyter notebooks (haven't used them) be useful for implementing a large multi-module program, or is it better to document interactions with an existing program? – Thorbjørn Ravn Andersen Jul 23 '20 at 09:04
5

@ThorbjørnRavnAndersen No, they’re generally recognised to be terrible at that. Some people (me included) even question their usefulness for their purported main purpose, which is exploratory data analysis (but also teaching). Joel Grus gave an influential talk on the subject, [*I don’t like notebooks*](https://www.youtube.com/watch?v=7jiPeIFXb6U). – Konrad Rudolph Jul 23 '20 at 10:23
2

@ThorbjørnRavnAndersen, in my mind, jupyter notebooks are in the same category of tools as UML diagrams. Very useful when you're talking about code. Not so much in the day to day of building code. – Karl Bielefeldt Jul 23 '20 at 14:23
1

@ThorbjørnRavnAndersen Although it was designed to create documents with embedded code in my experience very few people I know use it for that purpose. Instead they use it as a terminal that speaks Python and has access to their database. Jupyter notebook is most commonly used as a command line (the fact that it saves everything feels more like a history feature instead of constructing documentation) – slebetman Jul 24 '20 at 03:11
@slebetman It's great as a command-line replacement; I can also imagine a more general scripting engine built around the concept - really, anything where you want a freeform organization of code, with a bit of context. I've been working on a PoC of literate programming for game scripting - most of that is built around mostly isolated snippets of code that benefit greatly from context. I expect the smaller the pieces of code you need to understand the code, the more value you get from literate programming. – Luaan Jul 24 '20 at 10:21

Lie Ryan · Answer 3 · 2020-07-22T12:41:29.377

The modern attempts of the core idea of literate programming seems to be Jupyter notebooks.

In Knuth's own words: "The main idea [of literate programming] is to treat a program as a piece of literature, addressed to human beings rather than to a computer". That's pretty much what a Jupyter notebook is, a literature for human to read and share ideas, that just happens to contain interactive, executable code.

Generally, it seems like literate programming generally only makes sense if you're writing an academic paper or an article, and you want to include executable code in that paper/article.

literate programming was a solution to problem that was tackled meanwhile through other means

Yes, modern programming tend towards the thinking to improve readability of the code rather than adding comments. If code isn't readable without comments, then the code should be refactored so it is readable without comments. This is mainly through judicious use of intent revealing name and structures.

There is a vast difference between "readable without comments" and "understandable without comments". Programs tell you _how_, not _why_, and you need both to maintain them. — Thorbjørn Ravn Andersen, Jul 23 '20 at 09:03

score 2 · Answer 4 · answered Jul 22 '20 at 10:47

2

Literate programming was a great idea at that time (and in fact I used it back then to write the only piece of software with a public release and some user-base, which never got any error report.)

But: there are some "buts":

IDEs nowadays often use some kind of real-time-while-you-type compilation to show errors immediately. This probably would not work.
Software used to be much smaller in the 80s. The complete codebase of TeX would be less than a single module in any piece of enterprise software today. Therefore, it is very difficult to structure software as a single train of thought now, which is where literate programming really shines.
I am not aware of any source-level-debugger for literate programming. (Enlighten me via a comment if there is one.) This is a tool which I'd never go without.

answered Jul 22 '20 at 10:47

mtj

1,914
1
9
12

3

It would be great today, but nobody creates the tools. – gnasher729 Jul 22 '20 at 10:51
1

I did some experiments with writing code using nuweb back in the day, and added the necessary directives to have the compiler understand the `.w` file was the source, so the debugger (gdb iirc) used that instead. – Thorbjørn Ravn Andersen Jul 23 '20 at 12:34
1

The popularity of languages that do code generation (see the multitude of languages that compile down to JavaScript) means that there's actually pretty good debugger support in some stacks these days for distinguishing between the generated code corresponding with a given line, and the original source corresponding with that generated line. Building plumbing to generate such a map from the input to your literate-programming tool does not strike me as an insurmountable task. – Charles Duffy Jul 24 '20 at 22:12
Can you include some background info on your software which had no error reports (a link would be fine)? I'm interested in knowing more. – user3067860 Jul 30 '20 at 14:11
Too long ago. It was a program named "shark" made to gobble up the contents of "fish disks" which were the most popular freeware library on the Amiga in the 90s. I don't have the code any more. – mtj Jul 30 '20 at 17:49

score 2 · Answer 5 · answered Jul 22 '20 at 12:40

Wikipedia says:

"The literate programming paradigm, as conceived by Knuth, represents a move away from writing computer programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts."

I don't think modern programming languages impose many order constraints that matter, so I don't see any big difference from using appropriate comments in the source code.

Also per wikipedia:

"The main intention behind this approach was to treat a program as literature understandable to human beings."

That seems a good aspiration. Good coding style ( appropriate choices of names, etc. ), combined with extra explanation in comments where appropriate is the answer, but it's mostly just hard work. I don't think there are any silver bullets here.

What modern programming languages? Don't all imperative languages impose order? — Peter Mortensen, Jul 23 '20 at 13:29
@PeterMortensen: No, the order in which operations appear in the source code is up to the programmer. Consider: `y = f(g(x))` vs `t = g(x); y = f(t)` or even `y = g(x).f()` — Ben Voigt, Jul 24 '20 at 20:01

score 1 · Answer 6 · answered Jul 22 '20 at 10:50

What Knuth’s literate programming tools would allow you to do: Say you want a new feature. And for that feature, you need to create classes X and Y, and make changes to method in classes A, B and C. “Literate programming” would let you put all that in one source file, instead of say C++ where you had to add two header files and two source files for the classes, plus make changes in 3 different files.

This was very nice, but worked in Pascal (I think) only and I haven’t seen it implemented anywhere else.

Newer languages are getting closer. For example Java and Swift where you don’t have separate header and source files (Swift can extract the interface = what programmers need, not what the compiler needs like C++, don’t know what Java has). That’s a big step.

Other newer features are closures that pack up small bits of code that could be plugged into other classes. So the new classes you added for feature X might add bits of code to classes A, B and C like in Literate Programming, through language features and through having classes prepared for this. Not quite the same, but getting closer.

You still need documentation in literate programming, so this isn’t either / or.

There was a C-weave and -tangle back in the 90s (which I used.) — mtj, Jul 22 '20 at 11:29
But having separate header and source files makes things MUCH more readable/understandable. — jamesqf, Jul 23 '20 at 03:07
You missed the bit about expressiveness in the documentation while still being part of the code. That is to me the real advantage. — Thorbjørn Ravn Andersen, Jul 23 '20 at 09:01
Re *"...worked in Pascal (I think) only"*: No, it is also available for C / C++ ([CWEB](https://en.wikipedia.org/wiki/CWEB)). As an example, an entire embedded system has been written in it (still in use/active development at least until 2018). As part of the build process, a single large C file is generated and presented to the C compilers (several C cross compilers). — Peter Mortensen, Jul 23 '20 at 12:50

score 1 · Answer 7 · edited Jul 23 '20 at 13:49

tl;dr - README.md is the modern day heir to literate programming

First of all, Knuth invented literate programming because he needed it for typesetting his books digitally. This was around 1980, making it probably the oldest software package in common use today (not counting mainframes).

As he wanted to teach a subject, elaborate explanation of the actual code was paramount. You most likely do not need this today. Also a lot of the features provided (because the assembly language he used - Standard Pascal - didn't) are now implemented in the languages themselves.

What do we need?

Documentation close to the code.
Documentation contains live code, not a dead copy.
Documentation in a form that is a first class citizen on the web (i.e. usable with a browser)
Documentation written in a language that is version control friendly, that is plain text.

GitHub is probably the main stream provider of what we have today which in practical use is the direct successor of literate programming, namely README.md files which are written in the Markdown language, and rendered when navigating the source (this is the really important bit). This allows you to easily document and describe your program, and Markdown is easy to learn. The ability to have the Git repository be both code and documentation is a very important milestone!

I did an experiment to see if I could explain how my "Hello, World!" in Dagger 2 (a Java dependency injection framework) was put together at https://github.com/ravn/dagger2-hello-world as a single file being both Java and Markdown (in the spirit of literate programming) and it came out pretty well. I then learned that the AsciiDoc language can refer to snippets in other files (to get live code in the documentation), but I have not yet tested it fully.

Re *README.md is the modern day heir to literate programming*. No, it is not. Literate programming mixes code and comment in a way that documentation and code are intimately intertwined. A README file, in whatever form, separates documentation from code. — David Hammen, Jul 23 '20 at 20:33
*README.md is the modern day heir to literate programming* is a bit different than *it is possible to partially emulate literate programming by symlinking README.md to a source code file* (i.e. having a polyglot of e.g. markdown and Java). — Paŭlo Ebermann, Jul 25 '20 at 00:58

score 0 · Answer 8 · answered Jul 24 '20 at 11:18

Neither

I have a frame challenge for you, you write:

"Could it be that literate programming was a solution to problem that was tackled meanwhile through other means?"

I posit that this problem hasn't been tackled at all, at least not in practice. All the "solutions" here work in theory, and maybe in settings where everyone from the CEO to the junior programmer is convinced of it's use, skilled enough to execute it and has the discipline to never stray from the path.

The only practical example of this I'm aware of (though I'm sure there are more) is the software for the space shuttle

In my personal experience, even the best documented code has a lot of open questions and places where the documentation diverges from the code (no matter commented, literate, wiki-ed or otherwise documented).

The most successful projects I've encountered however would, bar one, not even document their code at all but in stead focus on good naming, good structure, code reviews and taking time to get new devs up to speed.

So in my case, the answer to the question in the title would be: Neither.

Literate programming vs reasonably documenting your code

8 Answers8