Should generated documentation be stored in a Git repository?

Question

When you use tools like jsdocs, it generates static HTML files and its styles in your codebase based on the comments in your code.

Should these files be checked into the Git repository or should they be ignored with .gitignore?

There may be an argument to store them in a GitHub repository as you can publish the static HTML using [pages](https://pages.github.com/). Although then an entirely separate set of arguments arise as to how you ensure they're up to date etc... — Boris the Spider, May 13 '19 at 07:51
If files are generated, then by definition they aren't *source*. — chrylis -cautiouslyoptimistic-, May 13 '19 at 09:29
You publish what you want published. Especially on GitHub. If you want everyone to see a generated PDF or image, you should include it instead of expecting everyone to install LaTeX and compile it themselves. For example, [this](https://github.com/kamranahmedse/developer-roadmap) repository wouldn't be very good if it didn't include the produced images, only the project files... — Džuris, May 13 '19 at 09:57
rather blatant duplicate of [Should generated documentation go in version control history?](https://softwareengineering.stackexchange.com/questions/175740/should-generated-documentation-go-in-version-control-history) — gnat, May 13 '19 at 15:35
Whether they should be part of the repository, and whether they should be ignored, are two different questions. — mkrieger1, May 13 '19 at 21:10
As a consumer of third party libraries, out of the 10 times that I see a library with no online documentation (whether in a subfolder of the repository, or linked from the readme), I will click away and skip those libraries, all 10 times. I'm not going to mess around with Doxygen for half an hour just to see if a library meets my needs. — Alexander, May 14 '19 at 00:50
@Alexander: Sure, a build of the documentation should be accessible *somewhere* online. Checked-in to the git alongside the source is basically always the wrong place, though. You could have a separate repo for (some) build outputs if you don't have anywhere else to keep it. — Peter Cordes, May 14 '19 at 03:04
@PeterCordes What's driving you to want to keep it in a separate repo? Is it a concern for repo size or something? — Alexander, May 14 '19 at 03:55
That and noise in commits / diffs from change in a source file also producing changes in build artifacts. Assuming you remember to rebuild docs before a commit... — Peter Cordes, May 14 '19 at 04:03
@PeterCordes Fair point! I'm not sure if it's enough to make me want to give up the convenience of automatic-repo-hosting features (like [GitHub pages](https://pages.github.com/), [GitLab Pages](https://about.gitlab.com/product/pages/), [Bitbucket Cloud](https://confluence.atlassian.com/bitbucket/publishing-a-website-on-bitbucket-cloud-221449776.html)), but that's pretty compelling. I always build/run/test before commits (except for "scratch" temp commits and such), so forgetting to rebuild isn't an issue in my case. — Alexander, May 14 '19 at 04:07
Possible duplicate of [Should generated documentation go in version control history?](https://softwareengineering.stackexchange.com/questions/175740/should-generated-documentation-go-in-version-control-history) — MSalters, May 14 '19 at 07:27
At my work, we have generated documentation that's committed to a separate branch, to use GitHub Pages. We use a CI tool to automatically build the docs on commits to master, and commit those docs to the `gh-pages` branch. That way, the docs aren't in master, but we still have them available for Pages. And we can see version history for the docs, if we want, although it's not very useful for us. — Dan Jones, May 14 '19 at 20:18

score 138 · Accepted Answer · answered May 13 '19 at 00:42

138

Absent any specific need, any file that can be built, recreated, constructed, or generated from build tools using other files checked into version control should not be checked in. When the file is needed, it can be (re)built from the other sources (and normally would be as some aspect of the build process).

So those files should be ignored with .gitignore.

answered May 13 '19 at 00:42

1201ProgramAlarm

1,529
2
14
21

5

But this may depend on versions of build tools or even the availability of build tools (e.g. to generate some files some old version of a build tool is required). How do you handle that? Can you address it in your answer? – Peter Mortensen May 13 '19 at 10:20
29

@PeterMortensen If you need an artifact built with a special version of buld tools, you build it with the version of build tools that you need. Such a need is either a) discoverd by yourself, in which case you're on your own; b) documented in README ("You'll need two have 2 specific versions of doxygen installed..."); c) dealt with by the build scripts (they check the available build tools versions and act appropriately). In any case, source control is for sources, not for build artifacts. – Joker_vD May 13 '19 at 10:32
1

And I'd add: artifacts are to store artifacts ;) E.g. if you really want to make sure you have the documentation ready for a certain version of the project artifact, store both a compiled project artifact and the documentation artifact in some artifact store. – Frank Hopkins May 14 '19 at 00:18
4

I think this answer is only viable iff a continuous deployment server builds and publishes the documentation in an easily accessible way. Otherwise, there's a great value in "caching" the docs in the repo, to improve accessibility. No user should have to muck with your build scripts just to see your software's documentation. – Alexander May 14 '19 at 00:53
4

@Alexander Would you also put the built binary into the repo? The documentation is built. You take the built documentation and make it accessible somewhere. – 1201ProgramAlarm May 14 '19 at 01:06
7

@1201ProgramAlarm "Would you also put the built binary into the repo?" Nope, because a built binary has low up-front value to people browsing around GitHub, as compared to the documentation. "You take the built documentation and make it accessible somewhere." As long as that's publicly hosted, visibly linked, then yea that's great. It's probably the best case. – Alexander May 14 '19 at 01:31
@1201ProgramAlarm Plus, lots of online git providers offer hosting of static HTML content directly from repositories (e.g. [GitHub Pages](https://pages.github.com), [GitLab pages](https://about.gitlab.com/product/pages/), [Bitbucket Cloud](https://confluence.atlassian.com/bitbucket/publishing-a-website-on-bitbucket-cloud-221449776.html)). It's really convenient, it works well, and it doesn't require setting up something like a CI server with a script to push build artifacts to dedicated web servers. – Alexander May 14 '19 at 01:51

gnasher729 · Answer 2 · 2019-05-15T21:34:37.680

25

My rule is that when I clone a repository and press a “build” button, then, after a while, everything is built. To achieve this for your generated documentation, you have two choices: either someone is responsible for creating these docs and putting them into git, or you document exactly what software I need on my development machine, and you make sure that pressing the “build” button builds all the documentation on my machine.

In the case of generated documentation, where any single change that I make to a header file should change the documentation, doing this on each developer’s machine is better, because I want correct documentation all the time, not only when someone has updated it. There are other situations where generating something might be time consuming, complicated, require software for which you have only one license, etc. In that case, giving one person the responsibility to put things into git is better.

@Curt Simpson: Having all the software requirements documented is a lot better than I have seen in many places.

edited May 15 '19 at 21:34

answered May 13 '19 at 06:55

gnasher729

42,090
4
59
119

7

Don't document what software someone needs to to the build (or at least don't _just_ document): make the build script tell the user what he's missing or even install it itself if that's reasonable. In most of my repos any half-way competent developer can just run `./Test` and get a build or get good information about what he needs to do to get a build. – cjs May 13 '19 at 15:14
5

I don't really agree that putting generated documentation into git can be good in the case you specify. That's the reason we have artifactories and archives. – Sulthan May 13 '19 at 17:13
That is your rule and it is a good rule and I like it. But others can make their own rules. – emory May 13 '19 at 20:41
I think you mean "run a build command," as there would be no build button on your machine. ...Unless you're expecting the entire build to be integrated with an IDE, which is wholly unreasonable. – jpmc26 May 14 '19 at 15:14
@jpmc26 I find it totally reasonable to have the entire build integrated in an IDE. The build button on my machine is Command-B. – gnasher729 May 15 '19 at 21:35
@gnasher729 Integrating documentation rendering into MSBuild csproj files sounds like a nightmare. VS doesn't like it when you manually modify those files. It handles many use cases that MSBuild supports very poorly, and if errors occur, then you can no longer debug your application. I don't know what ecosystem you work in, but it's certainly not reasonable in all of them. – jpmc26 May 15 '19 at 21:37

score 15 · Answer 3 · answered May 13 '19 at 08:55

These files should not be checked in because the data to generate them is already present. You do not want to store data twice (DRY).

If you have an CI system, you could perhaps make that build the docs and store them for that build/publish it to a web server.

score 4 · Answer 4 · answered May 13 '19 at 22:47

4

One advantage of having them in some repository (either the same or a different one, preferably automatically generated) is that then you can see all the changes to the documentation. Sometimes those diffs are easier to read than the diffs to the source code (specifically if you only care about specification changes, not implementation one).

But in most cases having them in source control is not needed, as the other answers explained.

answered May 13 '19 at 22:47

Paŭlo Ebermann

802
5
11

1

That would pretty much require a pre-commit hook in each and every repo that is used to create commits. Because if the documentation generation process is not fully automated, you will get commits that have the documentation out-of-sync with the code. And those broken commits will hurt understandability more than uncommitted documentation. – cmaster - reinstate monica May 14 '19 at 11:10
1

This doesn't have to be at the commit stage. It could easily be a downstream/CI/Jenkins job to publish them every time they are deemed worthy of storage. This may well be each commit, but the decision should be decoupled in the absence of a good reason. Or at least that's the way I see it. – ANone May 14 '19 at 13:57

ANone · Answer 5 · 2019-05-14T13:52:34.220

Ignored. You'll want to have the repo's users be able to rebuild them anyway, and it removes the complexity of being sure the doc's are always in sync. There's no reason not to have the built artifacts bundled up in one place if you want to have everything in one place and not have to build anything. However source repos are not really a good place to do this though as complexity there hurts more than most places.

score 2 · Answer 6 · edited May 13 '19 at 14:52

2

It depends on your deployment process. But committing generated files into a repository is an exception and should be avoided, if possible. If you can answer both of the following questions with Yes, checking in your docs might be a valid option:

Are the docs a requirement for production?
Does your deployment system lack the necessary tools to build the docs?

If these conditions are true, you are probably deploying with a legacy system or a system with special security constrains. As an alternative, you could commit the generated files into a release branch and keep the master branch clean.

edited May 13 '19 at 14:52

TRiG

1,170
1
11
21

answered May 13 '19 at 09:53

Trendfischer

207
1
6

1

Committing generated files into a release branch doesn't work in every situation, but there are a number, especially with things like static web sites built from markdown, where this is an excellent solution. I do it often enough that I [built a special tool](https://github.com/cynic-net/git-commit-filetree) to easily generate such commits as part of the build process. – cjs May 13 '19 at 15:16

score 2 · Answer 7 · answered May 13 '19 at 16:26

It depends. If those docs:

Needs to be part of the repository, like the readme.md, then it's preferred to keep them in the git repo. Because it can be tricky to handle those situations on a automated way.
If you don't have an automated way to build and update them, like a CI system, and it is intended to be seen for the general audience, then is preferred to keep them in the git repo.
Takes A LOT of time to build them, then is justifiable to keep them.
Are intended to be seen for the general audience (like the user manual), and it takes a considerable time to build, while your previous docs becomes inaccessible (offline), then is justifiable to keep them in the git repo.
Are intended to be seen for the general audience and has to show a history of its changes/evolution, it could be easier to keep previous doc versions commited and build/commit the new one linked to the previous. Justifiable.
Has an specific accepted reason for all the team to be commited, then is justifiable to keep them in the git repo. (We don't know your context, you & your team do)

In any other scenario, it should be safely ignored.

However, if its justifiable to keep them in the git repo, could be a sign of another bigger issue that your team is facing. (Not having a CI system or similar, horrible performance issues, facing downtime while building, etc.)

Kaz · Answer 8 · 2019-05-15T17:04:15.873

As a principle of version control, only "primary objects" should be stored in a repository, not "derived objects".

There are exceptions to the rule: namely, when there are consumers of the repository who require the derived objects, and are reasonably expected not to have the required tools to generate them. Other considerations weigh in, like is the amount of material unwieldy? (Would it be better for the project just get all the users to have the tools?)

An extreme example of this is a project that implements a rare programming language whose compiler is written in that language itself (well known examples include Ocaml or Haskell). If only the compiler source code is in the repository, nobody can build it; they don't have a compiled version of the compiler that they can run on the virtual machine, so that they can compile that compiler's source code. Moreover, the latest features of the language are immediately used in the compiler source itself, so that close to the latest version of the compiler is always required to build it: a month old compiler executable obtained separately will not compile the current code because the code uses language features that didn't exist a month ago. In this situation, the compiled version of the compiler almost certainly has to be checked into the repository and kept up-to-date.

Should generated documentation be stored in a Git repository?

8 Answers8

Linked

Related