83

I'm starting up a Git repository for a group project. Does it make sense to store documents in the same Git repository as code - it seems like this conflicts with the nature of the git revision flow.

Here is a summary of my question(s):

  • Is the Git revisioning style going to be confusing if both code and documents are checked into the same repository? Experiences with this?

  • Is Git a good fit for documentation revision control?

  • I am NOT asking if a Revision Control System in general should or shouldn't be used for documentation - it should.

Thanks for the feedback so far!

EmpireJones
  • 939
  • 1
  • 6
  • 3
  • Ah, okay... thanks for the clarification. I don't see why it would be a problem, but I don't have any personal experience with GIT (just a theoretical understanding), so I'll let someone with more direct experience answer that question. – Flimzy Jun 17 '11 at 07:24
  • 1
    I don't quite see how this is on topic. You're talking about software documentation and committing with a DVCS – Tim Post Jun 17 '11 at 09:36
  • Probably depends on the documentation and your needs. Do you need diffs and is it in a format that can handle it? If git gives the required services sure. Beats having a separate document management system... – Rig Mar 15 '12 at 11:33
  • 1
    If your documentation is in plain text - fine. If it is a binary format, you essentially need a version control system that understand the binary format - this is vendor lock-in in its purest form. –  Sep 08 '12 at 09:20
  • related: [What Part of Your Project Should be in Source Code Control?](http://programmers.stackexchange.com/questions/120477/what-part-of-your-project-should-be-in-source-code-control) – gnat Mar 04 '14 at 15:40

9 Answers9

56

We store documentation in SVN all the time. In fact, our entire user manual is written in LaTeX, and stored in SVN. We chose LaTeX specifically because it is a text-based language, and easy to show line-by-line diffs.

We also store some non-text formatted files, like Microsoft Office .doc files, spread sheets, .zip files, etc, when necessary... but some of the benefit of a RCS is lost when you can't see the the incremental diffs.

The key is really to make sure your documentation is well organized, so that people can find (and update) the documentation (and the source) when they need it.

Flimzy
  • 704
  • 4
  • 13
  • 13
    If you're a Microsoft shop, TortoiseSVN supports MS Office line-by-line diffs. – Phil Jun 17 '11 at 13:46
  • 3
    Dropping binary doc formats would make the world a better place. o given that docs are plain-text, there should be no real problem with a DVCS either. – Kai Inkinen Jun 17 '11 at 18:28
  • Oh, and first time I heard about TortoiseSVN and doc files, so +1 for that. Wonder if that'll end up on Tortoise[AnyDVCS] anytime in the future. – Kai Inkinen Jun 17 '11 at 18:28
  • @Phil: How does TortoiseSVN accomplish this? Is the doc-diff viewer integrated with the SVN client, or can it be used independently? – Flimzy Jun 19 '11 at 06:13
  • @Flimzy It's integrated with TortoiseSVN, but you don't need to have your docs in a repository to use it. You can select any two doc (or excel) files on your PC, right-click, then select TortoiseSVN/Diff – Phil Jun 20 '11 at 14:05
  • 2
    A cool option would be to use Pandoc so that *most* of your documentation is in Markdown, but the crucial bits can still use TeX. Since it compiles the Markdown to LaTeX, the results look the same. However, this would also let you export it to different formats and would make the source easier to read. – Tikhon Jelvis Sep 08 '12 at 17:23
  • Doc-diff is actually a feature of word, so it should be relatively easy to set up for any vcs. – Joeri Sebrechts Nov 07 '15 at 21:02
  • I presume you store your documentation in its own SVN repository? – Max Barraclough Sep 26 '18 at 12:24
  • @MaxBarraclough: Now 7 years, and 3 companies later, I don't recall specifically what we did. But I can imagine many advantages to both approaches. – Flimzy Sep 26 '18 at 12:26
  • @KaiInkinen change .doc to .zip. You'll find it is text. Ugly over complicated XML text, but text. – candied_orange Aug 05 '22 at 15:05
22

Well it depends on what format do you use for the documentation. If it is something text based it is all good.

Git can also store binary content and you can track revisions, but the diff output will not make sense.

It is also possible to store documentation in the code itself like perldoc pod, java also has some format/annotation for this.

cstamas
  • 321
  • 1
  • 3
  • I agree, while it's possible to store non-text documentation, git will do a lot better if you store text instead. There's been talk of a diff driver that knows how to diff word (or similar) documents, but I'm not sure if it was implemented or not – Sverre Rabbelier Jun 17 '11 at 09:27
  • I though Word moved their format away from binary to XML. – cledoux Jun 17 '11 at 13:48
  • 4
    @karategeek6 Word's 'XML' format is not human readable. And one line of text does not correspond to one line of Word's XML, even in approximation. So it might as well be binary. –  Jun 17 '11 at 13:56
  • You can instruct Word to save your output in uncompressed XML. Choose `Save As`, then select `Word XML Document (*.xml)` instead of the default `Word Document (*.docx)`. The XML is pretty complex, so this is no guarantee the changes will be easily readable, but at least it won't be binary. – Kyralessa Aug 17 '11 at 17:06
  • > but the diff output will not make sense. Incase of diff, we could open 2 revision of a document side by side and compare by our eyes :) – Luke Jun 07 '18 at 02:44
18

It is clear that using some kind of Version Control System for storing docs is a nobrainer. The more interesting part of the question is if it is good idea to store documents in the SAME location as the source code? The possible problem here is that it might be hard to set different access privileges for code and documentation in that case. And in many business cases people will need access to docs but not the source code, like marketing or BA departments.

Ma99uS
  • 281
  • 1
  • 3
  • 4
    Yes, the "same location" aspect is one of the key parts of this question! –  Jun 17 '11 at 18:13
  • Same location is good if you can manage it, because it avoids the need to either have tribal knowledge (knowing where to look), or the need to go searching for where the stuff is. – quickly_now Jun 22 '11 at 08:06
  • 1
    They may not need access to the code but it shouldn't hurt for them to have that access. They don't have to look at it. Secrets generally shouldn't be in version control anyway. – bdsl Nov 08 '16 at 21:17
  • Maybe the best balance here would be storing in a version control system but using some kind of website to pull the docs in for viewing for non-technical users. – Goahnary Dec 11 '20 at 15:45
14

Just like source code, documentation should have a full history and the ability to revert to an earlier version if that becomes necessary. A version control system is perfect for this.

  • 6
    Only if the documentation is in a text form. Binary blobs do not fully benefit from version control. –  Jun 22 '11 at 08:00
  • 2
    @ThorbjørnRavnAndersen: Even so, unless you have a binary-specific versioning system, it's probably better to keep even binary files in Git rather than on their own. – Tikhon Jelvis Sep 08 '12 at 17:24
  • @TikhonJelvis I did not question whether it is a good idea to put binary files in git - if they are the original artifacts, it is. Try, however, to run "git diff" on Word documents. –  Sep 08 '12 at 17:46
  • @user1249 : you could "export" 2 revision to desktop, say my_docs_rev15.docx and my_docs_rev14.docx then open it side by side and compare by your eyes and brain, its not that hard :) – Luke Jun 07 '18 at 02:48
13
  • Having more than just source code in a repository is a very good thing.

    It groups all of your resources together and turns the project into a cohesive, centralized entity rather than a scattered collection of files. Contributors/employees know where to find everything, rather than sending "Where do I change the documentation for feature x?" emails.

    You'll want to keep things organized. Have a system for separating the src from the images from the docs. You can always add a .gitignore to a directory to keep the repository and history clean. Because Git commits are file-based,* you can decouple source changes from documentation changes as strongly as you like.

  • As others have said, Git is great for documentation versioning as long as it's text-based.

  • I completely agree; documentation should be versioned right alongside the code.

My credibility comes from being a GitHub user and contributing to one project and exploring many others. In my experience, a complete, unified project is easy to tell from a half-missing one. I try to contain all of my projects within single directories whenever possible.


* This isn't quite accurate, because there are ways to specify parts of a file to be committed (here's one example).
Tyler Mumford
  • 231
  • 2
  • 4
9

In the company that I work we put documentation in SVN. However, after few conflicts and the need to share it, we decided to move it to Mediawiki.

At first it was trac, after that moved to Mediawiki cause it was easer to use...

The main problem with SVN was the sharing cause we had authorization system for SVN.

confiq
  • 283
  • 2
  • 8
4

I came here with a similar question. We come from a SVN-environment, where it basically is a no-brainer to keep all materials related to a project in the same repository. Due to SVN's nature, you can easily check out parts of the repository, so if you just need the sourcecode (for example, a website deployment), that's no problem.

With Git, things are different. A checkout is always at the root level, so if you want to put everything in the same repository, you will always end up with the same directory structure. One approach I have come across is to put everything in separate branches, i.e. you have code-branches (which would typically be your normal master, develop, etc. branches) and a doc branch, which has its own, separate directory structure. I'm not certain yet that's the best idea, but it is a suggestion which circumvents the problem which I imagine is at the base of your question.

Eelke Blok
  • 41
  • 2
  • Different branches with radically different directory structures has a very bad code smell to me. I would leave it all in one repo, making it easy for contributors to more easily add a mix of code and documentation. In fact, literate programming (Google that!) demands it. – tbc0 Nov 07 '15 at 21:07
  • When distributing packages, I'm partial to the .deb style that allows me to download executables to all servers, while my development box also has the documentation packages. – tbc0 Nov 07 '15 at 21:07
1

I use a wiki for internal docs...get revision PLUS prominent access/easy editing. When documentation is out of sync, update it right then and there. For end-user documentation, consider a professional tool like Madcap Flare They use an XML dialect for sharing, composing, and transforming documentation.

Michael Brown
  • 21,684
  • 3
  • 46
  • 83
-1

In code, thoughts are typically separated line-by-line. I tend to write documentation with soft line wraps. When I commit those files, lines are a whole paragraph long. That's not very useful to read in git diff. That's the problem I was trying to solve when I Googled and found this page. Thanks to Arne Hartherz for introducing me to git diff --word-diff. You might like git diff --color-words even better.

tbc0
  • 107
  • 3