39

We're trying to move our project documentation process from Google Documents to a set of self-hosted Git repositories.

Text documents are Git-friendly enough, since we usually do not need any fancy formatting, we'll just convert everything to, say, multimarkdown with an option to embed LaTeX for complex cases.

But spreadsheets are quite different story... Is there a spreadsheed(-like) format that is friendly to version control systems (and, preferably, is as human-readable as Markdown)?

"Friendly format": Git works well with the format (it doesn't with XML) and it generates human-readable diffs (extra configuration involving external tools is OK).

Obviously, Markdown flavors allow one to build static tables, but I'd like to be able to use stuff like SUM() etc... (Note that CSV has the same problem.) No WYSIWYG is OK, but decent editor/tool support would be nice.

Update: Linux-friendly answers only, please. No MS Office stuff.

Alexander Gladysh
  • 594
  • 1
  • 5
  • 10
  • 2
    Exactly what do you mean by "git-friendly"? I haven't used git a whole lot, but it handles binary files just fine and they can be versioned and tagged just like any text file. They just can't be diffed, but that may not be necessary. – Thomas Owens Jan 04 '13 at 12:38
  • Friendly: I can view diff and easily figure out what was changed. Updated the question to reflect that. BTW, AFAIR, git, when properly configured, can show diffs for some binary formats (with the help of external tools, of course). – Alexander Gladysh Jan 04 '13 at 15:03
  • I can't believe no one has asked you this but why do you need to store spreadsheets in the project repository? what are the spreadsheets for? usually they're complex enough that you need them in a different location and they're usually used by business people... –  Jul 17 '13 at 18:30
  • 2
    This question appears to be off-topic because it is not related to programming. –  Jul 17 '13 at 21:17
  • An alternative to trying to find or create a whole new format suitable for regular diffs, is to find or create a tool to diff regular spreadsheets and produce text output. That is what the open source ExcelCompare software does, for Excel, OpenDocument etc. And that way of viewing the question is even suitable for a software development Q&A site :) See [version control - How do I diff two spreadsheets? - Stack Overflow](http://stackoverflow.com/questions/114698/how-do-i-diff-two-spreadsheets) and the software itself is at [na-ka-na/ExcelCompare](https://github.com/na-ka-na/ExcelCompare) – nealmcb Oct 14 '14 at 02:16
  • Another similar tool to help do text diffs of spreadsheets is xls2txt. And at [git - How to perform better document version control on Excel files and SQL schema files - Stack Overflow](http://stackoverflow.com/questions/17083502/how-to-perform-better-document-version-control-on-excel-files-and-sql-schema-fil) they describe how to configure git to automatically use it and similar tools like pdf2txt. – nealmcb Oct 14 '14 at 02:27
  • https://github.com/andmarti1424/scim is a command line spreadsheet based on SC and Xspread... I am pretty sure it uses an extended character delimited format. Gnumeric can import from it and scim can import Excel files (unlike it's older relatives). – cb88 Jul 10 '15 at 14:54

6 Answers6

13

You can also use libreOffice/open-office-spreadsheet-non-zip-xml-fileformat "*.fods" which is plain xml. @glenatron s comment applies to this format, too.

The standard open ofice spreadsheet format "*.ods" is zipped xml and not so suitable for git (similar to @Egryan/@emuddudley answer).

k3b
  • 7,488
  • 1
  • 18
  • 31
  • I would like to avoid XML. Updated the question to reflect that. – Alexander Gladysh Jan 04 '13 at 15:10
  • @AlexanderGladysh - 1. LibreOffice **works** on Linux. 2. XML **is not** "MS-bullshit" – Lazy Badger Jan 04 '13 at 17:39
  • 2
    1. LibreOffice does work on Linux, indeed. 2. No, XML is not *MS* bullshit. However, XML and Git do not work well together (see @glenatron's comment above). – Alexander Gladysh Jan 04 '13 at 18:27
  • @AlexanderGladysh - just use The Right Tool (tm). For XML it's [Altova DiffDog](http://www.altova.com/diffdog.html). It's not a Git problem per se, but used differ-merger, which *can be* **extension-specific** – Lazy Badger Jan 04 '13 at 18:50
  • 4
    @LazyBadger: DiffDog: no Linux support, closed-source, 500$/user. Sorry, but I'll pass. – Alexander Gladysh Jan 04 '13 at 19:39
  • 2
    @AlexanderGladysh - Meld, xmldiff or [How can I diff two XML files?](http://superuser.com/questions/79920/how-can-i-diff-two-xml-files) topic on SU – Lazy Badger Jan 04 '13 at 19:49
  • 1
    @LazyBadger: Note that 3-way merge is more important than, diff-ing. (But Google finds several suitable Linux command-line 3-way merge tools for XML.) I'll try these against LibreOffice spreadsheets, thanks. – Alexander Gladysh Jan 04 '13 at 20:30
9

This may not fit your needs, but may fit another's. Org-mode for Emacs includes table.el, which, along with Org-mode's particular enhancements, provides an extremely robust solution for spreadsheets, all in plain text. More information (much more than the scope of this site) is available at Org-mode's website and manual, particularly its spreadsheet tutorial.

enter image description here

Sean Allred
  • 837
  • 6
  • 14
4

What about pyspread? It's powerful and comes with a nice GUI.

According to the First Steps page:

The pys file format has changed in version 0.2.0. It now is a bzip2-ed Text file with the following structure:

[Pyspread save file version]

0.1

[shape]

1000 100 3

[grid]

7 22 0 'Testcode1'

8 9 0 'Testcode2'

[attributes]

[] [] [] [] [(0, 0)] 0 'textfont' u'URW Chancery L'

[] [] [] [] [(0, 0)] 0 'pointsize' 20

[row_heights]

0 0 56.0

7 0 25.0

[col_widths]

0 0 80.0

[macros]

Macro text

The fact that it is bzip2-ed does not help but at least you can access a quite readable text.

Licence is GPLv3.

Clement J.
  • 411
  • 5
  • 8
4

CSV (Comma Separated Values)

If you're just working with data it's probably the simplest and most commonly supported format.

Should make life easy if you want to diff between versions.

Oh, and Google Docs fully supports CSV import/export.

Update:

Then just write a Google Apps Script to stringify the formulas on export and do the reverse on import. You'll need to use some ingenuity because the format you're looking for doesn't exist.

Evan Plaice
  • 5,725
  • 2
  • 24
  • 34
1

I know that Microsoft Office 2007 and higher default to a propitiatory xml format when they save. So that should be friendly for Git. Open office also saves to a xml format if you are wanting to use a more open source solution. Since a XML is a text format git should be able to handle it fairly well

Since you are moving it from Google Documents you can download them has open documents which are xml based.

Edit

Since you are wanting a non Microsoft/XML solution you could always save has a CSV in open office though I am not sure how much functionality you lose by saving to this format.

Egryan
  • 256
  • 2
  • 5
  • 3
    I have seen some problems with Git disagreeing with XML formats or merging them in ways that aren't compliant with the format of the document. I believe this can be worked around by using an XML-specific merge tool, but I haven't seen this in use. – glenatron Jan 04 '13 at 13:06
  • 4
    The Excel Workbook (\*.xlsx) format is a collection of XML files in a ZIP container. You can choose XML Spreadsheet 2003 (\*.xml) to save into a single XML file, but it only supports a subset of Excel's features. – sourcenouveau Jan 04 '13 at 14:00
  • 1
    XML wouldn't do, as per @glenatron comment above (I myself had such problems as well). Also: XML diffs are not quite human-readable IMO. Updated the question to reflect that. – Alexander Gladysh Jan 04 '13 at 15:09
  • Well, CSV does not support any formula stuff. I can just use Markdown's tables then. Updated the question to reflect that. – Alexander Gladysh Jan 04 '13 at 17:06
0

This might not be exact what you want I believe that libre office lets you reference outside file. You could have a spreadsheet that you treat like a database and have a static libre file that would be your interface. You would lose easy access summing in your versioned files unless you call them back, but it would work.

Another rather big issue with this would be that it is one directional.

Jpatrick
  • 261
  • 1
  • 3