29

When browsing open-source projects that are primarily developed for Linux systems and downloading the latest packages, the source code is always stored in a .tar.gz or .tar.bz2 file.

Is there any reason for using .tar.gz or .tar.bz2 rather than something like .zip or .rar or some other compression algorithm (or even leaving it uncompressed if the project is small enough)?

gnat
  • 21,442
  • 29
  • 112
  • 288
Joe Z.
  • 667
  • 7
  • 14
  • 3
    Where are you browsing? Most projects these days distribute their source code in the form of a VCS repository URL, and when they give archives it's usually in .ZIP form. TGZ has been obsolete for decades (despite a few obnoxious people's stubborn refusal to get the message) and I haven't seen any project using it in a long time... – Mason Wheeler Dec 27 '12 at 00:34
  • I sometimes come across random Sourceforge projects. You're right that almost every project has a version control tree nowadays, but I was talking about source code releases. – Joe Z. Dec 27 '12 at 00:35
  • 1
    However, across those random projects, they almost uniformly use .tar.gz or some form of .tar file, in my experience. – Joe Z. Dec 27 '12 at 00:37
  • 4
    Linux, zip and rar did not even exist when `tar` (i.e., *Tape* Archiver) was already a standard de facto. – SK-logic Dec 27 '12 at 09:31
  • 19
    @Mason Wheeler: Define "obsolete". A format is not obsolete as long as people find it useful and keep using it. I think tar + gz does the job and switching to another format is in many cases just a matter of taste. Projects like Eclipse (http://www.eclipse.org) still use it. – Giorgio Dec 27 '12 at 11:37
  • 1
    @SK-logic: What's wrong about Delphi? – Giorgio Dec 27 '12 at 11:53
  • 1
    @Giorgio, nothing wrong with it, it's a brilliant tool. But its users are, well, a bit *isolated*, and their perception of the world is distorted in very funny ways. It is not a Delphi fault. – SK-logic Dec 27 '12 at 11:56
  • 2
    @SK-logic: Hm, I could say the same about users of other languages too. Some are even religiously convinced that their favourite language is the best available. Anyway, I would be careful about this kind of categorizations. – Giorgio Dec 27 '12 at 12:04
  • 2
    @Giorgio, whereas the whole world is still using `tar` as the main (and often only) media, Delphi and Java users download their "components" and "frameworks" in .zip form. It creates a distorted perception. Others are exposed to a much broader and diverse world, so the other languages would not produce such a narrow-minded ethos as these two. – SK-logic Dec 27 '12 at 12:10
  • @SK-logic: OK, have it your way. We can discuss this in chat some time if you like (this discussion does not belong here anyway). – Giorgio Dec 27 '12 at 12:16
  • @Giorgio: TAR (the Tape ARchiver) became obsolete once two things happened: Better, easier to use storage devices than tape showed up and got adopted by pretty much everyone, and better, easier to use OSes than *nix showed up and got adopted by pretty much everyone, and they didn't have support for TAR built in. (Maybe because they didn't have support for tape drives built in?) – Mason Wheeler Dec 27 '12 at 17:54
  • 1
    @SK-logic: What "whole world" are you referring to anyway? When I'm downloading stuff not related to programming, and it's not some sort of installer, it's invariably in ZIP form. The only time I *ever* see a TGZ for any reason at all is when I have to download the source to some C library maintained by coders who are still stuck in the 1980s. – Mason Wheeler Dec 27 '12 at 18:04
  • 3
    @MasonWheeler, browse around `ftp.gnu.org`, for example. This is the stuff the whole Internet is built upon. And, I hope, everyone will agree that ZIP is technologically inferior anyway. – SK-logic Dec 27 '12 at 18:20
  • @SK-logic: Yeah. A bunch of C libraries, like I said. But the rest of the "whole world" doesn't do that, and they don't tend to care about a few percentage points of technological superiority anyway, with download speeds and drive sizes being what they are. – Mason Wheeler Dec 27 '12 at 18:22
  • 2
    @MasonWheeler, there is a lot more than just C stuff. What people do care about is preservation of Unix file attributes, sparse files, etc. Have you ever seen, say, a virtual machine image distributed in `zip`? – SK-logic Dec 27 '12 at 18:29
  • 2
    @SK-logic, no, I generally see those as .7z archives. :P – Mason Wheeler Dec 27 '12 at 18:31
  • 1
    @Mason Wheeler: Better, easier to use OS than *nix? If you mean MacOS, don't forget that it has a Unix core. – Giorgio Dec 28 '12 at 00:53
  • 5
    @MasonWheeler and others. Just because tar has tape archiver in its name does not make it obsolete. EVERYONE I know uses tar in the *nix/BSD world and using .zip is relatively rare. In fact, when I see a zip file I almost always question whether it's a Windows only archive. – Rob Jan 01 '13 at 04:54
  • Ohh, is it tar.gz? I always thought it was just .tgz – Ingo Mar 14 '13 at 18:20
  • .tar.gz and .tgz are the same thing, just different file extensions. – Joe Z. Mar 14 '13 at 18:52
  • Also, sorry for edit-bumping, guys. – Joe Z. Mar 14 '13 at 18:52

5 Answers5

31

To answer the question in the heading: tar.gz/tar.bz2 became the standard for distributing Linux source code a very very very long time ago, as in well over 2 decades, and probably a couple more. Significantly before Linux even came into existence.

In fact, tar stands for (t)ape (ar)chive. Think reel hard, and you'll get an idea how old it is. ba-dum-bump.

Before people had CD burners, distros of software were put out on 1.44Mb floppy disks. The compressed tar file was chopped into floppy-sized pieces by the split command, and these pieces were called tarballs. You'd join them back together with cat and extract the archive.

To answer the other question of why not Zip or Rar, that's an easy one. The tar archiver comes from Unix, while the other two come from MS-DOS/Windows. Tar handles unix file metadata (permissions, times, etc), while zip and rar did not until very recently (they stored MS-DOS file data). In fact, zip took a while before it started storing NTFS metadata (alternate streams, security descriptor, etc) properly.

Many of the compression algorithms in PKZip are proprietary to the original maker, and the final one added to the Dos/Windows versions was Deflate (RFC 1951) which performed a little better than Implode, the proprietary algo in there that produced the best general compression. Gzip uses the Deflate algorithm.

The RAR compression algorithm is proprietary, but there is a gratis open source implementation of the decompressor. Official releases of RAR and WinRAR from RARlab are not gratis.

Gzip uses the deflate algorithm, and so is no worse than PKZip. Bzip2 gets slightly better compression ratios.

TL;DR version:

tar.gz and tar.bz2 are from Unix, so Unix people use them. Zip and Rar are from the DOS/Windows world, so DOS/Windows people use them. tar has been the standard for bundling archives of stuff in *nix for several decades.

ikmac
  • 480
  • 3
  • 9
  • 1
    Some clarification: Open-source RAR implementations are based on RARlab's own open source [decompressor](http://www.rarlab.com/rar_add.htm). It's also significantly newer than most other compressors, appearing first on Windows long after the previously more popular ACE, ARJ, and ARC, which each in turn displacing the others, as I recall. None of which ever really appeared on Unix until relatively very recently. – greyfade Dec 27 '12 at 22:31
  • Small correction: The RAR algorithm is *not* open: https://fedoraproject.org/wiki/Licensing:Unrar?rd=Licensing/Unrar – Sven Slootweg Jun 26 '15 at 03:53
16

I don't know about when, but I imagine the reason why it's used is a combination of: tar being traditional (it's very old); easy management from a command line; tar preserving file system info that ZIP or RAR may not; and the two pass process means that compression is more efficient (one big file compressing better than many little files).

bzip2 (.bz2) seems to be displacing gzip (.gz) as it provides better compression, in much the same way that gzip itself displaced the earlier compress (.Z).

John Bickers
  • 773
  • 3
  • 7
  • 3
    And xz (LZMA) seems to be displacing bzip2 where compression ratio matters ([.xz files are 30% smaller](https://www.archlinux.org/news/switching-to-xz-compression-for-new-packages/) than gzip). Gzip is probably the fastest of them all. – sastanin Dec 27 '12 at 09:52
8

In essence, archiving and compressing are two different operations. The tar.gz very clearly shows the intention: a compressed archive whereas a .zip or .rar just shows it's some compressed stuff.

Pieter B
  • 12,867
  • 1
  • 40
  • 65
5

tar is traditional in unix, it combines files but doesn't necessarily compress them. Compressing them with .g or .bz or .b2 is just as easy.

Zip and rar are propriety and more common in the Windows world

Martin Beckett
  • 15,776
  • 3
  • 42
  • 69
4

It's traditional, ubiquitous, and it works. Plus I thought it was somewhat self apparent.

Update

My apologies, I forget most people don't know what I know or have experience as an administrator in heterogeneous environments.

Tradition as in a custom or practice ingrained over time. We know it has basis in history because tar derives from Tape ARchive referencing the old tape backup technology. It has a long history in the various Unix operating systems dating back to 1979 in 7th edition Unix where it replaced tp. Linux systems are usually an amalgamation of the Linux Kernel and GNU software of which GNU tar is a part of. All this tar history means a majority of experienced technical people know how to use it without having to refer to documentation because it's been ingrained. For newer users there is plenty of documentation because the software has been around for so long.

Ubiquitous as in appearing or found everywhere. A somewhat accepted misuse is where the appearance isn't universal, but in large enough percentage of the population to be accepted as ubiquitous. 7th Edition Unix is the ancestor of the biggest versions of Unix including Sun OS/Solaris, AIX, HPUX, AIX, BSD, etc. There is also a high degree of cross-compatibility across the different implementations of tar on unix. Since MacOS (since OS 10) has been based on BSD it also has tar. Linux uses GNU software which includes GNU Tar so tar is available on all flavors of Linux. AND, while not available as a builtin there are many implementations of tar available on Windows including GNU Tar through cygwin and natively. GNU Tar in particular is available on most Unices and Windows making it the good choice for file migrations across OSes.

Works as in it's been functioning for a long time without major modifications. It's available on all major platforms out of the box (except for Windows, where it's available as additional software). The format is also supported on all major platforms which facilitates interchange between platforms. Not only is it still used as a way to make easily portable archives, but a tar-pipe is a standard Unix idiom for copying directory trees, especially across heterogenous environments. In short, it's been around and still in heavy use because it does what it does well.

dietbuddha
  • 8,677
  • 24
  • 36
  • 5
    with all due respect, this answer is short, simplistic, and it... somehow doesn't work for me – gnat Dec 27 '12 at 21:51
  • To counter this brief answer, I will explain what I don't like in it in really lengthy comments. For the record, I generally believe that [one liners can sometimes make great answers](http://meta.programmers.stackexchange.com/a/3477/31260 "as explained here"); it is just this one is not the case. Okay, let's see... – gnat Dec 28 '12 at 10:23
  • 5
    ... 1) **"traditional"** carries less than zero weight in software development; otherwise we'd all be coding COBOL on IBM-360 using punch cards; saying "it's traditional" explains nothing at all... – gnat Dec 28 '12 at 10:24
  • 5
    ... 2) **"ubiquitous"**... really? One thing I noticed when switched from Unix to Windows is that no one was using tar and everything was going just fine without it. When, after several years of happy coding, I once turned in need for tar, it took me quite a while to find its Windows version. That's ubiquitous? give me a break... – gnat Dec 28 '12 at 10:24
  • 4
    ... 3) **"it works"** - well without explaining _what kind of work and why_ is done by tar, this is just hand waving. It was long time ago but I still remember kind of shock when I learned about tar. Utility that doesn't compress felt just useless to me. Of course I was mistaken, but if back then someone would try to "educate" me with one-word statement like _it works_ I'd think "no, it masturbates"... – gnat Dec 28 '12 at 10:25
  • 4
    ...Summing up, this zero effort, purely opinionated answer lacks explanation and context. Statements made are not explained nor backed up with anything. Overly generalized wording seems to merely reiterate what was stated in [question asked](http://programmers.stackexchange.com/questions/180695/when-did-the-standard-for-packaging-linux-source-code-become-tar-gz) instead of answering it. – gnat Dec 28 '12 at 10:26
  • 1
    Your question now answers why, but not when. I've removed the downvote I put on it. – Joe Z. Jan 03 '13 at 03:18
  • 1
    Great answer. There is few software that lives for 30 years or more. This is only possible, because, well, it just works. There is, to my knowledge, nothing you can do to make tar better. And the UNIX philosophy - do one thing right - pays off. You can combine tar with any compression algorithm you want. – Ingo Mar 14 '13 at 18:24