Compressed Files

What if somebody sends you a file called myCoolStuff.tar.gz?

It's a tarball!

A file with "tar.gz" on the end is an archive in which one or more files have been "tarred" together and then compressed with the GNU compression program called "gzip".

When I receive a file like that, I first inspect the contents like so:

$ tar tzvf myCoolStuff.tar.gz

And if I want to extract the contents:

$ tar xzvf myCoolStuff.tar.gz

Read "man tar" to understand this:

t: test

x: extract

z: un-gzip

f: following file

What if you get myCoolStuff.tar.bz2?

bz2 is just another compression algorithm. It is created by a program called "bzip2".

Review the contents:

$ tar tjvf myCoolStuff.tar.bz2

Extract the contents:

$ tar xjvf myCoolStuff.tar.bz2

Read "man tar".

j: extract with bzip2

Can you find a more complicated & tedious way to do that?

Yes, of course I can.

If you are on a Unix/Linux system that does not have the GNU version of tar, you have to do this in two steps. First, you decompress the file

$ gzip -d myCoolStuff.tar.gz

which is the same thing as

$ gunzip myCoolStuff.tar.gz

That creates an output file "myCoolStuff.tar". This decompresses the package. It is as if the "gz" suffix has been stripped off.

In step two, the tar program is used to extract the individual parts.

$ tar xvf myCoolStuff.tar

A pipe can be used to do that in one slug, as in

$ gzip -dc myCoolStuff.tar.gz | tar xv -

Run "man gzip" and "man tar" in case you wonder what -dc and - are for here.

What if somebody sends you a zip file?

Most Unix/Linux systems have an unzip program that can handle zip files.

$ unzip -t myCoolStuff.zip

shows the contents and tests the archive's completeness, and

$ unzip myCoolStuff.zip

extracts the files into the current working directory.

The Windows "zip" Philosophy and the Unix "tar.gz" Philosophy

In Windows, zip is a popular archiving format. zip is a proprietary storage format and, at least in the early days of the Internet, it was necessary to buy the PKZip program to open zip packages. zip will group together files and compress them in (what appears to the user as) a single step.

In Unix, there is a different tradition. One tries to do separate jobs with separate programs, and the evolution can select the "best of breed" approach to do each job optimally.

As far as I know, there have not been many changes in the basic "tar" approach to group together files. There have been several advances in the compression approach, however. The original standard, which is still considered good enough for most purposes, is gzip (the gz file suffix). The bzip program (bz suffix) achieved a higher compression ration, but at the expense of a slightly slower compression performance. bzip2 is the newer version of bzip, of course. It provides a yet smaller file format, but it also takes a bit longer to run.

Diatribe About --, -, and Such.

The options for "tar" reveal one of the frustrating Unix things.

The options for some programs follow a single dash, as in

$ ls -l

Some programs ask for options with two dashes, as in

$ rpmbuild whatever.spec --define "dist Centos"

Some programs have decided that they are so special that they will use neither the - nor the --, but nothing. Both "tar" and "ps" used to expect a single dash, but no longer to they want that. In fact, if you use the dash, those programs may refuse to work or they will print out some abuse. Tar allows either

$ tar -tzvf whatever.tar.gz

or

$ tar tzvf whatever.tar.gz

But ps offers some caution if one runs

$ ps -aux

Warning: bad ps syntax, perhaps a bogus '-'? See procsps FAQ »

Instead, they would have one type

$ ps aux

Frankly, I find it aggravating, and wish that all programs would adhere to the standard that the GNU Foundation has tried to establish, which is that we use a single dash for short options that do not require an optional argument, as in

$ ls -l

or

$ ls -a

and these can be combined as

$ ls -la

But we use two dashes and an equal sign when an option is required

$ ls --color=auto

Many programs are also willing to accept either a "verbose" option --anoption=2 or an "abbreviated" short option without the equal sign, -a2.

$ myprogram --anoption=2

or

$ myprogram -a2

In many cases, it is important that there is no space between the a and the 2.


CRMDA Calendar

Like us on Facebook
One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
23rd nationwide for service to veterans —"Best for Vets," Military Times
Equity & Diversity Calendar

KU Today