5

I am confused about archiving and compression and I am going to write what I think is correct.

Here are the commands that I wish to get a better understanding of:

tar –c file > file.tar
tar –cf file.tar file // this command and the previous one are about the same

I think that the previous commands merely create a sort of common folder (archive) in which we have the exact files we had before executing the archiving command (the size is not reduced at all), The only difference is that the third one will produce an archive with .gz instead of .tar.

To reduce the archive’s size (compress it) we have to use:

tar -cjf file.tar.bz2 file
tar –cJf file.tar.xz file

gzip file.tar // it’ll create a compressed file called file.tar.gz
tar -cz file  > file.tar.gz 

The way I see it is that, if the extension is .gz, .bz2, or .xz it is compressed and if it is .tar it is archived, Is that correct?

2 Answers2

5

I think this one answers your question: https://unix.stackexchange.com/questions/127169/does-tar-actually-compress-files-or-just-group-them-together

To confirm what you think: tar indeed just puts the file together in an archive without any compression. It is the compression flags you can use that will make tar use a type of compression when creating the archive.

Also, the file extensions are typed manually, so you could be fooled like this: tar -cz file > file.tar.bz2 (when you should have used .tar.gz instead)

If you want to extract this file later you will fool yourself thinking you used gzip compression, so know what you are doing or document somewhere what you did.

2

'tar' literally means tape archive and stores and extracts files from a tape (or disk) archive. tar supports a large number of compression programs such as gzip, bzip2, lzip, lzma, lzop, xz and traditional compress

tar commands should begin with a function such as

[-] A --catenate --concatenate | c --create | d --diff --compare |
         --delete | r --append | t --list | --test-label | u --update | x
         --extract --get [options] [pathname ...]

the -a option can be used to autodetect the compression desired from the suffix (extension) of the file being created. tar should autodetect the compression type and act accordingly on extraction without a need for specific information. It will only fall back on the suffix for type determination if the signature check fails. For much much more on tar issue the command man tar and to learn more about tar compression specifically, see: https://www.gnu.org/software/tar/manual/html_section/tar_69.html

There are a number of options that can modify the results. tar allows for the addition to, and removal The extension in no way indicates file type. For instance I've seen image files from a Windows system with an extension of .jpg that were actually .gif. To find out the file type open a terminal with Ctrl Alt T navigate to the directory with the file in question and issue the command file filename Here's an example of output from a gzip compressed file:

$ file wireless-info.tar.gz 
wireless-info.tar.gz: gzip compressed data, from Unix, last modified: Thu Apr 23 07:45:20 2015

Changing the extension in no way changes the output of file making it a trustworthy way of determining filetype even on files you don't own and have no notes on. Here's an example after renaming the previous file and removing the .gz...

$ file wireless-info.tar
wireless-info.tar: gzip compressed data, from Unix, last modified: Thu Apr 23 07:45:20 2015

For more on file and how it works issue the command man file

Sources: Experience & https://www.gnu.org/software/tar/manual/html_section/tar_69.html

Elder Geek
  • 36,752