What compression tools are available in Ubuntu that can benefit from a multi-core CPU.
9 Answers
Well, the keyword was parallel. After looking for all compression tools that were also parallel I found the following:
PXZ - Parallel XZ is a compression utility that takes advantage of running LZMA compression of different parts of an input file on multiple cores and processors simultaneously. Its primary goal is to utilize all resources to speed up compression time with minimal possible influence on compression ratio.
sudo apt-get install pxz
PLZIP - Lzip is a lossless data compressor based on the LZMA algorithm, with very safe integrity checking and a user interface similar to the one of gzip or bzip2. Lzip decompresses almost as fast as gzip and compresses better than bzip2, which makes it well suited for software distribution and data archiving.
Plzip is a massively parallel (multi-threaded) version of lzip using the lzip file format; the files produced by plzip are fully compatible with lzip.
Plzip is intended for faster compression/decompression of big files on multiprocessor machines, which makes it specially well suited for distribution of big software files and large scale data archiving. On files big enough, plzip can use hundreds of processors.
sudo apt-get install plzip
PIGZ - pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that takes advantage of multiple processors and multiple cores when compressing data.
sudo apt-get install pigz
PBZIP2 - pbzip2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 (ie: anything compressed with pbzip2 can be decompressed with bzip2).
sudo apt-get install pbzip2
LRZIP - A multithreaded compression program that can achieve very high compression ratios and speed when used with large files. It uses the combined compression algorithms of zpaq and lzma for maximum compression, lzo for maximum speed, and the long range redundancy reduction of rzip. It is designed to scale with increases with RAM size, improving compression further. A choice of either size or speed optimizations allows for either better compression than even lzma can provide, or better speed than gzip, but with bzip2 sized compression levels.
sudo apt-get install lrzip
A small Compression Benchmark (Using the test Oli created):
ORIGINAL FILE SIZE - 100 MB
PBZIP2 - 101 MB (1% Bigger)
PXZ - 101 MB (1% Bigger)
PLZIP - 102 MB (1% Bigger)
LRZIP - 101 MB (1% Bigger)
PIGZ - 101 MB (1% Bigger)
A small Compression Benchmark (Using a Text file):
ORIGINAL FILE SIZE - 70 KB Text File
PBZIP2 - 16.1 KB (23%)
PXZ - 15.4 KB (22%)
PLZIP - 15.5 KB (22.1%)
LRZIP - 15.3 KB (21.8%)
PIGZ - 17.4 KB (24.8%)
- 216,643
There are two main tools. lbzip2 and pbzip2. They're essentially different implementations of bzip2 compressors. I've compared them (the output is a tidied up version but you should be able to run the commands)
cd /dev/shm # we do all of this in RAM!
dd if=/dev/urandom of=bigfile bs=1024 count=102400
$ lbzip2 -zk bigfile
Time: 0m3.596s
Size: 105335428
$ pbzip2 -zk bigfile
Time: 0m5.738s6
Size: 10532460
lbzip2 appears to be the winner on random data. It's slightly less compressed but much quicker. YMMV.
- 299,380
Update:
XZ Utils supports multi-threaded compression since v5.2.0, it was originally mistakenly documented as being multi-threaded decompression.
For example: tar -cf - source | xz --threads=0 > destination.tar.xz
- 545
- 376
In addition the nice summary above (thanks Luis), these days folks might also want to consider PIXZ, which according to it's README (Source: https://github.com/vasi/pixz -- I haven't verified the claims myself) has some advantages over PXZ.
[Compared to PIXZ, PXZ has these advantages and disadvantages:]
* Simpler code
* Uses OpenMP instead of pthreads
* Uses streams instead of blocks, not indexable
* Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage
In other words, PIXZ is supposedly more memory and disk efficient, and has an optional indexing feature that speeds up decompression of individual components of compressed tar files.
- 329
- 3
- 9
Zstandard supports multi-threading since v1.2.0ยน. It is a very fast compressor and decompressor intended to replace gzip and it can also compress as efficient (if not better) as LZMA2/XZ on its highest levels.
You have to use one of these releases, or compile the latest version from source to get these benefits. Luckily it doesn't pull in a lot of dependencies.
There was also a 3rd party pzstd in v1.1.0 of zstd.
- 17,371
- 29,597
lzop may also be a viable option, although it's single-threaded.
It uses the very fast lempel-ziv-oberhumer compression algorithm which is 5-6 times faster than gzip in my observation.
Note: Although it's not multi-threaded yet, it will probably outperform pigz on 1-4 core systems. That's why I decided to post this even if it doesn't directly answer your question. Try it, it may solve your CPU bottleneck problem while using only one CPU and compressing a little worse. I found it often to be a better solution than, e.g pigz.
- 181
It is not really an answer, but I think it is relevant enough to share my benchmarks comparing speed of gzip and pigz on a real HW in a real life scenario. As pigz is the multithreaded evolution I personally have chosen to use from now on.
Metadata:
- Hardware used:
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz(4c/8t) + Nvme SSD - GNU/Linux distribution:
Xubuntu 17.10 (artful) gzipversion:1.6pigzversion:2.4- The file being compressed is 9.25 GiB SQL dump
gzip quick
time gzip -1kN ./db_dump.sql
real 1m22,271s
user 1m17,738s
sys 0m3,330s
gzip best
time gzip -9kN ./db_dump.sql
real 10m6,709s
user 10m2,710s
sys 0m3,828s
pigz quick
time pigz -1kMN ./db_dump.sql
real 0m26,610s
user 1m55,389s
sys 0m6,175s
pigz best (no zopfli)
time pigz -9kMN ./db_dump.sql
real 1m54,383s
user 14m30,435s
sys 0m5,562s
pigz + zopfli algorithm
time pigz -11kMN ./db_dump.sql
real 171m33,501s
user 1321m36,144s
sys 0m29,780s
As a bottomline I would not recommend the zopfli algorithm since the compression took tremendous amount of time for a not-that-significant amount of disk space spared.
Resulting file sizes:
- bests: 1309M
- quicks: 1680M
- zopfli: 1180M
- 141
Relevant Arch Wiki entry: https://wiki.archlinux.org/index.php/Makepkg#Utilizing_multiple_cores_on_compression
# lzma compression
xz --threads=0
drop-in parallel gzip replacement
-p/--processes flag can be used to employ less cores
pigz
drop-in parallel bzip2 replacement
-p# flag can be used to employ less cores
(note: no space between the -p and number of cores)
pbzip2
modern zstd compression
is used to build Arch packages by default
since somewhere 2020
zstd --threads=0
- 1
- 2
