How to use java to do compress and uncompress using snappy or bzip2

Introduction

This post would demo how to do compress and uncompress using snappy or bzip2.

Environments

  • Java 1.8

1. The Snappy method

Snappy is a high-performance compress library , as follows:

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

1.1 the pom.xml

<dependency>
    <groupId>org.xerial.snappy</groupId>
    <artifactId>snappy-java</artifactId>
    <version>1.1.7.1</version>
</dependency>

1.2 the example code

FileInputStream fis = new FileInputStream(new File("/tmp/textfile.txt"));
FileChannel channel = fis.getChannel();
ByteBuffer bb = ByteBuffer.allocate((int) channel.size());
channel.read(bb);
byte[] beforeBytes = bb.array();

//compress
System.out.println("Before snappy compress size:" + beforeBytes.length + " bytes");
long startTime1 = System.currentTimeMillis();
byte[] afterBytes = compress(beforeBytes);
long endTime1 = System.currentTimeMillis();
System.out.println("after snappy compress size:" + afterBytes.length + " bytes");
System.out.println("snappy compress time elapsed:" + (endTime1 - startTime1)
        + "ms");

//uncompress
long startTime2 = System.currentTimeMillis();
byte[] resultBytes = uncompress(afterBytes);
System.out.println("snappy uncompress size:" + resultBytes.length + " bytes");
long endTime2 = System.currentTimeMillis();
System.out.println("uncompress time elapsed:" + (endTime2 - startTime2)
        + "ms");

The key points are as follows:

  • Prepare a big file to compress, I use python to generate a random big file, you can view this article to prepare a file
  • The compression code just compress the bytes and check the time elapsed
  • The uncompression code use the compressed bytes to uncompress and check the size and time

We got this result:

Before snappy compress size:104857600 bytes
after snappy compress size:104862415 bytes
snappy compress time elapsed:229ms
snappy uncompress size:104857600 bytes
uncompress time elapsed:82ms

1.3 the snappy summary

As we can see that the snappy is very fast but it has a low compression rate. For my randomized big file, the compressed size is even bigger than the origin one!

2. the bzip2 method

The bzip2 is a very high-rate compression library, as follows:

bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

Apache foundation provides a compress library which contains bzip2 library, here is the example.

2.1 the pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.16</version>
</dependency>

2.2 the example code

FileInputStream fis = new FileInputStream(new File("/tmp/textfile.txt"));
FileChannel channel = fis.getChannel();
ByteBuffer bb = ByteBuffer.allocate((int) channel.size());
channel.read(bb);
byte[] beforeBytes = bb.array();

//compress
System.out.println("before bzip2 compress size:" + beforeBytes.length + " bytes");
long startTime1 = System.currentTimeMillis();
byte[] afterBytes = compress(beforeBytes);
long endTime1 = System.currentTimeMillis();
System.out.println("after bzip2 compress size:" + afterBytes.length + " bytes");
System.out.println("bzip2 compress time elapsed:" + (endTime1 - startTime1)
        + "ms");

//uncompress
long startTime2 = System.currentTimeMillis();
byte[] resultBytes = uncompress(afterBytes);
System.out.println("after bzip2 uncompress size:" + resultBytes.length + " bytes");
long endTime2 = System.currentTimeMillis();
System.out.println("bzip2 uncompress time elapsed:" + (endTime2 - startTime2)
        + "ms");

2.3 the result

And run the code, we got this

before bzip2 compress size:104857600 bytes
after bzip2 compress size:75620256 bytes
bzip2 compress time elapsed:16794ms
after bzip2 uncompress size:104857600 bytes
bzip2 uncompress time elapsed:9973ms

2.4 bzip2 summary

As we can see, the bzip2 is slower than the snappy , but it has a high compression rate, the compressed file is 75% of the original size. It’s awesome!

summary

I recommend to use snappy when the performance is a key feature, but if you care about the compression rate, choose bzip2.

You can find detail documents about the snappy and bzip2 here: