hbase_value_compress

背景

我们公司在使用HBASE时，存入的业务数据是一些对象实例的序列化数据，为了降低存储空间，计划对数据进行压缩

解决步骤

首先，引入依赖，这里我们直接使用Apache 提供的工具包：

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.21</version>
</dependency>

新增encode方法：

public byte[] encode(byte[] content,String compressType) {
        if (content == null || content.length == 0) {
            return content;
        }
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        CompressorOutputStream encoder = null;
        try {
            encoder = new CompressorStreamFactory().createCompressorOutputStream(compressType, out);
            encoder.write(content);
        } catch (IOException | CompressorException e) {
            LOG.error("ApacheCodec encode error", e);
            return new byte[0];
        } finally {
            IOUtils.closeQuietly(encoder);
            IOUtils.closeQuietly(out);
        }
        return out.toByteArray();
    }

这里的compressType的值为CompressorStreamFactory提供的一系列变量

在向HBASE中Put数据的时候，只需要稍加调整就好：

final byte[] encode = encode(columnBytes);
put.addColumn(Bytes.toBytes(
  cfAndQualifier[0]),
              Bytes.toBytes(cfAndQualifier[1]),
              encode);

对应的decode 方法为：

public byte[] decode(byte[] content,String compressType) {
        if (content == null || content.length == 0) {
            return content;
        }
        ByteArrayInputStream in = new ByteArrayInputStream(content);
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        CompressorInputStream decoder = null;
        try {
            decoder = new CompressorStreamFactory().createCompressorInputStream(compressType, in);
            IOUtils.copy(decoder, out);
        } catch (IOException | CompressorException e) {
            LOG.error("ApacheCodec decode error", e);
            return new byte[0];
        } finally {
            IOUtils.closeQuietly(decoder);
            IOUtils.closeQuietly(out);
            IOUtils.closeQuietly(in);
        }
        return out.toByteArray();
    }

在使用时，需要注意的是：压缩和解压缩使用的类型必须一致，否则将会导致异常

总结

依托于现有的一些工具包，通过一个小小的优化，可以是存储空间的利用率有一个比较大的提升，当数据到达一定规模的时候，这种提升带来的成本收益还是很可观的。