java - what data format will be considered fastest to be written on Kafka? -
we have various options in kafka write data on it,e.g: string format, byte array. data foramt considered fastest while writing on kafka.
moreover kafka provide utility compress whole data once , write on it.
also need consider while consuming same message de-compressing it, reading data cost increase.
kafka 0.8.2 serialises data byte array commit log. org.apache.kafka.common.serialization.serializer
class has following interface:
byte[] serialize(string var1, t var2);
it requires byte array returned data written kafka topic. org.apache.kafka.common.serialization.stringserializer
class has extract byte array string:
public byte[] serialize(string topic, string data) { try { return data == null?null:data.getbytes(this.encoding);
so in performance terms if have binary data write byte array using default serializer creating strings in java can potentially expensive , kafka convert string byte array anyway.
regarding compression kafka offers following compression options on producer out of box:
- compression.codec
- this parameter allows specify compression codec data generated producer. valid values "none", "gzip" , "snappy".
see following article 1 of kafka co-creators summarise, gzip offers best compression requires more cpu cycles. snappy nice compromise , can compress data , in many cases allows higher throughput. gzip better cross data center replication requires less bandwidth.
Comments
Post a Comment