java - what data format will be considered fastest to be written on Kafka? -


we have various options in kafka write data on it,e.g: string format, byte array. data foramt considered fastest while writing on kafka.

moreover kafka provide utility compress whole data once , write on it.

also need consider while consuming same message de-compressing it, reading data cost increase.

kafka 0.8.2 serialises data byte array commit log. org.apache.kafka.common.serialization.serializer class has following interface:

byte[] serialize(string var1, t var2); 

it requires byte array returned data written kafka topic. org.apache.kafka.common.serialization.stringserializer class has extract byte array string:

public byte[] serialize(string topic, string data) {         try {             return data == null?null:data.getbytes(this.encoding); 

so in performance terms if have binary data write byte array using default serializer creating strings in java can potentially expensive , kafka convert string byte array anyway.

regarding compression kafka offers following compression options on producer out of box:

  • compression.codec
  • this parameter allows specify compression codec data generated producer. valid values "none", "gzip" , "snappy".

see following article 1 of kafka co-creators summarise, gzip offers best compression requires more cpu cycles. snappy nice compromise , can compress data , in many cases allows higher throughput. gzip better cross data center replication requires less bandwidth.


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -