java - what data format will be considered fastest to be written on Kafka? -

- August 15, 2015

we have various options in kafka write data on it,e.g: string format, byte array. data foramt considered fastest while writing on kafka.

moreover kafka provide utility compress whole data once , write on it.

also need consider while consuming same message de-compressing it, reading data cost increase.

kafka 0.8.2 serialises data byte array commit log. org.apache.kafka.common.serialization.serializer class has following interface:

byte[] serialize(string var1, t var2);

it requires byte array returned data written kafka topic. org.apache.kafka.common.serialization.stringserializer class has extract byte array string:

public byte[] serialize(string topic, string data) {         try {             return data == null?null:data.getbytes(this.encoding);

so in performance terms if have binary data write byte array using default serializer creating strings in java can potentially expensive , kafka convert string byte array anyway.

regarding compression kafka offers following compression options on producer out of box:

compression.codec

this parameter allows specify compression codec data generated producer. valid values "none", "gzip" , "snappy".

see following article 1 of kafka co-creators summarise, gzip offers best compression requires more cpu cycles. snappy nice compromise , can compress data , in many cases allows higher throughput. gzip better cross data center replication requires less bandwidth.

Search This Blog

ITEMscalal

java - what data format will be considered fastest to be written on Kafka? -

Comments

Post a Comment

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

python - RuntimeWarning: PyOS_InputHook is not available for interactive use of PyGTK -

unity3d - In a Unity canvas a button and an image hide each other even though they don't overlap -