hadoop - Convert date with milliseconds using PIG -
really stuck on this! assume have following data set:
a | b ------------------ 1/2/12 | 13:3.8 04:4.1 | 12:1.4 15:4.3 | 1/3/13
observations , b in general in format minutes:seconds.milliseconds click , b response. time format has form of month/day/year if of events happens in beginning of new day.
what want? calculate average difference between b , a. can handle m:s.ms splitting them 2 parts each , b , cast double , perform needed operations fails when m/d/yy introduced. easiest way omit them not practice. there clear way handle such exceptions using pig?
a thought worth contemplating ....
ref : http://pig.apache.org/docs/r0.12.0/func.html string , date functions used.
input :
1/2/12|13:3.8 04:4.1|12:1.4 15:4.3|1/3/13
pig script :
a = load 'input.csv' using pigstorage('|') (start_time:chararray,end_time:chararray); b = foreach generate (indexof(end_time,'/',0) > 0 , last_index_of(end_time,'/') > 0 , (indexof(end_time,'/',0) != last_index_of(end_time,'/')) ? (tounixtime(todate(end_time,'mm/dd/yy'))) : (tounixtime(todate(end_time,'mm:ss.s')))) - (indexof(start_time,'/',0) >0 , last_index_of(start_time,'/') > 0 , (indexof(start_time,'/',0) != last_index_of(start_time,'/')) ? (tounixtime(todate(start_time,'mm/dd/yy'))) : (tounixtime(todate(start_time,'mm:ss.s')))) diff_time; c = foreach (group b all) generate avg(b.diff_time); dump c;
n.b. in place of tounixtime can use tomilliseconds() method.
output :
(1.0569718666666666e7)
Comments
Post a Comment