hadoop - Convert date with milliseconds using PIG -


really stuck on this! assume have following data set:

a      |   b ------------------ 1/2/12 | 13:3.8  04:4.1 | 12:1.4 15:4.3 | 1/3/13 

observations , b in general in format minutes:seconds.milliseconds click , b response. time format has form of month/day/year if of events happens in beginning of new day.

what want? calculate average difference between b , a. can handle m:s.ms splitting them 2 parts each , b , cast double , perform needed operations fails when m/d/yy introduced. easiest way omit them not practice. there clear way handle such exceptions using pig?

a thought worth contemplating ....

ref : http://pig.apache.org/docs/r0.12.0/func.html string , date functions used.

input :

1/2/12|13:3.8 04:4.1|12:1.4 15:4.3|1/3/13 

pig script :

a = load 'input.csv' using pigstorage('|')  (start_time:chararray,end_time:chararray); b = foreach generate (indexof(end_time,'/',0) > 0 , last_index_of(end_time,'/') > 0 , (indexof(end_time,'/',0) != last_index_of(end_time,'/'))                  ? (tounixtime(todate(end_time,'mm/dd/yy')))  : (tounixtime(todate(end_time,'mm:ss.s')))) -                  (indexof(start_time,'/',0) >0  , last_index_of(start_time,'/') > 0 , (indexof(start_time,'/',0) != last_index_of(start_time,'/'))                  ? (tounixtime(todate(start_time,'mm/dd/yy')))  : (tounixtime(todate(start_time,'mm:ss.s')))) diff_time; c = foreach (group b all) generate avg(b.diff_time); dump c; 

n.b. in place of tounixtime can use tomilliseconds() method.

output :

(1.0569718666666666e7) 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -