hadoop - How to process large file with one record dependent on another in MapReduce -
i have scenario there large file , line 1 record might have dependency on 1000th line data , line 1 , 1000 can part of separate spilts. understanding of framework record reader going return 1 key, value pair mapper , each k,v pair independent of another. since file has been divided splits , want (i.e. splittable false no option), can handle anyhow may writing own record reader, mapper or reducer?
dependency -
row1: a,b,c,d,e,f
row2: x,y,z,p,q,r
now x in row2 need used d in row1 desired output.
thanks.
i think need implement reducer side join. here can see better explanation of it: http://hadooped.blogspot.mx/2013/09/reduce-side-joins-in-java-map-reduce.html.
both related values have end in same reducer (defined key , partitioner) , should grouped (groupingcomparator) , may use secondsort order grouped values.
Comments
Post a Comment