hadoop - How to process large file with one record dependent on another in MapReduce -

- August 15, 2013

i have scenario there large file , line 1 record might have dependency on 1000th line data , line 1 , 1000 can part of separate spilts. understanding of framework record reader going return 1 key, value pair mapper , each k,v pair independent of another. since file has been divided splits , want (i.e. splittable false no option), can handle anyhow may writing own record reader, mapper or reducer?

dependency -

row1: a,b,c,d,e,f

row2: x,y,z,p,q,r

now x in row2 need used d in row1 desired output.

thanks.

i think need implement reducer side join. here can see better explanation of it: http://hadooped.blogspot.mx/2013/09/reduce-side-joins-in-java-map-reduce.html.

both related values have end in same reducer (defined key , partitioner) , should grouped (groupingcomparator) , may use secondsort order grouped values.

Search This Blog

ITEMscalal

hadoop - How to process large file with one record dependent on another in MapReduce -

Comments

Post a Comment

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

Fatal error: Call to undefined function menu_execute_active_handler() in drupal 7.9 -

python - RuntimeWarning: PyOS_InputHook is not available for interactive use of PyGTK -