r - How to pick the first comma separated value in a column in data.table? -
courtesy of @jaap's answer this stackoverflow question
the data this:
name                                   text idx             c_org 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3 mit,yale,stanford 4:   bill                              text   4                     for column c_org, if there's multiple values, in observation 3, mit,yale,stanford, i'd make first value, mit column value. result should this:
name                                   text idx             neworg 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3               mit 4:   bill                              text   4           (please note in c_org column, field has more 1 value, empty. in expected output, if there's 1 value, keep it; if more one, keep first one; if empty, keep empty.)
i tried (but failed):
dt[ , str_split(c_org, ",")[[1]][1]]   i guess quite common met data there more 1 value in 1 field. how in data.table? (or in other way if solution better data.table)
we can use sub match pattern , followed 1 or more characters (.*) until end ($) of string  in 'c_org' column , replace ''.  output can assigned (:=) create column 'neworg', , assign 'c_org' null.
dt[, neworg := sub(',.*$', '', c_org)][,c_org:= null] dt #     name                                   text idx   neworg #1:   john                      text contains mit   1      mit #2: sussan     text stanford university   2 stanford #3:   bill graduated yale, mit, stanford.   3      mit #4:   bill                              text   4            or option data.table v1.9.6+ tstrsplit
dt[, neworg := tstrsplit(c_org, ',', fill='')[[1]]][, c_org:= null]      
Comments
Post a Comment