r - How to pick the first comma separated value in a column in data.table? -


courtesy of @jaap's answer this stackoverflow question

the data this:

name                                   text idx             c_org 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3 mit,yale,stanford 4:   bill                              text   4                   

for column c_org, if there's multiple values, in observation 3, mit,yale,stanford, i'd make first value, mit column value. result should this:

name                                   text idx             neworg 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3               mit 4:   bill                              text   4         

(please note in c_org column, field has more 1 value, empty. in expected output, if there's 1 value, keep it; if more one, keep first one; if empty, keep empty.)

i tried (but failed):

dt[ , str_split(c_org, ",")[[1]][1]] 

i guess quite common met data there more 1 value in 1 field. how in data.table? (or in other way if solution better data.table)

we can use sub match pattern , followed 1 or more characters (.*) until end ($) of string in 'c_org' column , replace ''. output can assigned (:=) create column 'neworg', , assign 'c_org' null.

dt[, neworg := sub(',.*$', '', c_org)][,c_org:= null] dt #     name                                   text idx   neworg #1:   john                      text contains mit   1      mit #2: sussan     text stanford university   2 stanford #3:   bill graduated yale, mit, stanford.   3      mit #4:   bill                              text   4          

or option data.table v1.9.6+ tstrsplit

dt[, neworg := tstrsplit(c_org, ',', fill='')[[1]]][, c_org:= null] 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -