r - How to pick the first comma separated value in a column in data.table? -
courtesy of @jaap's answer this stackoverflow question
the data this:
name text idx c_org 1: john text contains mit 1 mit 2: sussan text stanford university 2 stanford 3: bill graduated yale, mit, stanford. 3 mit,yale,stanford 4: bill text 4
for column c_org
, if there's multiple values, in observation 3, mit,yale,stanford
, i'd make first value, mit
column value. result should this:
name text idx neworg 1: john text contains mit 1 mit 2: sussan text stanford university 2 stanford 3: bill graduated yale, mit, stanford. 3 mit 4: bill text 4
(please note in c_org
column, field has more 1 value, empty. in expected output, if there's 1 value, keep it; if more one, keep first one; if empty, keep empty.)
i tried (but failed):
dt[ , str_split(c_org, ",")[[1]][1]]
i guess quite common met data there more 1 value in 1 field. how in data.table
? (or in other way if solution better data.table
)
we can use sub
match pattern ,
followed 1 or more characters (.*
) until end ($
) of string in 'c_org' column , replace ''
. output can assigned (:=
) create column 'neworg', , assign 'c_org' null.
dt[, neworg := sub(',.*$', '', c_org)][,c_org:= null] dt # name text idx neworg #1: john text contains mit 1 mit #2: sussan text stanford university 2 stanford #3: bill graduated yale, mit, stanford. 3 mit #4: bill text 4
or option data.table v1.9.6+
tstrsplit
dt[, neworg := tstrsplit(c_org, ',', fill='')[[1]]][, c_org:= null]
Comments
Post a Comment