r - How to pick the first comma separated value in a column in data.table? -
courtesy of @jaap's answer this stackoverflow question
the data this:
name text idx c_org 1: john text contains mit 1 mit 2: sussan text stanford university 2 stanford 3: bill graduated yale, mit, stanford. 3 mit,yale,stanford 4: bill text 4 for column c_org, if there's multiple values, in observation 3, mit,yale,stanford, i'd make first value, mit column value. result should this:
name text idx neworg 1: john text contains mit 1 mit 2: sussan text stanford university 2 stanford 3: bill graduated yale, mit, stanford. 3 mit 4: bill text 4 (please note in c_org column, field has more 1 value, empty. in expected output, if there's 1 value, keep it; if more one, keep first one; if empty, keep empty.)
i tried (but failed):
dt[ , str_split(c_org, ",")[[1]][1]] i guess quite common met data there more 1 value in 1 field. how in data.table? (or in other way if solution better data.table)
we can use sub match pattern , followed 1 or more characters (.*) until end ($) of string in 'c_org' column , replace ''. output can assigned (:=) create column 'neworg', , assign 'c_org' null.
dt[, neworg := sub(',.*$', '', c_org)][,c_org:= null] dt # name text idx neworg #1: john text contains mit 1 mit #2: sussan text stanford university 2 stanford #3: bill graduated yale, mit, stanford. 3 mit #4: bill text 4 or option data.table v1.9.6+ tstrsplit
dt[, neworg := tstrsplit(c_org, ',', fill='')[[1]]][, c_org:= null]
Comments
Post a Comment