r - How to pick the first comma separated value in a column in data.table? -

- February 15, 2014

courtesy of @jaap's answer this stackoverflow question

the data this:

name                                   text idx             c_org 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3 mit,yale,stanford 4:   bill                              text   4

for column c_org, if there's multiple values, in observation 3, mit,yale,stanford, i'd make first value, mit column value. result should this:

name                                   text idx             neworg 1:   john                      text contains mit   1               mit 2: sussan     text stanford university   2          stanford 3:   bill graduated yale, mit, stanford.   3               mit 4:   bill                              text   4

(please note in c_org column, field has more 1 value, empty. in expected output, if there's 1 value, keep it; if more one, keep first one; if empty, keep empty.)

i tried (but failed):

dt[ , str_split(c_org, ",")[[1]][1]]

i guess quite common met data there more 1 value in 1 field. how in data.table? (or in other way if solution better data.table)

we can use sub match pattern , followed 1 or more characters (.*) until end ($) of string in 'c_org' column , replace ''. output can assigned (:=) create column 'neworg', , assign 'c_org' null.

dt[, neworg := sub(',.*$', '', c_org)][,c_org:= null] dt #     name                                   text idx   neworg #1:   john                      text contains mit   1      mit #2: sussan     text stanford university   2 stanford #3:   bill graduated yale, mit, stanford.   3      mit #4:   bill                              text   4

or option data.table v1.9.6+ tstrsplit

dt[, neworg := tstrsplit(c_org, ',', fill='')[[1]]][, c_org:= null]

Search This Blog

ITEMscalal

r - How to pick the first comma separated value in a column in data.table? -

Comments

Post a Comment

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

python - RuntimeWarning: PyOS_InputHook is not available for interactive use of PyGTK -

unity3d - In a Unity canvas a button and an image hide each other even though they don't overlap -