json - Python and Pandas: UnicodeDecodeError: 'ascii' codec can't decode byte -


after using pandas read json object pandas.dataframe, want print first year in each pandas row. eg: if have 2013-2014(2015), want print 2013

full code (here)

x = '{"0":"1985\\u2013present","1":"1985\\u2013present",......}' = pd.read_json(x, typ='series') i, row in a.iteritems():     print row.split('-')[0].split('—')[0].split('(')[0] 

the following error occurs:

--------------------------------------------------------------------------- unicodedecodeerror                        traceback (most recent call last) <ipython-input-1333-d8ef23860c53> in <module>()       1 i, row in a.iteritems(): ----> 2     print row.split('-')[0].split('—')[0].split('(')[0]  unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) 

why happening? how can fix problem?

your json data strings unicode string, can see example printing 1 of values:

in: a[0] out: u'1985\u2013present' 

now try split string @ unicode \u2031 (en dash), string give split no unicode string (therefore error 'ascii' codec can't decode byte 0xe2 - en dash no ascii character).

to make example working, use:

for i, row in a.iteritems():     print row.split('-')[0].split(u'—')[0].split('(')[0] 

notice u in front of uncode dash. write u'\u2013' split string.

for details on unicode in python, see https://docs.python.org/2/howto/unicode.html


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -