Reading file multiple ways in Python -
i trying set system running various statistics on text file. in endeavor need open file in python (v2.7.10) , read both lines, , string, statistical functions work.
so far have this:
import csv, json, re textstat.textstat import textstat file = "data/test.txt" data = open(file, "r") string = data.read().replace('\n', '') lines = 0 blanklines = 0 word_list = [] cf_dict = {} word_dict = {} punctuations = [",", ".", "!", "?", ";", ":"] sentences = 0
this sets file , preliminary variables. @ point, print textstat.syllable_count(string)
returns number. further, have:
for line in data: lines += 1 if line.startswith('\n'): blanklines += 1 word_list.extend(line.split()) char in line.lower(): cf_dict[char] = cf_dict.get(char, 0) + 1 word in word_list: lastchar = word[-1] if lastchar in punctuations: word = word.rstrip(lastchar) word = word.lower() word_dict[word] = word_dict.get(word, 0) + 1 key in cf_dict.keys(): if key in '.!?': sentences += cf_dict[key] number_words = len(word_list) num = float(number_words) avg_wordsize = len(''.join([k*v k, v in word_dict.items()]))/num mcw = sorted([(v, k) k, v in word_dict.items()], reverse=true) print( "total lines: %d" % lines ) print( "blank lines: %d" % blanklines ) print( "sentences: %d" % sentences ) print( "words: %d" % number_words ) print('-' * 30) print( "average word length: %0.2f" % avg_wordsize ) print( "30 common words: %s" % mcw[:30] )
but fails 22 avg_wordsize = len(''.join([k*v k, v in word_dict.items()]))/num
returns zerodivisionerror: float division zero. however, if comment out string = data.read().replace('\n', '')
first piece of code, can run second piece without problem , expected output.
basically, how set can run second piece of code on data
, textstat on string
?
the call data.read()
places file pointer @ end of file, dont have more read @ point. either have close , reopen file or more reset pointer @ begining using data.seek(0)
Comments
Post a Comment