Reading file multiple ways in Python -


i trying set system running various statistics on text file. in endeavor need open file in python (v2.7.10) , read both lines, , string, statistical functions work.

so far have this:

import csv, json, re textstat.textstat import textstat  file = "data/test.txt" data = open(file, "r") string = data.read().replace('\n', '')  lines = 0 blanklines = 0 word_list = [] cf_dict = {} word_dict = {} punctuations = [",", ".", "!", "?", ";", ":"] sentences = 0 

this sets file , preliminary variables. @ point, print textstat.syllable_count(string) returns number. further, have:

for line in data:     lines += 1         if line.startswith('\n'):         blanklines += 1     word_list.extend(line.split())     char in line.lower():         cf_dict[char] = cf_dict.get(char, 0) + 1  word in word_list:     lastchar = word[-1]     if lastchar in punctuations:         word = word.rstrip(lastchar)     word = word.lower()     word_dict[word] = word_dict.get(word, 0) + 1  key in cf_dict.keys():     if key in '.!?':         sentences += cf_dict[key]  number_words = len(word_list) num = float(number_words) avg_wordsize = len(''.join([k*v k, v in word_dict.items()]))/num mcw = sorted([(v, k) k, v in word_dict.items()], reverse=true)  print( "total lines: %d" % lines ) print( "blank lines: %d" % blanklines ) print( "sentences: %d" % sentences ) print( "words: %d" % number_words )  print('-' * 30) print( "average word length: %0.2f" % avg_wordsize ) print( "30 common words: %s" % mcw[:30] ) 

but fails 22 avg_wordsize = len(''.join([k*v k, v in word_dict.items()]))/num returns zerodivisionerror: float division zero. however, if comment out string = data.read().replace('\n', '') first piece of code, can run second piece without problem , expected output.

basically, how set can run second piece of code on data, textstat on string?

the call data.read() places file pointer @ end of file, dont have more read @ point. either have close , reopen file or more reset pointer @ begining using data.seek(0)


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -