Script frequence de mots dans texte
From Mondothèque
Revision as of 06:46, 28 October 2015 by Ana (talk | contribs) (Created page with "#!/usr/bin/env/ python from collections import Counter import string ''' Prepare your book in plain text format. Makes a frequency dictionary of the words in the book. So...")
- !/usr/bin/env/ python
from collections import Counter
import string
Prepare your book in plain text format. Makes a frequency dictionary of the words in the book.
Sorts the words in the dictionary by frequency and writes it to a text file called frequencies.txt.
Program ignores capitalization as well as punctuation
- functions
- remove caps + breaks + punctuation
def remove_punct(f): tokens = (' '.join(line.replace('\n', ) for line in f)).lower() for c in string.punctuation: tokens= tokens.replace(c,"") return tokens
- create frequency dictionary
def freq_dict(tokens): frequency_d = {} tokens = tokens.split(" ") for word in tokens: try: frequency_d[word] += 1 except KeyError: frequency_d[word] = 1 return frequency_d
- sort words by frequency (import module)
def sort_dict(frequency_d): c=Counter(frequency_d) frequency = c.most_common() return frequency
- write words to text file
def write_to_file(frequency): g = open('frequencies.txt', 'wt') for key, value in frequency: g.write(("{} : {} \n".format(value, key))) g.close()
- execute text file as f // specify your source text here
f = open('0_plus_petit_document.txt', 'rt') frequency_d ={}
tokens = remove_punct(f) print(tokens) frequency_d = freq_dict(tokens) print(frequency_d) frequency = sort_dict(frequency_d) write_to_file(frequency)