Python > Exercises #4: Counting words
Task 1: Count all the words
Read a text from the keyboard, then display the number of words in it.
Viable solution:
line = input() # read from keyboard
words = line.split() # split into words
count = len(words) # count the words
print(count)
Or in one line:
Task 2: Count distinct words
Display only the number of distinct words. Count words that differ only in letter case, such as 'example'
, 'Example'
and 'exAMPLE'
, as one word.
One possible way is to construct a list of unique (lower-cased) words, and then consulting it for each word of the input, expanding this list when appropriate.
words = input().split()
seen = []
for x in words:
if x.lower() not in seen:
seen.append(x.lower())
print(len(seen))
(The convenient in
operator could also be emulated by list.count(value)
or by a nested for
cycle with an if
condition; both variants decidedly less straight-forward.)
The in
operator is slow for lists, so it would be better to use a set. As a bonus, set’s .add()
method will never cause duplicities, so we can skip the membership check altogether:
Or in one line:
Task 3: Count word occurrences
For each word in the input, output the number of occurrences, sorted alphabetically.
We’ll maintain a dict()
, mapping each word to the number of its occurrences. Processing the input word-by-word, in each iteration we’ll increment the corresponding entry in the dictionary. There’s only one problem: at the beginning, the number of occurrences for each word is technically not zero, but undefined. Accessing dict[word]
would result in an error.
The most straight-forward solution is to explicitly test (e.g. with the in
operator), whether such a word is already in the dictionary (and was therefore encountered before), or it’s not (and this is the first occurrence), and then act accordingly:
words = input().split()
words = [x.lower() for x in words]
d = dict()
for x in words:
if x not in d:
d[x] = 1
else:
d[x] += 1
for k, v in sorted(d.items()):
print(k + ': ' + str(v))
Alternatively, we could replace dict[key]
access with the dict.get(key, default)
method, which, when the key is not in the dictionary, returns the specified default value:
words = input().split()
words = [x.lower() for x in words]
d = dict()
for x in words:
d[x] = d.get(x, 0) + 1
for k, v in sorted(d.items()):
print(k + ': ' + str(v))
Furthermore, the collections
module contains a drop-in dict()
replacement defaultdict(type)
, which returns the default value also on d[key]
access.
(Defaults are 0
for int
, 0.0
for float, False
for bool
and empty for complex types.)
import collections
words = input().split()
words = [x.lower() for x in words]
d = collections.defaultdict(int)
for x in words:
d[x] += 1
for k, v in sorted(d.items()):
print(k + ': ' + str(v))
Task 4: Order by most used
Print the words ordered by frequency, from the most common to the least common words.
In previous examples, sorted(d.items())
sorts the items alphabetically. (Actually, it sorts the key-value pairs from d.items()
lexicographically, that is first by the key, and in case of a tie (which won’t happen in a dictionary), by the value.)
To sort by the value first (and then by the key), we can construct a list consisting of reversed value-key pairs, and then sort
and reverse
it:
import collections
words = input().split()
words = [x.lower() for x in words]
d = collections.defaultdict(int)
for x in words:
d[x] += 1
a = [(v,k) for k,v in d.items()]
a.sort()
a.reverse()
for v, k in a:
print(k + ': ' + str(v))
Sneak peek: using key functions
The list.sort(…)
and sorted(…)
functions also accept two interesting parameters:
reverse
- change the sort order (default: low to high,
reverse=True
: high to low) key
- sort the items using a value derived from an item by a key function
One pre-made key function is operator.itemgetter(index)
, which grabs the i-th subitem:
import collections
import operator
words = input().split()
words = [x.lower() for x in words]
d = collections.defaultdict(int)
for x in words:
d[x] += 1
for k, v in sorted(d.items(), key=operator.itemgetter(1), reverse=True):
print(k + ': ' + str(v))
And more arcanely, without importing operator
, using a lambda function: