One of the most common ways to implement the BoW model in Python is as a dictionary with each key set to a word and each value set to the number of times that word appears. Take the example below:

The words from the sentence go into the bag-of-words and come out as a dictionary of words with their corresponding counts. For statistical models, we call the text that we use to build the model our training data. Usually, we need to prepare our text data by breaking it up into documents
(shorter strings of text, generally sentences).
Let’s build a function that converts a given training text into a bag-of-words!
Instructions
Define a function text_to_bow()
that accepts some_text
as a variable. Inside the function, set bow_dictionary
equal to an empty dictionary and return it from the function. This is where we’ll be collecting the words and their counts.
Above the return statement, call the preprocess_text()
function we created for you on some_text
and assign the result to the variable tokens
.
Text preprocessing allows us to count words like “game” and “Games” as the same word token.
Still above the return
, iterate over each token
in tokens
and check if token
is already in the bow_dictionary
.
- If it is, increment that token’s count by
1
. (Remember that eachtoken
‘s count is its corresponding value within thebow_dictionary
.) - Otherwise, set the count equal to
1
because this is the first time the model has seen that word token.
Uncomment the print statement and run the code to see your bag-of-words function in action!