Nice work! Time to put that dictionary of vocabulary to good use and build a bag-of-words vector from a new document.
In Python, we can use a list to represent a vector. Each index in the list will correspond to a word and be set to its count.
Define a function
text_to_bow_vector() with two parameters:
some_text(the document we pass in to vectorize)
features_dictionary(the dictionary of vocabulary we generated in the previous exercise)
Create a list of
0s the length of
features_dictionary and assign it to the variable
0 represents a word’s count within the vector.
bow_vector from the function.
Above the return statement, preprocess the
some_text document using the
preprocess_text() function we built for you and assign the result to the variable
tokens as a second return value for the function.
Still above the return statement, loop through each
- Determine which index the
features_dictionaryand assign the value to a new variable
feature_index. (Take a look at the gif. If
tokenis the word
fish, then we would want
- Now that you have the word’s index, access the word count index within the
bow_vectorand increment that count by
Uncomment the print statement to test out the function!