In addition to directly calculating the tf-idf scores for a set of terms across a corpus, you can also convert a bag-of-words model you have already created into tf-idf scores.
TfidfTransformer is up to the task of converting your bag-of-words model to tf-idf. You begin by initializing a
tf_idf_transformer = TfidfTransformer(norm=False)
Given a bag-of-words matrix
count_matrix, you can now multiply the term frequencies by their inverse document frequency to get the tf-idf scores as follows:
tf_idf_scores = tfidf_transformer.fit_transform(count_matrix)
This is very similar to how we calculated inverse document frequency, except this time we are fitting and transforming the
TfidfTransformer to the term frequencies/bag-of-words vectors rather than just fitting the
TfidfTransformer to them.
Consider one last time the same selection of 6 Emily Dickinson poems given in poems.py. The term frequencies of each term-document pair are calculated in term_frequency.py and stored in
bow_matrix as a matrix and
df_bag_of_words as a Pandas DataFrame.
In script.py, print
df_bag_of_words to view the bag-of-words matrix (term-document matrix of term frequencies).
TfidfTransformer object named
transformer with keyword argument
transformer to fit and transform the bag-of-words matrix
bow_matrix into tf-idf scores. Save your result to