Bag-of-Words Language Model

Whet your language model appetite with the widely used Bag-of-Words. Develop the underlying functionality in Python, then use scikit-learn.

Start[missing "en.views.course_landing_page.natural-language-processing.course_illustration" translation]
Bag-of-Words Language Model
Lesson 1 of 1
  1. 1

    "A bag-of-words is all you need," some NLPers have decreed. The bag-of-words language model is a simple-yet-powerful tool to have up your sleeve when working on natural language processing (NLP)....

  2. 2

    Bag-of-words (BoW) is a statistical language model based on word count. Say what? Let's start with that first part: a statistical language model is a way for computers to make sense o...

  3. 3

    One of the most common ways to implement the BoW model in Python is as a dictionary with each key set to a word and each value set to the number of times that word appears. Take the example below:...

  4. 4

    Sometimes a dictionary just won't fit the bill. Topic modelling applications, for example, require an implementation of bag-of-words that is a bit more mathematical: feature vectors. A feat...

  5. 5

    Now that you know what a bag-of-words vector looks like, you can create a function that builds them! First, we need a way of generating a features dictionary from a list of training documents. We...

  6. 6

    Nice work! Time to put that dictionary of vocabulary to good use and build a bag-of-words vector from a new document. In Python, we can use a list to represent a vector. Each index in the list wil...

  7. 7

    Phew! That was a lot of work. It's time to put [...] and [...] together and use them in a spam filter we created that uses a Naive Bayes classifier. We've slightly modified the two functions f...

  8. 8

    Amazing work! As is the case with many tasks in Python, there's already a library that can do all of that work for you. For [...] , you can approximate the functionality with the [...] module'...

  9. 9

    As you can see, bag-of-words is pretty useful! BoW also has several advantages over other language models. For one, it's an easier model to get started with and a few Python libraries already have ...

  10. 10

    Alas, there is a trade-off for all the brilliance BoW brings to the table. Unless you want sentences that look like "the a but for the", BoW is NOT a great primary model for text prediction. If t...

  11. 11

    You made it! And you've learned plenty about the bag-of-words language model along the way: - Bag-of-words (BoW) — also referred to as the unigram model — is a statistical language model based on w...

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo

Bag-of-Words Language Model

Start[missing "en.views.course_landing_page.natural-language-processing.course_illustration" translation]