Skip to Content
Learn
Natural Language Parsing with Regular Expressions
Chunk Filtering

Another option you have to find chunks in your text is chunk filtering. Chunk filtering lets you define what parts of speech you do not want in a chunk and remove them.

A popular method for performing chunk filtering is to chunk an entire sentence together and then indicate which parts of speech are to be filtered out. If the filtered parts of speech are in the middle of a chunk, it will split the chunk into two separate chunks! The chunk grammar you can use to perform chunk filtering is given below:

chunk_grammar = """NP: {<.*>+} }<VB.?|IN>+{"""
  • NP is the user-defined name of the chunk you are searching for. In this case NP stands for noun phrase
  • The brackets {} indicate what parts of speech you are chunking. <.*>+ matches every part of speech in the sentence
  • The inverted brackets }{ indicate which parts of speech you want to filter from the chunk. <VB.?|IN>+ will filter out any verbs or prepositions

Chunk filtering provides an alternate way for you to search through a text and find the chunks of information useful for your analysis!

Instructions

1.

The code in the workspace chunks an entire sentence together using the chunk grammar "Chunk: {<.*>+}". Run the code and view the output to see how the sentence is chunked into one big chunk named Chunk!

2.

Define a piece of chunk grammar named chunk_grammar that will chunk a noun phrase using chunk filtering. Name the chunk NP.

3.

Create a RegexpParser object called chunk_parser using chunk_grammar as an argument.

4.

Now you can find the NP-chunks in a sentence from The Wonderful Wizard of Oz using chunk filtering! Chunk and filter the part-of-speech tagged sentence stored at index 230 in pos_tagged_oz using chunk_parser‘s .parse() method. Save the result to filtered_dancers, and print filtered_dancers.

What parts of speech are removed from the chunk? What chunks remain?

5.

The last line in the workspace .pretty_print()s the chunked and filtered sentence with nltk. Uncomment the line and run the code to view the chunked and filtered sentence. Expand the output terminal all the way to the left to get a better view!

Folder Icon

Sign up to start coding

Already have an account?