Learn
Natural Language Parsing with Regular Expressions
Chunking Verb Phrases

Another popular type of chunking is VP-chunking, or verb phrase chunking. A verb phrase is a phrase that contains a verb and its complements, objects, or modifiers.

Verb phrases can take a variety of structures, and here you will consider two. The first structure begins with a verb VB of any tense, followed by a noun phrase, and ends with an optional adverb RB of any form. The second structure switches the order of the verb and the noun phrase, but also ends with an optional adverb.

Both structures are considered because verb phrases of each form are essentially the same in meaning. For example, consider the part-of-speech tagged verb phrases given below:

  • (('said', 'VBD'), ('the', 'DT'), ('cowardly', 'JJ'), ('lion', 'NN'))
  • ('the', 'DT'), ('cowardly', 'JJ'), ('lion', 'NN')), (('said', 'VBD'),

The chunk grammar to find the first form of verb phrase is given below:

chunk_grammar = "VP: {<VB.*><DT>?<JJ>*<NN><RB.?>?}"
  • VP is the user-defined name of the chunk you are searching for. In this case VP stands for verb phrase
  • <VB.*> matches any verb using the . as a wildcard and the * quantifier to match 0 or more occurrences of any character. This ensures matching verbs of any tense (ex. VB for present tense, VBD for past tense, or VBN for past participle)
  • <DT>?<JJ>*<NN> matches any noun phrase
  • <RB.?> matches any adverb using the . as a wildcard and the optional quantifier to match 0 or 1 occurrence of any character. This ensures matching any form of adverb (regular RB, comparative RBR, or superlative RBS)
  • ? is an optional quantifier, matching either 0 or 1 adverbs

The chunk grammar for the second form of verb phrase is given below:

chunk_grammar = "VP: {<DT>?<JJ>*<NN><VB.*><RB.?>?}"

Just like with NP-chunks, you can find all the VP-chunks in a text and perform a frequency analysis to identify important, recurring verb phrases. These verb phrases can give insight into what kind of action different characters take or how the actions that characters take are described by the author.

Once again, this is the part of the analysis where you get to be creative and use your own knowledge about the text you are working with to find interesting insights!

Instructions

1.

Define a piece of chunk grammar named chunk_grammar that will chunk a verb-phrase of the following form: verb VB, followed by a noun phrase, followed by an optional adverb RB. Name the chunk VP.

2.

Create a RegexpParser object called chunk_parser using chunk_grammar as an argument.

3.

That part-of-speech tagged novel pos_tagged_oz you previously created has been imported for you in the workspace.

Create a for loop through each part-of-speech tagged sentence in pos_tagged_oz. Within the for loop, VP-chunk each part-of-speech tagged sentence using chunk_parser‘s .parse() method and append the result to vp_chunked_oz. Each item in vp_chunked_oz will now be a verb phrase chunked sentence from The Wonderful Wizard of Oz!

4.

A customized function vp_chunk_counter that returns the 30 most common vp-chunks from a list of chunked sentences has been imported to the workspace for you. Call vp_chunk_counter with vp_chunked_oz as an argument and save the result to a variable named most_common_vp_chunks.

Print most_common_chunks. What sticks out to you about the most common verb phrase chunks? Does the action provided by the verbs give other insights simple noun phrases did not? Open the hint to see our analysis.

Want to see how vp_chunk_counter works? Use the file navigator to open vp_chunk_counter.py and inspect the function.

5.

Go back to the chunk grammar you defined earlier and update the grammar to find a verb phrase of the following form: noun phrase, followed by a verb VB, followed by an optional adverb RB. Rerun your code and look at the most common chunks. What do you find?

Folder Icon

Sign up to start coding

Already have an account?