Another popular type of chunking is VP-chunking, or verb phrase chunking. A verb phrase is a phrase that contains a verb and its complements, objects, or modifiers.
Verb phrases can take a variety of structures, and here you will consider two. The first structure begins with a verb VB
of any tense, followed by a noun phrase, and ends with an optional adverb RB
of any form. The second structure switches the order of the verb and the noun phrase, but also ends with an optional adverb.
Both structures are considered because verb phrases of each form are essentially the same in meaning. For example, consider the part-of-speech tagged verb phrases given below:
(('said', 'VBD'), ('the', 'DT'), ('cowardly', 'JJ'), ('lion', 'NN'))
('the', 'DT'), ('cowardly', 'JJ'), ('lion', 'NN')), (('said', 'VBD'),
The chunk grammar to find the first form of verb phrase is given below:
chunk_grammar = "VP: {<VB.*><DT>?<JJ>*<NN><RB.?>?}"
VP
is the user-defined name of the chunk you are searching for. In this caseVP
stands for verb phrase<VB.*>
matches any verb using the.
as a wildcard and the*
quantifier to match0
or more occurrences of any character. This ensures matching verbs of any tense (ex.VB
for present tense,VBD
for past tense, orVBN
for past participle)<DT>?<JJ>*<NN>
matches any noun phrase<RB.?>
matches any adverb using the.
as a wildcard and the optional quantifier to match0
or1
occurrence of any character. This ensures matching any form of adverb (regularRB
, comparativeRBR
, or superlativeRBS
)?
is an optional quantifier, matching either0
or1
adverbs
The chunk grammar for the second form of verb phrase is given below:
chunk_grammar = "VP: {<DT>?<JJ>*<NN><VB.*><RB.?>?}"
Just like with NP-chunks, you can find all the VP-chunks in a text and perform a frequency analysis to identify important, recurring verb phrases. These verb phrases can give insight into what kind of action different characters take or how the actions that characters take are described by the author.
Once again, this is the part of the analysis where you get to be creative and use your own knowledge about the text you are working with to find interesting insights!
Instructions
Define a piece of chunk grammar named chunk_grammar
that will chunk a verb-phrase of the following form: verb VB
, followed by a noun phrase, followed by an optional adverb RB
. Name the chunk VP
.
Create a RegexpParser
object called chunk_parser
using chunk_grammar
as an argument.
That part-of-speech tagged novel pos_tagged_oz
you previously created has been imported for you in the workspace.
Create a for loop through each part-of-speech tagged sentence in pos_tagged_oz
. Within the for loop, VP-chunk each part-of-speech tagged sentence using chunk_parser
‘s .parse()
method and append the result to vp_chunked_oz
. Each item in vp_chunked_oz
will now be a verb phrase chunked sentence from The Wonderful Wizard of Oz!
A customized function vp_chunk_counter
that returns the 30
most common vp-chunks from a list of chunked sentences has been imported to the workspace for you. Call vp_chunk_counter
with vp_chunked_oz
as an argument and save the result to a variable named most_common_vp_chunks
.
Print most_common_chunks
. What sticks out to you about the most common verb phrase chunks? Does the action provided by the verbs give other insights simple noun phrases did not? Open the hint to see our analysis.
Want to see how vp_chunk_counter
works? Use the file navigator to open vp_chunk_counter.py
and inspect the function.
Go back to the chunk grammar you defined earlier and update the grammar to find a verb phrase of the following form: noun phrase, followed by a verb VB
, followed by an optional adverb RB
. Rerun your code and look at the most common chunks. What do you find?