Discovering new code words in declassified CIA documents may seem like a mission for a foreign intelligence service, and detecting gender biases in the Harry Potter novels a task for a literature professor. Yet by utilizing natural language parsing with regular expressions, the power to perform such analyses is in your own hands!
While you may not put much explicit thought into the structure of your sentences as you write, the syntax choices you make are critical in ensuring your writing has meaning. Analyzing such sentence structure as well as word choice can not only provide insights into the connotation of a piece text, but can also highlight the biases of its author or uncover additional insights that even a deep, rigorous reading of the text might not reveal.
By using Python’s regular expression module
re and the Natural Language Toolkit, known as NLTK, you can find keywords of interest, discover where and how often they are used, and discern the parts-of-speech patterns in which they appear to understand the sometimes hidden meaning in a piece of writing. Let’s get started!
The code in the workspace performs natural language parsing with regular expressions on L. Frank Baum’s classic novel The Wonderful Wizard of Oz! . Run the code to view the output, which gives the frequency of different phrases that appear in the text.
Proceed to the next exercise when you are ready to learn how to perform such parsing yourself!