Natural Language Parsing with Regular Expressions
Searching and Finding

You can make your regular expression matches even more dynamic with the help of the .search() method. Unlike .match() which will only find matches at the start of a string, .search() will look left to right through an entire piece of text and return a match object for the first match to the regular expression given. If no match is found, .search() will return None. For example, to search for a sequence of 8 word characters in the string Are you a Munchkin?:

result = re.search("\w{8}","Are you a Munchkin?")

Using .search() on the string above will find a match of "Munchkin", while using .match() on the same string would return None!

So far you have used methods that only return one piece of matching text. What if you want to find all the occurrences of a word or keyword in a piece of text to determine a frequency count? Step in the .findall() method!

Given a regular expression as its first argument and a string as its second argument, .findall() will return a list of all non-overlapping matches of the regular expression in the string. Consider the below piece of text:

text = "Everything is green here, while in the country of the Munchkins blue was the favorite color. But the people do not seem to be as friendly as the Munchkins, and I'm afraid we shall be unable to find a place to pass the night."

To find all non-overlapping sequences of 8 word characters in the sentence you can do the following:

list_of_matches = re.findall("\w{8}",text)

.findall() will thus return the list ['Everythi', 'Munchkin', 'favorite', 'friendly', 'Munchkin'].



The entire text of L. Frank Baum’s The Wonderful Wizard of Oz has been stored in oz_text. .search() for the occurrence of "wizard" in oz_text. Store the result in found_wizard, and print it.


Find all occurrences of "lion" in oz_text and store the result in all_lions. Print all_lions.


Save the length of all_lions to number_lions and print it. Given the number of occurrences, is the word “lion” important to the text?

It’s important to note that the number of words in an entire text can impact the importance of a given word’s frequency!

Folder Icon

Sign up to start coding

Already have an account?