Key Concepts

Review core concepts you need to learn to master this subject

Regular Expressions

Regular expressions are sequence of characters defining a pattern of text that needs to be found. They can be used for parsing the text files for specific pattern, verifying test results, and finding keywords in emails or webpages.

Literals in Regular Expressions

In Regular expression, the literals are the simplest characters that will match the exact text of the literals. For example, the regex monkey will completely match the text monkey but will also match monkey in text The monkeys like to eat bananas.

Alternation in Regular Expressions

Alternation indicated by the pipe symbol |, allows for the matching of either of two subexpressions. For example, the regex baboons|gorillas will match the text baboons as well as the text gorillas.

Character Sets in Regular Expressions

Regular expression character sets denoted by a pair of brackets [] will match any of the characters included within the brackets. For example, the regular expression con[sc]en[sc]us will match any of the spellings consensus, concensus, consencus, and concencus.

Wildcards in Regular expressions

In Regular expression, wildcards are denoted with the period . and it can match any single character (letter, number, symbol or whitespace) in a piece of text. For example, the regular expression ......... will match the text orangutan, marsupial, or any other 9-character text.

Regular Expression Ranges

Regular expression ranges are used to specify a range of characters that can be matched. Common regular expression ranges include: [A-Z]. : match any uppercase letter [a-z]. : match any lowercase letter [0-9]. : match any digit [A-Za-z] : match any uppercase or lowercase letter.

Shorthand Character Classes in Regular Expressions

Shorthand character classes simplify writing regular expressions. For example, \w represents the regex range [A-Za-z0-9_], \d represents [0-9], \W represents [^A-Za-z0-9_] matching any character not included by \w, \D represents [^0-9] matching any character not included by \d.

Grouping in Regular Expressions

In Regular expressions, grouping is accomplished by open ( and close parenthesis ). Thus the regular expression I love (baboons|gorillas) will match the text I love baboons as well as I love gorillas, as the grouping limits the reach of the | to the text within the parentheses.

Fixed Quantifiers in Regular Expressions

In Regular expressions, fixed quantifiers are denoted by curly braces {}. It contains either the exact quantity or the quantity range of characters to be matched. For example, the regular expression roa{3}r will match the text roaaar, while the regular expression roa{3,6}r will match roaaar, roaaaar, roaaaaar, or roaaaaaar.

Optional Quantifiers in Regular Expressions

In Regular expressions, optional quantifiers are denoted by a question mark ?. It indicates that a character can appear either 0 or 1 time. For example, the regular expression humou?r will match the text humour as well as the text humor.

Kleene Star & Kleene Plus in Regular Expressions

In Regular expressions, the Kleene star(*) indicates that the preceding character can occur 0 or more times. For example, meo*w will match mew, meow, meooow, and meoooooooooooow. The Kleene plus(+) indicates that the preceding character can occur 1 or more times. For example, meo+w will match meow, meooow, and meoooooooooooow, but not match mew.

Anchors in Regular Expressions

Anchors (hat ^ and dollar sign $) are used in regular expressions to match text at the start and end of a string, respectively. For example, the regex ^Monkeys: my mortal enemy$ will completely match the text Monkeys: my mortal enemy but not match Spider Monkeys: my mortal enemy or Monkeys: my mortal enemy in the wild. The ^ ensures that the matched text begins with Monkeys, and the $ ensures the matched text ends with enemy.

Arrow Chevron Left Icon
Introduction to Regular Expressions
Lesson 1 of 2
Arrow Chevron Right Icon
  1. 1
    When registering an account for a new social media app or completing an order for a gift online, nearly every piece of information you enter into a web form is validated. Did you enter a properly f…
  2. 2
    The simplest text we can match with regular expressions are literals. This is where our regular expression contains the exact text that we want to match. The regex a, for example, will match …
  3. 3
    Do you love baboons and gorillas? You can find either of them with the same regular expression using alternation! Alternation, performed in regular expressions with the pipe symbol, |, allows…
  4. 4
    Spelling tests may seem like a distant memory from grade school, but we ultimately take them every day while typing. It’s easy to make mistakes on commonly misspelled words like consensus, and on t…
  5. 5
    Sometimes we don’t care exactly WHAT characters are in a text, just that there are SOME characters. Enter the wildcard .! Wildcards will match any single character (letter, number, symbol or …
  6. 6
    Character sets are great, but their true power isn’t realized without ranges. Ranges allow us to specify a range of characters in which we can make a match without having to type out each ind…
  7. 7
    While character ranges are extremely useful, they can be cumbersome to write out every single time you want to match common ranges such as those that designate alphabetical characters or digits. To…
  8. 8
    Remember when we were in love with baboons and gorillas a few exercises ago? We were able to match either baboons or gorillas using the regex baboons|gorillas, taking advantage of the | symbol. Bu…
  9. 9
    Here’s where things start to get really interesting. So far we have only matched text on a character by character basis. But instead of writing the regex \w\w\w\w\w\w\s\w\w\w\w\w\w, which would mat…
  10. 10
    You are working on a research project that summarizes the findings of primate behavioral scientists from around the world. Of particular interest to you are the scientists’ observations of humor in…
  11. 11
    In 1951, mathematician Stephen Cole Kleene developed a system to match patterns in written language with mathematical notation. This notation is now known as regular expressions! In his honor, the…
  12. 12
    When writing regular expressions, it’s useful to make the expression as specific as possible in order to ensure that we do not match unintended text. To aid in this mission of specificity, we can u…
  13. 13
    Do you feel those regular expression superpowers coursing through your body? Do you just want to scream ah+ really loud? Awesome! You are now ready to take these skills and use them out in the wild…
  1. 1
    Discovering new code words in declassified CIA documents may seem like a mission for a foreign intelligence service, and detecting gender biases in the Harry Potter novels a task for a literature p…
  2. 2
    Before you dive into more complex syntax parsing, you’ll begin with basic regular expressions in Python using the re module as a regex refresher. The first method you will explore is _**.compi…
  3. 3
    You can make your regular expression matches even more dynamic with the help of the .search() method. Unlike .match() which will only find matches at the start of a string, .search() will loo…
  4. 4
    While it is useful to match and search for patterns of individual characters in a text, you can often find more meaning by analyzing text on a word-by-word basis, focusing on the part of speech of …
  5. 5
    You have made it to the juicy stuff! Given your part-of-speech tagged text, you can now use regular expressions to find patterns in sentence structure that give insight into the meaning of a text. …
  6. 6
    While you are able to chunk any sequence of parts of speech that you like, there are certain types of chunking that are linguistically helpful for determining meaning and bias in a piece of text. O…
  7. 7
    Another popular type of chunking is VP-chunking, or verb phrase chunking. A verb phrase is a phrase that contains a verb and its complements, objects, or modifiers. Verb phrases can take…
  8. 8
    Another option you have to find chunks in your text is chunk filtering. Chunk filtering lets you define what parts of speech you do not want in a chunk and remove them. A popular method …
  9. 9
    And there you go! Now you have the toolkit to dig into any piece of text data and perform natural language parsing with regular expressions. What insights will you gain, or what bias may you uncove…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo