We talked about character classes as one of the means to group characters in regex. The other is known as a capture group. Capture groups maintain the order but allow you to section the pattern and open the door for substring searching and backreferencing.
The following is known as a capture group (note the “aab” here is user-defined, so anything can be used):
|(aab)||Groups tokens together and creates a capture group|
Grouped tokens can be used for extracting a substring or using a back reference. Let’s go over an example.
Pattern pattern = Pattern.compile("(co)+");
The above regex pattern matches any text that contains the character combination “co” one or more times. What this will actually match could look like “co,” but it would also match with the following: “coco,” “cococo,” “cocococo,” etc.
Backreferencing is something that won’t be covered here, but note that it’s a useful tool that provides some good flexibility when performing regex text searching.
In any case, character classes and capture groups may look a little similar but just remember:
 denotes character class,
() denotes capture group.
Now let’s get into some unique character specifications in regex.
\ is what’s called an escape character, which when used in this manner helps define special characters with special functions. As you saw from the character class examples, it was used there to signify predefined character classes.
But how do we search for special operators without triggering their predefined special properties? For instance, how do we search for the
. to mark the end of a sentence without regex trying to interpret it as “any character except newline”?
\ character is an escape character, which carries special meaning in regex. It both denotes special characters with predefined functionality in regex and also allows you to “escape” certain special predefined characters.
Let’s look at some escaped character examples:
||Newline or linefeed|
Escaped characters insert reserved, special, and Unicode characters into the regex pattern. All escaped characters begin with the
\ character, which also allows the pattern to search for special characters without enacting their special properties by treating the special character as plaintext.
Note: Some escape character examples could also be found in the character class examples listed in the previous exercise.
Let’s see if we can take a given list of numbers and convert it into an array format using regex.
Declare a pattern called
pattern that searches for newline characters.
Declare a matcher called
matcher with the text to match as the following:
Let’s see if we can count how many newlines are in this string.
find() and a
while loop on the matcher object and store the result in an
int. Print the result.
Using the matcher object, replace all newline characters with
Additionally, as per the standard array declaration, we also need to insert an
[ at the beginning and an
] at the end of the string.
Thankfully, Java makes this easy. Because the replace method returns the result as a string, we can concatenate a
"[" onto the front of the string and an
"]" onto the end using Java’s
Print the final string after performing the above concatenation.
Run your code and check out the newly formatted string.