Regular Expressions
Regular expressions are a language used for pattern-matching text content, and they are implemented in Java through the Pattern
and Matcher
classes. The Pattern
class represents a compiled regular expression, while the Matcher
class uses a Pattern
to perform operations on text. Multiple Matcher
instances can use the same Pattern
instance. Both classes are part of java.util.regex
.
Details on the syntax of regular expressions can be found here.
Using the Pattern
Class
An instance of the Pattern
class is used to hold a compiled version of a regular expression pattern. The syntax for creating a pattern instance is:
Pattern p = Pattern.compile(re, flags)
Where re
is a regular expression pattern. And flags
is an optional int bit mask specifying the flags for the pattern.
The flags
parameter can include the following:
Pattern.CASE_INSENSITIVE
: Enables case-insensitive matchingPattern.MULTILINE
: Enables multiline mode where^
and$
match the start and end of a line rather than start and end of the whole text.Pattern.DOTALL
: Allows.
to match any character, including a line terminator.Pattern.UNICODE_CASE
: AllowsCASE_INSENSITIVE
to follow the Unicode standard, rather than restricting to the US-ASCII character set.Pattern.CANON_EQ
: Forces matching to take canonical equivalence into account.Pattern.UNIX_LINES
: Forces\n
to be the only line delimiter recognized by.
,^
and$
.Pattern.LITERAL
: Forces all metacharacters in the pattern to be interpreted as literal characters instead.Pattern.UNICODE_CHARACTER_CLASS
: Enables the Unicode version of character classes.Pattern.COMMENTS
: Allows whitespace and comments in pattern.
The Pattern
class includes the following methods:
.compile(pattern, flags)
: Static method that returns aPattern
instance based on the givenpattern
and optionalflags
..pattern()
: Returns the string pattern with which the instance was compiled..flags()
: Returns the flags bit mask with which the instance was compiled..matcher(input)
: Returns aMatcher
instance that applies thePattern
instance against the suppliedinput
text..matches(pattern, input)
: Static method returns a boolean if the givenpattern
matches a string in the suppliedinput
text..split(input, limit)
: Returns an array that splits theinput
around the matches found by the compiled pattern, and the optional intlimit
specifies the maximum number of strings to return in the array.
Using the Matcher
Class
An instance of the Matcher
class is used to perform operations against input text using a compiled Pattern
instance. A Matcher
instance is created from a Pattern
instance using the following syntax.
Matcher m = pattern.matcher(input)
Where pattern
is a compiled Pattern
instance and input
is the input text to be matched against it. The Matcher
can be used to search the whole input, or a region of the input, finding each match, sub-matches, and their locations in the input
text.
The Matcher
class includes the following methods:
.end(group)
: Returns the offset after the last character matched. If optionalint
group
included, returns the index of the match made by the given subgroup during the last match operation. (Subgroups defined by enclosing parentheses(...)
).find(start)
: Attempts to find the next match in the input. If optionalint
start
included, resets theMatcher
instance and finds the next match after the specified index in the input..group(group)
: Returns the section of input last matched in the input. If optionalint
group
specified, find the numbered subgroup matched in the input. (Subgroups defined by enclosing parentheses(...)
).hitEnd()
: Returns true if the last match hit the end of input..lookingAt()
: Attempts to find a match beginning at start of region. True if one found..matches()
: Attempts to find a match in the entire region. True if found..pattern()
: Returns thePattern
instance used by thisMatcher
instance..region(start, end)
: Sets the region of input used by thisMatcher
instance..regionEnd()
: Returns the end of region for thisMatcher
instance..regionStart()
: Returns the start of region for thisMatcher
instance..replaceAll(replacement)
: Replaces all incidences of matches with the givenreplacement
string. Returns modified string..replaceFirst(replacement)
: Replaces first match in the input with the givenreplacement
string. Returns modified string..reset(input)
: Resets thisMatcher
instance. If optionalinput
specified, resets with newinput
text..start(group)
: Returns the offset of the first character matched. If optionalint
group
included, returns the index of the match made by the given subgroup during the last match operation. (Subgroups defined by enclosing parentheses(...)
).usePattern(pattern)
: SetsMatcher
instance to use newPattern
instancepattern
.
Example
The following example finds all the words that start with “s” and have an “e” as the second or third character.
import java.util.regex.*;public class Example {public static void main(String args[]) {Pattern p = Pattern.compile("s.?e[a-z]+");Matcher m = p.matcher("Susie sells sea shells by the sea shore.");boolean matchFound = m.find();while ( matchFound ) {System.out.println(m.group());matchFound = m.find();}}}
This produces the following output:
sellsseashellssea
Looking to contribute?
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.