Print

Regular Expression Basic Usage Examples

What?
A quick reminder on basic regular expressions.


Match Any Character — Dot
The dot operator '.' matches any single character in the current character set. For example, to find the sequence--'a', followed by any character, followed by 'c'--use the expression:
copyraw
a.c
  1.  a.c 
This expression matches all of the following sequences:
copyraw
abc
adc
a1c
a&c
  1.  abc 
  2.  adc 
  3.  a1c 
  4.  a&c 
The expression does not match:
copyraw
abb
  1.  abb 


One or More — Plus
The one or more operator '+' matches one or more occurrences of the preceding expression. For example, to find one or more occurrences of the character 'a', you use the regular expression:
copyraw
a+
  1.  a+ 
This expression matches all of the following:
copyraw
a
aa
aaa
  1.  a 
  2.  aa 
  3.  aaa 
The expression does not match:
copyraw
bbb
  1.  bbb 


Zero or One — Question Mark The question mark matches zero or one--and only one--occurrence of the preceding character or subexpression. You can think of this operator as specifying an expression that is optional in the source text.

For example, to find--'a', optionally followed by 'b', then followed by 'c'--you use the following regular expression:
copyraw
ab?c
  1.  ab?c 
This expression matches:
copyraw
abc
ac
  1.  abc 
  2.  ac 
The expression does not match:
copyraw
adc
abbc
  1.  adc 
  2.  abbc 


Zero or More — Star
The zero or more operator '*', matches zero or more occurrences of the preceding character or subexpression. For example, to find--'a', followed by zero or more occurrences of 'b', then followed by 'c'--use the regular expression:
copyraw
ab*c
  1.  ab*c 
This expression matches all of the following sequences:
copyraw
ac
abc
abbc
abbbbc
  1.  ac 
  2.  abc 
  3.  abbc 
  4.  abbbbc 
The expression does not match:
copyraw
adc
  1.  adc 


Interval — Exact Count
The exact-count interval operator is specified with a single digit enclosed in braces. You use this operator to search for an exact number of occurrences of the preceding character or subexpression.

For example, to find where 'a' occurs exactly 5 times, you specify the regular expression:
copyraw
a{5}
  1.  a{5} 
This expression matches:
copyraw
aaaaa
  1.  aaaaa 
The expression does not match:
copyraw
aaaa
  1.  aaaa 


Interval — At Least Count
You use the at-least-count interval operator to search for a specified number of occurrences, or more, of the preceding character or subexpression. For example, to find where 'a' occurs at least 3 times, you use the regular expression:
copyraw
a{3,}
  1.  a{3,} 
This expression matches all of the following:
copyraw
aaa
aaaaa
  1.  aaa 
  2.  aaaaa 
The expression does not match:
copyraw
aa
  1.  aa 


Interval — Between Count
You use the between-count interval operator to search for a number of occurrences within a specified range. For example, to find where 'a' occurs at least 3 times and no more than 5 times, you use the following regular expression:
copyraw
a{3,5}
  1.  a{3,5} 
This expression matches all of the following sequences:
copyraw
aaa
aaaa
aaaaa
  1.  aaa 
  2.  aaaa 
  3.  aaaaa 
The expression does not match:
copyraw
aa
aaaaaa
  1.  aa 
  2.  aaaaaa 


Matching Character List
You use the matching character list to search for an occurrence of any character in a list. For example, to find either 'a', 'b', or 'c' use the following regular expression:
copyraw
[abc]
  1.  [abc] 
This expression matches the first character in each of the following strings:
copyraw
at
bet
cot
  1.  at 
  2.  bet 
  3.  cot 
The expression does not match:
copyraw
def
  1.  def 


Non-Matching Character List
Use the non-matching character list to specify characters that you do not want to match. Characters that are not in the non-matching character list are returned as a match. For example, to exclude the characters 'a', 'b', and 'c' from your search results, use the following regular expression:
copyraw
[^abc]
  1.  [^abc] 
This expression matches characters 'd' and 'g' in the following strings:
copyraw
abcdef
ghi
  1.  abcdef 
  2.  ghi 
The expression does not match:
copyraw
abc
  1.  abc 


Character Range
For example, the following regular expression excludes any character between 'a' and 'i' from the search result:
copyraw
[^a-i]
  1.  [^a-i] 
This expression matches the characters 'j' and 'l' in the following strings:
copyraw
hijk
lmn
  1.  hijk 
  2.  lmn 
The expression does not match the characters:
copyraw
abcdefghi
  1.  abcdefghi 


Or
Use the Or operator '|' to specify an alternate expression. For example to match 'a' or 'b', use the following regular expression:
copyraw
a|b
  1.  a|b 
This expression matches the characters 'a' or 'b' in the following strings:
copyraw
a
b
  1.  a 
  2.  b 
The expression does not match the characters:
copyraw
ab
c
  1.  ab 
  2.  c 


Subexpression
You can use the subexpression operator to group characters that you want to find as a string or to create a complex expression. For example, to find the optional string 'abc', followed by 'def', use the following regular expression:
copyraw
(abc)?def
  1.  (abc)?def 
This expression matches strings 'abcdef' and 'def' in the following strings:
copyraw
abcdefghi
defghi
  1.  abcdefghi 
  2.  defghi 
The expression does not match the string:
copyraw
ghi
  1.  ghi 


Backreference
The backreference lets you search for a repeated expression. You specify a backreference with '\n', where n is an integer from 1 to 9 indicating the nth preceding subexpression in your regular expression.

For example, to find a repeated occurrence of either string 'abc' or 'def', use the following regular expression:
copyraw
(abc|def)\1
  1.  (abc|def)\1 
This expression matches the following strings:
copyraw
abcabc
defdef
  1.  abcabc 
  2.  defdef 
The expression does not match the following strings:
copyraw
abcdef
abc
  1.  abcdef 
  2.  abc 
The backreference counts subexpressions from left to right starting with the opening parenthesis of each preceding subexpression.

The backreference lets you search for a repeated string without knowing the actual string ahead of time. For example, the regular expression:
copyraw
^(.*)\1$
  1.  ^(.*)\1
matches a line consisting of two adjacent appearances of the same string.

Escape Character
Use the escape character '\' to search for a character that is normally treated as a metacharacter. For example to search for the '+' character, use the following regular expression:
copyraw
\+
  1.  \+ 
This expression matches the plus character '+' in the following string:
copyraw
abc+def
  1.  abc+def 
The expression does not match any characters in the string:
copyraw
abcdef
  1.  abcdef 


Beginning of Line Anchor
Use the beginning of line anchor ^ to search for an expression that occurs only at the beginning of a line. For example, to find an occurrence of the string def at the beginning of a line, use the expression:
copyraw
^def
  1.  ^def 
This expression matches def in the string:
copyraw
defghi
  1.  defghi 
The expression does not match def in the following string:
copyraw
abcdef
  1.  abcdef 


End of Line Anchor
The end of line anchor metacharacter '$' lets you search for an expression that occurs only at the end of a line. For example, to find an occurrence of def that occurs at the end of a line, use the following expression:
copyraw
def$
  1.  def$ 
This expression matches def in the string:
copyraw
abcdef
  1.  abcdef 
The expression does not match def in the following string:
copyraw
defghi
  1.  defghi 


POSIX Character Class
The POSIX character class operator lets you search for an expression within a character list that is a member of a specific POSIX Character Class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.

To use this operator, specify the expression using the syntax [:class:] where class is the name of the POSIX character class to search for. For example, to search for one or more consecutive uppercase characters, use the following regular expression:
copyraw
[[:upper:]]+
  1.  [[:upper:]]+ 
This expression matches 'DEF' in the string:
copyraw
abcDEFghi
  1.  abcDEFghi 
The expression does not return a match for the following string:
copyraw
abcdefghi
  1.  abcdefghi 
Note that the character class must occur within a character list, so the character class is always nested within the brackets for the character list in the regular expression.


POSIX Collating Sequence
The POSIX collating sequence element operator [. .] lets you use a collating sequence in your regular expression. The element you specify must be a defined collating sequence in the current locale.

This operator lets you use a multicharacter collating sequence in your regular expression where only one character would otherwise be allowed. For example, you can use this operator to ensure that the collating sequence 'ch', when defined in a locale such as Spanish, is treated as one character in operations that depend on the ordering of characters.

To use the collating sequence operator, specify [.element.] where element is the collating sequence you want to find. You can use any collating sequence that is defined in the current locale including single-character elements as well as multicharacter elements.

For example, to find the collating sequence 'ch', use the following regular expression:
copyraw
[[.ch.]]
  1.  [[.ch.]] 
This expression matches the sequence 'ch' in the following string:
copyraw
chabc
  1.  chabc 
The expression does not match the following string:
copyraw
cdefg
  1.  cdefg 
You can use the collating sequence operator in any regular expression where collation is needed. For example, to specify the range from 'a' to 'ch', you can use the following expression:
copyraw
[a-[.ch.]]
  1.  [a-[.ch.]] 


POSIX Character Equivalence Class
Use the POSIX character equivalence class operator to search for characters in the current locale that are equivalent. For example, to find the Spanish character 'ñ' as well as 'n'.

To use this operator, specify [=character=], to find all characters that are members of the same character equivalence class as the specified character.

For example, the following regular expression could be used to search for characters equivalent to 'n' in a Spanish locale:
copyraw
[[=n=]]
  1.  [[=n=]] 
This expression matches both 'N' and 'ñ' in the following string:
copyraw
El Niño
  1.  El Niño 


Source(s): Using Regular Expressions With Oracle Database
Category: Web-Development :: Article: 531