Informatica Reference

REG_EXTRACT

Posted on: 12/14/2011

Extracts subpatterns of a regular expression within an input value. For example, from a regular expression pattern for a full name, you can extract the first name or last name.
Note: Use the REG_REPLACE function to replace a character pattern in a string with another character pattern.

Syntax
REG_EXTRACT( subject, ‘pattern’, subPatternNum )

Using perl Compatible Regular Expression Syntax
You must use perl compatible regular expression syntax with REG_EXTRACT, REG_MATCH and REG_REPLACE functions.

The following table provides perl compatible regular expression syntax guidelines:

Syntax
Description
(a period)
Matches any one character.
[a-z]
Measurement is the amount determined by observation.Matches one instance of a character in lower case. For example, [a-z] matches ab. Use [A-Z] to match characters in upper case.
\d
Matches one instance of any digit from 0-9.
\s
Matches a whitespace character.
\w
Matches one alphanumeric character, including underscore (_)
()
Groups an expression.For example, the parentheses in (\d-\d-\d\d) groups the expression \d\d-\d\d,which finds any two numbers followed by a hyphen and any two numbers, as in 12-34.
{}
Matches the number of characters. For example, \d{3} matches any three numbers, such as 650 or 510. Or, [a-z]{2} matches any two letters, such as CA or NY.
?
Matches the preceding character or group of characters zero or one time. For example, \d{3}(-{d{4})? matches any three numbers, which can be followed by a hyphen and any four numbers.
* (an asterisk)
Matches zero or more instances of the values that follow the asterisk. For example, *0 is any value that precedes a 0.
+
Matches one or more instances of the values that follow the plus sign. For example, \w+ is any value that follows an alphanumeric character.

For example, the following regular expression finds 5-digit U.S.A. zip codes, such as 93930, and 9-digit zip codes, such as 93930-5407:

\d{5}(-\d{4})?
\d{5} refers to any five numbers, such as 93930. The parentheses surrounding -\d{4} group this segment of the expression. The hyphen represents the hyphen of a 9-digit zip code, as in 93930-5407. \d{4} refers to any four numbers, such as 5407. The question mark states that the hyphen and last four digits are optional or can appear one time.

Converting COBOL Syntax to perl Compatible Regular Expression Syntax

If you are familiar with COBOL syntax, you can use the following information to write perl compatible regular expressions.

The following table shows examples of COBOL syntax and their perl equivalents:

Cobol Syntax
Cobol Syntax
Description
9
\d
Matches one instance of any digit from 0-9.
9999
\d\d\d\d
or
\d{4}
Matches any four digits from 0-9, as in 1234 or 5936.
x
[a-z] Matches one instance of a letter.
9xx9
\d[a-z][a-z]\d
Matches any number followed by two letters and another number, as in 1ab2.

Converting SQL Syntax to perl Compatible Regular Expression Syntax

If you are familiar with SQL syntax, you can use the following information to write perl compatible regular expressions.

The following table shows examples of SQL syntax and their perl equivalents:

SQL Syntax
perl Syntax
Description
%
. *
Matches any string.
A%
A.*
Matches the letter “A” followed by any string, as in Area.
_
. (a period) Matches any one character.
A_
A.
Matches “A” followed by any one character, such as AZ.

Return Value
Returns the value of the nth subpattern that is part of the input value. The nth subpattern is based on the value you specify for subPatternNum.
NULL if the input is a null value or if the pattern is null.

Example
You might use REG_EXTRACT in an expression to extract middle names from a regular expression that matches first name, middle name, and last name. For example, the following expression returns the middle name of a regular expression:

REG_EXTRACT( Employee_Name, ‘(\w+)\s+(\w+)\s+(\w+)’,2)

Employee_Name                      Return Value
Stephen Graham Smith                 Graham
Juan Carlos Fernando                   Carlos

Advertisements
%d bloggers like this: