Regular Expressions

Function: Regular Expressions

There are four regex functions:

regex_match: Compares a string with a regular expression, returning true if the entire string matches, otherwise false.
regex_search: Searches a string for a regex pattern and returns the position of the found string or -1 if not found.
regex_extract: Retrieves an array of extracted content. If the expression contains groups, the function returns all the groups as a delimited list. The first line is the full match, and the groups follow on successive lines. The default delimiter is a line-feed character, but another can be specified.
regex_replace: Replaces a match with specified content. The replace string may contain \1, \2, etc., to refer to groups in the match. If the pattern is not found, the original subject is returned.

Prototypes:

bool = regex_match ( <regex string>, <subject> );
int = regex_search ( <regex string>, <subject> );
array string = regex_extract ( <regex string>, <subject>{, <group delimiter>} );
string = regex_replace ( <regex string>, <subject>, <replace> );

Parameters:

Parameter Name	Type	Description
regex string	string	A string containing a regular expression pattern.
subject	string	The text to search in.
replace	string	Only for `regex_replace`: The new string to replace, which can contain references to the groups.
group delimiter	string	When using `regex_extract`, the return value is an array with matches. If the pattern contains groups, the groups are delimited with this delimiter. This parameter is optional, and the default value is "\n" (line-feed character).

Return values:

regex_match: Returns true or false if found or not.
regex_search: Returns the position of the string.
regex_extract: Returns an array with the query result.
regex_replace: Returns the modified string.

Example:

// Replace a path with a standardized new path, keeping the filename.
string fn = "/users/ole/Documents/mytestfile.txt";
string newfn = regex_replace("^(.+)/(.+)\..+$", fn, "/users/ole/result/\2.restype");

// The newfn now contains: /users/ole/result/mytestfile.restype

Example 2: Check Email Address (RFC 5322 Official Standard):

function check_email(string email)
    string regex = "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])";
    return regex_match(regex, email);
end

Normalize Name Using regex_replace:

In this example, we normalize a name. If it's written as last name, first name(s), we reverse it and remove any commas. If there is no comma, we don't modify it.

The regex contains two capture groups: the first word and the remaining words. However, there must be a comma in between for the regex to match. If matched, we replace the whole thing with \2 (meaning the second group) and \1 (meaning the first).

function NormaliseName(string Name)
    Name = regex_replace("(\w+),\s?(.+)", Name, "\2 \1");
    return Name;
end

// Hornnes, Ole Kristian => Ole Kristian Hornnes
// Hornnes Ole Kristian => Hornnes Ole Kristian

Example regex_extract:

In this example, we extract some emails and names from a list. Each entry in the resulting array has a delimited list of groups. The first is the whole match, followed by the groups. We have chosen "::" as the group delimiter in this example, but it can be anything. If you are looping through the array, you can use the explode function to create a new array for each iteration in the loop. This way, you can work on each match and put the data where it belongs.

Usually, a two-dimensional array would fit here, but that is not yet available in ACF.

Sample text:

Hornnes, Ole Kristian ole@example.no
Bjarne Hansen bjarne@b.com
Borre Josefson borre@j.com

Extract Function:

Function testExtract(string source)
    source = substitute(source, "\r", "\n");
    string regex = "([A-Za-z, ]+)\s([a-z]+@[a-z]+\.[a-z]{2,3})";
    array string results = regex_extract(regex, source, "::");
    int l = sizeof(results);
    int i;
    for (i = 1, l)
        print "Line " + i + ":" + results[i];
        print "\n";
    end for
    return implode("||", results);
end

Console output:

Line 1:Hornnes, Ole Kristian ole@example.no::Hornnes, Ole Kristian::ole@example.no
Line 2:Bjarne Hansen bjarne@b.com::Bjarne Hansen::bjarne@b.com
Line 3:Borre Josefson borre@j.com::Borre Josefson::borre@j.com

New to Regular Expressions?

Regular expressions, often abbreviated as regex, are powerful tools for text manipulation and error checking. While they can be challenging to read for those unfamiliar with them, gaining knowledge about regular expressions can save you hours of work and improve code quality.

A regex pattern is a set of rules for interpreting string data, with each character in the pattern having a specific meaning. For those looking to learn more about regular expressions, consider the following resources:

Lynda.com offers a comprehensive regex course that is well worth a few hours of your time.
regex101.com provides an online regex editor and testing tool, complete with excellent documentation and explanations for each character in your regex pattern.

regex-101

References:

The Regex101 website contains regex documentation and a testing tool.