

A description of how this works is given later, following the discussion of parenthesized subpatterns. If the number is less than 10, or if there have been at least that many previous capturing left parentheses in the expression, the entire sequence is taken as a back reference. Outside a character class, PCRE reads it and any following digits as a decimal number. The handling of a backslash followed by a digit other than 0 is complicated. Make sure you supply two digits after the initial zero if the pattern character that follows is itself an octal digit. Thus the sequence \0\x\07 specifies two binary zeros followed by a BEL character (code value 7). In both cases, if there are fewer than two digits, just those that are present are used. Thus \cz becomes hex 1A, but \c.Īfter \0 up to two further octal digits are read. Then bit 6 of the character (hex 40) is inverted. The precise effect of \cx is as follows: if x is a lower case letter, it is converted to upper case. This is different from Perl in that $ and are handled as literals in \Q.\E sequences in PCRE, whereas in Perl, $ and cause variable interpolation. If you want to remove the special meaning from a sequence of characters, you can do so by putting them between \Q and \E. An escaping backslash can be used to include a whitespace or # character as part of the pattern. If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the pattern (other than in a character class) and characters between a # outside a character class and the next newline character are ignored. In particular, if you want to match a backslash, you write \\. This escaping action applies whether or not the following character would otherwise be interpreted as a metacharacter, so it is always safe to precede a non-alphanumeric with backslash to specify that it stands for itself. This use of backslash as an escape character applies both inside and outside character classes.įor example, if you want to match a * character, you write \* in the pattern. Firstly, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. The backslash character has several uses. The following sections describe the use of each of the metacharacters. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. There is also a summary of UTF-8 features in the section on UTF-8 support in the main PCRE page.Ī regular expression is a pattern that is matched against a subject string from left to right.

How this affects pattern matching is mentioned in several places below. To use this, you must build PCRE to include UTF-8 support, and then call pcre_compile() with the PCRE_UTF8 option. However, there is now also support for UTF-8 character strings. The original operation of PCRE was on strings of one-byte characters. This description of PCRE's regular expressions is intended as reference material.

Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, covers regular expressions in great detail. Regular expressions are also described in the Perl documentation and in a number of books, some of which have copious examples. The syntax and semantics of the regular expressions supported by PCRE are described below.
