regexp − regular expression notation

A regular expression specifies a set of strings of
characters.  A member of this set of strings is said to be
matched by the regular expression.  In many applications a
delimiter character, commonly bounds a regular expression.
In the following specification for regular expressions the
word ‘character’ means any character (rune) but newline.

     The syntax for a regular expression e0 is

     e3:  literal | charclass | ’.’ | ’^’ | ’$’ | ’(’ e0 ’)’

     e2:  e3
       |  e2 REP
     REP: ’*’ | ’+’ | ’?’

     e1:  e2
       |  e1 e2

     e0:  e1
       |  e0 ’|’ e1

     A literal is any non‐metacharacter or a metacharacter
(one of or the delimiter preceded by

     A charclass is a nonempty string s bracketed ][s (or
]);[^s it matches any character in (or not in) s.  A negated
character class never matches newline.  A substring with a
and b in ascending order, stands for the inclusive range of
characters between a and In s, the metacharacters an initial
and the regular expression delimiter must be preceded by a
other metacharacters have no special meaning and may appear

     A .  matches any character.

     A ^ matches the beginning of a line; $ matches the end
of the line.

     The REP operators match zero or more one or more zero
or one instances respectively of the preceding regular

     A concatenated regular expression, matches a match to
e1 followed by a match to

     An alternative regular expression, matches either a
match to e0 or a match to

     A match to any part of a regular expression extends as
far as possible without preventing a match to the remainder
of the regular expression.