mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-11-30 00:01:50 +00:00
f09cad0452
r=zach
258 lines
8.7 KiB
HTML
258 lines
8.7 KiB
HTML
<HTML>
|
|
<HEAD>
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
|
<META NAME="Author" CONTENT="lloyd tabb">
|
|
<META NAME="GENERATOR" CONTENT="Mozilla/4.0 [en] (WinNT; I) [Netscape]">
|
|
<TITLE>Regular expressions in the cvs query tool</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
|
|
<H1>
|
|
Description of MySQL regular expression syntax.</H1>
|
|
Regular expressions are a powerful way of specifying complex searches.
|
|
|
|
<P><B>MySQL</B> uses Henry Spencer's implementation of regular expressions.
|
|
And that is aimed to conform to POSIX 1003.2. <B>MySQL</B> uses the extended
|
|
version.
|
|
|
|
<P>To get more exact information see Henry Spencer's regex.7 manual.
|
|
|
|
<P>This is a simplistic reference that skips the details. From here on
|
|
a regular expression is called a regexp.
|
|
|
|
<P>A regular expression describes a set of strings. The simplest case is
|
|
one that has no special characters in it. For example the regexp <TT>hello</TT>
|
|
matches <TT>hello</TT> and nothing else.
|
|
|
|
<P>Nontrivial regular expressions use certain special constructs so that
|
|
they can match more than one string. For example, the regexp <TT>hello|word</TT>
|
|
matches either the string <TT>hello</TT> or the string <TT>word</TT>.
|
|
|
|
<P>And a more complex example regexp <TT>B[an]*s</TT> matches any of the
|
|
strings <TT>Bananas</TT>, <TT>Baaaaas</TT>, <TT>Bs</TT> and all other string
|
|
starting with a <TT>B</TT> and continuing with any number of <TT>a</TT>
|
|
<TT>n</TT> and ending with a <TT>s</TT>.
|
|
|
|
<P>The following special characters/constructs are known.
|
|
<DL COMPACT>
|
|
<DT>
|
|
<TT>^</TT></DT>
|
|
|
|
<DD>
|
|
Start of whole string.</DD>
|
|
|
|
<PRE>mysql> select "fo\nfo" regexp "^fo$"; -> 0
|
|
mysql> select "fofo" regexp "^fo"; -> 1</PRE>
|
|
|
|
<DT>
|
|
<TT>$</TT></DT>
|
|
|
|
<DD>
|
|
End of whole string.</DD>
|
|
|
|
<PRE>mysql> select "fo\no" regexp "^fo\no$"; -> 1
|
|
mysql> select "fo\no" regexp "^fo$"; -> 0</PRE>
|
|
|
|
<DT>
|
|
<TT>.</TT></DT>
|
|
|
|
<DD>
|
|
Any character (including newline).</DD>
|
|
|
|
<PRE>mysql> select "fofo" regexp "^f.*"; -> 1
|
|
mysql> select "fo\nfo" regexp "^f.*"; -> 1</PRE>
|
|
|
|
<DT>
|
|
<TT>a*</TT></DT>
|
|
|
|
<DD>
|
|
Any sequence of zero or more a's.</DD>
|
|
|
|
<PRE>mysql> select "Ban" regexp "^Ba*n"; -> 1
|
|
mysql> select "Baaan" regexp "^Ba*n"; -> 1
|
|
mysql> select "Bn" regexp "^Ba*n"; -> 1</PRE>
|
|
|
|
<DT>
|
|
<TT>a+</TT></DT>
|
|
|
|
<DD>
|
|
Any sequence of one or more a's.</DD>
|
|
|
|
<PRE>mysql> select "Ban" regexp "^Ba+n"; -> 1
|
|
mysql> select "Bn" regexp "^Ba+n"; -> 0</PRE>
|
|
|
|
<DT>
|
|
<TT>a?</TT></DT>
|
|
|
|
<DD>
|
|
Either zero or one a.</DD>
|
|
|
|
<PRE>mysql> select "Bn" regexp "^Ba?n"; -> 1
|
|
mysql> select "Ban" regexp "^Ba?n"; -> 1
|
|
mysql> select "Baan" regexp "^Ba?n"; -> 0</PRE>
|
|
|
|
<DT>
|
|
<TT>de|abc</TT></DT>
|
|
|
|
<DD>
|
|
Either the sequence <TT>de</TT> or <TT>abc</TT>.</DD>
|
|
|
|
<PRE>mysql> select "pi" regexp "pi|apa"; -> 1
|
|
mysql> select "axe" regexp "pi|apa"; -> 0
|
|
mysql> select "apa" regexp "pi|apa"; -> 1
|
|
mysql> select "apa" regexp "^(pi|apa)$"; -> 1
|
|
mysql> select "pi" regexp "^(pi|apa)$"; -> 1
|
|
mysql> select "pix" regexp "^(pi|apa)$"; -> 0</PRE>
|
|
|
|
<DT>
|
|
<TT>(abc)*</TT></DT>
|
|
|
|
<DD>
|
|
Zero or more times the sequence <TT>abc</TT>.</DD>
|
|
|
|
<PRE>mysql> select "pi" regexp "^(pi)+$"; -> 1
|
|
mysql> select "pip" regexp "^(pi)+$"; -> 0
|
|
mysql> select "pipi" regexp "^(pi)+$"; -> 1</PRE>
|
|
|
|
<DT>
|
|
<TT>{1}</TT></DT>
|
|
|
|
<DT>
|
|
<TT>{2,3}</TT></DT>
|
|
|
|
<DD>
|
|
There is a more general way of writing regexps that match many occurrences.</DD>
|
|
|
|
<DL COMPACT>
|
|
<DT>
|
|
<TT>a*</TT></DT>
|
|
|
|
<DD>
|
|
Can be written as <TT>a{0,}</TT>.</DD>
|
|
|
|
<DT>
|
|
<TT>+</TT></DT>
|
|
|
|
<DD>
|
|
Can be written as <TT>a{1,}</TT>.</DD>
|
|
|
|
<DT>
|
|
<TT>?</TT></DT>
|
|
|
|
<DD>
|
|
Can be written as <TT>a{0,1}</TT>.</DD>
|
|
</DL>
|
|
To be more precise, an atom followed by a bound containing one integer <TT>i</TT>
|
|
and no comma matches a sequence of exactly <TT>i</TT> matches of the atom.
|
|
An atom followed by a bound containing one integer <TT>i</TT> and a comma
|
|
matches a sequence of <TT>i</TT> or more matches of the atom. An atom followed
|
|
by a bound containing two integers <TT>i</TT> and <TT>j</TT> matches a
|
|
sequence of <TT>i</TT> through <TT>j</TT> (inclusive) matches of the atom.
|
|
Both arguments must <TT>0 >= value <= RE_DUP_MAX (default 255)</TT>,
|
|
and if there are two of them, the second must be bigger or equal to the
|
|
first.
|
|
<DT>
|
|
<TT>[a-dX]</TT></DT>
|
|
|
|
<DT>
|
|
<TT>[^a-dX]</TT></DT>
|
|
|
|
<DD>
|
|
Any character which is (not if ^ is used) either <TT>a</TT>, <TT>b</TT>,
|
|
<TT>c</TT>, <TT>d</TT> or <TT>X</TT>. To include <TT>]</TT> it has to be
|
|
written first. To include <TT>-</TT> it has to be written first or last.
|
|
So <TT>[0-9]</TT> matches any decimal digit. All character that does not
|
|
have a defined meaning inside a <TT>[]</TT> pair has no special meaning
|
|
and matches only itself.</DD>
|
|
|
|
<PRE>mysql> select "aXbc" regexp "[a-dXYZ]"; -> 1
|
|
mysql> select "aXbc" regexp "^[a-dXYZ]$"; -> 0
|
|
mysql> select "aXbc" regexp "^[a-dXYZ]+$"; -> 1
|
|
mysql> select "aXbc" regexp "^[^a-dXYZ]+$"; -> 0
|
|
mysql> select "gheis" regexp "^[^a-dXYZ]+$"; -> 1
|
|
mysql> select "gheisa" regexp "^[^a-dXYZ]+$"; -> 0</PRE>
|
|
|
|
<DT>
|
|
<TT>[[.characters.]]</TT></DT>
|
|
|
|
<DD>
|
|
The sequence of characters of that collating element. The sequence is a
|
|
single element of the bracket expression's list. A bracket expression containing
|
|
a multi-character collating element can thus match more than one character,
|
|
e.g. if the collating sequence includes a <TT>ch</TT> collating element,
|
|
then the RE <TT>[[.ch.]]*c</TT> matches the first five characters of <TT>chchcc</TT>.</DD>
|
|
|
|
<DT>
|
|
<TT>[=character-class=]</TT></DT>
|
|
|
|
<DD>
|
|
An equivalence class, standing for the sequences of characters of all collating
|
|
elements equivalent to that one, including itself. For example, if <TT>o</TT>
|
|
and <TT>(+)</TT> are the members of an equivalence class, then <TT>[[=o=]]</TT>,
|
|
<TT>[[=(+)=]]</TT>, and <TT>[o(+)]</TT> are all synonymous. An equivalence
|
|
class may not be an endpoint of a range.</DD>
|
|
|
|
<DT>
|
|
<TT>[:character_class:]</TT></DT>
|
|
|
|
<DD>
|
|
Within a bracket expression, the name of a character class enclosed in
|
|
<TT>[:</TT> and <TT>:]</TT> stands for the list of all characters belonging
|
|
to that class. Standard character class names are:</DD>
|
|
|
|
<TABLE BORDER WIDTH="100%" NOSAVE >
|
|
<TR>
|
|
<TD>alnum </TD>
|
|
|
|
<TD>digit </TD>
|
|
|
|
<TD>punct </TD>
|
|
</TR>
|
|
|
|
<TR>
|
|
<TD>alpha </TD>
|
|
|
|
<TD>graph </TD>
|
|
|
|
<TD>space </TD>
|
|
</TR>
|
|
|
|
<TR>
|
|
<TD>blank </TD>
|
|
|
|
<TD>lower </TD>
|
|
|
|
<TD>upper </TD>
|
|
</TR>
|
|
|
|
<TR>
|
|
<TD>cntrl </TD>
|
|
|
|
<TD>print </TD>
|
|
|
|
<TD>xdigit </TD>
|
|
</TR>
|
|
</TABLE>
|
|
These stand for the character classes defined in ctype(3). A locale may
|
|
provide others. A character class may not be used as an endpoint of a range.
|
|
<PRE>mysql> select "justalnums" regexp "[[:alnum:]]+"; -> 1
|
|
mysql> select "!!" regexp "[[:alnum:]]+"; -> 0</PRE>
|
|
|
|
<LI>
|
|
[[:<:]]</LI>
|
|
|
|
<LI>
|
|
[[:>:]] These match the null string at the beginning and end of a word
|
|
respectively. A word is defined as a sequence of word characters which
|
|
is neither preceded nor followed by word characters. A word character is
|
|
an alnum character (as defined by ctype(3)) or an underscore.</LI>
|
|
|
|
<PRE>mysql> select "a word a" regexp "[[:<:]]word[[:>:]]"; -> 1
|
|
mysql> select "a xword a" regexp "[[:<:]]word[[:>:]]"; -> 0</PRE>
|
|
</DL>
|
|
|
|
<PRE>mysql> select "weeknights" regexp "^(wee|week)(knights|nights)$"; -> 1</PRE>
|
|
|
|
</BODY>
|
|
</HTML>
|