mirror of
https://github.com/mozilla/gecko-dev.git
synced 2025-02-05 05:30:29 +00:00
linked in dawn's search help doc
This commit is contained in:
parent
ee5db31b04
commit
6a9fd84fda
@ -1,5 +1,5 @@
|
||||
#!/usr/bonsaitools/bin/perl
|
||||
# $Id: search,v 1.1 1998/06/11 23:56:17 jwz Exp $
|
||||
# $Id: search,v 1.2 1998/06/12 21:42:37 jwz Exp $
|
||||
|
||||
# search -- Freetext search
|
||||
#
|
||||
@ -51,7 +51,7 @@ sub search {
|
||||
# $Query->popup_menu("fuzz",[0,1,2,3,4,5,6,7],0),
|
||||
"</form>\n",
|
||||
|
||||
"<a href=\"search_help.html\">Hints</a> ",
|
||||
"<a href=\"search-help.html\">Hints</a> ",
|
||||
"making queries.\n");
|
||||
|
||||
$| = 1; print('');
|
||||
@ -61,7 +61,7 @@ sub search {
|
||||
unless (open(GLIMPSE, "-|")) {
|
||||
open(STDERR, ">&STDOUT");
|
||||
$!='';
|
||||
exec($Conf->glimpsebin,"-H".$Conf->dbdir,'-y','-n',$searchtext);
|
||||
exec($Conf->glimpsebin,"-i","-H".$Conf->dbdir,'-y','-n',$searchtext);
|
||||
print("Glimpse subprocess died unexpextedly: $!\n");
|
||||
exit;
|
||||
}
|
||||
|
168
webtools/lxr/search-help.html
Normal file
168
webtools/lxr/search-help.html
Normal file
@ -0,0 +1,168 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>mozilla cross-reference: search help</TITLE>
|
||||
</HEAD>
|
||||
<BODY BGCOLOR="#FFFFFF" TEXT="#000000"
|
||||
LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
|
||||
|
||||
<TABLE BGCOLOR="#000000" WIDTH="100%" BORDER=0 CELLPADDING=0 CELLSPACING=0>
|
||||
<TR><TD><A HREF="http://www.mozilla.org/"><IMG
|
||||
SRC="http://www.mozilla.org/images/mozilla-banner.gif" ALT=""
|
||||
BORDER=0 WIDTH=600 HEIGHT=58></A></TD></TR></TABLE>
|
||||
|
||||
<H1 ALIGN=CENTER>search help<BR>
|
||||
<FONT SIZE=3>
|
||||
for the<br>
|
||||
<A HREF="./"><I>mozilla cross-reference</I></A>
|
||||
</FONT></H1>
|
||||
|
||||
<P><BR>
|
||||
<BLOCKQUOTE><BLOCKQUOTE>
|
||||
<I>
|
||||
This text is copied from the Glimpse manual page. I have tried to remove
|
||||
things that do not apply to the lxr searcher, but beware, some things might
|
||||
have slipped through. I'll try to put together something better when I get
|
||||
the time. For more information on glimpse go to the
|
||||
<A HREF="http://glimpse.cs.arizona.edu">Glimpse homepage</A>.
|
||||
<DIV ALIGN=RIGHT>-- <A HREF="mailto:dawn@cannibal.mi.org">Dawn Endico</A>
|
||||
</DIV>
|
||||
</I>
|
||||
</BLOCKQUOTE></BLOCKQUOTE>
|
||||
|
||||
<A NAME="Patterns"></A><H2>Patterns</H2>
|
||||
<UL>
|
||||
glimpse supports a large variety of patterns, including simple
|
||||
strings, strings with classes of characters, sets of strings,
|
||||
wild cards, and regular expressions (see <A HREF="#Limitations">Limitations</A>).
|
||||
|
||||
</UL>
|
||||
<P> <H3>Strings</H3>
|
||||
<UL>
|
||||
|
||||
Strings are any sequence of characters, including the special symbols
|
||||
`^' for beginning of line and `$' for end of line. The following
|
||||
special characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and
|
||||
`\' ) as well as the following meta characters special to glimpse (and
|
||||
agrep): `;', `,', `#', `>', `<', `-', and `.', should be preceded by
|
||||
`\\' if they are to be matched as regular characters. For example,
|
||||
\\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds
|
||||
to the string abc at the beginning of a line.
|
||||
|
||||
</UL>
|
||||
<P> <H3>Classes of characters</H3>
|
||||
<UL>
|
||||
|
||||
a list of characters inside [] (in order) corresponds to any character
|
||||
from the list. For example, [a-ho-z] is any character between a and h
|
||||
or between o and z. The symbol `^' inside [] complements the list.
|
||||
For example, [^i-n] denote any character in the character set except
|
||||
character 'i' to 'n'.
|
||||
The symbol `^' thus has two meanings, but this is consistent with
|
||||
egrep.
|
||||
The symbol `.' (don't care) stands for any symbol (except for the
|
||||
newline symbol).
|
||||
|
||||
</UL>
|
||||
<P> <H3>Boolean operations</H3>
|
||||
<UL>
|
||||
|
||||
Glimpse
|
||||
supports an `AND' operation denoted by the symbol `;'
|
||||
an `OR' operation denoted by the symbol `,',
|
||||
a limited version of a 'NOT' operation (starting at version 4.0B1)
|
||||
denoted by the symbol `~',
|
||||
or any combination.
|
||||
For example, pizza;cheeseburger' will output all lines containing
|
||||
both patterns.
|
||||
'{political,computer};science' will match 'political science'
|
||||
or 'science of computers'.
|
||||
|
||||
</UL>
|
||||
<P><H3>Wild cards</H3>
|
||||
<UL>
|
||||
|
||||
The symbol '#' is used to denote a sequence
|
||||
of any number (including 0)
|
||||
of arbitrary characters (see <A HREF="#Limitations">Limitations</A>).
|
||||
The symbol # is equivalent to .* in egrep.
|
||||
In fact, .* will work too, because it is a valid regular expression
|
||||
(see below), but unless this is part of an actual regular expression,
|
||||
# will work faster.
|
||||
(Currently glimpse is experiencing some problems with #.)
|
||||
|
||||
</UL>
|
||||
<P><H3>Combination of exact and approximate matching</H3>
|
||||
<UL>
|
||||
|
||||
Any pattern inside angle brackets <> must match the text exactly even
|
||||
if the match is with errors. For example, <mathemat>ics matches
|
||||
mathematical with one error (replacing the last s with an a), but
|
||||
mathe<matics> does not match mathematical no matter how many errors are
|
||||
allowed. (This option is buggy at the moment.)
|
||||
|
||||
</UL>
|
||||
<H3>Regular expressions</H3>
|
||||
<UL>
|
||||
|
||||
Since the index is word based, a regular expression must match words
|
||||
that appear in the index for glimpse to find it. Glimpse first strips
|
||||
the regular expression from all non-alphabetic characters, and
|
||||
searches the index for all remaining words. It then applies the
|
||||
regular expression matching algorithm to the files found in the index.
|
||||
For example, glimpse 'abc.*xyz' will search the index for all files
|
||||
that contain both 'abc' and 'xyz', and then search directly for
|
||||
'abc.*xyz' in those files. (If you use glimpse -w 'abc.*xyz', then
|
||||
'abcxyz' will not be found, because glimpse will think that abc and
|
||||
xyz need to be matches to whole words.) The syntax of regular
|
||||
expressions in glimpse is in general the same as that for agrep. The
|
||||
union operation `|', Kleene closure `*', and parentheses () are all
|
||||
supported. Currently '+' is not supported. Regular expressions are
|
||||
currently limited to approximately 30 characters (generally excluding
|
||||
meta characters). The maximal number of errors
|
||||
for regular expressions that use '*' or '|' is 4.
|
||||
|
||||
</UL>
|
||||
<A NAME="Limitations"></A><H2>Limitations</H2>
|
||||
<UL>
|
||||
|
||||
The index of glimpse is word based. A pattern that contains more than
|
||||
one word cannot be found in the index. The way glimpse overcomes this
|
||||
weakness is by splitting any multi-word pattern into its set of words
|
||||
and looking for all of them in the index.
|
||||
For example, <I>'linear programming'</I> will first consult the index
|
||||
to find all files containing both <I>linear</I> and <I>programming</I>,
|
||||
and then apply agrep to find the combined pattern.
|
||||
This is usually an effective solution, but it can be slow for
|
||||
cases where both words are very common, but their combination is not.
|
||||
|
||||
<P>
|
||||
As was mentioned in the section on <A HREF="#Patterns">Patterns</A> above, some characters
|
||||
serve as meta characters for glimpse and need to be
|
||||
preceded by '\\' to search for them. The most common
|
||||
examples are the characters '.' (which stands for a wild card),
|
||||
and '*' (the Kleene closure).
|
||||
So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
|
||||
will not, and "glimpse ab*de" will not match ab*de, but
|
||||
"glimpse ab\\*de" will.
|
||||
The meta character - is translated automatically to a hypen
|
||||
unless it appears between [] (in which case it denotes a range of
|
||||
characters).
|
||||
|
||||
<P>
|
||||
There is no size limit for simple patterns and simple patterns
|
||||
within Boolean expressions.
|
||||
More complicated patterns, such as regular expressions,
|
||||
are currently limited to approximately 30 characters.
|
||||
Lines are limited to 1024 characters.
|
||||
|
||||
</UL>
|
||||
<P>
|
||||
<HR>
|
||||
|
||||
<ADDRESS>
|
||||
<A HREF="mailto:lxr@linux.no">
|
||||
Arne Georg Gleditsch and Per Kristian Gjermshus</A>
|
||||
</ADDRESS>
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
Loading…
x
Reference in New Issue
Block a user