gecko-dev/webtools/web-sniffer/READ-ME.txt
erik%netscape.com 2ca57e0bf5 Apache does something a bit strange with files called "README", so I've
created this file called "READ-ME.txt", which gets displayed normally
in the list of files.
2000-02-01 20:40:05 +00:00

91 lines
2.8 KiB
Plaintext

Web Sniffer
by Erik van der Poel <erik@netscape.com>
originally created in 1998
Introduction
This is a set of tools to work with the protocols underlying the Web.
Description of Tools
view.cgi
This is an HTML form that allows the user to enter a URL. The CGI then
fetches the object associated with that URL, and presents it to the
user in a colorful way. For example, HTTP headers are shown, HTML
documents are parsed and colored, and non-ASCII characters are shown in
hex. Links are turned into live links, that can be clicked to see the
source of that URL, allowing the user to "browse" source.
robot
Originally written to see how many documents actually include the HTTP
and HTML charsets, this tool has developed into a more general robot
that collects various statistics, including HTML tag statistics, DNS
lookup timing, etc. This robot does not adhere to the standard robot
rules, so please exercise caution if you use this.
proxy
This is an HTTP proxy that sits between the user's browser and another
HTTP proxy. It captures all of the HTTP traffic between the browser and
the Internet, and presents it to the user in the same colorful way as
the above-mentioned view.cgi.
grab
Allows the user to "grab" a whole Web site, or everything under a
particular directory. This is useful if you want to grab a bunch of
related HTML files, e.g. the whole CSS2 spec.
link
Allows the user to recursively check for bad links in a Web site or
under a particular directory.
Description of Files
addurl.c, addurl.h: adds URLs to a list
cgiview.c, cgiview.html: the view.cgi tool
dns.c: experimental DNS toy
doRun: used with robot
file.c, file.h: the file: URL
ftp.c: experimental FTP toy
grab.c: the "grab" tool
hash.c, hash.h: incomplete hash table routines
html.c, html.h: HTML parser
http.c, http.h: simple HTTP implementation
io.c, io.h: I/O routines
link.c: the "link" tool
main.h: very simple callbacks, could be more object-oriented
Makefile: the Solaris Makefile
mime.c, mime.h: MIME Content-Type parser
mutex.h: for threading in the robot
net.c, net.h: low-level Internet APIs
pop.c: experimental POP toy
proxy.c: the "proxy" tool
robot.c: the "robot" tool
run: used with robot
TODO: notes to myself
url.c, url.h: implementation of absolute and relative URLs
utils.c, utils.h: some little utility routines
view.c, view.h: presents stuff to the user
Description of Code
The code is extremely quick-and-dirty. It could be a lot more elegant,
e.g. C++, object-oriented, extensible, etc.
The point of this exercise was not to design and write a program well,
but to create some useful tools and to learn about Internet protocols.