unofficial git repo -- report bugs/issues/pull requests on https://gitlab.torproject.org/ --
Go to file
Karsten Loesing 57dae5651a Be more precise about using the database together with Tor Status.
Tor Status will need extra-info descriptors, so we should tell Tor to
download them.
2011-08-05 14:03:20 +02:00
db Add log messages to database refresh function. 2011-04-26 15:04:16 +02:00
etc Let users download votes from a given valid-after time. 2011-07-13 13:15:32 +02:00
lib Prepare bridge statistics as part of metrics-web. 2011-03-02 20:31:42 +01:00
rserve Add table with top-10 relays by bandwidth. 2011-07-29 20:41:04 -04:00
src/org/torproject/ernie Add table with top-10 relays by bandwidth. 2011-07-29 20:41:04 -04:00
web Add table with top-10 relays by bandwidth. 2011-07-29 20:41:04 -04:00
.gitignore Generate consensus-health page as part of metrics-web. 2011-03-01 11:25:28 +01:00
build.xml Add GeoIP database to generate fancy relay visualizations. 2011-04-05 22:02:40 +02:00
ChangeLog Wait, we have a ChangeLog file. Let's use it! 2011-07-21 16:05:04 +02:00
config.template Write nagios status file for possible consensus problems. 2011-03-07 13:42:19 +01:00
HACKING Add a HACKING document with some notes on code style. 2010-12-12 13:09:40 +01:00
LICENSE Tweak Kevin's new network status page a bit. 2010-12-12 13:10:03 +01:00
README Be more precise about using the database together with Tor Status. 2011-08-05 14:03:20 +02:00
run.sh Generate consensus-health page as part of metrics-web. 2011-03-01 11:25:28 +01:00

Tor Metrics Database and Website
================================

The metrics database stores publicly available data about the Tor network
which are visualized by the metrics website.

This software package, metrics-web, contains (1) the code to import Tor
network data into a database, (2) the code to generate graphs and .CSV
output, and (3) the code for a dynamic web application.  metrics-web is
based on Java, Ant, PostgreSQL, R, Apache HTTP Server, and Apache Tomcat.

This README explains all necessary steps to install metrics-web including
the database (Section 1), the graphing engine (Section 2), and the web
application (Section 3).  It is possible to install only the database part
or only the database and the graphing engine, if desired.


1. Installing the metrics database
==================================

The metrics database contains data about the Tor Network coming from
different sources, including the Tor directory authorities, Torperf
performance measurement installations, the GetTor software package
delivery service, and others.


1.1. Preparing the operating system
===================================

This README describes the steps for installing metrics-web on a Debian
GNU/Linux Squeeze server.  Instructions for other operating systems may
vary.

In the following it is assumed that root privileges are available.
Commands requiring root privileges will be prefixed with # below.

Start by adding a metrics user that will be used to execute all commands
that do not require root privileges.  These commands will be prefixed with
$ below.

# adduser metrics

The database importer and website sources will be installed in
/srv/metrics-web/ that is created as follows:

# mkdir /srv/metrics-web/
# chmod g+ws /srv/metrics-web/
# chown metrics:metrics /srv/metrics-web/

Either extract the metrics-web source tarball...

$ tar xf metrics-web-x.y.z.tar /srv/metrics-web/

... or clone the metrics-web Git repository:

$ git clone git://git.torproject.org/metrics-web /srv/metrics-web/

Install Sun Java 6, Ant 1.8, and PostgreSQL 8.4 that are necessary for
setting up the metrics database (be sure to include Debian's non-free
repository in /etc/apt/sources.list).

# apt-get install sun-java6-jdk ant postgresql-8.4

Make Sun's Java the default.

# update-java-alternatives -s java-6-sun

Check the versions of the newly installed tools.

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

$ ant -version
Apache Ant version 1.8.0 compiled on March 11 2010

$ psql --version
psql (PostgreSQL) 8.4.7
contains support for command-line editing


1.2. Configuring the database
=============================

The first step in setting up the metrics database is to configure the
PostgreSQL database and import a database schema.

Start by creating a new metrics database user.  There is no need to give
the metrics user superuser privileges or allow it to create databases or
new roles.

# sudo -u postgres createuser -P metrics

Create a new database tordir owned by user metrics.

# sudo -u postgres createdb -O metrics tordir

Import the metrics database schema.

$ psql -f /srv/metrics-web/db/tordir.sql tordir

Confirm that the database now contains tables to hold metrics data.  In
the following, => will be used as the database prompt.

$ psql tordir
=> \dt+
=> \q


1.3. Importing relay descriptor tarballs
========================================

In most cases it makes sense to populate the metrics database with
archived relay descriptors from the official metrics website.

Download the relay descriptor tarballs from the metrics website at
https://metrics.torproject.org/data.html#relaydesc and extract them to
/srv/metrics-web/archives/ .  The database importer can process v3 votes,
v3 consensuses, server descriptors, and extra-infos.

Edit the config file ~/metrics-web/config (or create it if it's not there)
to contain the following five lines (be sure to remove the linebreak in
the line defining the JDBC string and insert the real password there):

ImportDirectoryArchives 1
DirectoryArchivesDirectory archives/
KeepDirectoryArchiveImportHistory 1
WriteRelayDescriptorDatabase 1
RelayDescriptorDatabaseJDBC
    jdbc:postgresql://localhost/tordir?user=metrics&password=password

Compile and run the Java database importer.

$ cd /srv/metrics-web/
$ ./run.sh

The database import will take a while.  Once it's complete, check that the
database tables now contain metrics data:

$ psql tordir
=> \dt+
=> \q

It's safe to delete the relay descriptor files in ~/metrics-web/archives/
once they are imported.

An alternative to importing relay descriptor tarballs directly into the
database is to convert them into a data format that psql's \copy command
can process.  Look for the config option WriteRelayDescriptorsRawFiles in
/srv/metrics-web/config.template for more information on this experimental
feature.

In a future version of metrics-web it may also be possible to update local
relay descriptor tarballs from the official metrics server via rsync and
import only the changes into the metrics database.  The idea is to simply
rsync the data/ directory from the metrics server and have all information
available.  However, this feature is not implemented yet.


1.4. Importing relay descriptors from a local Tor data directory
================================================================

In order to keep the data in the metrics database up-to-date, the metrics
database importer can import the cached descriptors from a local Tor data
directory.

Configure a local Tor client to fetch all known descriptors as early as
possible by adding these config options to its torrc file:

DownloadExtraInfo 1
FetchUselessDescriptors 1
FetchDirInfoEarly 1
FetchDirInfoExtraEarly 1

Tell the metrics database importer where to find the cached descriptor
files.  One way to achieve this is to add symbolic links to
/srv/metrics-web/archives/ like this.  Tor's data directory is assumed to
be /srv/tor/ here.

$ cd /srv/metrics-web/archives/
$ ln -s /srv/tor/cached-* .

Add a crontab entry for the database importer to run once per hour:

15 * * * * cd /srv/metrics-web/ && ./run.sh


1.5. Importing GeoIP information
================================

Some of the graphs require GeoIP information to resolve IP addresses to
country codes.  This information is provided in MaxMind's GeoLite City
database available at http://www.maxmind.com/app/geolitecity.

Download and extract the two files GeoLiteCity-Location.csv and
GeoLiteCity-Blocks.csv to /srv/metrics-web/.

Import the two files into the metrics database.

$ ant geoipdb

Note that there is no easy way to update the GeoIP information in the
metrics database yet.  The only way to do so is to manually delete and
recreate the database table and import the new GeoIP database.


1.6. Pre-calculating relay statistics
=====================================

The relay graphs on the metrics website rely on pre-calculated statistics
in the metrics database.  These statistics are not calculated after every
completed import, which would usually be once per hour.  In general it's
sufficient to pre-calculate statistics 2 or 4 times a day.

Calculate statistics manually after large imports (this may take a while):

$ psql tordir -c 'SELECT * FROM refresh_all();'

If the metrics database gets updated automatically, write a script and add
a crontab entry for pre-calculating statistics every 6 or 12 hours.


1.7. Generating network status information
==========================================

The metrics database importer can analyze the most recently parsed network
status consensus for irregularities indicating problems with the directory
authorities.  There are two possible outputs: the consensus-health page
that can be found at https://metrics.torproject.org/consensus-health.html
and a local file that can be parsed by Nagios that will be written to
/srv/metrics-web/website/consensus-health .

Edit /srv/metrics-web/config to contain either or both of the following
options:

WriteConsensusHealth 1
WriteNagiosStatusFile 1


1.8. Importing sanitized bridge descriptors
===========================================

The metrics database can store aggregate statistics about running bridges
and bridge usage.  These statistics are added by parsing sanitized bridge
descriptors available on the official metrics website.

Download a sanitized bridge descriptor tarball from the metrics website at
https://metrics.torproject.org/data.html#bridgedesc and extract it to,
e.g., /srv/metrics-web/bridges/bridge-descriptors-2011-05/ .

Edit /srv/metrics-web/config to contain the following options:

ImportSanitizedBridges 1
SanitizedBridgesDirectory bridges/
KeepSanitizedBridgesImportHistory 1
WriteBridgeStats 1

Note that the bridge usage statistics require parsing relay descriptors of
the same time period in order to filter bridges that have been running as
relays from the results.  When parsing sanitized bridge descriptors for
the first time it may be necessary to delete the relay descriptor import
history in /srv/metrics-web/stats/archives-import-history and import all
relay descriptors once again.

Run the database import:

$ ./run.sh


1.9. Importing Torperf performance data
=======================================

Torperf measures the performance of the Tor network as users experience
it.  Torperf's measurement data are available on the metrics website and
can be imported into the metrics database, too.

Download the Torperf measurement files from the metrics website at
https://metrics.torproject.org/data.html#performance and put them in a
subdirectory, e.g., /srv/metrics-web/torperf/ .

Edit /srv/metrics-web/config to contain the following options:

ImportWriteTorperfStats 1
TorperfDirectory torperf/

Run the database import:

$ ./run.sh


1.10. Importing GetTor statistics
=================================

WARNING: The GetTor statistics are not available for download yet, so that
this section only applies to the official metrics website.

GetTor is a software distribution service that allows users to fetch the
Tor software via email.  GetTor produces daily statistics of requested
packages that can be imported into the metrics database.

Put the GetTor statistics file into /srv/metrics-web/gettor/ .

Edit /srv/metrics-web/config to contain the following options:

ProcessGetTorStats 1
GetTorDirectory gettor/

Run the database import:

$ ./run.sh


2. Installing the graphing engine
=================================

The metrics graphing engine generates custom graphs of Tor network data
based on user-provided parameters.  The graphing engine requires the
metrics database to be installed as described in the previous section.

The graphing engine uses R and Rserve to generate its graphs.  Rserve is a
TCP/IP server that makes it easy for other tools to use R without spawning
their own R process.  Rserve also pre-loads R code and R libraries which
saves time when processing user requests.

In this configuration, Rserve will run in the context of the metrics user.

Setting up the graphing engine requires installing PostgreSQL's header
files and R 2.8 or higher.  R 2.8 or higher is required for the ggplot2
library.

# apt-get install libpq-dev r-base-dev

Run R as user metrics and install required packages to ~/R/.  In the
following, R commands will be prefixed with >.

$ R
> install.packages("Rserve")
> install.packages("ggplot2")
> install.packages("RPostgreSQL")
> q()

Start the Rserve daemon (the exact path of Rserve-bin.so may vary), check
that it's working by connecting via telnet, and shut it down:

$ R CMD ~/R/x86_64-pc-linux-gnu-library/2.11/Rserve/libs/Rserve-bin.so
$ telnet 127.0.0.1 6311
$ echo "library(Rserve); RSshutdown(RSconnect())" | R --slave

Also check that a database connection can be established from within R
(using the actual password instead of "password"):

$ R
> library(RPostgreSQL)
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, user = "metrics", password = "password",
    dbname = "tordir")
> dbDisconnect(con)
> dbUnloadDriver(drv)
> q()

Insert the database password in the Rserve initialization script in
/srv/metrics-web/rserve/rserve-init.R.

Update the workdir path in /srv/metrics-web/rserve/Rserv.conf .

Start Rserve, this time with the metrics-web-specific configuration that
includes pre-loading the graph code:

$ cd /srv/metrics-web/rserve/ && ./start.sh

Add a crontab entry to start Rserve on reboot:

@reboot cd /srv/metrics-web/rserve/ && ./start.sh

Rserve will pre-load the graph code at startup.  If changes are made to
the graph code, Rserve must be restarted:

$ cd /srv/metrics-web/rserve/
$ ./shutdown.sh && ./start.sh


3. Installing the metrics website
=================================

The metrics website lets web users search parts of the metrics database
and visualizes custom graphs.  Both the metrics database and the graphing
engine are required to set up the metrics website as described in this
section.

Note that the description here has a few specific parts that only apply to
the official metrics website.  These parts should be changed when setting
up a non-official metrics website.


3.1. Configuring Apache HTTP Server
===================================

The Apache HTTP Server is used as the front-end web server that serves
static resources itself and forwards requests for dynamic resources to
Apache Tomcat.

Start by installing Apache 2:

# apt-get install apache2

Disable Apache's default site.

# a2dissite default

Enable mod_rewrite to tell Apache where to find static resources on disk.
Also enable mod_proxy to forward requests to Tomcat.

# a2enmod rewrite proxy_http

Create a new virtual host configuration and store it in a new file
/etc/apache2/sites-available/metrics.torproject.org with the following
content:

<VirtualHost *:80>
  ServerName metrics.torproject.org
  ServerAdmin torproject-admin@torproject.org
  ErrorLog /var/log/apache2/error.log
  CustomLog /var/log/apache2/access.log combined
  ServerSignature On
  <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteRule /(data|dist|papers)/(.*) /srv/metrics-web/$1/$2 [L]
    RewriteRule /(consensus-health.html) /srv/metrics-web/website/$1 [L]
  </IfModule>
  <IfModule mod_proxy.c>
    <Proxy *>
      Order deny,allow
      Allow from all
    </Proxy>
    ProxyPass / http://127.0.0.1:8080/ernie/ retry=15
    ProxyPassReverse / http://127.0.0.1:8080/ernie/
    ProxyPreserveHost on
  </IfModule>
</VirtualHost>

Create the directories containing static resources: /srv/metrics-web/data/
contains the tarballs and other metrics data linked from data.html.
/srv/metrics-web/dist/ contains the software packages linked from
tools.html.  /srv/metrics-web/papers/ contains the papers and technical
reports linked from papers.html.  Note that there is no option not to
serve these files other than manually removing the links from the .html
pages.

Enable the new virtual host.

# a2ensite metrics.torproject.org

Restart Apache just to be sure that all changes are effective.

# /etc/init.d/apache2 restart


3.2. Configuring Apache Tomcat
==============================

Apache Tomcat will process requests for dynamic resources, including web
pages and graphs.

Install Tomcat 6:

# apt-get install tomcat6

Replace Tomcat's default configuration in /etc/tomcat6/server.xml with the
following configuration:

<Server port="8005" shutdown="SHUTDOWN">
  <Service name="Catalina">
    <Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true"
               compression="off" compressionMinSize="2048"
               noCompressionUserAgents="gozilla, traviata"
               compressableMimeType="text/html,text/xml,text/plain" />
    <Engine name="Catalina" defaultHost="yatei.torproject.org">
      <Host name="metrics.torproject.org" appBase="webapps"
            unpackWARs="true" autoDeploy="true"
            xmlValidation="false" xmlNamespaceAware="false">
        <Alias>yatei.torproject.org</Alias>
        <Valve className="org.apache.catalina.valves.AccessLogValve"
               directory="logs" prefix="metrics_access_log." suffix=".txt"
               pattern="%l %u %t %r %s %b" resolveHosts="false"/>
      </Host>
    </Engine>
  </Service>
</Server>

Be sure to replace *.torproject.org with something else, unless this is
a re-installation of the official metrics website.

Update the database password in /srv/metrics-web/etc/context.xml.

Update the paths starting with /srv/metrics.torproject.org/ in
/srv/metrics-web/etc/web.xml to the correct paths in /srv/metrics-web/.
The default paths in that file are correct for the official metrics
website setup which is slightly different than the one described here.

Now generate the web application.

$ ant make-war

Create a symbolic link to the ernie.war file:

# ln -s /srv/metrics-web/ernie.war /var/lib/tomcat6/webapps/

Tomcat will now attempt to deploy the web application automatically.

Whenever the metrics website needs to be redeployed, generate a new .war
file and Tomcat will reload the web application automatically.

Restart Tomcat to make all configuration changes effective:

# /etc/init.d/tomcat6 restart

The metrics website should now work.