gecko-dev/services/healthreport/docs/identifiers.rst

.. _healthreport_identifiers:

===========
Identifiers
===========

Firefox Health Report records some identifiers to keep track of clients
and uploaded documents.

Identifier Types
================

Document/Upload IDs
-------------------

A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
client creates or uploads a new document.

When clients generate a new *Document ID*, they persist this ID to disk
**before** the upload attempt.

As part of the upload, the client sends all old *Document IDs* to the server
and asks the server to delete them. In well-behaving clients, the server
has a single record for each client with a randomly-changing *Document ID*.

Client IDs
----------

A *Client ID* is an identifier that **attempts** to uniquely identify an
individual FHR client. Please note the emphasis on *attempts* in that last
sentence: *Client IDs* do not guarantee uniqueness.

The *Client ID* is generated when the client first runs or as needed.

The *Client ID* is transferred to the server as part of every upload. The
server is thus able to affiliate multiple document uploads with a single
*Client ID*.

Client ID Versions
^^^^^^^^^^^^^^^^^^

The semantics for how a *Client ID* is generated are versioned.

Version 1
   The *Client ID* is a randomly-generated UUID.

History of Identifiers
======================

In the beginning, there were just *Document IDs*. The thinking was clients
would clean up after themselves and leave at most 1 active document on the
server.

Unfortunately, this did not work out. Using brute force analysis to
deduplicate records on the server, a number of interesting patterns emerged.

Orphaning
   Clients would upload a new payload while not deleting the old payload.

Divergent records
   Records would share data up to a certain date and then the data would
   almost completely diverge. This appears to be indicative of profile
   copying.

Rollback
   Records would share data up to a certain date. Each record in this set
   would contain data for a day or two but no extra data. This could be
   explained by filesystem rollback on the client.

A significant percentage of the records on the server belonged to
misbehaving clients. Identifying these records was extremely resource
intensive and error-prone. These records were undermining the ability
to use Firefox Health Report data.

Thus, the *Client ID* was born. The intent of the *Client ID* was to
uniquely identify clients so the extreme effort required and the
questionable reliability of deduplicating server data would become
problems of the past.

The *Client ID* was originally a randomly-generated UUID (version 1). This
allowed detection of orphaning and rollback. However, these version 1
*Client IDs* were still susceptible to use on multiple profiles and
machines if the profile was copied.