mirror of
https://gitlab.winehq.org/wine/wine-gecko.git
synced 2024-09-13 09:24:08 -07:00
Bug 968419 - Store and submit a persistent health report identifier; r=rnewman, r=bsmedberg
Up to this point, Firefox Health Report has generated and submitted a random UUID with each upload. Generated UUIDs were stored on the client. During upload, the client asked the server to delete all old UUIDs. Well-behaving clients thus left at most one record/ID on the server. Unfortunately, clients in the wild have not been behaving properly. We are seeing multiple documents on the server that appear to come from the same client. Clients are uploading new records but failing to delete the old ones. These old, undeleted "orphan" records are severely impacting the ability to derive useful knowledge from FHR data because it is difficult, resource intensive, and error prone to filter the records on the server. This is undermining the ability for FHR data to be put to good use. This patch introduces a persistent client identifier. When the client is initialized, it generates a random UUID. That UUID is persisted to the profile and sent as part of every upload. For privacy reasons, if a client opts out of data submission, the client ID will be reset as soon as all remote data has been deleted. We still issue and send upload IDs. They exist mostly for forensics purposes so we may log client behavior and more accurately determine what exactly misbehaving, orphan-producing clients are doing. It is worth noting that this persistent client identifier will not solve all problems of branching and orphaned records. For example, profile copying will result in multiple clients sharing a client identifier. A "client ID version" field has been added to facilitate an upgrade path towards client IDs with different generation semantics. --HG-- extra : rebase_source : b761daab39fb07b6ab8883819d68bf53462314a0
This commit is contained in:
parent
5df4ee778c
commit
4daf05ec72
@ -387,6 +387,21 @@ thisPingDate
|
||||
version
|
||||
Integer version of this payload format. Currently only 1 is defined.
|
||||
|
||||
clientID
|
||||
An identifier that identifies the client that is submitting data.
|
||||
|
||||
This property may not be present in older clients.
|
||||
|
||||
See :ref:`healthreport_identifiers` for more info on identifiers.
|
||||
|
||||
clientIDVersion
|
||||
Integer version associated with the generation semantics for the
|
||||
``clientID``.
|
||||
|
||||
If the value is ``1``, ``clientID`` is a randomly-generated UUID.
|
||||
|
||||
This property may not be present in older clients.
|
||||
|
||||
data
|
||||
Object holding data constituting health report.
|
||||
|
||||
|
83
services/healthreport/docs/identifiers.rst
Normal file
83
services/healthreport/docs/identifiers.rst
Normal file
@ -0,0 +1,83 @@
|
||||
.. _healthreport_identifiers:
|
||||
|
||||
===========
|
||||
Identifiers
|
||||
===========
|
||||
|
||||
Firefox Health Report records some identifiers to keep track of clients
|
||||
and uploaded documents.
|
||||
|
||||
Identifier Types
|
||||
================
|
||||
|
||||
Document/Upload IDs
|
||||
-------------------
|
||||
|
||||
A random UUID called the *Document ID* or *Upload ID* is generated when the FHR
|
||||
client creates or uploads a new document.
|
||||
|
||||
When clients generate a new *Document ID*, they persist this ID to disk
|
||||
**before** the upload attempt.
|
||||
|
||||
As part of the upload, the client sends all old *Document IDs* to the server
|
||||
and asks the server to delete them. In well-behaving clients, the server
|
||||
has a single record for each client with a randomly-changing *Document ID*.
|
||||
|
||||
Client IDs
|
||||
----------
|
||||
|
||||
A *Client ID* is an identifier that **attempts** to uniquely identify an
|
||||
individual FHR client. Please note the emphasis on *attempts* in that last
|
||||
sentence: *Client IDs* do not guarantee uniqueness.
|
||||
|
||||
The *Client ID* is generated when the client first runs or as needed.
|
||||
|
||||
The *Client ID* is transferred to the server as part of every upload. The
|
||||
server is thus able to affiliate multiple document uploads with a single
|
||||
*Client ID*.
|
||||
|
||||
Client ID Versions
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The semantics for how a *Client ID* is generated are versioned.
|
||||
|
||||
Version 1
|
||||
The *Client ID* is a randomly-generated UUID.
|
||||
|
||||
History of Identifiers
|
||||
======================
|
||||
|
||||
In the beginning, there were just *Document IDs*. The thinking was clients
|
||||
would clean up after themselves and leave at most 1 active document on the
|
||||
server.
|
||||
|
||||
Unfortunately, this did not work out. Using brute force analysis to
|
||||
deduplicate records on the server, a number of interesting patterns emerged.
|
||||
|
||||
Orphaning
|
||||
Clients would upload a new payload while not deleting the old payload.
|
||||
|
||||
Divergent records
|
||||
Records would share data up to a certain date and then the data would
|
||||
almost completely diverge. This appears to be indicative of profile
|
||||
copying.
|
||||
|
||||
Rollback
|
||||
Records would share data up to a certain date. Each record in this set
|
||||
would contain data for a day or two but no extra data. This could be
|
||||
explained by filesystem rollback on the client.
|
||||
|
||||
A significant percentage of the records on the server belonged to
|
||||
misbehaving clients. Identifying these records was extremely resource
|
||||
intensive and error-prone. These records were undermining the ability
|
||||
to use Firefox Health Report data.
|
||||
|
||||
Thus, the *Client ID* was born. The intent of the *Client ID* was to
|
||||
uniquely identify clients so the extreme effort required and the
|
||||
questionable reliability of deduplicating server data would become
|
||||
problems of the past.
|
||||
|
||||
The *Client ID* was originally a randomly-generated UUID (version 1). This
|
||||
allowed detection of orphaning and rollback. However, these version 1
|
||||
*Client IDs* were still susceptible to use on multiple profiles and
|
||||
machines if the profile was copied.
|
@ -24,6 +24,7 @@ are very specific to what the Firefox Health Report feature requires.
|
||||
|
||||
architecture
|
||||
dataformat
|
||||
identifiers
|
||||
|
||||
Legal and Privacy Concerns
|
||||
==========================
|
||||
|
@ -60,11 +60,19 @@ const TELEMETRY_COLLECT_CHECKPOINT = "HEALTHREPORT_POST_COLLECT_CHECKPOINT_MS";
|
||||
*
|
||||
* Instances are not meant to be created outside of a HealthReporter instance.
|
||||
*
|
||||
* Please note that remote IDs are treated as a queue. When a new remote ID is
|
||||
* added, it goes at the back of the queue. When we look for the current ID, we
|
||||
* pop the ID at the front of the queue. This helps ensure that all IDs on the
|
||||
* server are eventually deleted. This should eventually become irrelevant once
|
||||
* the server supports multiple ID deletion.
|
||||
* There are two types of IDs associated with clients.
|
||||
*
|
||||
* Since the beginning of FHR, there has existed a per-upload ID: a UUID is
|
||||
* generated at upload time and associated with the state before upload starts.
|
||||
* That same upload includes a request to delete all other upload IDs known by
|
||||
* the client.
|
||||
*
|
||||
* Per-upload IDs had the unintended side-effect of creating "orphaned"
|
||||
* records/upload IDs on the server. So, a stable client identifer has been
|
||||
* introduced. This client identifier is generated when it's missing and sent
|
||||
* as part of every upload.
|
||||
*
|
||||
* There is a high chance we may remove upload IDs in the future.
|
||||
*/
|
||||
function HealthReporterState(reporter) {
|
||||
this._reporter = reporter;
|
||||
@ -89,6 +97,20 @@ function HealthReporterState(reporter) {
|
||||
}
|
||||
|
||||
HealthReporterState.prototype = Object.freeze({
|
||||
/**
|
||||
* Persistent string identifier associated with this client.
|
||||
*/
|
||||
get clientID() {
|
||||
return this._s.clientID;
|
||||
},
|
||||
|
||||
/**
|
||||
* The version associated with the client ID.
|
||||
*/
|
||||
get clientIDVersion() {
|
||||
return this._s.clientIDVersion;
|
||||
},
|
||||
|
||||
get lastPingDate() {
|
||||
return new Date(this._s.lastPingTime);
|
||||
},
|
||||
@ -117,9 +139,19 @@ HealthReporterState.prototype = Object.freeze({
|
||||
|
||||
let resetObjectState = function () {
|
||||
this._s = {
|
||||
// The payload version. This is bumped whenever there is a
|
||||
// backwards-incompatible change.
|
||||
v: 1,
|
||||
// The persistent client identifier.
|
||||
clientID: CommonUtils.generateUUID(),
|
||||
// Denotes the mechanism used to generate the client identifier.
|
||||
// 1: Random UUID.
|
||||
clientIDVersion: 1,
|
||||
// Upload IDs that might be on the server.
|
||||
remoteIDs: [],
|
||||
// When we last performed an uploaded.
|
||||
lastPingTime: 0,
|
||||
// Tracks whether we removed an outdated payload.
|
||||
removedOutdatedLastpayload: false,
|
||||
};
|
||||
}.bind(this);
|
||||
@ -154,6 +186,23 @@ HealthReporterState.prototype = Object.freeze({
|
||||
// comes along and fixes us.
|
||||
}
|
||||
|
||||
let regen = false;
|
||||
if (!this._s.clientID) {
|
||||
this._log.warn("No client ID stored. Generating random ID.");
|
||||
regen = true;
|
||||
}
|
||||
|
||||
if (typeof(this._s.clientID) != "string") {
|
||||
this._log.warn("Client ID is not a string. Regenerating.");
|
||||
regen = true;
|
||||
}
|
||||
|
||||
if (regen) {
|
||||
this._s.clientID = CommonUtils.generateUUID();
|
||||
this._s.clientIDVersion = 1;
|
||||
yield this.save();
|
||||
}
|
||||
|
||||
// Always look for preferences. This ensures that downgrades followed
|
||||
// by reupgrades don't result in excessive data loss.
|
||||
for (let promise of this._migratePrefs()) {
|
||||
@ -214,6 +263,24 @@ HealthReporterState.prototype = Object.freeze({
|
||||
return this.removeRemoteIDs(ids);
|
||||
},
|
||||
|
||||
/**
|
||||
* Reset the client ID to something else.
|
||||
*
|
||||
* This fails if remote IDs are stored because changing the client ID
|
||||
* while there is remote data will create orphaned records.
|
||||
*/
|
||||
resetClientID: function () {
|
||||
if (this.remoteIDs.length) {
|
||||
throw new Error("Cannot reset client ID while remote IDs are stored.");
|
||||
}
|
||||
|
||||
this._log.warn("Resetting client ID.");
|
||||
this._s.clientID = CommonUtils.generateUUID();
|
||||
this._s.clientIDVersion = 1;
|
||||
|
||||
return this.save();
|
||||
},
|
||||
|
||||
_migratePrefs: function () {
|
||||
let prefs = this._reporter._prefs;
|
||||
|
||||
@ -897,6 +964,8 @@ AbstractHealthReporter.prototype = Object.freeze({
|
||||
|
||||
let o = {
|
||||
version: 2,
|
||||
clientID: this._state.clientID,
|
||||
clientIDVersion: this._state.clientIDVersion,
|
||||
thisPingDate: pingDateString,
|
||||
geckoAppInfo: this.obtainAppInfo(this._log),
|
||||
data: {last: {}, days: {}},
|
||||
@ -1457,9 +1526,23 @@ this.HealthReporter.prototype = Object.freeze({
|
||||
this._log.warn("Deleting remote data.");
|
||||
let client = new BagheeraClient(this.serverURI);
|
||||
|
||||
return client.deleteDocument(this.serverNamespace, this.lastSubmitID)
|
||||
.then(this._onBagheeraResult.bind(this, request, true, this._now()),
|
||||
this._onSubmitDataRequestFailure.bind(this));
|
||||
return Task.spawn(function* doDelete() {
|
||||
try {
|
||||
let result = yield client.deleteDocument(this.serverNamespace,
|
||||
this.lastSubmitID);
|
||||
yield this._onBagheeraResult(request, true, this._now(), result);
|
||||
} catch (ex) {
|
||||
this._log.error("Error processing request to delete data: " +
|
||||
CommonUtils.exceptionStr(error));
|
||||
} finally {
|
||||
// If we don't have any remote documents left, nuke the ID.
|
||||
// This is done for privacy reasons. Why preserve the ID if we
|
||||
// don't need to?
|
||||
if (!this.haveRemoteData()) {
|
||||
yield this._state.resetClientID();
|
||||
}
|
||||
}
|
||||
}.bind(this));
|
||||
},
|
||||
});
|
||||
|
||||
|
@ -110,6 +110,8 @@ add_task(function test_constructor() {
|
||||
do_check_eq(typeof(reporter._state), "object");
|
||||
do_check_eq(reporter._state.lastPingDate.getTime(), 0);
|
||||
do_check_eq(reporter._state.remoteIDs.length, 0);
|
||||
do_check_eq(reporter._state.clientIDVersion, 1);
|
||||
do_check_neq(reporter._state.clientID, null);
|
||||
|
||||
let failed = false;
|
||||
try {
|
||||
@ -293,6 +295,9 @@ add_task(function test_remove_old_lastpayload() {
|
||||
add_task(function test_json_payload_simple() {
|
||||
let reporter = yield getReporter("json_payload_simple");
|
||||
|
||||
let clientID = reporter._state.clientID;
|
||||
do_check_neq(clientID, null);
|
||||
|
||||
try {
|
||||
let now = new Date();
|
||||
let payload = yield reporter.getJSONPayload();
|
||||
@ -301,6 +306,9 @@ add_task(function test_json_payload_simple() {
|
||||
|
||||
do_check_eq(original.version, 2);
|
||||
do_check_eq(original.thisPingDate, reporter._formatDate(now));
|
||||
do_check_eq(original.clientID, clientID);
|
||||
do_check_eq(original.clientIDVersion, reporter._state.clientIDVersion);
|
||||
do_check_eq(original.clientIDVersion, 1);
|
||||
do_check_eq(Object.keys(original.data.last).length, 0);
|
||||
do_check_eq(Object.keys(original.data.days).length, 0);
|
||||
do_check_false("notInitialized" in original);
|
||||
@ -310,6 +318,7 @@ add_task(function test_json_payload_simple() {
|
||||
|
||||
original = JSON.parse(yield reporter.getJSONPayload());
|
||||
do_check_eq(original.lastPingDate, reporter._formatDate(reporter.lastPingDate));
|
||||
do_check_eq(original.clientID, clientID);
|
||||
|
||||
// This could fail if we cross UTC day boundaries at the exact instance the
|
||||
// test is executed. Let's tempt fate.
|
||||
@ -636,6 +645,7 @@ add_task(function test_data_submission_success() {
|
||||
|
||||
let d = reporter.lastPingDate;
|
||||
let id = reporter.lastSubmitID;
|
||||
let clientID = reporter._state.clientID;
|
||||
|
||||
reporter._shutdown();
|
||||
|
||||
@ -643,6 +653,7 @@ add_task(function test_data_submission_success() {
|
||||
reporter = yield getReporter("data_submission_success");
|
||||
do_check_eq(reporter.lastSubmitID, id);
|
||||
do_check_eq(reporter.lastPingDate.getTime(), d.getTime());
|
||||
do_check_eq(reporter._state.clientID, clientID);
|
||||
|
||||
reporter._shutdown();
|
||||
} finally {
|
||||
@ -703,6 +714,9 @@ add_task(function test_request_remote_data_deletion() {
|
||||
do_check_neq(id, null);
|
||||
do_check_true(server.hasDocument(reporter.serverNamespace, id));
|
||||
|
||||
let clientID = reporter._state.clientID;
|
||||
do_check_neq(clientID, null);
|
||||
|
||||
defineNow(policy, policy._futureDate(10 * 1000));
|
||||
|
||||
let promise = reporter.requestDeleteRemoteData();
|
||||
@ -711,6 +725,16 @@ add_task(function test_request_remote_data_deletion() {
|
||||
do_check_null(reporter.lastSubmitID);
|
||||
do_check_false(reporter.haveRemoteData());
|
||||
do_check_false(server.hasDocument(reporter.serverNamespace, id));
|
||||
|
||||
// Client ID should be updated.
|
||||
do_check_neq(reporter._state.clientID, null);
|
||||
do_check_neq(reporter._state.clientID, clientID);
|
||||
do_check_eq(reporter._state.clientIDVersion, 1);
|
||||
|
||||
// And it should be persisted to disk.
|
||||
let o = yield CommonUtils.readJSON(reporter._state._filename);
|
||||
do_check_eq(o.clientID, reporter._state.clientID);
|
||||
do_check_eq(o.clientIDVersion, 1);
|
||||
} finally {
|
||||
reporter._shutdown();
|
||||
yield shutdownServer(server);
|
||||
@ -1096,3 +1120,78 @@ add_task(function test_state_downgrade_upgrade() {
|
||||
reporter._shutdown();
|
||||
}
|
||||
});
|
||||
|
||||
// Missing client ID in state should be created on state load.
|
||||
add_task(function* test_state_create_client_id() {
|
||||
let reporter = getHealthReporter("state_create_client_id");
|
||||
|
||||
yield CommonUtils.writeJSON({
|
||||
v: 1,
|
||||
remoteIDs: ["id1", "id2"],
|
||||
lastPingTime: Date.now(),
|
||||
removeOutdatedLastPayload: true,
|
||||
}, reporter._state._filename);
|
||||
|
||||
try {
|
||||
yield reporter.init();
|
||||
|
||||
do_check_eq(reporter.lastSubmitID, "id1");
|
||||
do_check_neq(reporter._state.clientID, null);
|
||||
do_check_eq(reporter._state.clientID.length, 36);
|
||||
do_check_eq(reporter._state.clientIDVersion, 1);
|
||||
|
||||
let clientID = reporter._state.clientID;
|
||||
|
||||
// The client ID should be persisted as soon as it is created.
|
||||
reporter._shutdown();
|
||||
|
||||
reporter = getHealthReporter("state_create_client_id");
|
||||
yield reporter.init();
|
||||
do_check_eq(reporter._state.clientID, clientID);
|
||||
} finally {
|
||||
reporter._shutdown();
|
||||
}
|
||||
});
|
||||
|
||||
// Invalid stored client ID is reset automatically.
|
||||
add_task(function* test_empty_client_id() {
|
||||
let reporter = getHealthReporter("state_empty_client_id");
|
||||
|
||||
yield CommonUtils.writeJSON({
|
||||
v: 1,
|
||||
clientID: "",
|
||||
remoteIDs: ["id1", "id2"],
|
||||
lastPingTime: Date.now(),
|
||||
removeOutdatedLastPayload: true,
|
||||
}, reporter._state._filename);
|
||||
|
||||
try {
|
||||
yield reporter.init();
|
||||
|
||||
do_check_neq(reporter._state.clientID, null);
|
||||
do_check_eq(reporter._state.clientID.length, 36);
|
||||
} finally {
|
||||
reporter._shutdown();
|
||||
}
|
||||
});
|
||||
|
||||
add_task(function* test_nonstring_client_id() {
|
||||
let reporter = getHealthReporter("state_nonstring_client_id");
|
||||
|
||||
yield CommonUtils.writeJSON({
|
||||
v: 1,
|
||||
clientID: 42,
|
||||
remoteIDs: ["id1", "id2"],
|
||||
lastPingTime: Date.now(),
|
||||
remoteOutdatedLastPayload: true,
|
||||
}, reporter._state._filename);
|
||||
|
||||
try {
|
||||
yield reporter.init();
|
||||
|
||||
do_check_neq(reporter._state.clientID, null);
|
||||
do_check_eq(reporter._state.clientID.length, 36);
|
||||
} finally {
|
||||
reporter._shutdown();
|
||||
}
|
||||
});
|
||||
|
Loading…
Reference in New Issue
Block a user