-
Notifications
You must be signed in to change notification settings - Fork 30
Stats and Feedback
We record usage in the data_finder_loggedpostcode
table on the logger DB. Each record represents a search made via the front-end or the API.
After an election, I usually produce stats in the following way:
-
Pick the interval we care about. I usually use midnight on Monday to close of polls at 10pm on polling day for WDIV as almost all of the meaningful traffic WDIV sees is in this period. Remember that timestamps are UTC, so adjust depending on whether we're in BST or GMT. i.e: for the Euro elections in May 2019, the interval was
2019-05-19 23:00:00 - 2019-05-23 21:00:00
, whereas for the general election in Dec 2019, it was2019-12-09 00:00:00 - 2019-12-12 22:00:00
. Usually major scheduled elections are in May, so we're using BST. -
Run a query like:
SELECT * FROM data_finder_loggedpostcode WHERE postcode NOT IN ('CF105AJ', 'BS44NN', 'GL12EQ', 'EXAMPLE') AND created>'2019-05-19 23:00:00' AND created<='2019-05-23 21:00:00';
on the prod logger DB.
CF105AJ
andBS44NN
are postcodes that get used by load balancer pings or updown, so I exclude them to remove junk data from the stats. I then export this to a local fileraw.csv
-
Once I've got that, I pull the rest of the stats out locally instead of running queries against a production server. You could load
raw.csv
into a DB. I usually leave it as a CSV file and query it using q on the console. -
Run some queries. The queries I usually run are:
q -H -d "," "SELECT COUNT(*) FROM ./raw.csv WHERE created>'2019-05-22 23:00:00';" q -H -d "," "SELECT brand, COUNT(*) FROM ./raw.csv GROUP BY brand ORDER BY COUNT(*);" > by_brand.csv q -H -d "," "SELECT utm_medium, COUNT(*) FROM ./raw.csv GROUP BY utm_medium ORDER BY COUNT(*);" > by_utm_medium.csv q -H -d "," "SELECT api_user, COUNT(*) FROM ./raw.csv GROUP BY api_user ORDER BY COUNT(*);" > by_api_user.csv q -H -d "," "SELECT utm_source, COUNT(*) FROM ./raw.csv GROUP BY utm_source ORDER BY COUNT(*);" > by_utm_source.csv q -H -d "," "SELECT has_election, COUNT(*) FROM ./raw.csv GROUP BY has_election ORDER BY COUNT(*);" > by_has_election.csv q -H -d "," "SELECT had_data, COUNT(*) FROM ./raw.csv GROUP BY had_data ORDER BY COUNT(*);" > by_had_data.csv q -H -d "," "SELECT had_data, COUNT(*) FROM ./raw.csv WHERE has_election='t' GROUP BY had_data ORDER BY COUNT(*);" > by_had_data_had_election.csv
We also record user feedback from users. This is stored in the feedback_feedback
on the logger DB. I usually export it for the same time range we look at stats for e.g:
SELECT * FROM feedback_feedback
WHERE created>'2019-05-19 23:00:00'
AND created<='2019-05-23 21:00:00';
Generally the volume of feedback is sufficiently low that we can deal with this in a spreadsheet.
Athena logging needs a table of postcodes -> local authorities. We get this from EE, and it is election date and ONSPD specific, so name the file accordingly.
WITH org_geoms AS (SELECT oo.id AS org_id,
(SELECT og.gss
FROM organisations_organisationgeography og
WHERE og.organisation_id = oo.id
ORDER BY og.start_date DESC
LIMIT 1) AS gss,
(SELECT og.geography
FROM organisations_organisationgeography og
WHERE og.organisation_id = oo.id
ORDER BY og.start_date DESC
LIMIT 1) AS geom
FROM organisations_organisation oo)
SELECT onspd.pcds,
oo.official_identifier,
oo.common_name,
onspd.nuts,
oo.organisation_type,
oo.organisation_subtype,
org_geoms.gss
FROM organisations_organisation oo
JOIN org_geoms ON oo.id = org_geoms.org_id
JOIN uk_geo_utils_onspd onspd ON ST_WITHIN(onspd.location, org_geoms.geom)
WHERE oo.organisation_type = 'local-authority'
AND (oo.end_date IS NULL OR oo.end_date > '2022-05-05'::date)