Plan

UPRNs as a first-class Citizen in WhereDIV: 2 - Plan

Data requirements

UPRN does become a hard requirement. I can't find any areas who didn't give us UPRNs at all for parl.2017 (even South Ribble, Hartlepool, Blaenau Gwent, Barnsley, Torfaen which were all possibilities)
It will be fine for Xpress, Halarose or DCounts users
It will be fine for GIS data
Idox users may have problems
There are some areas with low UPRN coverage (e.g: Haringey - only 93% with UPRNs)
Even where there is low UPRN coverage, the number of rows with no UPRN will be roughly similar to the number of ambiguous rows we're discarding using the current processes

Where does AddressBase live?

Postgres can't join across databases
You can have multiple schema in the same database and join across them
How does Django like this setup?
Is django happy for multiple apps to live in schema within the same database?
Deploy changes
How would this setup impact on your local dev copy?

You need to work out where AddressBase, ONSAD, ONSPD will live:

Same DB?
Different DB?
Same DB, different schema?

Factors to consider:

Do you want to put polling station data directly into the AddressBase table?
Do you want to have a 1-to-1 relationship?
Do you actually need to do JOINs or FKs?

Sort this out first - it is important and it defines constraints on your data model.

[Captain Hindsight]

https://stackoverflow.com/questions/35404442/django-orm-confusion-about-router-allow-relation

In WhereDIV we can only have a FK relationship between Loggedpostcode and Council because at some point we have run the councils importer on the logger DB so all the IDs are in that table. It isn't FK-ing across connections.

Postgres explicitly doesn't support cross-db JOINs http://wiki.postgresql.org/wiki/FAQ#How_do_I_perform_queries_using_multiple_databases.3F

postgres_fdw - can't find ANY docs on using this with django :(

contrib/dblink - not supported in django

Different DB doesn't work

Same DB different schema does work but fundamentally different DB connections

either way you'd have to use loads of raw SQL not the ORM

Database

You don't need blacklist anymore
In principle, you don't need PollingDistrict model, but it might be practical to store them in a DB either for performance or consistency checking (but don't query them interactively)
There is no reason for ResidentialAddress and Address to be different models - you only need one address model.
You should only need Address (addressbase) and PollingStation (again, in principle)
These data model changes impact on the API endpoints

AddressBase App

Most of the proposed work on the AddressBase app was done under

https://github.com/DemocracyClub/UK-Polling-Stations/pull/1063
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1065
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1068
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1070
You need to account for a new selection of edge cases:
- UPRN doesn't match address
- Same UPRN in files from 2 local auths
- Duplicate UPRNs:
  - Actual dupes/conflicts
  - 1a + 1b
- UPRN not in our AddrsssBase
- Overlapping polygons --> race conditions
- UPRN in AddressBase but not in ONSUD

Polling Stations App

Data model changes - lots of work in models.py (see Database section)
get_polling_station() will need substantial changes

Address Pickers

Fundamental question: In the front-end, do we always show an address picker?

This has impact on

UX/accessability
Licence considerations
API spec
API users: EC, WhoCIVF, Labour, widget (remember widget does not check EE)
Directions source point:
- If we always show an address picker, we can use doorstep grid refs for source point
- If not, we use centroid, or inconsistently use centroid/doorsetp
EE
- If we always show a picker, we can call the EE API by grid ref and that abstracts a lot of issues - we can shift the issue to the client app
- If not, we need EE and WhereDIV to be 'in step'
- Polling Districts must be a strict subset of current electoral boundaries (i.e: the votes at a single station won't be split across multiple posts) but not historic. That only helps us in areas where we do hold data though.
- Sort this out early on
Design of data_finder app

[Captain Hindsight]

We decided we wanted to retain the "if everyone with your postcode votes at the same place, we don't show a picker" at the expense of:

providing a centroid to EE

using a centroid as the source point for directions

Stuff you don't need to change

Feedback app: No changes
NUS Wales app: No changes
Whitelabel app: No changes

Data Collection App

Large chunks of this do need to be rewritten and re-imagained.
A lot of code needs to be deleted, rather than anything else

Shouldn't need to make any changes to:

s3wrapper.py
geo_utils.py
filehelpers.py
loghelper.py
Data collection views (models need a bit of changing) - should we just bin this though?

Everything else needs substantial changes.

Should you:

Try to keep the top-level interface surface the same for import classes (i.e: try to maintain the current public method signatures/returns) and just modify the implementation details OR
Is this your opportunity to bin it off and start again?
Could you just re-write
- BaseStationsImporter
- BaseDistrictsImporter (do you even need this?)
- BaseAddressesImporter and then minimise changes at the next level of abstraction?

maybe you could do that for BaseStationsImporter, but not BaseAddressesImporter

Need to think about performance at import time as well as query time. Ensure you are not creating a crazy-slow process.
Think about tooling for checking importers (reports, logging, etc)
There are checks you do now (e.g: checking that the number of districts in the input file is the same as the number in the DB) that won't 'translate' to the new model. You will need to find new ways to check the data/debug import scripts.
Fortunately there is a lot of test/sample data to play with..

[Captain Hindsight]

If you make queries like UPDATE address SET polling_station_id='foo' WHERE uprn IN (0001, 0002...) the performance is surprisingly good. Prototype implementation was able to attach station ids to ~7.2 million UPRNs in about 10 mins running 4x scripts in parallell (~100 import scripts).

Data Finder App

views.py
- Try to keep BasePollingStationView fairly consistent
- PostcodeView and AddressView need major changes (or may be both replaced by UPRNView if everyone sees an address picker)
- AddressForm - major rewrite
- MultipleCouncilsView - you'd think this is not needed, but if you say "my address not in list" on a split postcode, it is still relevant.
- WeDontKnowView - changes needed
Helpers:
- LoggedPostcode --> LoggedUPRN> (low priority)
- Geocoders need to go, but review the use-cases for geocode() and geocode_point_only() again. Review v. thoroughly before thinking about implementation. https://gist.github.com/chris48s/3fc6b354dec4de6ae7d85b029f7ef5d1
- get_council() - changes needed
- AddressSorter - probably keep it, but depends on how you are storing AddressBase.
- EE wrapper will need to reflect chanes to EE but until you've made them, leave it as-is.
- DirectionsHelper - Fine. Leave it as it is.
- RoutingHelper - heavily dependent on 'do we show all users an address picker'? but definitely needs some changes. If everyone sees an address picker, do we even need this?
- Directions clients - fine
Remember to account for Northern Ireland correctly
Need new tests to account for new behaviour

[Captain Hindsight]

Never made it this far

Templates

Data Quality list - changes
Address Select - text edits, but nothing major
Multiple Councils ??
Postcode view - likely to need some edits to account for changes to views.
Rest is prob. fine

[Captain Hindsight]

Never made it this far

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan

UPRNs as a first-class Citizen in WhereDIV: 2 - Plan

Data requirements

Where does AddressBase live?

Database

AddressBase App

Polling Stations App

Address Pickers

Stuff you don't need to change

Data Collection App

Data Finder App

Templates

Clone this wiki locally