-
Notifications
You must be signed in to change notification settings - Fork 30
Plan
- UPRN does become a hard requirement. I can't find any areas who didn't give us UPRNs at all for
parl.2017
(even South Ribble, Hartlepool, Blaenau Gwent, Barnsley, Torfaen which were all possibilities) - It will be fine for Xpress, Halarose or DCounts users
- It will be fine for GIS data
- Idox users may have problems
- There are some areas with low UPRN coverage (e.g: Haringey - only 93% with UPRNs)
- Even where there is low UPRN coverage, the number of rows with no UPRN will be roughly similar to the number of ambiguous rows we're discarding using the current processes
- Postgres can't join across databases
- You can have multiple schema in the same database and join across them
- How does Django like this setup?
- Is django happy for multiple apps to live in schema within the same database?
- Deploy changes
- How would this setup impact on your local dev copy?
You need to work out where AddressBase, ONSAD, ONSPD will live:
- Same DB?
- Different DB?
- Same DB, different schema?
Factors to consider:
- Do you want to put polling station data directly into the AddressBase table?
- Do you want to have a 1-to-1 relationship?
- Do you actually need to do
JOIN
s or FKs?
Sort this out first - it is important and it defines constraints on your data model.
[Captain Hindsight]
https://stackoverflow.com/questions/35404442/django-orm-confusion-about-router-allow-relation
In WhereDIV we can only have a FK relationship between Loggedpostcode and Council because at some point we have run the councils importer on the logger DB so all the IDs are in that table. It isn't FK-ing across connections.
Postgres explicitly doesn't support cross-db JOINs http://wiki.postgresql.org/wiki/FAQ#How_do_I_perform_queries_using_multiple_databases.3F
- postgres_fdw - can't find ANY docs on using this with django :(
- contrib/dblink - not supported in django
- Different DB doesn't work
- Same DB different schema does work but fundamentally different DB connections
- either way you'd have to use loads of raw SQL not the ORM
- You don't need blacklist anymore
- In principle, you don't need
PollingDistrict
model, but it might be practical to store them in a DB either for performance or consistency checking (but don't query them interactively) - There is no reason for
ResidentialAddress
andAddress
to be different models - you only need one address model. - You should only need
Address
(addressbase) andPollingStation
(again, in principle) - These data model changes impact on the API endpoints
Most of the proposed work on the AddressBase app was done under
-
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1063
-
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1065
-
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1068
-
https://github.com/DemocracyClub/UK-Polling-Stations/pull/1070
-
You need to account for a new selection of edge cases:
- UPRN doesn't match address
- Same UPRN in files from 2 local auths
- Duplicate UPRNs:
- Actual dupes/conflicts
- 1a + 1b
- UPRN not in our AddrsssBase
- Overlapping polygons --> race conditions
- UPRN in AddressBase but not in ONSUD
- Data model changes - lots of work in
models.py
(see Database section) -
get_polling_station()
will need substantial changes
Fundamental question: In the front-end, do we always show an address picker?
This has impact on
- UX/accessability
- Licence considerations
- API spec
- API users: EC, WhoCIVF, Labour, widget (remember widget does not check EE)
- Directions source point:
- If we always show an address picker, we can use doorstep grid refs for source point
- If not, we use centroid, or inconsistently use centroid/doorsetp
- EE
- If we always show a picker, we can call the EE API by grid ref and that abstracts a lot of issues - we can shift the issue to the client app
- If not, we need EE and WhereDIV to be 'in step'
- Polling Districts must be a strict subset of current electoral boundaries (i.e: the votes at a single station won't be split across multiple posts) but not historic. That only helps us in areas where we do hold data though.
- Sort this out early on
- Design of
data_finder
app
[Captain Hindsight]
We decided we wanted to retain the "if everyone with your postcode votes at the same place, we don't show a picker" at the expense of:
- providing a centroid to EE
- using a centroid as the source point for directions
- Feedback app: No changes
- NUS Wales app: No changes
- Whitelabel app: No changes
- Large chunks of this do need to be rewritten and re-imagained.
- A lot of code needs to be deleted, rather than anything else
Shouldn't need to make any changes to:
s3wrapper.py
geo_utils.py
filehelpers.py
loghelper.py
- Data collection views (models need a bit of changing) - should we just bin this though?
Everything else needs substantial changes.
Should you:
- Try to keep the top-level interface surface the same for import classes (i.e: try to maintain the current public method signatures/returns) and just modify the implementation details OR
- Is this your opportunity to bin it off and start again?
- Could you just re-write
BaseStationsImporter
-
BaseDistrictsImporter
(do you even need this?) -
BaseAddressesImporter
and then minimise changes at the next level of abstraction?
maybe you could do that for BaseStationsImporter
, but not BaseAddressesImporter
- Need to think about performance at import time as well as query time. Ensure you are not creating a crazy-slow process.
- Think about tooling for checking importers (reports, logging, etc)
- There are checks you do now (e.g: checking that the number of districts in the input file is the same as the number in the DB) that won't 'translate' to the new model. You will need to find new ways to check the data/debug import scripts.
- Fortunately there is a lot of test/sample data to play with..
[Captain Hindsight]
If you make queries like
UPDATE address SET polling_station_id='foo' WHERE uprn IN (0001, 0002...)
the performance is surprisingly good. Prototype implementation was able to attach station ids to ~7.2 million UPRNs in about 10 mins running 4x scripts in parallell (~100 import scripts).
-
views.py
- Try to keep
BasePollingStationView
fairly consistent -
PostcodeView
andAddressView
need major changes (or may be both replaced byUPRNView
if everyone sees an address picker) -
AddressForm
- major rewrite -
MultipleCouncilsView
- you'd think this is not needed, but if you say "my address not in list" on a split postcode, it is still relevant. -
WeDontKnowView
- changes needed
- Try to keep
- Helpers:
-
LoggedPostcode
-->LoggedUPRN
> (low priority) - Geocoders need to go, but review the use-cases for
geocode()
andgeocode_point_only()
again. Review v. thoroughly before thinking about implementation. https://gist.github.com/chris48s/3fc6b354dec4de6ae7d85b029f7ef5d1 -
get_council()
- changes needed -
AddressSorter
- probably keep it, but depends on how you are storing AddressBase. - EE wrapper will need to reflect chanes to EE but until you've made them, leave it as-is.
-
DirectionsHelper
- Fine. Leave it as it is. -
RoutingHelper
- heavily dependent on 'do we show all users an address picker'? but definitely needs some changes. If everyone sees an address picker, do we even need this? - Directions clients - fine
-
- Remember to account for Northern Ireland correctly
- Need new tests to account for new behaviour
[Captain Hindsight]
Never made it this far
- Data Quality list - changes
- Address Select - text edits, but nothing major
- Multiple Councils ??
- Postcode view - likely to need some edits to account for changes to views.
- Rest is prob. fine
[Captain Hindsight]
Never made it this far