Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UK - Constituency changes for next HoC general election #385

Merged
merged 9 commits into from
Jun 24, 2024

Conversation

sguenther85
Copy link
Contributor

After the 2023 Periodic Review of Westminster constituencies the Parliamentary Constituencies Order 2023 the UK Gov introduced a lot of changes for the next general HoC election.

Including:
211 newly named constituencies (lost of redistricting)
Many abolished constituency names
Some disappearing and newly created seats

See Wiki for Summary and link below for law with table that lists new constituencies (pdf).
https://en.wikipedia.org/wiki/2023_Periodic_Review_of_Westminster_constituencies#New_and_abolished_constituencies
https://www.legislation.gov.uk/uksi/2023/1230/pdfs/uksi_20231230_en.pdf

More background:
https://commonslibrary.parliament.uk/constituency-boundary-review-data-for-new-constituencies/

Readme will follow. Due to the recent unexpected election (by date), i wanted to share the constituency changes first

…iamentary Constituencies Order 2023 the UK Gov introduced a lot of changes for the next general HoC election.
@sguenther85
Copy link
Contributor Author

Readme added

@jloutsenhizer jloutsenhizer self-requested a review June 3, 2024 06:56
Copy link
Contributor

@jloutsenhizer jloutsenhizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to double check the set of districts, but providing some early feedback

identifiers/country-gb.csv Outdated Show resolved Hide resolved
identifiers/country-gb/constituencies.csv Outdated Show resolved Hide resolved
identifiers/country-gb/constituencies.csv Outdated Show resolved Hide resolved
@jloutsenhizer
Copy link
Contributor

I checked the count of OCD IDs after filtering out aliased IDs and districts which are being abolished and the total count yields 543 which maches the expected count after the 2023 redistricting.

@sguenther85
Copy link
Contributor Author

@jloutsenhizer Thanks for checking.
i updated already the files. Please have a look for your requested changes.

@jpmckinney
Copy link
Member

I think we typically set validThrough to be one date before the new validFrom, to avoid having both old and new districts being considered valid on that day.

@sguenther85
Copy link
Contributor Author

I think we typically set validThrough to be one date before the new validFrom, to avoid having both old and new districts being considered valid on that day.

@jpmckinney done

@jpmckinney
Copy link
Member

I see a few dates like "The boundaries of X will..." "X will have its area reduced..." can we change these to pure dates? We can have an extra column if important to retain the sentences (not sure what to name the column).

@sguenther85
Copy link
Contributor Author

what would it look like if we changed it to pure dates as an example?
maybe we should just leave the column empty in that case?
i thought the official info would be helpful, but i didn't want to make it more complicated than it already is here in gb ;)

@jpmckinney
Copy link
Member

jpmckinney commented Jun 7, 2024

You have some values like "2024-07-04: The boundaries of Berwick-upon-Tweed will change, and it will be renamed North Northumberland". I would just change them to "2024-07-04". We don't want the column empty if it is a new or abolished district.

@sguenther85
Copy link
Contributor Author

You have some values like "2024-07-04: The boundaries of Berwick-upon-Tweed will change, and it will be renamed North Northumberland". I would just change them to "2024-07-04". We don't want the column empty if it is a new or abolished district.

done

@jpmckinney
Copy link
Member

Oh, my apologies, I misread the CSV and thought those values were in the validFrom/validThrough columns. They are fine in the sameAsNote column. You can reset to the previous commit and force push.

How are you deciding which are "sameAs" and which are "boundaries changed and division renamed"?

I think @chris48s @showerst @symroe had opinions on GB divisions when they were first added.

@sguenther85
Copy link
Contributor Author

done. reset to the previous version.

we have decided this on the basis of the "Civics Common Standard Data Specification" here

@jpmckinney
Copy link
Member

Hmm, reading that, I still don't know how are you deciding which to create aliases for, and which to make invalid / create new. Can you demonstrate with one example?

@sguenther85
Copy link
Contributor Author

sguenther85 commented Jun 11, 2024

  • If a district's ID is based on its name, and the district is renamed after redistricting, create a new ID based on the new district name and add an alias where id = oldId and sameAs = newId. This canonicalizes the newId as usage of oldId maps to newId.
    = Berwick-upon-Tweed sameAs North Northumberland by official source description: The boundaries of Berwick-upon-Tweed will change, and it will be renamed North Northumberland.
  • For districts that no longer exist, update the ValidThrough field with the date redistricting went into effect.
    = Bethnal Green and Bow validTorugh because of official source description: Bethnal Green and Bow will be split between 2 successor constituencies, with the bulk of the population moving to the new Bethnal Green and Stepney seat.
    = Bethnal Green and Stepney validFrom because of the description above
  • No Changes
    = Birmingham, Edgbaston keeps the id and name because of official source desccription:The boundaries of Birmingham, Edgbaston will change.

You will have always sameAs if the boundaries AND the name is changed.

Here is an overview over all current and new ocd's from our colleague with the official description of the changes
Summary of how current constituencies will change, and their closest successors.xlsx

@jpmckinney
Copy link
Member

Thanks! For the "Summary of change" column, where is that from? I checked the links in issue description, but it was not obvious.

@sguenther85
Copy link
Contributor Author

Sure:
https://commonslibrary.parliament.uk/boundary-review-2023-which-seats-will-change/ -> btn: Full Scren Version -> Download all data (xlsx, 684KB)

And inside the .xlsx the 2nd tab

@jpmckinney
Copy link
Member

jpmckinney commented Jun 13, 2024

Ok, so looks like the logic is:

  • If the "Summary of change" contains "will be split between" then it's considered abolished, unless the name is unchanged (allowing for differences in punctuation). I count 108 with the same name out of 265 "will be split", leaving 157 to be abolished.
    • The CSV has 158 with an end date of 2024-07-03, instead of 157.
    • Delete newcastle_upon_tyne_north_2023
    • Remove validThrough from newcastle_upon_tyne_north (to get divisions with end dates down to 157)
    • The 108 with the same name persist in the CSV (except for newcastle_upon_tyne_north as mentioned).
    • The unimportant differences in punctuation were:
      • Birmingham, Ladywood
      • Birmingham, Perry Barr
      • Birmingham, Yardley
      • Liverpool, Riverside
      • Liverpool, Walton
      • Liverpool, Wavertree
      • Liverpool, West Derby
      • Sheffield, Heeley
  • 61 "will be unchanged", with some unimportant differences in punctuation:
    • Southampton, Itchen
    • Southampton, Test
  • Ayr, Carrick and Cumnock should keep the , (use CSV quoting). This probably applies to a few others of the form "X, Y and Z". I found:
    • Argyll, Bute and South Lochaber
    • Berwickshire, Roxburgh and Selkirk
    • Brecon, Radnor and Cwm Tawe
    • Caithness, Sutherland and Easter Ross
    • Dumfriesshire, Clydesdale and Tweeddale
    • Harborough, Oadby and Wigston
    • Inverness, Skye and West Ross-shire
    • Moray West, Nairn and Strathspey
    • Motherwell, Wishaw and Carluke
    • Oldham West, Chadderton and Royton
    • Pontefract, Castleford and Knottingley
    • Ruislip, Northwood and Pinner
    • Stone, Great Wyrley and Penkridge
  • 258 have the same name and "The boundaries of X will change.", or "X will have its area enlarged.", "X will have its area enlarged and have an additional small boundary change." "X will have its area reduced", "X will have a small change to its boundaries.", "X will be almost unchanged". 10 more have unimportant punctuation changes:
    • Birmingham, Edgbaston
    • Birmingham, Erdington
    • Birmingham, Northfield
    • Birmingham, Selly Oak
    • Brighton, Pavilion
    • Manchester, Withington
    • Plymouth, Sutton and Devonport
    • Sheffield, Brightside and Hillsborough
    • Sheffield, Hallam
    • Weston-Super-Mare
    • Plymouth, Sutton and Devonport
    • Sheffield, Brightside and Hillsborough
  • 53 have "renamed", excluding 2 with unimportant differences in punctuation below. Brighton Kemptown and Peacehaven was renamed, but this isn't noted in the summary of changes. Total: 56, including punctuation only.
    • Plymouth, Moor View
    • Ealing, Southall
    • CSV has 53 sameAs, so it is missing one. (See Montgomeryshire and Glyndwr below, as it's a bit more complicated.)
  • CSV has 160 new (2024-07-04\n). Unless I miscounted, seems like we have 3 more active than before (i.e. than were abolished), but the number of districts should be the same before and after.
    • Montgomeryshire should either be abolished, or, if we follow the rename logic, be sameAs Montgomeryshire and Glyndwr, without the latter being "new" (though Montgomeryshire and Glyndwr also succeeds Clwyd South in the sheet, but thankfully Clwyd South itself is abolished – I haven't checked if any others do this).
    • ocd-division/country:gb/part:eng/region:ukc/ed:berwick-upon-tweed should not have a validFrom, since it is sameAs over time.
    • newcastle_upon_tyne_north above is the last one? Anyway, double-check when done that ,2024-07-03, matches number of 2024-07-04\n

Regexes used:

  • "will be split" with the same name (note: doesn't match if punctuation changes): ^\S+\t([^\t]+)\t[^\t]+\t\1\t[^\t]+\t[^\t]+\t\1 will be split
  • "will be split" with a different name (note: matches even if only punctuation changes) ^\S+\t([^\t]+)\t[^\t]+\t(?!\1\t)[^\t]*\t[^\t]+\t[^\t]+\t\1 will be split.+
  • ^\S+\t([^\t]+)\t[^\t]+\t\1\t[^\t]+\t[^\t]+\tThe boundaries of \1 will change\.
  • ^\S+\t([^\t]+)\t[^\t]+\t\1\t[^\t]+\t[^\t]+\t\1 (?=(will have its area (enlarged|reduced)( and have an additional small boundary change)?|will have a small change to its boundaries|will be almost unchanged)\.\n)

# Remove validThrough from newcastle_upon_tyne_north (to get divisions with end dates down to 157)
# unimportant differences in punctuation
# keep the , (use CSV quoting)
# ocd-division/country:gb/part:eng/region:ukc/ed:berwick-upon-tweed should not have a validFrom, since it is sameAs over time.
# Montgomeryshire should either be abolished, or, if we follow the rename logic, be sameAs Montgomeryshire and Glyndwr, without the latter being "new" (though Montgomeryshire and Glyndwr also succeeds Clwyd South in the sheet, but thankfully Clwyd South itself is abolished – I haven't checked if any others do this).
@sguenther85
Copy link
Contributor Author

@jpmckinney all done.
We have now 157 validThrough, but 159 validFrom

@jpmckinney
Copy link
Member

Is there any way we can track down the difference? We should have 650 valid divisions both before 2024-07-03 and after 2024-07-04 (not including sameAs).

Also, I intended only the punctuation in the "Ayr, Carrick and Cumnock" list to be updated. I only noted the others to assist me when comparing names before (which had more commas) and after (which had fewer).

@jpmckinney
Copy link
Member

FWIW, this is why I write Python or Ruby scripts for Canada. If we can download the XLSX and then write code to update constituencies, then it's much easier to verify and make changes as needed. Right now, it's all quite hard to verify.

@sguenther85
Copy link
Contributor Author

@jpmckinney i made an update.
I created now the new ocd file via script and now we have
157 validThrough and validFrom
54 sameAs

And in the end we have again 650 constituencies.
So everthing looks fine now.

fyi: i found trough the script the two entries, whre walidFrom was set but there was also a sameAs reference for this entry.

@sguenther85
Copy link
Contributor Author

@jpmckinney friendly ping, as the election is not far away
@jloutsenhizer @HKSenior

@jpmckinney
Copy link
Member

Thank you! I typically commit the script as well under scripts/.

@sguenther85
Copy link
Contributor Author

@jpmckinney hehe, maybe next time. I wrote the script with js and html output and really dirty with the sources as .csv files.

But can anybody merge the pull request now?

@HKSenior HKSenior merged commit 0f22faf into opencivicdata:master Jun 24, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants