Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Welsh language sorting office - research #107

Open
peterkwells opened this issue Apr 2, 2015 · 2 comments
Open

Welsh language sorting office - research #107

peterkwells opened this issue Apr 2, 2015 · 2 comments

Comments

@peterkwells
Copy link

We have built a free-format text parsing API, Sorting Office (https://sorting-office.openaddressesuk.org) that takes free-format addresses and turns them into well-structured addresses. It can do this because of our knowledge of how UK addresses are structured and of our knowledge of the building blocks (towns, postcodes, etc) that form UK addresses.

We know that the platform needs to support Welsh language addresses. The data model can support Welsh language addresses already.

But when we extend the thinking to services, like Sorting Office, we need to understand if or how Welsh language addresses differ.

What would be the algorithm for turning a free-format Welsh address into a structured address?

@peterkwells
Copy link
Author

Owen Blacker (@owenblacker) on the Twitter has stated that there is a 1-1 match: https://twitter.com/peterkwells/status/583544915524771840

Investigating in OS Open Names shows examples of the 1-1 match (this sample is from SS68.csv):

screen shot 2015-04-02 at 12 12 03

The gaps are interesting. When we implement the capability to learn new building blocks then we could help to fill in this missing information through people using the services.

Waiting to hear from others to confirm if the structure is the same and hence if 1-1 matching always works. e.g. we need to be careful of nuances such as building block 1 (road) is English but building block two (town) is Welsh.

@peterkwells
Copy link
Author

Update on mix/match of languages in building blocks for a single address: https://twitter.com/peterkwells/status/583591684975554561

TLDR: that's an even edgier edge case. Not something to support immediately, if ever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant