Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyphenated names #65

Open
missinglink opened this issue Jul 2, 2015 · 10 comments
Open

hyphenated names #65

missinglink opened this issue Jul 2, 2015 · 10 comments
Milestone

Comments

@missinglink
Copy link
Member

names such as 51 Friedrich-Richter-Straße (address-osmnode-2967205513) should be searchable using the tokens ['friedrich','richter','strasse'] as well as ['friedrichrichterstrasse'] and ['friedrich-richter-strasse']

@missinglink
Copy link
Member Author

@missinglink
Copy link
Member Author

this is how the peliasTwoEdgeGram currently tokenizes that address:
[ '51', 'fr', 'fri', 'frie', 'fried', 'friedr', 'friedri', 'friedric', 'friedrich', 'friedrich-' ]

@missinglink
Copy link
Member Author

Leonardo da Vinci–Fiumicino Airport should be searchable by Fiumicino Airport
http://pelias.mapzen.com/doc?id=geoname:6299619

@dianashk
Copy link
Contributor

Add acceptance-tests in order to gauge impact.

@orangejulius
Copy link
Member

Just checked, this is still an area we could improve. Something to think about for the near-ish future

@missinglink
Copy link
Member Author

missinglink commented Aug 3, 2016

This feature will require alt-names as the street name above can have 3 forms:

Friedrich-Richter-Straße
Friedrich Richter Straße
FriedrichRichterStraße

moving to alt-names milestone as it can only be solved for a maximum of 2 cases before then.

@amatissart
Copy link

amatissart commented Dec 7, 2017

I am facing a similar (maybe simpler ?) issue with french names.
A search for stade roland-garros should return similar results as stade roland garros

Would it help to add a hyphen - in the tokenizers pattern ? (see https://github.com/pelias/schema/blob/master/settings.js#L18) ?
Or would that cause serious regressions with other languages ?

@orangejulius
Copy link
Member

As of the last time we checked in, we were waiting for good alt-names support before tackling this feature. We now have that functionality, and its worth looking at this again.

My guess is that we would want to parse any streetnames coming in with formats like "Friedrich-Richter-Straße or Friedrich Richter Straße and store an alt-name of "FriedrichRichterStraße". This combined with proper hyphen handling would allow us to handle all 3 cases.

Some questions:
1.) would we want to tokenize on hyphens, or handle them in some different way?
2.) Where would we put the code to always take say, street names, and convert them to compound word form? My guess is pelias/model, so that it can be use by all importers. We probably want to start building up a common core of importer functionality anyway.

@missinglink
Copy link
Member Author

My guess is that we would want to parse any streetnames coming in with formats like "Friedrich-Richter-Straße or Friedrich Richter Straße and store an alt-name of "FriedrichRichterStraße". This combined with proper hyphen handling would allow us to handle all 3 cases.

Yes, that sounds correct

1.) would we want to tokenize on hyphens, or handle them in some different way?

I think tokenizing on hyphens would work, so long as we can handle the issues that tokenizing brings with it (such as not matching main st with main ave but at the same time matching E main st with W main st).

2.) Where would we put the code to always take say, street names, and convert them to compound word form? My guess is pelias/model, so that it can be use by all importers. We probably want to start building up a common core of importer functionality anyway.

I would be hesitant to put this logic in pelias/model, it's clearly super convenient but it might be better to have the code closer to the data (in the importer) so the importer could make data-specific decisions about it's data conventions and optionally apply locale-aware logic which is specific only to certain languages or geographies.

The other option would be to pass the locale information down to the pelias/model code so that it was able to work with that metadata.

@orangejulius
Copy link
Member

orangejulius commented Sep 9, 2019

Hi @Joxit,
Yes, it's long past time we merge this change or something like it. Let us run a quick full planet build with this branch and take a look. Pretty sure it will be something we can merge right away.

I'll let you know tomorrow :)

edit: oops, this was supposed to be a comment on #375

orangejulius added a commit to pelias/acceptance-tests that referenced this issue Sep 12, 2019
orangejulius added a commit to pelias/acceptance-tests that referenced this issue Sep 18, 2019
orangejulius added a commit to pelias/acceptance-tests that referenced this issue Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants