-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect altnames that are a substring of name.default #548
base: master
Are you sure you want to change the base?
Conversation
e775ef6
to
9a0b934
Compare
@@ -892395,8 +892389,7 @@ | |||
"_type": "_doc", | |||
"data": { | |||
"name": { | |||
"default": "IPOH Asian House" | |||
}, | |||
"default": "IPOH Asian House" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-issue: this is weirdly indented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, oops! My fault! I had a try at editing the fixture manually since there were only a few changes.
Should have left it to the machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, it's really not an issue, I just noticed it a few times in the PR, like the trim( tags[key]))
catches my eye.
the Pelias code styling used to be more hit-and-miss, I've been using autoformatting in my editor for a while now which hopefully fixes a bunch of the little ones.
at some point I'd still love to fully adopt standardJS.
9a0b934
to
481144e
Compare
This makes it easier to add custom logic by working through the tags in a specified order.
This handles the case where one alt name is a substring fully contained in another.
481144e
to
9c364a5
Compare
I came across this PR today and wanted to see if it still made a difference, so I've rebased it and kicked off a planet build to test things out. |
This change is an attempt to mitigate scoring penalties applied to documents with alternate names (#507).
It handles the case where an alt name is merely a substring of the main name, for example on the Union Square subway stop in OSM:
Alt names like this don't add much value: they don't allow searching on any new terms, but do throw off the scoring. Even when we fix the scoring issue, duplicate alt names that add no value still take up space, so this change should be useful for a while.
The change comes in 2 parts, each in their own commit:
name.default
, then the othername.*
fields, and finally the rest. This makes it easier to write logic that looks for duplicatesI'd be happy to extend this in the future with other near-identical alt names, such as handling
Mc Donalds
vsMcDonalds
or ignoring quotes or other special characters like in pelias/api#1488.