-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: optionally perform multiple PIP lookups per doc #300
base: master
Are you sure you want to change the base?
Conversation
I think from my perspective, I'm most interested in applying this to
points, since it's fully qualified street addresses where we generally see
postalcities/realestatecities issues. Two things I worry about for points
are 1) we might end up adding a nontrivial number of new tokens 2) tuning
the radius/sampling seems tough. But I'm open to helping out with this
approach.
For lines, it seems helpful for the long roads + interpolation use case and
maybe the best solution, though I think that probably having some distance
sampling would be helpful.
For polygons, I'm struggling to figure out what kinds of queries this would
help? It seems like the main use case there is searching for a neighborhood
near the interface between two cities?
…On Wed, Sep 9, 2020 at 4:13 AM Peter Johnson ***@***.***> wrote:
Hi @orangejulius <https://github.com/orangejulius>, @Joxit
<https://github.com/Joxit>, @blackmad <https://github.com/blackmad>
I had a thought last night that we can fairly easily improve recall for
queries where the user enters the name of a nearby parent, instead of the
parent assigned by PIP. ie. they get the neighbourhood wrong.
We already have the postal cities mapping, which works well when a
postcode is present.
The postal cities mapping adds aliases to the parent field, so a record
can have multiple 'neighbourhood' values, for instance.
We can extend on this further by performing *multiple* point-in-polygon
lookups per document and recording each of the additionally matched parents
as an alias.
I threw this PR together quickly, so it's not exactly what I would
recommend merging, but I wanted to solicit feedback on the general idea,
which is:
- use the doc.getCentroid() for the primary parent info
- if there is a 'meta' property specified with additional points, use
results from those lookups for aliases of the parent.
The wof-admin-lookup module would not be responsible for determining
*which* additional points to use, we can update the importers accordingly
to use this functionality as required, varying the amount of points based
on geometry type and layer.
- I think we can begin with adding *two additional points* to the
polyline importer, so we PIP the start and end points of a street
additionally to the midpoint
- We may also want to apply this logic to some of the "lower level"
WOF records, such as neighbourhoods. We could provide *four additional
points* at the corners of the bbox, or even extend this to the 8
compass directions.
- Finally we *may* want to also apply this to points using a similar
method, this would greatly improve the 'postal cities' and 'realestate
cities' issues at the cost of significantly more PIP work.
Below is a pretty picture I drew to illustrate how this might work for
point, linestring and polygon geometry types, in each case the poorly
draw pin is the centroid we're currently using and the crosshairs
highlighted in yellow represent *additional* points we might lookup for
aliases
[image: IMG_20200909_095004_2]
<https://user-images.githubusercontent.com/738069/92572330-948b8d80-f284-11ea-808e-bfa53158c6fb.jpg>
------------------------------
You can view, comment on, or merge this pull request online at:
#300
Commit Summary
- feat(multi-pip): optionally perform multiple PIP lookups per doc
File Changes
- *M* src/lookupStream.js
<https://github.com/pelias/wof-admin-lookup/pull/300/files#diff-090cfd642c6bf770bd70183d6cfa86ef>
(130)
Patch Links:
- https://github.com/pelias/wof-admin-lookup/pull/300.patch
- https://github.com/pelias/wof-admin-lookup/pull/300.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#300>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADMZMEGFKTCZNQBWLWF2NTSE42ITANCNFSM4RBO2RPA>
.
--
*David Blackman*
creative technologist & wandering
help me find my purpose <http://purpose.blackmad.com>
|
Interesting. This definitely makes the most sense IMO for roads (lines), since we have the current problem that we can only assign a single neighbourhood, borough, city, etc to the road where clearly it might equally belong to multiple. There would still be issues with display, but at least we can make search function better. Somewhat related, and more complicated to implement, would be a PIP step at query time for interpolated results. This would prevent a street where the centroid is in City A from forcing all interpolated results to belong to that city, even if a large portion of the road (and therefore the addresses on the road) belong to nearby City B. |
Do we currently have a way in the index to add on secondary admin tokens
that we won't accidentally use to construct labels/addreses?
…On Wed, Sep 9, 2020 at 2:42 PM David Blackman ***@***.***> wrote:
I think from my perspective, I'm most interested in applying this to
points, since it's fully qualified street addresses where we generally see
postalcities/realestatecities issues. Two things I worry about for points
are 1) we might end up adding a nontrivial number of new tokens 2) tuning
the radius/sampling seems tough. But I'm open to helping out with this
approach.
For lines, it seems helpful for the long roads + interpolation use case
and maybe the best solution, though I think that probably having some
distance sampling would be helpful.
For polygons, I'm struggling to figure out what kinds of queries this
would help? It seems like the main use case there is searching for a
neighborhood near the interface between two cities?
On Wed, Sep 9, 2020 at 4:13 AM Peter Johnson ***@***.***>
wrote:
> Hi @orangejulius <https://github.com/orangejulius>, @Joxit
> <https://github.com/Joxit>, @blackmad <https://github.com/blackmad>
>
> I had a thought last night that we can fairly easily improve recall for
> queries where the user enters the name of a nearby parent, instead of the
> parent assigned by PIP. ie. they get the neighbourhood wrong.
>
> We already have the postal cities mapping, which works well when a
> postcode is present.
> The postal cities mapping adds aliases to the parent field, so a record
> can have multiple 'neighbourhood' values, for instance.
>
> We can extend on this further by performing *multiple* point-in-polygon
> lookups per document and recording each of the additionally matched parents
> as an alias.
>
> I threw this PR together quickly, so it's not exactly what I would
> recommend merging, but I wanted to solicit feedback on the general idea,
> which is:
>
> - use the doc.getCentroid() for the primary parent info
> - if there is a 'meta' property specified with additional points, use
> results from those lookups for aliases of the parent.
>
> The wof-admin-lookup module would not be responsible for determining
> *which* additional points to use, we can update the importers
> accordingly to use this functionality as required, varying the amount of
> points based on geometry type and layer.
>
> - I think we can begin with adding *two additional points* to the
> polyline importer, so we PIP the start and end points of a street
> additionally to the midpoint
> - We may also want to apply this logic to some of the "lower level"
> WOF records, such as neighbourhoods. We could provide *four
> additional points* at the corners of the bbox, or even extend this to
> the 8 compass directions.
> - Finally we *may* want to also apply this to points using a similar
> method, this would greatly improve the 'postal cities' and 'realestate
> cities' issues at the cost of significantly more PIP work.
>
> Below is a pretty picture I drew to illustrate how this might work for
> point, linestring and polygon geometry types, in each case the poorly
> draw pin is the centroid we're currently using and the crosshairs
> highlighted in yellow represent *additional* points we might lookup for
> aliases
>
> [image: IMG_20200909_095004_2]
> <https://user-images.githubusercontent.com/738069/92572330-948b8d80-f284-11ea-808e-bfa53158c6fb.jpg>
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> #300
> Commit Summary
>
> - feat(multi-pip): optionally perform multiple PIP lookups per doc
>
> File Changes
>
> - *M* src/lookupStream.js
> <https://github.com/pelias/wof-admin-lookup/pull/300/files#diff-090cfd642c6bf770bd70183d6cfa86ef>
> (130)
>
> Patch Links:
>
> - https://github.com/pelias/wof-admin-lookup/pull/300.patch
> - https://github.com/pelias/wof-admin-lookup/pull/300.diff
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#300>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AADMZMEGFKTCZNQBWLWF2NTSE42ITANCNFSM4RBO2RPA>
> .
>
--
*David Blackman*
creative technologist & wandering
help me find my purpose <http://purpose.blackmad.com>
--
*David Blackman*
creative technologist & wandering
help me find my purpose <http://purpose.blackmad.com>
|
Yes, that already exists just fine, using the Code can call doc.addParent(...) multiple times and only the first will be used for display. |
Interesting, a few years ago (whosonfirst-data/whosonfirst-data#1094 (comment)), I had had some problems with addresses assigned to the neighboring locality. I think this might fix this kind of issue (if it still exists). In my case we fixed this with data update. What should be the distance from the original point to take ? Delta in degree ? Meter ? This sound promising ! |
Hi @orangejulius, @Joxit, @blackmad
I had a thought last night that we can fairly easily improve recall for queries where the user enters the name of a nearby parent, instead of the parent assigned by PIP. ie. they get the neighbourhood wrong.
We already have the postal cities mapping, which works well when a postcode is present.
The postal cities mapping adds aliases to the parent field, so a record can have multiple 'locality' values, for instance.
We can extend on this further by performing multiple point-in-polygon lookups per document and recording each of the additionally matched parents as an alias.
I threw this PR together quickly, so it's not exactly what I would recommend merging, but I wanted to solicit feedback on the general idea, which is:
doc.getCentroid()
for the primary parent infoThe
wof-admin-lookup
module would not be responsible for determining which additional points to use, we can update the importers accordingly to use this functionality as required, varying the amount of points based on geometry type and layer.polyline
importer, so we PIP the start and end points of a street additionally to the midpointBelow is a pretty picture I drew to illustrate how this might work for
point
,linestring
andpolygon
geometry types, in each case the poorly draw pin is the centroid we're currently using and the crosshairs highlighted in yellow represent additional points we might lookup for aliases