Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for polygon/linestring in results #823

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

red-fenix
Copy link

@red-fenix red-fenix commented Jul 11, 2024

At the moment, photon only returns the point of a location, and not the polygon (see #259). This PR will add the option to add the polygon (i.e. geometry) to the Elasticsearch Index and a API parameter polygon to return said polygon. If no polygon exists, the point is returned.

WARNING: This will increase the Elasticsearch Index size! (~575GB for a Planet import).

To enable: add the command line argument -use-geometry-column whilst importing and add &polygon=true to the API call.

@red-fenix red-fenix changed the title Add option for polygons in results Add option for polygon/linestring in results Jul 11, 2024
@lonvia
Copy link
Collaborator

lonvia commented Jul 14, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

  • This really needs to be implemented for the OpenSearch version because the ES version of Photon is on its way out. Note that the OpenSearch variant does not use mappings.json. It defines its mapping in https://github.com/komoot/photon/blob/master/app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java.
  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.
  • We already have a field extent which contains the bounding box. The geometry should replace this field, i.e. once full geometries are enabled, do not save extent and derive the extent from the geometry field when returning a result. We need to keep the centroid because it is not necessarily the geometric centroid of the geometry. (Note that extent was missing from the mapping specification so far. So likely it was in fact saved as a text field, which really is an oversight but can't really be fixed right now without creating an incompatible database version.)
  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).
  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.
  • This needs some tests for the import.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

  • Once we have full geometries, we'd want to use them for reverse lookup. See Discussion for PR to improve accuracy of reverse geocoding #357.
  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

@red-fenix
Copy link
Author

red-fenix commented Jul 17, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Thanks. I will update the PR with the changes in this file soon.

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.

Will take this along as well.

  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).

Agreed

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behavior as on import then.

OK. I will make the default to return the polygon when it's available in the index. The option 'polygon=false' will return the centroid instead.

  • This needs some tests for the import.

Will do.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

I have to look into this issue.

  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

I will look into this.
Another thing related to this: when someone is searching for a street Nominatim (and thus Photon) returns a street in parts because they are separate OSM id's. I'm still looking for a method to merge multiple ways (i.e. linestrings) into 1 linestring to make sure the whole street is displayed instead of a part (example)

@lonvia
Copy link
Collaborator

lonvia commented Jul 21, 2024

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

We'll drop ES support completely and go with OpenSearch. Note that the OS version already supports an external OpenSearch cluster. The support is just somewhat rudimentary and HTTP-only.

@karussell
Copy link
Collaborator

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

@red-fenix
Copy link
Author

Would this PR return a Multipolygon in case for queries like "hamburg"? See the nominatim response.

Currently photon returns a slightly confusing, but correct extent as only a single extent is allowed https://photon.komoot.io/api/?q=hamburg ... or maybe we add a new field extents in the response?

Yes, it would return a Polygon like Nominatim does.

@red-fenix
Copy link
Author

red-fenix commented Sep 5, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Done

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.

Done

  • This needs some tests for the import.

Done

I'm looking forward to your response :)

@lonvia
Copy link
Collaborator

lonvia commented Oct 29, 2024

Sorry for the long delay in responding.

I've decided to get a release out first before looking into this again. Because of this there is a minor merge conflict now regarding the dependencies. Also, the opensearch variant doesn't compile because setUpESWithPolygons() is only defined for the elasticsearch variant. Could you quickly fix both issues?

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of more comments after trying this out. The OpenSearch version currently doesn't work at all. It already fails to import.

The output for the ElasticSearch version displays a wrong projection:

...
"geometry": {
  "coordinates":[9.5227103,47.1395576],
  "type":"Point",
  "crs":{"type":"name","properties":{"name":"EPSG:0"}}
}
...

Photon doesn't supply a CRS at all with the geometry. I'd leave it at that.

@red-fenix
Copy link
Author

Sorry for the long delay in responding.

I've decided to get a release out first before looking into this again. Because of this there is a minor merge conflict now regarding the dependencies. Also, the opensearch variant doesn't compile because setUpESWithPolygons() is only defined for the elasticsearch variant. Could you quickly fix both issues?

Will do this weekend.

@red-fenix
Copy link
Author

Couple of more comments after trying this out. The OpenSearch version currently doesn't work at all. It already fails to import.

I've made some adjustments and have been trying to get the OpenSearch version running however, it fails with an error:

Request failed: [mapper_parsing_exception] No handler for type [geo_shape] declared on field [geometry]

I've searching online, but can't seem to find the reason for OpenSearch clearly has geo_shape support.

In other news: I fixed the PR with your suggestions, but will perform some additional tests this weekend.

@lonvia
Copy link
Collaborator

lonvia commented Nov 18, 2024

Request failed: [mapper_parsing_exception] No handler for type [geo_shape] declared on field [geometry]

That's where I gave up my experiments with this branch as well. It's possible that the embedded version of OpenSearch that we are using is missing a dependency needed for geo_shape. Maybe try running against a stand-alone OpenSearch and see if it works then?

@otbutz
Copy link
Contributor

otbutz commented Nov 18, 2024

https://github.com/opensearch-project/geospatial ? Maybe it's enough to add the OpenSearch plugin as a dependency.

@red-fenix
Copy link
Author

red-fenix commented Dec 1, 2024

So, I've found out that there were multiple issues with my build:

  1. The Opensearch implementation is very different from the Elasticsearch implementation (ser/des vs parsing a Searchit), but I've made it work.

  2. The main problem is the opensearch-runner. It doesn't have all Opensearch plugins available and the geo plugin is disabled. I've been trying to add the Opensearch geospatial plugin as either a module or a plugin, but was unable to do so.

Therefore I switched to testcontainers and everything is working-ish. Unit tests for Opensearch work, but due to some small differences with Testcontainers, the Elasticsearch Unit tests don't work... yet! (It's mainly because testcontainer runs a Docker container with an Opensearch instance on random ports and the unit tests have to connect to said random port, although it's possible to expose the default ports).

Is there anyone who can help me add the Geo plugin to the opensearch-runner of would you be willing to switch to testcontainers and docker? If Photon switches to testcontainers, I can make some changes to the Elasticsearch unit tests.
Also, it's probably better to use the testcontainer docker with Photon when the local instance is used.

@otbutz
Copy link
Contributor

otbutz commented Dec 2, 2024

You could open an issue at https://github.com/codelibs/opensearch-runner. Either it's already working or it's a valid feature request.

@lonvia
Copy link
Collaborator

lonvia commented Dec 3, 2024

It needs to work with opensearch-runner because that is also what Photon uses in embedded mode.

From the code you cite it looks like you can supply your own module list by adding .moduleTypes(<comma-separated list of modules>) at

}).build(OpenSearchRunner.newConfigs().basePath(dataDirectory).clusterName(clusterName).numOfNode(1));

@lonvia
Copy link
Collaborator

lonvia commented Dec 4, 2024

With #855 merged you can now simply add the plugin to build.gradle and enable it by adding it to the list in

public static final String OPENSEARCH_MODULES =

@red-fenix
Copy link
Author

With #855 merged you can now simply add the plugin to build.gradle and enable it by adding it to the list in

public static final String OPENSEARCH_MODULES =

Thank you so far for your help. I've been trying to get it working, but failed so far. I've opened a issue with the Codelab Opensearch Runner.

@red-fenix
Copy link
Author

Opensearch has fixed it and I've updated the PR. Both Opensearch tests and Es Embedded are 100% working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants