-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding addendum namespaces config #140
base: master
Are you sure you want to change the base?
Adding addendum namespaces config #140
Conversation
Hi @mansoor-sajjad, It would save you a ton of time because unfortunately there's no way we can merge functionality like this. The You can probably achieve much of what you want with the category field and the If not let us know and perhaps we can come up with something. |
Hei @orangejulius, We would have opened the issue for discussing the problem first, but we have found that this problem has be discussed in multiple issues before, with either no solution or suggestion to use category field. But we thought to come up with the solution this time instead of creating yet another issue. All of the concerns you have mentioned are understandable and we had them in mind before implementing this feature. Concern # 1:
It will remain non-searchable additional metadata, if we don't configure anything. Concern # 2:
We have tried to handle exactly that mess in these PRs. Making sure that the backward compatibility will remain preserved. And no one get affected by this change without explicit configurations. Concern # 3:
This is the choice that the user of this feature have to make. Concern # 4:
Again, extra parameters will not be available by default. Concern # 5:
We are already using categories. |
In our current (v1) of our geocoder we have forked the api project and inject data manually to solve this. This has become difficult to maintain for obvious reasons. Our current effort is to try to solve this within the confines of the upstream projects, with contributions to those projects if necessary. Using "categories" as a dump-site for custom but indexable metadata is a suboptimal hack. I would be very interested to hear how this could be solved in other and cleaner ways. |
Hi @mansoor-sajjad, thanks for putting this together. As Julian mentioned the addendum field was originally designed in such a way that it would be non-searchable, so I'm trying to better understand your usecase in order to consider if reversing that decision is a good idea, or whether there may be an alternative solution to solve your problem.
Can you please explain this a little more so I understand correctly? My understanding is that adding |
Hi @missinglink, Thank for the response. I want to take this opportunity to introduce ourselves also: We are Entur. Entur is owned by the Norwegian Ministry of Transport and is intended to offer basic public transport services within travel planning and ticketing in Norway. Entur connects public transport in Norway, and collaborates with the actors in the public transport sector to achieve simple, sustainable travels. Among other open source services we provide, one of the most important service is Geocoding, which is backed by Pelias. (Thanks to you guys, for making it and have it available for use, and maintaining it like super heroes.) We are importing different kind of data in ES via Pelias. Like registered stop places in Norway, addresses in Norway, Point Of Interests etc As mentioned by @testower in his comment that, in our current (v1) of our geocoder we have forked the pelias-api project and inject data manually in elasticsearch in order to import and query based on the custom data. The category field:
We are using the category field extensively for storing different identifiers based on different type of data whether it be stop places, addresses or POI. And we can filter on it and get the data in the response in feature properties. The clients identify the identifiers in the category field based on the source and layers. Which is nice. We have some additional data, like tariffzone and farezone etc. We would like to filter the results on these in addition to the identifiers we are storing in the category field. We can store the tariffzones and farezones identifiers in the category field also, but we have two problems here: First Problem: If we query and filter on some tariffzone id, we will get the category array in the response in feature properties, but that will then contain the tariffzone identifiers, farezone identifiers and other data identifiers which we store in category field. Second Problem: For example, we are storing the
It will be interpreted as |
Third Problem: |
To give a little more detail to our use case - we have around 60k public transport stop places from Norway's National Stop Registry in our geocoder. These stop places have modality information (e.g. transport mode bus and submode localBus). In addition they belong to one or more tariff zones for ticketing purposes. A consumer may want to search within all bus and tram stops within a certain tariff zone. This will not work if collapsed into the categories filter: categories=bus,tram,RUT:TariffZone:1, because the filter is an OR filter, so we will get all stops which are a bus or tram stop or in the specified tariff zone. Instead what we actually need is all bus OR trams stops that are also in the specified tariff zones, in other words, the filter should be OR within each filter type, but AND across the filter types. We have other similar use cases within other types of entries in our geocoder. So as you can see, solving this using the categories filter is not possible, which is why we have proposed the possibility to add custom indexable data fields. Would you open to discussing a solution to this, that does not involve repurposing the addendum field, but instead adding a new concept that will allow adding custom indexable data? In essence a structured version of the existing categories. |
Isn't it possible to achieve this by changing the way you define your categories? Ie. if the bus stop and the tram stop are both given the category
This can also be resolved by changing the the way you define your categories. ie. rather than only specifying an integer, prefix the 'type', such as
It's not documented, but you can specify an empty list of categories, which will return the categories in the response: |
This can also be resolved by adding additional categories which are the concatenation of two disparate terms (ie. |
👋
Problem statement:
We want to import custom data into pelias index in elasticsearch and want to be able to filter the query results based on those custom data fields like we are able to filter the results based on sources and layers for example.
One solution is to fork the pelias-schema and extend the index mapping with whatever fields we want. But then we need to fork and other pelias services and update them to be able to use those custom data fields. But we really want to be connected with the open source community instead of start maintaining the forks of pelias services ourself. So we implemented the solution by adding some configuration options that can support our use case and probably help others.
Findings:
Pelias schema have the addendum field, where we are able to import whatever data we want. Which is fine but problem is that, firstly its not indexed:
and secondly we can only store JSON data strings into addendum fields if we create the data through
pelias-model
.Solution:
In order to filter the results based on addendum fields, we firstly need to make addendum field indexed and secondly we need to be able to store data in other possible searchable types like string or array and so on.
Following are the changes we have made in order implement the above mentioned solution. We have taken special consideration for backwards compatibility.
pelias-schema Link to pull request
First of all, we have added the configuration option
indexAddendumField
in thepelias-schema
to optionally be able index the addendum namespaces. The default value is set tofalse
, in order to keep it backwards compatible. We have not added this configuration option inpelias-config
, as it is only used inpelias-schema
.pelias-config (This pull request)
We have added the configuration option
addendum_namespaces
in thepelias-config
with the default value of empty object.We can configure the names for the namespaces we want to filter the data on, and type of data those namespaces will hold. The
type
configuration will be useful in sanitization and validation of data for the configuredaddendum_namespaces
while importing and querying.The configuration will look like the following:
We have added the configuration validation and the Joi schema in
pelias-config
to make sure that the configuration is correct.We have restricted the configurable types to only
string
,number
,boolean
andarray
. Because the addendum field in the elasticsearch mapping inpelias-schema
is configured to holdkeyword
type, which cannot store objects.pelias-model Link to pull request
We have updated the
setAddendum
function inDocument.js
to validate the configuredaddendum_namespaces
as configured type, and not always asobject
type, as it was setup before this PR.In addition to that all the configured
addendum_namespaces
will be stored in its configured data type.And all the non-configured namespaces will be validated as object and stored as json string, as it was before this PR. Which basically means that, if we don’t configured
addendum_namespaces
at all, we will not be affected by any change in this PR. So we have preserved the backwards compatibility.We have also added new unit tests for
setAddendum
function, to test the new usecases.pelias-api Link to pull request
The goal her is to be able to filter the results based on configured
addendum_namespaces
directly, without the need to explicitly mention the relationship with the addendum field. And get the result as separate entity i.e. outside the addendum field.For example:
If we have configured the following in the
pelias.json
then we will be able to filter the results based on the configured namespace, as follows
Or if we have multiple
addendum_namespaces
configured like followingthen we will be able to filter the results based on the multiple configured namespaces, as follows
Following are the changes we have made in the pelias-api:
Sanitization
In the
pelias-api
, we have updated thesanitizeAll.js
to find the configuredaddendum_namespaces
in the received parameters and qualify them as agoodParameters
.We have added the new
_addendum.js
sanitizer, which will sanitize values provided for those configuredgoodParameters
based on the configured types.We have updated the sanitizers
autocomplete
,search
,reverse
,structured_geocoding
andnearby
to use the new_addendum
sanitizer.Added the unit test to test the changes.
Query
We have added the new query filter
addendum_namespace_filter.js
, which creates thefilter
query based on the configured types inaddendum_namespaces
We have updated the quires
autocomplete
,search
,reverse
andstructured_geocoding
to use the newaddendum_namespace_filter
Search result
We have updated the
geojsonify.js
to include the configuredaddendum_namespaces
in the response as separate entity not under the addendum field. All non-configured addendum namespaces will be returned as the children of addendum in the result, as it was prior to this PR.For example:
Let's say, we have the following configuration
and we have imported the following addendum data in pelias index
We have the following request
pelias-api
will add the following filters in the ES query:The result will look something like the following: