Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented decode->flatten->filter->unflatten->encode enhancement #8

Merged
merged 4 commits into from
Jul 12, 2021

Conversation

aianta
Copy link
Collaborator

@aianta aianta commented Jul 7, 2021

During filtering the pkg_dict data structure now goes through the following process:

  1. Decode Stringified JSON Object/List values
  2. Flatten the decoded structure
  3. Filter the flattened structure
  4. Unflatten the filtered structure
  5. Stringify expected JSON Object/List values

This process allows finer grained filtering of metadata values. Before this update, the plugin was capable of filtering 66 values, with this enhancement, it can filter 89. The new roster of filterable metadata fields is as follows, and can also be found in the vitality_docs repo.

"author",
"author_email",
"bbox-east-long",
"bbox-north-lat",
"bbox-south-lat",
"bbox-west-long",
"cited-responsible-party",
"creator_user_id",
"dataset-reference-date",
"eov",
"extras",
"frequency-of-update",
"groups",
"id",
"isopen",
"keywords/en",
"keywords/fr",
"license_id",
"license_title",
"license_url",
"maintainer",
"maintainer_email",
"metadata_created",
"metadata_modified",
"metadata-language",
"metadata-point-of-contact/contact-info_email",
"metadata-point-of-contact/contact-info_online-resource_application-profile",
"metadata-point-of-contact/contact-info_online-resource_description",
"metadata-point-of-contact/contact-info_online-resource_function",
"metadata-point-of-contact/contact-info_online-resource_name",
"metadata-point-of-contact/contact-info_online-resource_protocol",
"metadata-point-of-contact/contact-info_online-resource_protocol-request",
"metadata-point-of-contact/contact-info_online-resource_url",
"metadata-point-of-contact/individual-name",
"metadata-point-of-contact/organisation-name",
"metadata-point-of-contact/position-name",
"metadata-point-of-contact/role",
"metadata-reference-date",
"name",
"notes/en",
"notes/fr",
"notes_translated/en",
"notes_translated/fr",
"num_resources",
"num_tags",
"organization/approval_status",
"organization/created",
"organization/description",
"organization/description_translated/en",
"organization/description_translated/fr",
"organization/id",
"organization/image_url",
"organization/image_url_translated/en",
"organization/image_url_translated/fr",
"organization/is_organization",
"organization/name"
"organization/revision_id",
"organization/state",
"organization/title",
"organization/title_translated/en",
"organization/title_translated/fr",
"organization/type",
"owner_org",
"private",
"progress",
"relationships_as_object",
"relationships_as_subject",
"resources",
"resource-type",
"revision_id",
"spatial/coordinates",
"spatial/type",
"state",
"tags",
"temporal-extent/begin",
"temporal-extent/end",
"title",
"title_translated/en",
"title_translated/fr",
"tracking_summary/recent",
"tracking_summary/total",
"type",
"unique-resource-identifier-full/authority",
"unique-resource-identifier-full/code",
"unique-resource-identifier-full/code-space",
"unique-resource-identifier-full/version",
"url",
"vertical-extent",
"xml_location_url",

Care must still be taken when filtering some of the new fields to ensure that the cioos-theme is able to gracefully handle their absence.

Additionally, certain stringified lists of json objects still contain 'unfilterable' fields as mentioned in issue 7.

The ckan library flatten and unflatten functions: ckan.lib.navl.dictization_functions.flatten_dict(data_dict) and ckan.lib.navl.dictization_functions.unflatten(translated_flattened) reduce keys into tuples which don't really lend themselves as nicely for our filtering needs as the path reducer from the flatten_dict python package that has been used here.

It is interesting to note however that the ckan flatten and unflatten functions do appear to do a better job of handling list values, where as the flatten_dict plugin has issues unflattening such values. If we're ever required to dive into fixing issue 7, perhaps we can use the ckan flatten function to handle the list values after applying the flatten_dict flatten function first.

An issue exists on the flatten_dict github repo regarding unflattening list values, see issue-8.

Returns a list of keys whose values should be strigified json objects
"""
return [
"metadata-point-of-contact",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually it would be good to move these to constants.

# UNFLATTEN filtered dictionary
unflattened = unflatten(flattened, splitter='path')

# STRIGIFY required json fields
Copy link
Contributor

@greebie greebie Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STRINGIFY

@aianta aianta merged commit 2a48687 into development Jul 12, 2021
@JaredMclellan JaredMclellan deleted the improved-filtering branch October 27, 2021 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants