You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Collection and Upload index procedure is very different from other entity types, similar to each other though.
Collection.datasets and Upload.datasets are both generated by on_read_trigger. This can be time-consuming when a collection has lots datasets. For instance, 3ae4ddfc175d768af5526a010bfe95aa has 211 datasets, the GET request takes 8 seconds to generate a 3.6MB payload.
Collection:
Rename Collection.dataset_uuids (currently used by POST and PUT methods) to Collection.member_uuids (Ingest Portal will need to use this new field for Generic UI to create Collections ingest-ui#1377). Also update the trigger method to use this new field.
Remove Collection.datasets. And add a new on_read_trigger and on_index_trigger (used by the specialized/documents/<id> endpoint) to the same Collection.member_uuids so we'll only return a list of uuids (requires to update the neo4j query and corresponding search-api) for the GET call. Also mark it as indexed: true.
Upload:
Do NOT change or rename any of the existing fields.
Add a new field Upload.dataset_uuids with on_read_trigger and on_index_trigger to only return a list of uuids. Also mark it as indexed: true. We'll use this Upload.dataset_uuids to replace Upload.dataset_uuids_to_link and Upload.datasets in the search index procedure, for now. Will require other teams to make this switch LATER.
The text was updated successfully, but these errors were encountered:
yuanzhou
changed the title
Additional efficiency improvement targeting Collection and Upload
Efficiency improvement targeting Collection and Upload
Apr 15, 2024
Collection and Upload index procedure is very different from other entity types, similar to each other though.
Collection.datasets
andUpload.datasets
are both generated byon_read_trigger
. This can be time-consuming when a collection has lots datasets. For instance,3ae4ddfc175d768af5526a010bfe95aa
has 211 datasets, the GET request takes 8 seconds to generate a 3.6MB payload.Collection:
Collection.dataset_uuids
(currently used by POST and PUT methods) toCollection.member_uuids
(Ingest Portal will need to use this new field for Generic UI to create Collections ingest-ui#1377). Also update the trigger method to use this new field.Collection.datasets
. And add a newon_read_trigger
andon_index_trigger
(used by the specialized/documents/<id>
endpoint) to the sameCollection.member_uuids
so we'll only return a list of uuids (requires to update the neo4j query and corresponding search-api) for the GET call. Also mark it asindexed: true
.Upload:
Upload.dataset_uuids
withon_read_trigger
andon_index_trigger
to only return a list of uuids. Also mark it asindexed: true
. We'll use thisUpload.dataset_uuids
to replaceUpload.dataset_uuids_to_link
andUpload.datasets
in the search index procedure, for now. Will require other teams to make this switch LATER.The text was updated successfully, but these errors were encountered: