Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept handling #241

Open
toddroper opened this issue Jun 9, 2023 · 4 comments
Open

Concept handling #241

toddroper opened this issue Jun 9, 2023 · 4 comments
Assignees
Labels
backlog enhancement New feature or request

Comments

@toddroper
Copy link
Contributor

Currently we only pull concepts from mapped locations inside our data objects. We need to abstract this and search the object for any instance of a concept so it can be properly saved to the DB.

@toddroper toddroper added enhancement New feature or request technical debt labels Jun 20, 2023
@toddroper toddroper self-assigned this Jun 20, 2023
@brandomr
Copy link
Collaborator

For faceted search, we should consider whether we can just use the objects in ES directly and get rid of the concepts table in postgres.

Since Terarium HMI may do the name lookups on the concepts from the HMI directly (not using TDS) it's less obvious to me that we need to do the concept name caching.

When the dust settles later this week we should discuss with them.

@brandomr brandomr changed the title Update Concept Extraction for all Entities Concept handling Jul 28, 2023
@brandomr
Copy link
Collaborator

brandomr commented Jul 28, 2023

Updating this post-hackathon: there's currently no concept based faceted search in Terarium and I'm not sure if/when it will be part of the platform.

As far as I can tell concepts will only show up as metadata on model variables/parameters and on dataset features. They might come up as part of paper extractions but I assume we only care about them with respect to the models extracted from those papers.

For datasets, we get the concepts curies + concept names from the MIT extractions and format them into columns aspect of the dataset metadata. The HMI already parses this correctly and AFAIK we are not storing these concepts into Postgres (@toddroper is that correct?)

Screenshot 2023-07-27 at 8 20 16 PM

Models: we pull the curies out of models and look them up in the DKG; we then store them back to the concepts table. As we add new frameworks (e.g. PDEs or ABMs) we'll have to update our search over the model to find the concepts.

For both of these we should confirm how to treat these going forward but can leave this in the parking lot for now.

@toddroper
Copy link
Contributor Author

I don't believe we save the dataset concepts in Postgres but would need to double check that. We might still need the model concept extraction to be updated separate of this issue.

@brandomr
Copy link
Collaborator

Ok that matches my understanding--we don't need to do lookups on the datasets since MIT does it for us. I agree the model concept extractions will have to be revisited since the model extraction schema changed multiple times in the last few weeks. Good for discussion tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants