Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathfinder: Rework templating system #832

Closed
tokebe opened this issue Jul 16, 2024 · 9 comments
Closed

Pathfinder: Rework templating system #832

tokebe opened this issue Jul 16, 2024 · 9 comments
Assignees
Labels
On Test Related changes are deployed to Test server

Comments

@tokebe
Copy link
Member

tokebe commented Jul 16, 2024

Parent issue: #794

In order to better cover the space of possible pathfinder queries (in terms of pinned-node pairings), we need to revisit how Pathfinder acquires templates. Instead of manually-built TRAPI templates, we should instead generate templates on-the-fly based on the node categories presented in the pathfinder query.

Essentially, within the existing framework of the current prototype, we replace the "select templates" step as follows:

Assume two lookup tables:

  • One which takes two node-categories as coordinates, outputting an array of node-categories
  • One which takes two node-categories as coordinates, outputting an array of predicates

These tables would be defined as yaml files (categoryTable.yaml and predicateTable.yaml) in the query_graph_handler data folder, with the following structure:

category:
  category:
    - category/predicate
    - category/predicate
   category:
     - category/predicate
     - category/predicate
category:
  category:
    - category/predicate
# and so on

These tables would then be used to generate templates as specified in these slides

At which point behavior returns to the current implementation of the prototype, with one exception:

When generating final results from the existing template results (see parent issue, current prototype), skip constructing a result for any intermediate node which is not equal to or a descendant of the unpinned node category (as a computational shortcut, if the unpinned node category is biolink:NamedThing, then no need to check, just proceed with making the result).

@tokebe
Copy link
Member Author

tokebe commented Jul 16, 2024

Work on this will first require some example tables for testing, assigning @colleenXu to build these:

  • They don't need to be exhaustive, should only cover content of existing Pathfinder MVP queries
  • See above for where to place/what to name the files
  • Put in a branch based on pathfinder named pathfinder-newsystem

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 23, 2024

Example tables are now in pathfinder-newsystem branch's data folder!

Note: I made these tables thinking of how to construct the existing Pathfinder MVP templates using templates 1 + 2, and not really 3.

Some things to account for @rjawesome @tokebe:

  • changed the predicateTable.yaml format: rather than a list of strings, we're using a list of dictionaries (only key used right now is predicate). This should allow us to add qualifier-constraint sets to a predicate entry later if we want to.
  • Will want Gene/Protein conflation: if Protein is a starting category (one of the user-provided IDs or categories), should add/use Gene.
    • And check: when running template-queries, Gene QNodes get Protein added to them AND vice versa (this should exist as previous conflation behavior)

@rjawesome
Copy link
Contributor

A few questions about the slides
1.
image
Does this mean the edge should have no predicates (ie. "predicates": []) which would automatically make the query fail, or does this mean the edge should not have the predicates key (ie. basically have a predicate of "biolink:related_to")?

2 . If two adjacent nodes (ie. n0 & A) have multiple categories, then should all predicates from all category pairs be merged, or should a new query graph be generated for each category pair (with the predicates in a particular query graph determined by its category pair)?

3 (similar to 2). If n0 & A have multiple categories, then should all possible categories for node B be merged, or should a new query graph be generated for each category pair of n0 & A (with the categories for B in a particular query graph determined by its category pair)?

@tokebe
Copy link
Member Author

tokebe commented Jul 24, 2024

  1. The latter, just remove the predicate key (ignore my earlier edit, I misread your question).
  2. Initially, we were thinking it would just be merged, so one query graph where an edge might have multiple predicates. I don't think this should cause problems with result transformation down the line -- if it does, we should review and possibly end up making separate query graphs as you say.
  3. Merged. If it causes problems, as with 2, we'll review before deciding what to do instead.

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 25, 2024

@rjawesome

After today's discussion with Jackson, we decided to drop "potential qualifier support" for now (it would maybe involve separate QGraph for a predicate+qualifier-set combo).

So I pushed a commit to adjust the predicateTable.yaml format, making it simpler. Can you pull and adjust your code to account for this format change?

@rjawesome
Copy link
Contributor

Done

@colleenXu
Copy link
Collaborator

Updates

(1) Lookup tables were adjusted right before Sept Translator Relay biothings/bte_trapi_query_graph_handler#215.

  • based on testing the new set of 12 example queries without setting intermediate category (link may only be accessible to some Translator folks)
  • NOTES
  • Didn't optimize for setting intermediate category. Jackson did test Mon 9/9 and found that some queries that set intermediate category still completed (works on a basic level)

(2) Hotfixes (direct commits) made during Sept Translator Relay to fix TRAPI validation errors for Pathfinder responses

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 17, 2024

Current discussion topics

  1. Rerun examples/reanalyze now that MyDisease disgenet data has been fixed?
    • During the testing/lookup-table adjustments, I discovered that MyDisease was missing disgenet data. This was fixed yesterday (Slack link).
    • I know that affected the imatinib-asthma example (although the use case may actually be using the allergic asthma ID only...)
  2. Currently testing/deploying Pathfinder dryrun mode: showing what queries are created from the "template-generation" #862
  3. May need some additions to lookup-tables for Cell starting-category support Ongoing Pathfinder improvements #864
  4. Pathfinder scoring isn't meaningful future: Pathfinder scoring #863

@colleenXu colleenXu added On CI -> Test and removed On CI Related changes are deployed to CI server labels Oct 24, 2024
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Oct 28, 2024
@tokebe
Copy link
Member Author

tokebe commented Dec 3, 2024

Related changes deployed to Prod as of 11/13

@tokebe tokebe closed this as completed Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
On Test Related changes are deployed to Test server
Projects
None yet
Development

No branches or pull requests

3 participants