Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of BETA BCO Ranking systems #329

Closed
HadleyKing opened this issue Jun 17, 2024 · 4 comments
Closed

Implementation of BETA BCO Ranking systems #329

HadleyKing opened this issue Jun 17, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@HadleyKing
Copy link
Collaborator

Implement the ideas from #328 into the BCO Scoring function:

def bco_score(bco_instance: Bco) -> Bco:
"""BCO Score
Process and score BioCompute Objects (BCOs).
"""
contents = bco_instance.contents
if "usability_domain" not in contents:
bco_instance.score = 0
return bco_instance
try:
usability_domain_length = sum(len(s) for s in contents['usability_domain'])
score = {"usability_domain_length": usability_domain_length}
except TypeError:
score = {"usability_domain_length": 0}
usability_domain_length = 0
bco_instance.score = usability_domain_length
return bco_instance

@HadleyKing HadleyKing added the enhancement New feature or request label Jun 17, 2024
@HadleyKing HadleyKing added this to the 24.06.27 milestone Jun 17, 2024
@Kirans0615
Copy link
Collaborator

@seankim658
Copy link
Member

Spoke with Hadley and we had some ideas for the representation of the scores in the data model. I've been implementing scores in the biomarker project for scoring "trustworthy" biomarkers and a few things that we've done that have made things easier to track are:

  1. Have some sort of internal versioning for the scores. The scoring is an iterative process that changes over time and as it changes having some sort of way to delineate which scores come from which version formula is very helpful.
  2. When calculating the scores, create an object with the formula breakdown. This can be used both internally when investigating scores and on the frontend to show users how the score was calculated/where the weights are coming from. The biomarker project has that information in our data schema and returns it on API requests. It looks like this:
{
  "score": 3.4,
    "score_info": {
      "contributions": [
        {
          "c": "first_pmid",
          "w": 1,
          "f": 1
        },
        {
          "c": "other_pmid",
          "w": 0.2,
          "f": 7
        },
        {
          "c": "first_source",
          "w": 1,
          "f": 1
        },
        {
          "c": "other_source",
          "w": 0.1,
          "f": 0
        },
        {
          "c": "generic_condition_pen",
          "w": -4,
          "f": 0
        },
        {
          "c": "loinc",
          "w": 1,
          "f": 0
        }
      ],
      "formula": "sum(w*f)",
      "variables": {
        "w": "weight",
        "c": "condition",
        "f": "frequency"
      }
    }
}

This shows that the score was calculated by the sum of the weights times the frequencies. For example, having one PMID associated with the biomarker is a weight of 1. Additional PMIDs get a weight of 0.2, and so on. So the calculation for this score was 1(1) + 0.2(7) + 1(1).

@tiwa1154
Copy link
Contributor

Write a FAQ on how the ranking system works, criteria, etc?

@tiwa1154
Copy link
Contributor

tiwa1154 commented Oct 2, 2024

FAQ created in #446

@tiwa1154 tiwa1154 closed this as completed Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants