-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drop v2g
and reimplement distance features
#771
Conversation
…us/L2GGoldStandard instance
…stance_from_tss`
The new features are definitely a step forward to make it more modular and easy to interpret. |
src/gentropy/dataset/l2g_feature.py
Outdated
.groupBy("studyLocusId", "geneId") | ||
.agg(agg_expr.alias(feature_name)) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using weighted distances as the feature now? instead of the normalised distance score between 0-1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to weight them earlier as well. The feature itself is not normalised until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice, good bye v2g, just need to make sure we are happy with the weighted distance features. x
@@ -74,32 +71,3 @@ def from_source( | |||
source_class = source_to_class[source_name] | |||
data = source_class.read(spark, source_path) # type: ignore | |||
return source_class.parse(data, gene_index, lift) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, what is the plan with the interval dataclass? Do we plan to have a separate DAG to process these and save them somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good!
✨ Context
This PR closes 3434 and 3258
The definition of the features have been implemented as agreed with @addramir and @xyg123*. The spreadsheet that documents them has been updated accordingly https://docs.google.com/spreadsheets/d/1wUs1AprRCCGItZmgDhc1fF5BtwCSosdzFv4NQ8V6Dtg/edit?gid=452826388#gid=452826388
🛠 What does this PR implement
🙈 Missing
🚦 Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?