-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Olinda #3
Comments
I have managed to get the pipeline working for both the example in the notebook and for an Ersilia model.
After a lot of trial and error, I could workaround it by writing over the first model instantiation but this is maybe something to ask Ank if he has any insight for.
|
I've implemented a few quality-of-life features:
|
This is fantastic @JHlozek . @GemmaTuron I am happy with merging the PR (if we haven't yet). Perhaps let's just tag the current version (i.e. the one developed by @leoank ) and then we build on top of this with the new additions, mostly focused on ZairaChem integration. |
I've named v1 Ank's version so we can move on with merging the new code when it is ready |
Let me know if you need any help with this! |
I have updated Olinda to be able to specify the training set size (not just 100 or everything anymore) and worked through a bug around usage of the MorganFingerprints. I've also updated the demo notebook accordingly. As discussed, I'll update the readme and the create the PR before moving onto ZairaChem. Thanks, @leoank. I encountered a confusing issue during the final model training step which I'm hoping you may have some insight. I'll create a separate issue with some more information and tag you. |
Hi @miquelduranfrigola, I've created the pull request for Olinda: #5 I'm going to move onto the ZairaChem model implementation with 1k pre-calculated descriptors. |
Hi @JHlozek |
Thanks @GemmaTuron and @JHlozek 👌 |
test this notebook: https://github.com/ersilia-os/olinda/tree/main/notebooks |
Hi @JHlozek I am unable to install Olinda, the problem seems the Lap package but I've tried several options and I have not succeded. I've also revised the dependencies specified in pyproject and there are a few things that should be probably updated:
|
Hi, I am still running into issues with Keras and Tensorflow (see attached log) olinda.log Thanks! |
Hi @GemmaTuron, Here is the conda env list for my Olinda_only install and then with Zairachem (where I am now doing all the development): To the previous points above:
|
Hi @JHlozek That is very helpful. I have updated the pyproject.toml:
When I try to run your demo notebook, though, I run into issues with Keras (apparently, needs keras v3 to by pass this error: |
Hi @JHlozek Good news, I have been able to figure out the dependency hell. The latest problem was being caused by keras-nlp, which was installing the latest version as it was not specified. v0.14 only works with Keras v3, hence the clash. I have specified it in the pyproject.toml file as keras-nlp=0.12.0 and it now works fine. Regarding the actual testing of olinda, I am able to run the demo notebook ONLY if I remove the Thanks Gemma |
Hi @GemmaTuron That is good news. I'm guessing keras-nlp has been updated in the last month but my laptop had a older version cached that it was drawing on. For the cleaning issue, that is a bug. I noticed it when I started implementing ZairaChem compatibility so I've fixed the original code in a later commit. Maybe next week we aim to test Olinda with ZairaChem on your side? I'll let you know when the code is more stable after I've implemented the sample_weights. I'm hoping this might also help prevent the model collapsing to the mean, otherwise I'll open the separate issue for further input. |
Hey @JHlozek, There were some issues with RDKit on Apple M1 chips when I started the project. |
Thanks @leoank this is good to know. Generally, pip installation of rdkit works in all platforms now. |
Ok @JHlozek , sounds good! Maybe we need to figure out how to start collaborating more closely, the changes you are making in your branch should probably be merged soon into the main code so we work from the same one. Let's see if we manage to discuss this next week. |
Hello @JHlozek What is the status of this? When will the code be merged into the main branch? |
Olinda is almost ready, the next steps to close this as a production-ready tool:
|
I have updated the README file for Olinda. I think we should move the more advanced customization section somewhere else, perhaps the gitbook? For now it is in olinda_customization.md To test the pipeline: install zairachem from the JHlozek fork, train a fresh ZairaChem model, then follow the 'Usage' steps in the README to distill it. Ongoing work:
For each of these setups, I've varied the number of ChEMBL reference training points (1k, 10k, 50k, 100k) and how the training scores are weighted for Olinda (unweighted, weighting_by_predicted_class, weighting_by_zaira_vs_chembl_origin, combined_weighting_of_class_and_origin) The above testing is complete for the high data scenario of the H3D plasmodium_NF54 model at a 0.5 uM cutoff. I'm working on condensing the metrics to a single plot and I'm now testing the same configs for a low-data caco model. This data will be used to inform the choice of the default Olinda setup, which will then be benchmarked on TDC datasets. |
Thanks @JHlozek! |
Hi @JHlozek I have tried from your forks but I am basically stuck at installing ZairaChem. I recall you saying you made some edits and it works for you, so please could you look at this issue? Thanks! |
Hi @GemmaTuron I've updated the ZairaChem issue you linked. I had the same problem with AutoGluon but I managed to find a temporary workaround for now. I'm able to install my updated ZairaChem fork into both a Rocky Linux H3D workstation and my own Ubuntu laptop and run the fit/predict commands. |
Hi @JHlozek I get this message when I follow the instructions in your olinda fork:
I am trying to distill a zairachem model I just trained, using the same zairachem environment that in principle has olinda installed as they should work together:
let me know what am I doing wrong if you have found this before! thx |
Hi @GemmaTuron Unfortunately, it doesn't look familiar. :/ The only thing I can think of is that maybe our requirements.txt might be different if you're not installing directly from my ZairaChem fork? |
Hi @JHlozek When I try Olinda in an isolated environment this happens: indeed there seems to be a circular import between Olinda and ZairaChem which needs to be fixed. Ideally, Zairachem would install Olinda, but Olinda itself should remain independent from ZairaChem, as we want to distill all kinds of models, not only ZairaChem ones. Let's chat about this now
|
Hi, as agreed, @GemmaTuron and I will look into it and report back. |
ok the circular import was my fault, I have solved it. I cannot run the pipeline because my models do not have Mordred descriptors (a whole another realm of reasons) and Olinda does not have modularity, which it should (see #12 ). On my end I am working to solve the Mordred issue meanwhile |
Update here: @JHlozek has improved Olinda's code which now checks which descriptors are being calculated. We will merge @JHlozek fork and take it from there to solve dependency issues with ZairaChem and Ersilia. I will close this issue which is too general and open issues as I start working on the code if there is need. |
Let's test Olinda as it was originally developed and see what modifications we might need
The text was updated successfully, but these errors were encountered: