Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐈 Task: REINVENT models reformat output #1246

Open
miquelduranfrigola opened this issue Aug 28, 2024 · 3 comments
Open

🐈 Task: REINVENT models reformat output #1246

miquelduranfrigola opened this issue Aug 28, 2024 · 3 comments
Assignees
Labels
chemsampler on-hold Interesting issue that we deprioritize

Comments

@miquelduranfrigola
Copy link
Member

Summary

Some (or all of) the REINVENT models in the Ersilia Model Hub have an unconventional output in JSON format, mainly because there is an outcome header in the service.py file. We need to give the output in tabular format and fill in the missing gaps with None.

Also importantly, some of the returned SMILES are labelled for some reason. We want to get rid of this labeling plus, ideally, we want to standardise the smiles and return a unique set (perhaps ordered by tanimoto similarity).

In summary, we need to work a little bit more on these models to have a more standard output.

Objective(s)

A more standard output (tabular format) for the REINVENT models.

Documentation

Here is how we can remove atom labels and standardise using RDKit and the standardiser library:

from rdkit import Chem
from standardiser import standardise

def remove_atom_map_labels(smiles):
    mol = Chem.MolFromSmiles(smiles)
    for atom in mol.GetAtoms():
        atom.SetAtomMapNum(0)
    return Chem.MolToSmiles(mol)
    
def standardise_mol(mol):
    try:
        mol = standardise.run(mol)
        return mol
    except:
        return None
@GemmaTuron
Copy link
Member

Hi @miquelduranfrigola

Did you work on this for the workshop in Ghana? If not, should we?

@miquelduranfrigola
Copy link
Member Author

I worked on this partially and I solved it to make it work for the workshop. I did not close the issue because we need to test it with every REINVENT model to be 100% sure. What priority should we give to it?

@GemmaTuron
Copy link
Member

I would do it in the next Chem Sampler sprint, I am marking it with the tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chemsampler on-hold Interesting issue that we deprioritize
Projects
Status: On Hold
Development

No branches or pull requests

3 participants