Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JCIM workflows #4

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dominiquesydow
Copy link

@dominiquesydow dominiquesydow commented Apr 19, 2022

Description

Update JCIM workflows (https://doi.org/10.1021/acs.jcim.6b00686); using KNIME 4.5.0.

Todos

Start updates for full workflows, then copy-paste changes to small workflows (which are a collapsed version of the full workflows thanks to meta-nodes).

See track changes below.

  • GCRPDB_example (full/small)
  • GPCR-kinase --- NOTE: Uses old KRIPO fragment library dataset
  • Chemdb4VS (full/small) --- NOTE: Update small stand-alone node sequences if we can get some dummy dummy scores.txt and decoy.smi files
  • KLIFS_example (full/small)

Not covered in this PR

  • KRIPO_bioisosteric_replacement_workflow (full/small) --- FIXME: Installation fails for KripoDB nodes ("Fragment information" and "Similar fragments") with "Cannot complete the request. See the error log for details."; cannot see logs in KNIME Console
  • SyGMa-example (full/small) --- FIXME: Using installed node "SyGMa Metabolites" fails with "The selected node could not be created due to the following reason: org/knime/python2/config/PythonCommandFlowVariableConfig"

Status

  • Ready to go

@dominiquesydow
Copy link
Author

Track changes for "GCRPDB_example"

  • Replace “JSON to Table” node to fix problem with multiple ligands per structure
    • Problem: This node splits multiple ligands per structure in a single row by duplicating ligand-related columns (name, type, function).
      • This was not a problem with last execution 4 years ago, since all structures had only one ligand but now we have 3 structures with 2 ligands: 5D7D, 6N48, and 6OBA
      • Now, this causes problems with the next “Joiner” node, where we are joining structural data from the “GPCRdb Structures of a protein” node with IFPs from the “GPCRdb Structure-ligand interactions” node based on the PDB code and the ligand name (remember, we have currently multiple columns with ligand names)
    • Expected behaviour: We want to split multiple ligands per structure into multiple rows (thus, keeping only one column each for name, type, and function).
    • Solution: Replace the “JSON to Table” node with a sequence of
      • “JSON Path” - Set paths to the ligands' name, type, and function (columns are now called “Ligand name”, “Ligand type”, “Ligand function”)
      • “Ungroup nodes” - Split multiple ligands per structure into individual rows
      • “Column Filter” – Drop JSON column
      • Ideas taken from here and here (look for the posted json.knar file)
  • Update deprecated “Joiner” node (GUI changed; transfer previous settings); FYI: Current execution results in a 1174x16 table

@dominiquesydow
Copy link
Author

dominiquesydow commented Apr 20, 2022

Track changes for "GPCR-kinase"

  • NOTE: KRIPO fragment library based on PDB dataset 2016-06-29 (?)
  • Update deprecated “Cell Splitter” and “Joiner” nodes
  • FYI: Final table has 2268 entries (before 1576 entries)

@dominiquesydow
Copy link
Author

dominiquesydow commented Apr 20, 2022

Track changes for "Chemdb4VS"

  • Replace broken ChEMBL node with Albert’s “ChEMBL – map UniProt to ChEMBL” meta-node
  • Replace broken ChEMBL node with Albert’s “ChEMBL – bioactivities from target ID” meta-node
  • Replace “calculate pAct” meta-node with “Extract pChEMBL values”, which extracts pChEMBL values for exact measures in nM; sets inactive compounds to 0
  • Replace broken ChEMBL node with Albert’s “ChEMBL – compound information” meta-node
  • Add new node: “RDKit Salt Stripper” to strip molecules off salts;
  • Split molecules with bioactivity data into active/inactive at “Numeric Row Splitter”: For both sets, rename column “canonical_smiles” to “SmilesValue” and continue to work with the latter (renaming useful for RDKit).
  • Added “Lipinski’s Rule-of-Five” node before Rule-of-5 filter (not sure if Ro5 was previously part of ChEMBL’s output)
  • Added “Molecular Properties” node to calculate MW before MW>100 (not sure if MW was previously part of ChEMBL’s output)
  • TODO: No dummy data available for stand-alone node sequences (thus not properly updated) that
    • reads docking scores (scores.txt) and plots ROC curves
    • reads in decoy SMILES (decoy.smi)
  • NOTE: MarvinView needs license (ChemAxon)

@dominiquesydow
Copy link
Author

Track changes for "KLIFS_example"

  • Update deprecated nodes “Joiner”
  • TODO: Could not test (and if needed update) the bar chart functionality, yet, due to missing R dependency; tried to use installed “Bar Chart” node but could not reproduce plot, so stick to R snippet

@dominiquesydow
Copy link
Author

@AJK-dev could you please take a look if this PR is ok to go from your side? There are two workflows that I could not cover so far; to be covered in another PR maybe?

@blindpikachu
Copy link

with regards to the KRIPO_bioisosteric_replacement_workflow. Is there a script/workflow to create the kripodb from scratch? otherwise, how would you suggest to fix this workflow ?

@sverhoeven
Copy link
Member

Steps to create the kripodb are at https://github.com/3D-e-Chem/kripodb/blob/master/docs/baseline-update.rst

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants