-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unrecognized atom type #255
Comments
Maybe it is related to this: rdkit/rdkit#6365 but I'm currently using the latest RDKIT so it should have been fixed. |
I screened 100000 structures from a focussed library from a Panther/ShaEP VS, on Unimol docking V2. I had a hard time with rescoring the resuts as 650 poses either had "nan" as coordinates or were out the binding pocket. So I made a script to clean the results before rescoring. Maybe this is coming from the problem I repported (UFFTYPER: Unrecognized atom type: S_5+6)? Do you know how to correct this problem? Thanks |
It looks like there is an issue with RDKit when loading the file. Could you provide a file that produces this error? We can test it further. |
Thank you v much for helping with this. The library is from the top 1% scores from a Panther/ShaEP VS. My cleaning script flagged 670 poses of around 100k minus all the poses not generated because of the valence problem. For RDKit, I tried the version suggested on your read.me file and also the latest version. Updating to the latest version did not solve the problem. Best, |
Hi, |
Sorry for the delayed response. Regarding the bug in RDKit, it seems that the bug mentioned in the original issue still exists. I am using an almost up-to-date version (2024.3.1, installed via pip), but when I run the example code from the issue:
The output is:
I also ran the example file you provided. The command I used is as follows:
There was no
|
Thank you very much for running some tests with my files. Many docking poses are missing/rejected from the screen because of the "Unrecognized atom type error", of poses without coordinates and molecules docked outside the binding pocket; so I'm really interested in resolving this problem. I'll try RDKit 2024.3.1, and investigate the "is tagged as 2D" message. Hopefully it will solve the "Unrecognized atom type: S_6+6 (0)" problem. Best, |
I encountered the same problem. |
I still have to solve that one... I'll try the problematic files with different versions of RDKit and I'll let you know if one works better.. If not, I could always try to sanitize the problematic files. I created a bash script to identify and remove the problematic poses/files, post-screening. Just to give you an idea for one of my screen: Ligands_Focused-library: 61235 (input files) so if I compare the number of files generated during screening to the number of files screened, the number is the same (missing=0). However, my script end up removing 446 files. To select the files, I extracted the 10th line in the sdf which should contain details about 1 atom. If this line contains "nan" instead of coordinates, the file is removed from the Poses folder, it is also the case if this atom is outside the binding pocket (+ a little buffer) as defined in the json file or if the coordinates make no sense ...molecule nowhere near the receptor (no-coordinates). |
Hi,
With some molecules I get (Unimol Docking V2):
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/CHEMBL-3740791-1.sdf-Cc1ccnc(N(CCC(=O)[O-])C(=O)c2ccc3c(c2)nc(CNc2ccc(C(N)=[NH2+])cc2F)n3C)c1-RMSD:173.775
[02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0)
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/Enamine-Z3019139935-2.sdf-Cc1cc(N2CCC(O)(C[NH+]3CCOCC3)CC2)nc(N(C)c2ccccc2)[nH+]1-RMSD:171.117
3%|█▎ | 63/1959 [01:50<50:00, 1.58s/it][02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:57] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/ChemDiv-V014-0652-1.sdf-CC(C)CCN(CC(=O)Nc1cc(C(C)(C)C)nn1-c1ccc(Cl)cc1)C(=O)C(C)(C)CCl-RMSD:173.7905
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
3%|█▎ | 64/1959 [01:54<1:02:15, 1.97s/it][02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
It does it even if I use the latest version of RDKIT.
The text was updated successfully, but these errors were encountered: