Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdb_tidy removes the TER record between chains and removes last ENDMDL in a multi-model PDB #155

Open
rvhonorato opened this issue Mar 27, 2023 · 8 comments
Assignees
Labels

Comments

@rvhonorato
Copy link
Member

Describe the bug
pdb_tidy removes the TER record between chains and removes last ENDMDL in a multi-model PDB.

To Reproduce

  1. test.pdb
MODEL        1
ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N  
TER       2      THR A   1
ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N  
TER       4      THR B   1
ENDMDL
MODEL        2
ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C  
TER       2      ARG A  10
ATOM      3   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C  
TER       4      TPO B 197
HETATM    5    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N  
TER       6      TPO B 197
ENDMDL
  1. pdb_tidy test.pdb > tidy.pdb
$ cat tidy.pdb
MODEL        1
ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
TER       4      THR B   1
ENDMDL
MODEL        2
ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
TER       2      ARG A  10
ATOM      4   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
TER       5      ARG B  10
HETATM    7    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
END
diff test.pdb tidy.pdb
1,14c1,12
< MODEL        1
< ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
< TER       2      THR A   1
< ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
< TER       4      THR B   1
< ENDMDL
< MODEL        2
< ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
< TER       2      ARG A  10
< ATOM      3   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
< TER       4      TPO B 197
< HETATM    5    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
< TER       6      TPO B 197
< ENDMDL
---
> MODEL        1
> ATOM      1    N THR A   1      17.047  14.099   3.625  1.00 13.79       N
> ATOM      3    N THR B   1      11.047  11.099  11.625  0.00  0.00       N
> TER       4      THR B   1
> ENDMDL
> MODEL        2
> ATOM      1   CA ARG A  10       8.496   4.609   8.837  1.00  3.38       C
> TER       2      ARG A  10
> ATOM      4   CA ARG B  10      22.496  22.609  22.837  1.00  3.38       C
> TER       5      ARG B  10
> HETATM    7    N TPO B 197      21.891   2.133 -14.748  1.00 38.81       N
> END

Expected behavior

The TER records between the chains should be kept and the last ENDMDL kept

Desktop (please complete the following information):

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy
$ python --version
Python 3.11.2
$ pip show pdb-tools
Name: pdb-tools
Version: 2.5.0
Summary: A swiss army knife for PDB files.
Home-page: http://bonvinlab.org/pdb-tools
Author: Joao Rodrigues
Author-email: [email protected]
License: Apache Software License, version 2
Location: /home/rodrigo/.pyenv/versions/3.11.2/lib/python3.11/site-packages
Requires:
Required-by:
@rvhonorato rvhonorato added the bug label Mar 27, 2023
@joaomcteixeira
Copy link
Member

note:

If we repeat the first line to simulate having two atoms before the first TER, the TER is not removed.

The same does not happen with the HETATM entry.

@JoaoRodrigues JoaoRodrigues self-assigned this Mar 27, 2023
@JoaoRodrigues
Copy link
Member

Thanks for the report @rvhonorato, we'll have a look.

@rvhonorato
Copy link
Member Author

This is probably an edge case since the test pdb is not realistic and it works for "real" structures - anyway could be an indicative of some underlying issue.

Let me know if there's anyway I can help

@JoaoRodrigues
Copy link
Member

I had a look at the format specification and it seems to hint that TER statements do not apply after HETATM. Only at the terminus of a (linked) chain. Checking a couple of random PDBs does reinforce that:

@rvhonorato
Copy link
Member Author

rvhonorato commented Mar 28, 2023

Its indeed not very clear, looking at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER

Every chain of ATOM/HETATM records presented on SEQRES records is terminated with a TER record.

and https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html

indicates the end of a chain of residues. For example, a hemoglobin molecule consists of four subunit chains that are not connected. TER indicates the end of a chain and prevents the display of a connection to the next chain.

And deeper into the SEQRES record: https://www.wwpdb.org/documentation/file-format-content/format33/sect3.html#SEQRES

SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.

So that seems to imply to me that there is some relation between TER and SEQRES. Since the pdbs might not have this SEQRES to pull the limits from, its probably ok follow the convention of always having TER between chains of ATOM and additionally a TER between chain breaks (non-continuous numbering in ATOM) using the strict options, which I think already exists, right?

@amjjbonvin
Copy link
Member

amjjbonvin commented Mar 28, 2023 via email

@rvhonorato
Copy link
Member Author

Software that interprets the PDB format should cross-relate the TER records and the SEQRES to decide if its the true break or not - but its unlikely that this behaviour covers PDBs obtained from non-experimental methods, in that case (older) tools might just indeed assume its the OXT.

+1 for less TER in the sake of compability - but still the bug above is still relevant

@amjjbonvin
Copy link
Member

Any news on this? Is it still relevant or implemented already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants