wild type sequence for stability dataset #131

lzhangUT · 2022-08-15T03:21:47Z

Hi,
I am interested in using you tape data for my deep learning, but we would like to generate pdb file from the wild type sequence first. we sort of figured out the wild type sequence for fluroscent data is the one with num_mutation=0, but couldn't figure out the wild type sequence for stability score, as we looked into it, there were a few of them with stability_score =1.
would you mind share with me the wild type sequence for the stabililty_score?
Thank you.

agitter · 2023-01-17T17:39:22Z

I've been looking into the original data files from the Rocklin 2017 stability paper, and the wild type sequences in the saturation mutagenesis (ssm2) experiment are clearly indicated there. Their entries in the name column of the Rocklin file are

EEHEE_rd3_0037.pdb
EEHEE_rd3_1498.pdb
EEHEE_rd3_1702.pdb
EEHEE_rd3_1716.pdb
EHEE_0882.pdb
EHEE_rd2_0005.pdb
EHEE_rd3_0015.pdb
HEEH_rd2_0779.pdb
HEEH_rd3_0223.pdb
HEEH_rd3_0726.pdb
HEEH_rd3_0872.pdb
HHH_0142.pdb
HHH_rd2_0134.pdb
HHH_rd3_0138.pdb
Pin1
hYAP65
villin

If you want to find them in stability_test.json in the TAPE data, match those to the id entries or look for id entries that don't have an additional underscore specifying the mutation. For instance, EEHEE_rd3_0037.pdb is a wild type instance but EEHEE_rd3_0037.pdb_A19D is a mutation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wild type sequence for stability dataset #131

wild type sequence for stability dataset #131

lzhangUT commented Aug 15, 2022

agitter commented Jan 17, 2023

wild type sequence for stability dataset #131

wild type sequence for stability dataset #131

Comments

lzhangUT commented Aug 15, 2022

agitter commented Jan 17, 2023