Skip to content

Experimental Phasing of Proteinase K

biochem_fan edited this page Jul 29, 2018 · 3 revisions

Experimental Phasing of Proteinase K

In this section, we reprocess a subset of images from Proteinase K Praseodymium phasing dataset. This study was published by Sugahara et al as "Hydroxyethyl cellulose matrix applied to serial crystallography" in Scientific reports (2017). Hit images from Cheetah pipeline have been uploaded to CXIDB entry 48.

Dataset preparation

To save processing time, we use only runs 359465 to 359472. Since each run was split into three blocks, we have 24 files in total, which amount to about 23 GB. Please download them from CXIDB. If you have access to SACLA HPC, images are also available at /lfs01/2018A/8023/public/CXIDB-48-ProK-Pr/.

To mimic the real world situation, we do not use the refined geometry but start from scratch using the initial geometry. Download cxidb_48_metadata.tar.gz and take sacla-15oct-10keV-orig.geom. As discussed in the README file, adu_per_eV lines have errors. Change q1/adu_per_eV = 0.000 to q1/adu_per_eV = 0.001. Repeat the same for q2/adu_per_eV, q3/adu_per_eV, ..., up to q8/adu_per_eV. We called this file orig.geom.

Processing with CrystFEL 0.6.3

As discussed in this wiki, we processed by (1) optimizing the detector distance (2) optimizing the beam center (3) optimizing the spot finding parameters (4) bulk indexing (5) metrology refinement (6) re-integration.

1 & 2. optimizing the detector distance and the beam centre

In the clen-test folder, first make geometry files with varying the detector distance.

for len in `seq 490 5 520`; do sed 's/clen.*/clen = 0.0'$len'/' orig.geom > sacla-15oct-$len.geom; done

Also prepare a job submission script index-dirax-geom.sh:

#!/bin/bash
#PBS -l nodes=1:ppn=14
#PBS -q serial

if [ -n "$PBS_O_WORKDIR" -a "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
        cd $PBS_O_WORKDIR
fi
 
source ~sacla_sfx_app/setup.sh

 if [ -z "$TARGET" -o -z "$GEOM" ]; then
        echo "please set TARGET and GEOM"
        exit 1
fi

TARGET=${TARGET%.lst}

indexamajig -i $TARGET.lst -o dirax-$TARGET-$GEOM.stream -j 14 -g $GEOM -p ../sfx.cell --indexing=dirax --peaks=zaef --threshold=400 --min-gradient=40000 --min-snr=5 --int-radius=3,4,7

The initial unit cell is

CrystFEL unit cell file version 1.0

lattice_type = tetragonal
centering = P
unique_axis = c 

a = 67 A
b = 67 A
c = 107 A

al = 90.0 deg
be = 90.0 deg
ga = 90.0 deg

We use only one HDF5 file for testing.

ls ../data/run359469-1.h5 |tee 349649-1.lst

The submit jobs.

for f in sacla*.geom; do qsub -v TARGET=349649-1.lst,GEOM=$f index-dirax-geom.sh; done

The indexing results from index_rate *.stream:

dirax-349649-1-sacla-15oct-490.geom.stream  164 138 84.1463
dirax-349649-1-sacla-15oct-493.geom.stream  164 138 84.1463
dirax-349649-1-sacla-15oct-495.geom.stream  164 140 85.3659
dirax-349649-1-sacla-15oct-498.geom.stream  164 139 84.7561
dirax-349649-1-sacla-15oct-500.geom.stream  164 139 84.7561
dirax-349649-1-sacla-15oct-502.geom.stream  164 138 84.1463
dirax-349649-1-sacla-15oct-505.geom.stream  164 134 81.7073

(Since 49.5 and 50.0 mm looked equally good, 49.3, 49.8 and 50.2 mm were also tested)

Use detector-shifts script to correct the beam center and re-run indexing on promising geometries.

dirax-349649-1-sacla-15oct-495-predrefine.geom.stream  164 139 84.7561
dirax-349649-1-sacla-15oct-498-predrefine.geom.stream  164 139 84.7561
dirax-349649-1-sacla-15oct-500-predrefine.geom.stream  164 139 84.7561
dirax-349649-1-sacla-15oct-502-predrefine.geom.stream  164 138 84.1463

Inspecting the unit cell distributions in cell_explorer, 49.8 mm looked the best. It was most symmetrical; The distribution of the a axis length for 49.5 mm had a tail to left, while that for 50.0 mm had a tail to right.

3. optimizing the spot finding parameters

The unit cell parameters were updated to 68.4 68.4 108.5 90 90 90.

Both zaef and peakfinder8 were tested.

dirax-349649-1-p8-th0-snr4.0-minpix2-lbg3.stream  164 144 87.8049
dirax-349649-1-p8-th0-snr4.5-minpix2-lbg3.stream  164 138 84.1463
dirax-349649-1-p8-th0-snr5.5-minpix2-lbg3.stream  164 143 87.1951
dirax-349649-1-p8-th0-snr5-minpix2-lbg3.stream  164 140 85.3659
dirax-349649-1-p8-th400-snr4.0-minpix2-lbg3.stream  164 143 87.1951
dirax-349649-1-p8-th400-snr4.5-minpix2-lbg3.stream  164 145 88.4146
dirax-349649-1-p8-th400-snr5.5-minpix2-lbg3.stream  164 140 85.3659
dirax-349649-1-zaef-th0-gr100000-snr5.stream  164 138 84.1463
dirax-349649-1-zaef-th0-gr10000-snr3.stream  164 57 34.7561
dirax-349649-1-zaef-th0-gr10000-snr5.stream  164 141 85.9756
dirax-349649-1-zaef-th0-gr200000-snr5.stream  164 139 84.7561
dirax-349649-1-zaef-th0-gr50000-snr5.stream  164 137 83.5366
dirax-349649-1-zaef-th400-gr10000-snr3.stream  164 138 84.1463
dirax-349649-1-zaef-th400-gr10000-snr5.stream  164 138 84.1463

4. bulk indexing

All images were indexed with --indexing=dirax --peaks=peakfinder8 --threshold=400 --min-snr=4.5 --min-pix-count=2 --local-bg-radius=3 --int-radius=3,4,7.

5748 out of 6506 images were indexed. The unit cell parameters were updated to 68.4 68.4 108.4 90 90 90.

5. metrology refinement

The stream file was used for sensor metrology refinement.

$ geoptimiser -g sacla-15oct-498-predrefine.geom -i all.stream -o sacla-15oct-498-opt.geom -c connected -q independent 2>&1 | tee geoptimiser.log

Error for connected group q1: 724 pixels with more than 3 peaks: RMSD = 2.1457 pixels.
Error for connected group q2: 5062 pixels with more than 3 peaks: RMSD = 1.2015 pixels.
Error for connected group q3: 5919 pixels with more than 3 peaks: RMSD = 0.9634 pixels.
Error for connected group q4: 611 pixels with more than 3 peaks: RMSD = 1.0774 pixels.
Error for connected group q5: 605 pixels with more than 3 peaks: RMSD = 2.6680 pixels.
Error for connected group q6: 5789 pixels with more than 3 peaks: RMSD = 1.0072 pixels.
Error for connected group q7: 5677 pixels with more than 3 peaks: RMSD = 0.9425 pixels.
Error for connected group q8: 781 pixels with more than 3 peaks: RMSD = 1.3405 pixels.
Detector-wide error before correction: RMSD = 1.1532 pixels.

Error for connected group q1: 724 pixels with more than 3 peaks: RMSD = 1.0700 pixels.
Error for connected group q2: 5062 pixels with more than 3 peaks: RMSD = 0.8967 pixels.
Error for connected group q3: 5919 pixels with more than 3 peaks: RMSD = 0.8454 pixels.
Error for connected group q4: 611 pixels with more than 3 peaks: RMSD = 1.0447 pixels.
Error for connected group q5: 605 pixels with more than 3 peaks: RMSD = 1.0593 pixels.
Error for connected group q6: 5789 pixels with more than 3 peaks: RMSD = 0.8358 pixels.
Error for connected group q7: 5677 pixels with more than 3 peaks: RMSD = 0.8104 pixels.
Error for connected group q8: 781 pixels with more than 3 peaks: RMSD = 0.9987 pixels.
Detector-wide error after correction: RMSD = 0.8695 pixels.

6. re-integration

Using this refined geometry, all images were re-processed.

Now 5824 out of 6505 images were indexed. Rerunning geoptimiser again did not improve the RMSD.

Thus we took this stream file for merging.

7. merging

TO BE WRITTEN

Processing with DIALS

TO BE WRITTEN