Skip to content

Optimizing the result

biochem_fan edited this page Sep 9, 2021 · 16 revisions

Optimizing the result

Merging with --push-res option

Sometimes the --push-res option improves high resolution statistics. It applies per-image resolution cutoff.

Save the following script as process-scale-pr.sh.

#!/bin/bash

inp=$1
pr=$2
out=${inp%.stream}-scale-pr$pr.hkl
pg="4/mmm" # Point group. Use 422 to treat Friedel pairs separately

source ~sacla_sfx_app/setup.sh

process_hkl -y $pg --scale --push-res $pr --odd-only -i $inp -o ${out}1  &
process_hkl -y $pg --scale --push-res $pr --even-only -i $inp -o ${out}2 &
process_hkl -y $pg --scale --push-res $pr -i $inp -o $out &
wait

Run this as

sh process-scale-pr.sh 1.0

. Try 0.5, 1.0, 1.5, 2.0 and see if it helps. When the number of --push-res is too small, the completeness of high resolution shells becomes very low. Look at completeness as well as CC1/2!

Merging with partialator

partialator can apply B factor scaling, while process_hkl --scale can apply only multiplicative scale factors. In my experiences, it rarely improved the data quality from process_hkl --scale --push-res, as it tended to underweight high resolution reflections. However, it did help in a few datasets out of > 100 datasets.

Typically, the command line is as follows. Try various --push-res as in process_hkl.

partialator -i input.stream -o merged.hkl -m unity -j16 -n3 --output-every-cycle -y 4/mmm --push-res 1.0

It outputs odd and even half sets as well as a full dataset. -n 3 specifies three iterations. --output-every-cycle helps to find the best number of iterations.

Optimization of detector metrology

The MPCCD consists of eight sensor modules, each having 512 x 1024 pixels. The physical arrangement of the modules (detector metrology) should be optimized to get the most of your datasets. geoptimiser in the CrystFEL suite refines the metrology by comparing the observed spot position and the predicted spot position. See [geoptimiser documents] (http://www.desy.de/~twhite/crystfel/geoptimiser-guide.html) at the CrystFEL website to learn how this is done at LCLS. Since the MPCCD has fewer sensors than CSPAD, the refinement is easier.

While several hundred images are sufficient for optimizations of the detector distance and spot finding parameters, you should use as many images as possible for geoptimiser. We use the result of Running CrystFEL to refine the geometry.

$ geoptimiser -i 266711-266721-dirax.stream -g sacla-15jan-505-predrefine.geom -o sacla-15jan-505-opt.geom -c connected -q independent
Maximum distance between peaks: 4.0 pixels.
Minimum number of measurements for a pixel to be included in the refinement: 3
Minimum number of measurements for connected group for accurate estimation of position/orientation: 100
Loaded 1000 indexed patterns from 1514 total patterns.
Loaded 2000 indexed patterns from 3017 total patterns.
Loaded 3000 indexed patterns from 4498 total patterns.
Loaded 4000 indexed patterns from 6007 total patterns.
Loaded 5000 indexed patterns from 7513 total patterns.
Loaded 6000 indexed patterns from 9027 total patterns.
Loaded 7000 indexed patterns from 10532 total patterns.
Loaded 8000 indexed patterns from 12032 total patterns.
Loaded 9000 indexed patterns from 13534 total patterns.
Loaded 10000 indexed patterns from 15035 total patterns.
Loaded 11000 indexed patterns from 16516 total patterns.
Loaded 12000 indexed patterns from 18026 total patterns.
Loaded 13000 indexed patterns from 19533 total patterns.
Loaded 14000 indexed patterns from 21074 total patterns.
Loaded 15000 indexed patterns from 22635 total patterns.
Loaded 16000 indexed patterns from 24158 total patterns.
Loaded 17000 indexed patterns from 25726 total patterns.
Loaded 18000 indexed patterns from 27203 total patterns.
Loaded 19000 indexed patterns from 28690 total patterns.
Loaded 20000 indexed patterns from 30289 total patterns.
Loaded 21000 indexed patterns from 31838 total patterns.
Loaded 22000 indexed patterns from 33711 total patterns.
Found 22298 indexed patterns in file 266711-266721-dirax.stream (from a total of 34272).
Average cell coordinates:
Average a, b, c (in nm):  7.887,  7.890,  3.787
Minimum -Maximum a, b, c:
	 7.545 -  8.151,
	 7.493 -  8.120,
	 3.657 -  3.955
Average alpha,beta,gamma in degrees: 90.025, 90.022, 90.020
Minimum - Maximum alpha,beta,gamma in degrees:
	88.26 - 92.42,
	87.59 - 92.00,
	88.04 - 91.59
All patterns have the same camera length: 0.050500 m.
Camera length 0.0505 m was found 22298 times.
Minimum inter-bragg peak distance (based on average cell parameters): 22.6 pixels.
Computing pixel displacements.
Adjusting the minimum number of measurements per pixel in order to have enough measurements for each connected group.
Minimum number of measurement per pixel for connected group q1 has been set to 3
Minimum number of measurement per pixel for connected group q2 has been set to 3
Minimum number of measurement per pixel for connected group q3 has been set to 3
Minimum number of measurement per pixel for connected group q4 has been set to 3
Minimum number of measurement per pixel for connected group q5 has been set to 3
Minimum number of measurement per pixel for connected group q6 has been set to 3
Minimum number of measurement per pixel for connected group q7 has been set to 3
Minimum number of measurement per pixel for connected group q8 has been set to 3
Computing error before correction.
Error for connected group q1: 2869 pixels with more than 3 peaks: RMSD = 2.4318 pixels.
Error for connected group q2: 11300 pixels with more than 3 peaks: RMSD = 1.3960 pixels.
Error for connected group q3: 12390 pixels with more than 3 peaks: RMSD = 1.2857 pixels.
Error for connected group q4: 2709 pixels with more than 3 peaks: RMSD = 1.5111 pixels.
Error for connected group q5: 2381 pixels with more than 3 peaks: RMSD = 2.1686 pixels.
Error for connected group q6: 12319 pixels with more than 3 peaks: RMSD = 1.4260 pixels.
Error for connected group q7: 13253 pixels with more than 3 peaks: RMSD = 1.2192 pixels.
Error for connected group q8: 3799 pixels with more than 3 peaks: RMSD = 1.4096 pixels.
Detector-wide error before correction: RMSD = 1.4548 pixels.
Saving error map before correction.
Computing rotation and stretch corrections.
Panel q1, num: 1531944, angle: -0.0013 deg, stretch coeff: 0.9998
Panel q2, num: 30152123, angle: -0.0011 deg, stretch coeff: 0.9991
Panel q3, num: 27663120, angle: -0.0008 deg, stretch coeff: 0.9978
Panel q4, num: 1227524, angle: -0.0030 deg, stretch coeff: 0.9974
Panel q5, num: 1006155, angle: 0.0001 deg, stretch coeff: 1.0002
Panel q6, num: 27600891, angle: 0.0013 deg, stretch coeff: 1.0001
Panel q7, num: 41470508, angle: -0.0015 deg, stretch coeff: 0.9995
Panel q8, num: 2607344, angle: -0.0025 deg, stretch coeff: 0.9985
Computing overall stretch coefficient.
(Using only connected groups for which the minimum number of measurements per pixel is 3).
The global stretch coefficient for the patterns is 0.9991
Computing rotation and elongation corrections for groups without the required number of measurements.
Applying rotation and stretch corrections.
Using a single offset distance for the whole detector: -0.000044 m. Stretch ceofficient: 0.9991
Computing shift corrections.
Panel q1, num pixels: 2869, shifts (in pixels) X,Y: 1.68711620, -1.52578556
Panel q2, num pixels: 11300, shifts (in pixels) X,Y: 0.72747886, -1.21585897
Panel q3, num pixels: 12390, shifts (in pixels) X,Y: 0.11680500, -1.18020567
Panel q4, num pixels: 2709, shifts (in pixels) X,Y: 0.23159705, -0.43663291
Panel q5, num pixels: 2381, shifts (in pixels) X,Y: -1.95957928, 0.47718251
Panel q6, num pixels: 12319, shifts (in pixels) X,Y: -0.78245869, 0.66825932
Panel q7, num pixels: 13253, shifts (in pixels) X,Y: -0.17912854, 0.20624590
Panel q8, num pixels: 3799, shifts (in pixels) X,Y: 0.51081474, 0.73901985
Computing shift corrections for groups without the required number of measurements.
Applying shift corrections.
Saving error map after correction.
Computing errors after correction.
Error for connected group q1: 2869 pixels with more than 3 peaks: RMSD = 1.1592 pixels.
Error for connected group q2: 11300 pixels with more than 3 peaks: RMSD = 0.9864 pixels.
Error for connected group q3: 12390 pixels with more than 3 peaks: RMSD = 1.0299 pixels.
Error for connected group q4: 2709 pixels with more than 3 peaks: RMSD = 1.2772 pixels.
Error for connected group q5: 2381 pixels with more than 3 peaks: RMSD = 1.1874 pixels.
Error for connected group q6: 12319 pixels with more than 3 peaks: RMSD = 1.0191 pixels.
Error for connected group q7: 13253 pixels with more than 3 peaks: RMSD = 0.9872 pixels.
Error for connected group q8: 3799 pixels with more than 3 peaks: RMSD = 1.2714 pixels.
Detector-wide error after correction: RMSD = 1.0527 pixels.
All done!
Be sure to inspect error_map_before.png and error_map_after.png !!

We can see the RMSD has been reduced from 1.4548 pixels to 1.0527 pixels. Look at error_map_before.png and error_map_after.png to visually check the improvements. Use display command or download the images to local.

Before refinement: Error map before refinement

After refinement: Error map after refinement

Note that the dots became darker (white to dark blue), especially around the center. If we used more images for the refinement, we could have got better results.

Re-process all images with the new geometry file sacla-15jan-505-opt.geom. Now it indexed 22687 images (22298 before optimization). The CC1/2 is:

  1/d centre       CC       nref      d / A   Min 1/nm    Max 1/nm
     1.097  0.9855871        580       9.12      0.253       1.940
     2.192  0.9884662        523       4.56      1.940       2.444
     2.620  0.9917031        517       3.82      2.444       2.797
     2.938  0.9879777        502       3.40      2.797       3.078
     3.197  0.9912425        495       3.13      3.078       3.316
     3.420  0.9920204        504       2.92      3.316       3.524
     3.616  0.9916279        487       2.77      3.524       3.709
     3.794  0.9921004        499       2.64      3.709       3.878
     3.956  0.9927059        490       2.53      3.878       4.033
     4.105  0.9909341        487       2.44      4.033       4.178
     4.245  0.9890665        485       2.36      4.178       4.312
     4.376  0.9868184        489       2.29      4.312       4.439
     4.499  0.9861799        479       2.22      4.439       4.559
     4.616  0.9772875        485       2.17      4.559       4.673
     4.728  0.9602735        491       2.12      4.673       4.782
     4.834  0.9402130        488       2.07      4.782       4.886
     4.936  0.9150620        470       2.03      4.886       4.986
     5.034  0.7954474        499       1.99      4.986       5.082
     5.128  0.6002460        475       1.95      5.082       5.174
     5.219  0.1496696        304       1.92      5.174       5.263

Compare this with the CC1/2 before optimization we got in Running CrystFEL. The highest resolution shells were:

     4.728  0.9478072        491       2.12      4.673       4.782
     4.834  0.8611027        488       2.07      4.782       4.886
     4.936  0.8654098        470       2.03      4.886       4.986
     5.034  0.6223890        499       1.99      4.986       5.082
     5.128  0.4044454        476       1.95      5.082       5.174
     5.219  0.1885259        306       1.92      5.174       5.263

Thus, the data quality has been improved.

Using optimized parameters in the pipeline

If you collect many data from a single sample (e.g. in time-resolved studies), you can tell the pipeline to use optimized parameters in CrystFEL. Then you can just cat stream files in subdirectories and merge them without re-running indexamajig. This saves your time.

Re-start cheetah-dispatcher with the crystfel_args parameter. This argument is passed to indexamajig and overrides pipeline's defaults. For example,

cheetah-dispatcher --crystfel_args="--indexing=dirax --threshold=1000 --min-snr=3.5 --peaks=peakfinder8 --int-radius=4,5,7 -g ../optimized.geom -p ../sfx.cell"

This tells the pipeline to use optimized.geom instead of the auto-generated geometry file. Note that you need to add ../ to the path; the path is relative to subdirectories for each run.

We do NOT recommend you to work in this way unless you are certain about samples and parameters. When a cell parameter is specified and the indexing method does not contain the raw flag, CrystFEL will reject indexing results that do not match the specified cell. So you risk missing unexpected cell parameter changes or precipitation of salts.