Skip to content

Running the pipeline

biochem_fan edited this page Jun 9, 2016 · 18 revisions

Introduction

Here you will process a lysozyme dataset collected at 7 keV. This is a part of datasets used for the S-SAD phasing reported in Nakane et al, Acta Cryst. D, 2015. Feel free to play with it. Full datasets (processed by Cheetah) are available at CXIDB #33.

If you are not familiar with CrystFEL, we recommend that you go through the CrystFEL tutorial. In this tutorial, we focus on SACLA specific issues and omit basics common to LCLS.

Data collection

At SACLA, raw images are grouped into runs. Typically, a run consists of 150 dark images without X-ray, followed by 5000 exposed images. The dark images are used to calculate the dark-current of the detector. Multiple runs are collected as long as the sample in the injector lasts. This is different from the standard practice at LCLS, where people tend to collect a long run with tens of thousands of images from a sample batch.

Run 1:  D1 D2 D3 ... D150 E1 E2 E3 ...................... E5000
Run 2:  D1 D2 D3 ... D150 E1 E2 E3 ...................... E5000
Run 3:  D1 D2 D3 ... D150 E1 E2 E3 ...................... E5000
...

(D: dark image, E: exposed image)

A run is identified by a run number, while an image is identified by a tag number.

You can change the number of dark and exposed images in a run by a "Run control GUI". However, we recommend you to stick to the default. Since images become available for processing only after the run has been completed, making a run bigger increases the latency of data processing. You need at least 50 dark images in each run to get a reliable estimate of the dark current.

Start the pipeline

First, establish a VPN connection to SACLA. Then log in to the fep node (front end processor). If you are on site, you can use hpc01-smp3, which is more powerful.

ssh -Y yourname@fep # VPN
ssh -Y [email protected] # on site

Next, create and go to your work directory.

mkdir /work/yourname/cheetah-test
cd /work/yourname/cheetah-test

WARNING! Files under /work is automatically deleted one month after final access. Copy important files to /UserData/yourname (accessible from xfer2) for long term storage. /home doesn't have time limits but the quota is smaller. This is where you install your script & programs. Details are discussed in SACLA HPC system.

Now you are ready to launch "Cheetah dispatcher" GUI.

source ~sacla_sfx_app/setup.sh
cheetah-dispatcher

If this is the first time you launch Cheetah dispatcher at the directory, Cheetah requests you to setup a configuration file.

ERROR: Configuration file was not found!

You should copy /home/sacla_sfx_app/packages/tools/sacla-photon.ini into this directory
and confirm the settings.

Copy sacla-photon.ini as shown in the console. Basically, you don't have to edit it. If you want to change spot finding paramaters for Cheetah, you can do it here. But as written in Nakane et al, J. Appl. Cryst., 2016, the default parameters work well for almost all cases.

cp /home/sacla_sfx_app/packages/tools/sacla-photon.ini .
cheetah-dispatcher

Submit jobs

Make sure "ST4" is selected; if you are working at the experimental hutch 2 (EH2), choose "ST2".

Type "266711-266721" to the "Run ID" text box and click the "Submit" button. During your beamtime, you have to specify your own run number, of course.

You can omit the second number. Then Cheetah will automatically detect and submit all runs after the first. (e.g. "266711-" ; but do NOT do it now. It will submit more than 150000 runs from 266711 to the latest runs!)

Low level fitering

The "MaxI threshold" textbox controls the threshold for the low level fitering (LLF). Since we do not use LLF nowadays, just leave it 0 (disabled).

LLF calculates the maximal pixel value within the ROI (region of interest). Cheetah can skip images whose LLF value is less than the threshold. This is similar to the veto system at LCLS. This was useful to accelerate the processing in 2014 but no longer necessary because the performance has been improved.

For the details, read Nakane et al, J. Appl. Cryst., 2016.

Examine the output

In your work directory, you will find these files and directories:

  • 266702-0/
    • 266702.geom (CrystFEL geometry)
    • 209060-dark.h5 (Dark average)
    • 209060-geom.h5 (Cheetah geometry)
    • run266702-0.h5 (hit images)
    • ... (other log files)
  • 266702-1/
    • run266702-1.h5
    • ...
  • 266702-2/
    • run266702-2.h5
    • ...
  • 266703-0/
    • 266703.geom
    • run266703-0.h5
    • ...
  • 266703-1/
    • run266703-1.h5
    • ...
  • 266703-2/
    • run266703-2.h5
    • ...
  • 266704-0/...
  • 266704-1/...
  • 266704-2/...
  • ...

All hit images in a processing batch are packed into a big HDF5 file (runXXXXXX-Y.h5) to reduce file system overhead.

The images had (1) dark current subtracted and (2) detector gains normalized. (2) means that 1 photon corresponds to 10 counts. In other words, the detector gain is 10 by definition.

Although geometry files are generated in each folder, they are all the same; you can use any of them. The detector geometry is constant during your beamtime.