Abstracted system for doing seed-based correlation analysis with any given set
of groups and covariates. Uses AFNI 3dttest++
.
Below is a general guide to setting up this pipeline for any project. We assume that you will clone this repository and, in that directory, follow these instructions.
This script creates a number of recipes based on the number of groups and seeds. The number of recipes it has to create is:
(seeds * groups + seeds * contrasts) * 2
where contrasts
is:
seeds^2 - seeds
So, the time it takes make to prepare rises with the square of groups and
linearly with the number of seeds. As these increase, make
will take more time
to get ready to be invoked, and might look like it's hanging. Try make test-contrasts
to see how long it takes to compile.
The number of recipes is multiplied by two to account for the cluster correction recipes, which are separate from the base processing.
Important: Do not use a hyphen (-
) in group, seed, or covariate names.
Or anywhere, really. The system relies in many places on hyphens being used
exclusively to identify contrasts (e.g. patient-control
). I have built in a
few first-order checks against hyphens in bad places, but the system will fail
spectacularly if there are hyphens in unexpected places.
The makefile requires that the list of seeds and their locations be read in
from a file. That file must be named allseeds.txt
and be located in the
top-level analysis
directory and have the project seed directory, and all
the seeds used in the project listed in a single column like so:
/mnt/praxic/pdnetworksr01/lib/SVC_seeds/
DANRaIPS
DANLpIPS
DANRFEF
DANLFEF
...
The makefile reads from the directory listed in the first line and looks for
the seeds on the n>1
lines. These files can have any notes following the
seed name in the file name, but must have an .nii.gz
extension.
For example, DANRaIPS_sphereroi.nii.gz
is acceptable, but
ROI_DANRaIPS.nii
wouldn't be.
In other words, the DANRaIPS seed must match the file glob DANRaIPS*.nii.gz
.
Important: All seeds must be generated for all subjects (see next section).
If they aren't, the t-test for any seed where one or more subjects is missing
the file will fail (with an error from 3dttest++
).
The makefile identifies groups by looking for files that match the regex
group-[[:alpha:]]*.txt
. For example, these files would create the groups
"control" and "patient."
group-control.txt
group-patient.txt
Makefile
Please ensure that there are no other files matching this pattern in the
working directory. Reminder: Do not use "-
" in group names.
Group files must contain lines the following format:
140593 /mnt/praxic/pdnetworksr01/subjects/140593/session1/mcvsa/SVC_MEICA/
140605 /mnt/praxic/pdnetworksr01/subjects/140605/session1/mcvsa/SVC_MEICA/
Each line consists of a subject identifier, followed by a space, then followed
by the absolute path to the seed-based connectivity maps for that subject. It
is important to label each subject because the syntax of 3dttest++
requires
each map to be labeled with a subject identifier.
For info on the [[:alpha:]]
syntax, see
this page.
It is equivalent to [a-zA-Z]
.
Important: All of these directories must contain all of the required files.
3dttest++
will choke if it is given a non-existent file, and will not
continue without it. I suggest you create a script to manage the creation of
these files based on the existence of the map directories.
The first fifty or so lines of the makefile set a number of variables for
make
execution. However, all the project-specific variables are set in the
auxiliary file analysis/settings.conf
.
In general, the convention is ALL CAPS variables should be modified on a project basis and left alone. PascalCase variables can be given a default, but can be overridden on the command line (any variable can be so overridden, but these are likely to be.) All lowercase variables should not be modified.
To override the default value of any variable, simply set a new value for them on the command line, like so:
make SINGLEGROUP_PD ANALYSIS=-ETAC
PROJECT_DIR
: The top-level directory for your project (in IBIC standard convention, this containsbin/
,subjects/
, etc.)STANDARD_MASK
: The MNI-space mask used in3dttest++
. Using a mask reduces the number of comparisons and speeds up processing. Make sure the resolution (Xmm
) matches the space your files are registered to. Default: 2mmSVCSUFFIX
: Depending on howmeica
was set up, the maps aren't necessarily given consistent names. Set this to whatever your project chose as the output. We're looking for the Z-transformed correlation map.COVFILE
: The covariates file (see sectionXXX
) to read from. Leave blank if there are no covariates.ANALYSIS
: Which type of3dttest++
cluster correction to use. Available options are-ETAC
or-Clustsim
. Note that the number of cores can be specified as an argument to these flags, or left off to use all available cores. See section 7 for a discussion of cluster correction. Important: The leading dash must be included.
Paired
: By default, an unpaired t-test is executed. To do a paired t-test, changed the value ofPaired
to anything other than the empty string (for example, "yes").3dttest++
will fail if your subject lists aren't properly paired.seedsdir
: This variable is read in from the first lne ofallseeds.txt
and contains all the ROIs used in later analyses.allseeds
: By default, all the seeds are read in from the fileallseeds
(see section 1). This can be overriden here if you have another method to identify the names of the seeds.groups
: This pulls the group names out of thegroup-*.txt
files (see section 2). Don't override this setting, as later recipes assume the existence of the samegroup-*.txt
files.contrasts
: This variable automatically creates all the contrasts based on the given group, and filters out any 0 contrasts (A - A). Don't change this declaration! There should be N(N-1) total contrastscovariate
: This variable is set automatically based on whether there is aCOVFILE
. It is later read by3dttest++
, and adds the option-covariates
for the proper syntax. Don't change this logic statement.pairflag
: Adds the-paired
flag to theGROUPDIFF
t-test if thePaired
variable is set to true.
Check to make sure your setup is done correctly. You can run
make test-<variable>
to get the contents of a variable and how many "words"
it has; useful in checking that you have exported the correct number of
contrasts, etc.
Run bin/check-config.sh
, which will check that the appropriate files exist
in your analysis/
directory and that they are formatted correctly. It isn't
perfect, so do not count a successful result here as proof positive your project
is ready to go.
Additionally, check the number of degrees of freedom for your analysis:
make SINGLEGROUP_<group>_DoF
make GROUPDIFF_<contrast>_DoF
Note though, that due to the constraints of Make, the calculation has to be
written out twice. So it is possible that there is a discrepancy between the
test calculation and the value actually passed to cluster-correct.sh
. Please
let me know if you discover any such issues.
A covariate file takes the following format, including up to 32 whitespace-separated columns (including the subject identifier column).
idnum age gender ...
RRF01 18 0 ...
RRF02 20 1 ...
RRF04 21 1 ...
RRF05 22 0 ...
... ... ... ...
From the 3dttest++
manual:
* A maximum of 31 covariates are allowed. If you have more, then seriously consider the likelihood that you are completely deranged.
The column headers will be part of the output filenames, so ensure that they are
descriptive, but not too cumbersome.
For example, you might end up with files named
DMNLPCC_PDa-PDr_mean_age.nii.gz
.
You can only add one covariate file, so the covariates for all subjects in all groups must be present in this one file.
There is no required name for the covariate file (COVFILE
is empty by
default), but covariates.txt
works just fine.
Additionally from the 3dttest++
manual:
* There is no provision for missing values -- the entire table must be filled!
There are two primary ways of running this makefile: single-group and between-group analyses.
- Run
SINGLEGROUP_<g>
, whereg
is one of the groups set in step 2 to run for a single group. Note that this requires a minimum of 14 subjects (an AFNI3dttest++
requirement).
For example, you could type:
make SINGLEGROUP_patient
to create a directory called patient
with the single group analyses results
for the subjects in the file group_patient.txt
.
- Run
GROUPDIFF_<g1>-<g2>
, whereg1
andg2
are two different groups from step 2. This also requires 14 subjects, but between the two groups.
Analogously, this would be called using
make GROUPDIFF_control-patient
to create the contrast between patients and controls.
The basic function of cluster correction is to determine what areas of activation are interesting by grouping regions with enough contiguous voxels above a certain threshold, and discarding those areas that are too small.
While the math behind the methodologies is beyond the scope of this README, the basic effect is to remove noise.
Important: It is possible to have no voxels survive cluster correction. In this case, the cluster corrected maps will be empty, this isn't necessarily a symptom of program failure.
Cluster correction is a separate step because it takes a while, even when parallelized (see section 8), and this setup allows you to examine intermediate results before dedicating entire machines to cluster correction.
To run cluster correction, call either single or difference cluster correction:
make SINGLEGROUP_patient_clustcorr
make GROUPDIFF_control-patient_clustcorr
Either of these will use the cluster correction method identified in Section 3.
Note that if run asynchronously, the QA image creation program might try to take
pictures of temporary files that are deleted as the makefile progresses. You
might see such "unable to read file" message. These are harmless; only the
appropriate files are created, but you can avoid this messages by running
sequentially (e.g. make SINGLEGROUP_patient_clustcorr -k
.)
This will create a new directory at the same level as nifti
and headbrik
:
clustcorr
, which will contain six files for each seed:
- Two NIfTI images: one for positive-signed clusters and one for negative-signed clusters
- Two GIFs: one QA picture for each neuroimage
- Two text files: These contain the cluster size thresholds used
For each sign, there is the actual image, a QA picture for quick checking, and a text file with all the clusters in said image, along with their size in voxels and coordinates.
The ClustSim algorithm has its own AFNI program:
3dClustSim
but is also built into 3dttest++
as -Clustsim
.
From that page: The ClustSim method works by "simulates the noise volumes by randomizing and permuting input datasets, and sending those volumes into 3dClustSim directly. There is no built-in math model for the spatial [auto-correlation function]."
Importantly, the cluster size is the same for all brain regions with ClustSim. This is at the core of Eklund's 2016 [1] critique of fMRI false-positive rates.
Equitable Tresholding and Clustering (ETAC) is a response to the abovementioned
concerns about false-positive rates. Its 3dttest++
option is -ETAC
.
See Cox et al.'s response [2] for details about how ETAC works, but at its core, it uses different cluster thresholds for different regions of the brain based what we know about how large functional clusters are in one area of the brain compared to another.
You might want to (a) parallelize processing of a single ROI to speed it up (see 7.1); or (b) parallelize processing of multiple ROIs to decrease total competion time (see 7.2).
The easiest way to parallelize locally is to override Analysis
to be
-Clustsim
, without an integer argument. Then, 3dttest++
will use all
available cores.
Use this to test one ROI, as testing multiple will rapidly overwhelm your machine, but this is useful for rapid testing over waiting an hour+ for one-core execution to complete.
Check that the grid engine is set up properly for your user/machine. See here for instructions.
You can submit the jobs on the queue with qmake
:
qmake -cwd -V -- -j <n> -k <TARGET>
Where "n
" is the number of jobs (which you can get with make test-allseeds
to see how many you can parallelize), and TARGET
is the make
target you
want to build (like SINGLEGROUP_PD
). The -cwd
flag to qmake tells it to
work in the current directory, and -V
passes along all environment variables.
Or, you can submit the jobs using rmake
:
rmake -T <TARGET>
This project contains four scripts in bin/
that you may need to run. Three of
these scripts are automatically called by the Makefile, and check-config.sh
is
for your information.
Run check-config.sh
from the root directory (i.e. where the Makefile is) to
check your configuration.
cluster-correct.sh
takes the output of 3dttest++
and cluster corrects given
certain p and alpha values. Here are the mandatory and optional arguments:
Mandatory:
-i *
Input prefix, path to HEAD/BRIK files without+{orig,tlrc}.{HEAD,BRIK}
suffixes.-d *
Set the degrees of freedom. (Previously not a mandatory option.)-o *
Output directory.-D
Group mode. Use this when comparing two groups, not just one.
Optional:
-1/2
Use a 1- or 2-sided t-test (default 2)-a *
Set custom alpha value (default 0.5)-h
Display help menu-k
Keep mode, don't delete intermediate files for debugging-L
Creates a log file in the output directory with parameters from the run.-n [123]
Which neighbor mode to use (default 3)- 1: Faces only
- 2: Faces and edges
- 3: Faces, edges, and corners
-p *
Set custom p threshold (default 0.05), set value between 0.0-1.0. Not all p-values can be used, andcluster-correct.sh
will warn you if you try to use one.
This script extracts all the AFNI bricks to individual NIFTI files and names them appropriately. It takes 2+ arguments. The first is an output directory, and the second and beyond are BRIK files. All bricks from all BRIK files will be extracted to the given directory.
Each BRIK file must have a corresponding HEAD file, or else it will fail.
This script creates images using slicer
for every NIFTI file you give it
and outputs them to the same place as the input. It also annotates "EMPTY" onto
empty files so you can be sure they are actually empty and not just missing from
the given view.
[1] Eklund, A., Nichols, T. E., & Knutsson, H. (2016). "Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113(28), 7900–7905. https://doi.org/10.1073/pnas.1602413113
[2] Cox, R. W., Chen, G., Glen, D. R., Reynolds, R. C., & Taylor, P. A. (2017). FMRI Clustering in AFNI: False-Positive Rates Redux. Brain Connectivity, 7(3), 152–171. https://doi.org/10.1089/brain.2016.0475