GitHub - RussellJurek/S-MARVEL: Smoothen MAsk and ReVEaL (S-MARVEL) image analysis pipeline that removes background and finds objects in 2D images.

RussellJurek / S-MARVEL Public
Notifications You must be signed in to change notification settings
Fork 0
Star 0
Smoothen MAsk and ReVEaL (S-MARVEL) image analysis pipeline that removes background and finds objects in 2D images.
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
demo		demo
src		src
LICENSE		LICENSE
Makefile		Makefile
README		README
Repository files navigation

Programmer: Dr. Russell J. Jurek, [email protected]
Current version: 1.2
Version date: 16th Aug 2015

Previous versions:
1.0 1st April 2014
1.1 19th May 2014

History of changes:
1.1: Implemented vector classes for objects. Added shared libraries.
1.2: Modifed libraries to use size_t for array indexing. Identified cfitsio dependency of wcs shared library, and updated makefile.

===============================================================================================================================================================
The current version of my C++ object generating library consists of the following files:
===============================================================================================================================================================

RJJ_ObjGen.h --> The header file describing all of the functions in the library.

RJJ_ObjGen_DetectDefn.cpp --> The C++ object code that contains the definition of the `detections' C++ object type.

RJJ_ObjGen_CatPrint.cpp --> The functions that print a header and the detections to the output catalogue. Note that this header 
			    only labels the output catalogue columns. This header does not contain the input parameters used to 
			    call any of the library functions.

RJJ_ObjGen_PlotGlobal.cpp --> The functions that plot the global moment-0, RA PV and DEV PV images (including detection 
			      boundaries).

RJJ_ObjGen_CreateObjs.cpp --> The function that creates objects from an array containing a binary mask of `source' voxels.

RJJ_ObjGen_ThreshObjs.cpp --> The function that thresholds all objects.

RJJ_ObjGen_AddObjs.cpp --> The function that labels a flag array with an array of existing `detections', prior to running the 
		       	   function in RJJ_ObjGen_CreateObjs.cpp.

RJJ_ObjGen_MemManage.cpp --> The functions used to create/initialise and destroy the variables required to make use of this 
			     library: obj_ids, NO_obj_ids, check_obj_ids and detections as labelled internally in this library.

RJJ_ObjGen_MakeMask.cpp --> A single function that creates an output .fits file of integer type, which is 0 for non-source voxels
			    and labelled > 0 according to the object that the voxel belongs to.

RJJ_ObjGen_Dmetric.cpp --> A single function that calculates a metric for doing pointer arithmetic to retrieve the array value
		       	   corresponding to a given x,y,z position for arbitrary dimension size and order.

RJJ_ObjGen_CatPrint_WCS.cpp --> The functions that print a header and the detections to the output catalogue. Note that this header 
			    	only labels the output catalogue columns. This header does not contain the input parameters used to 
			    	call any of the library functions.

==============================================================================================================================================================
Ancillary files included with the current version.
==============================================================================================================================================================

create_catalog.cpp --> A command line program that generates a catalogue from two .fits files. One of these files is the data file, and the other is the
		     	 corresponding mask file. This mask file should consist of 0's for non-object pixels/voxels and negative values otherwise.

create_HUGE_catalog.cpp --> This is a variant of create_catalog.cpp that uses `long int' functions and double precision objects, obj_props_dbl.

create_catalog_NP.cpp --> This is a variant of create_catalog.cpp with all usage of the PGPLOT-based plotting removed.

ProcessSourceOnlyCube.cpp --> This is a variant of create_catalog.cpp that retains the PGPLOT-based plotting, but only requires a single .fits file. Instead 
			      of obtaining a mask from a .fits file, it generates one from the data .fits file using an intensity threshold. 

Process_HUGE_SourceOnlyCube.cpp --> This is a variant of ProcessSourceOnlyCube.cpp that uses `long int' functions and double precision objects, obj_props_dbl.

You can view instructions for using these ancillary files/programs by calling them at the terminal with the flag, -h. For example,

> ./create_catalog -h
 
Note that you will have to specify the current directory if you don't have `.', which is the Unix/Mac OSX symbol for the current directory, included in your
PATH variable. 

==============================================================================================================================================================
Installation instructions:
==============================================================================================================================================================

This library can utilise, but is not dependent upon, the following libraries:
1. cfitsio
2. pgplot
3. wcslib

This library can be installed without installing any of the dependent libraries, but this will result in *limited* functionality. Various portions of the 
extended functionality rely on these other libraries. The core functionality however is self-contained. To faciliate this, the library is broken into three
pieces: librjj_objgen.a, librjj_objgen_plots.a and librjj_objgen_wcs.a.

IMPORTANT: Note that the librjj_objgen_wcs.so shared library *may* depend upon the cfitsio library. When creating the shared library, the wcs shared library 
will be preferred over the static wcs library. The wcs shared library depends upon the cfitsio library.

Use the included makefile to install the 3 libraries that make up this distribution.

> make all
> make install [optional command to copy the compiled libaries and header files to another directory --- default position is /usr/local/lib and /usr/local/include]
> make clean [optional command to remove object files that are no longer required]

The command sudo might be required when using `make install' eg. > sudo make install. If installing on a Mac OS X system and something goes wrong, Libtool will sometimes prevent a library from being recompiled. Use the command

> make distclean

to remove *everything*. This should allow you to try again.

You can specify the following environmental flags that will be picked up by the makefile:
CXX ---- C++ compiler.
INSTALL_DIR --- Location to copy the compiled libraries.
INSTALL_INC --- Location to copy the compiled library headers.
FITS_DIR, FITS_LIB, FITS_INC --- The location of the FITSIO installation and the relative path to the library and include files.
PGPLOT_DIR --- The location of the PGPLOT installation.
WCS_DIR, WCS_LIB, WCS_INC --- The location of the WCS installation and the relative path to the library and include files.

============================================================================================================================================================
Using the library:
============================================================================================================================================================

This library relies on a C++ object type, called `object_props'. This type of object contains a range of scalar properties 
and arrays that are used to store postage stamps for the moment-0, RA position-velocity and DEC position-velocity diagrams, 
the integrated spectrum and the reference spectrum (integrated spectrum of data cube over the source bounding box).

The five main functions required to use this library are: InitObjGen(), FreeObjGen(), CreateMetric(), CreateObjects() and AddObjsToChunk(). 

This library assumes that the user has a copy of the data in one array and a second array of identical size storing 
the binary mask of this datacube (referred to as flag array). It is assumed that both of these arrays are 1-dimensional, 
and that higher dimensionality is achieved by using pointer arithmetic. For instance, pixel 2,3 in a 2-D image of size 
100x200 is array element (2 * 100) + 3. This design choice was made so that the code will work equally well for 1-D spectra, 
2-D images and 3-D spectral datacubes. 

This library also assumes that the binary mask uses a value of 0 for non-source pixels/voxels, and a consistent negative value 
source pixels/voxels eg. -1. Multiple negative vales can be used in this binary mask to reflect different sets of source 
pixels/voxels, such as might be produced using the Negative Detections algorithm (Serra, Jurek & De Floer, 2012). The library
will only process a pixels/voxels that contain the user-specified negative flag value eg. -99. Positive values are not allowed
because the library populates this flag_vals array with positive values as it creates objects.

If the data is sufficiently small that both the data and flag array fit into memory, then objects can be created by calling 
CreateObjects() just once. If the data is too large to do this, then the data can be processed in `chunks' using a two step 
process. First, AddObjsToChunk() can be called to pre-load a flag array with a list of existing/previously found objects. 
The command CreateObjects() can then be used to create objects for the current chunk, while taking into account the objects 
found in previous chunks.

When processing large datacubes in chunks, the CreateObjects() function assumes that if an axis chunk_start position isn't 0,
then this chunk overlaps previous chunks. In this situation it ignores the first X rows of this dimension, where X is the 
merging length in this dimension, because it assumes that this is the size and location of the overlap. If you want to 
process a chunk that doesn't overlap, then you should add the merging length to both the chunk size and chunk start position 
of this dimension.

In order to use this library, a user needs to make use of 2 working arrays. Both are 1-dimensional arrays of integers. These 
arrays are used to keep track of two types of object IDs. The first array, check_obj_ids, keeps track of the IDs of the 
objects that are still within the merging box of the voxel currently being linked to existing objects. The second array, 
obj_ids, keeps track of the available object IDs. This includes the object IDs that have been recycled during processing.

Both of these working arrays are initialised when the function InitObjGen() is called, provided the user passes in the required 
information and a vector of type int for each of the working arrays. InitObjGen() will also initialise a couple of variables 
at the same time, and setup the internally labelled `detections' array. The associated function if FreeObjGen(), which will 
properly clean up the memory allocated to the two working arrays and the `detections' array.

The function, CreateMetric(), sets up the final working arrays, data_metric. This array is used to carry out the pointer 
arithmetic that maps 1D/2D/3D positions to the internal memory layout of the data_vals & flag_vals array (among others).
The xyz_order array allows you to specify the order of the 'x/RA', 'y/Dec' and 'z/Frequency/Velocity' in the input arrays,
data_vals and flag_vals. For instance, a fits file with RA, Dec and Frequency in that order would use xyz_order: 1 2 3. A 
fits file with Frequency, RA and Dec axes would use xyz_order: 3 1 2. 

The function prototypes for AddObjsToChunk() and CreateObjects() are shown below. A description of the various parameters 
follows.

int AddObjsToChunk(int * flag_vals, vector<object_props *> & detections, int NOobj, int obj_limit, int chunk_x_start, int chunk_y_start, int chunk_z_start, int chunk_x_size, int chunk_y_size, int chunk_z_size, int data_x_size, int data_y_size, int data_z_size, vector<int> & check_obj_ids);

Returns: The number of objects added to the flag array of the current data chunk as an integer.

flag_vals: A pointer of type int, which is used to store the binary mask describing which pixels/voxels are source. Pointer 
	   arithmetic is used to index this array instead of defining an n-dimensional array. The fastest changing axis is 
	   labelled the x axis, the next fastest is labelled the y axis and the slowest changing is labelled the z axis. 
	   This is only an internal labelling of the flag_vals array, and does not have to match the labelling of the external 
	   program that calls the AddObjsToChunk() function.

detections: A vector of pointers of type object_props (the object type defined by this library). This is the array 
	    containing all of the objects that have been created by previous calls of CreateObjects(). The vector of  
	    pointers design allows groups of objects to be created and initialised at once. Creating the detections array in 
	    groups allows only the required memory to be created at any one time. The maximum number of detections that can be 
	    generated is equal to the multiplication of the number of groups by the number of objects in a group: 
	    internally labelled, this is --> obj_limit (objects per group) x obj_batch_limit (number of groups). The value of
	    obj_limit is set by your machines limit on vector size.

NOobj: An int corresponding to the total number of sources found so far (after all calls of CreateObjects()) and stored in the
       detections variable.

obj_limit: The detections variable is designed such that groups of objects of type object_props are created and initialised at 
	   once. The integer obj_limit specifies the size of these groups. A value of 1 can be used, which will effectively 
	   side-step this behaviour.

chunk_x_start: The starting x value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources. 
	       This is the internally labelled x axis. An integer value.

chunk_y_start: The starting y value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources.
	       This is the internally labelled y axis. An integer value.

chunk_z_start: The starting z value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources.
	       This is the internally labelled z axis. An integer value.

chunk_x_size: An integer corresponding to the size of the chunk to be populated, along the internally labelled x axis.

chunk_y_size: An integer corresponding to the size of the chunk to be populated, along the internally labelled y axis.

chunk_z_size: An integer corresponding to the size of the chunk to be populated, along the internally labelled z axis.

data_x_size: An integer corresponding to the size of the entire dataset along the internally labelled x axis.

data_y_size: An integer corresponding to the size of the entire dataset along the internally labelled y axis.

data_z_size: An integer corresponding to the size of the entire dataset along the internally labelled z axis.

check_obj_ids: A vector of integers that stores the numeric IDs of all the existing objects in the detections array, 
	       that have been added to the flag_vals array of the chunk that is currently being processed, during this call of
	       the AddObjsToChunk() function.



int CreateObjects(float * data_vals, int * flag_vals, int size_x, int size_y, int size_z, int chunk_x_start, int chunk_y_start, int chunk_z_start, int merge_x, int merge_y, int merge_z, int min_x_size, int min_y_size, int min_z_size, int min_v_size, float intensity_threshold, int flag_value, int start_obj, vector<object_props *> & detections, vector<int> & obj_ids, int& NO_obj_ids, vector<int> & check_obj_ids, int& NO_check_obj_ids, int obj_limit, int obj_batch_limit, int max_x_val, int max_y_val, int max_z_val);

Returns: The total number of objects created so far (across all chunks that have been processed) as an integer.

data_vals: A pointer of type float, which is used to store the actual data of the chunk currently being processed. Pointer 
	   arithmetic is used to index this array instead of defining an n-dimensional array. The fastest changing axis is 
	   labelled the x axis, the next fastest is labelled the y axis and the slowest changing is labelled the z axis. 
	   This is only an internal labelling of the data_vals array, and does not have to match the labelling of the external 
	   program that calls the CreateObjects() function.

flag_vals: A pointer of type int, which is used to store the binary mask describing which pixels/voxels are source. Pointer 
	   arithmetic is used to index this array instead of defining an n-dimensional array. The fastest changing axis is 
	   labelled the x axis, the next fastest is labelled the y axis and the slowest changing is labelled the z axis. 
	   This is only an internal labelling of the flag_vals array, and does not have to match the labellign of the external 
	   program that calls the AddObjsToChunk() function.

size_x: An integer corresponding to the size of the entire dataset along the internally labelled x axis.

size_y: An integer corresponding to the size of the entire dataset along the internally labelled y axis.

size_z: An integer corresponding to the size of the entire dataset along the internally labelled z axis.

chunk_x_start: The starting x value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources. 
	       This is the internally labelled x axis. An integer value.

chunk_y_start: The starting y value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources.
	       This is the internally labelled y axis. An integer value.

chunk_z_start: The starting z value in pixel/voxel co-ordinates of the chunk that is to be populated with existing sources.
	       This is the internally labelled z axis. An integer value.

merge_x: The maximum separation in pixels/voxels along the internally labelled x axis of sources that should be merger into 
	 a single source.

merge_y: The maximum separation in pixels/voxels along the internally labelled y axis of sources that should be merger into 
	 a single source.

merge_z: The maximum separation in pixels/voxels along the internally labelled z axis of sources that should be merger into 
	 a single source.

min_x: The minimum size of sources in pixels/voxels along the internally labelled x axis that are to be kept.

min_y: The minimum size of sources in pixels/voxels along the internally labelled y axis that are to be kept.

min_z: The minimum size of sources in pixels/voxels along the internally labelled z axis that are to be kept.

min_v: The minimum size of sources in pixels/voxels that are to be kept.

intensity_threshold: The minimum total intensity of sources that are to be kept.

flag_value: The integer value in the flag_vals array that corresponds to a source pixel/voxel.

start_obj: The starting value of the object IDs assigned to objects created in the current call of CreateObjects().

detections: A vector of pointers of type object_props (the object type defined by this library). This is the array 
	    containing all of the objects that have been created by previous calls of CreateObjects(). The vector of 
	    pointers design allows groups of objects to be created and initialised at once. Creating the detections array in 
	    groups allows only the required memory to be created at any one time. The maximum number of detections that can be 
	    generated is equal to the multiplication of the number of groups by the number of objects in a group: 
	    internally labelled, this is --> obj_limit (objects per group) x obj_batch_limit (number of groups). The value of
	    obj_limit is set by your machines limit on vector size.

obj_ids: A vector of integers, which is one of the two `working arrays' required by the CreateObjects() function. This array 
	 stores the available object IDs. 

NO_obj_ids: An integer value set to the number of available object IDs stored in the array, obj_ids.

check_obj_ids: A vector of integers that stores the numeric IDs of all the existing objects in the detections array, 
	       that have been added to the flag_vals array of the chunk that is currently being processed, during this call of
	       the AddObjsToChunk() function.

NO_check_obj_ids: An integer value set to the number of object IDs stored in the array, check_obj_ids.

obj_limit: The detections variable is designed such that groups of objects of type object_props are created and initialised at 
	   once. The integer obj_limit specifies the size of these groups. A value of 1 can be used, which will effectively 
	   side-step this behaviour. An integer value.

max_x_val: An integer corresponding to the size of the chunk to be populated, along the internally labelled x axis.

max_y_val: An integer corresponding to the size of the chunk to be populated, along the internally labelled y axis.

max_z_val: An integer corresponding to the size of the chunk to be populated, along the internally labelled z axis.