README

Travis Build Status Wercker Build Status Coverage Status Code Health License Documentation Anaconda Release Join the chat at https://gitter.im/nanshe-org/nanshe


Nanshe

Motivation

This package is designed to perform segmentation on datasets containing spatially-fixed components with time varying intensity. As the techniques used are a bit slow, the segmentation algorithm has been designed to run in batch. The segmentation algorithm has been further tailored to enable multiprocessing as well as support for clusters that support DRMAA ( http://www.drmaa.org ). In addition to the segmentation algorithm, it provides a converter for TIFF to HDF5 and a rough viewer for inspecting the results obtained.

Converter

Intro

The first item contained inside the package is nanshe_converter.py. This is designed to convert external formats to HDF5 (the native format used by Nanshe). Currently, only TIFF stacks are supported.

Sample Usage

To run the converter, simply run the command below. The first argument specifies the input format. Currently, this is just tiff. The configuration parameters are contained in the config.json file. An example configuration file can be found in the examples folder. The input images can be specified using a regex as can be seen below. The output file is an HDF5 with a dataset containing the result. Here, output.h5 is the output file. The data is stored in the specified dataset, which is /images within root.

./nanshe_converter.py tiff config.json input_dir/*.tiff output.h5/images

Learner

Intro

The next item contained inside the package is nanshe_learner.py. This runs the segmentation algorithm using a specified set of parameters on a provided dataset and stores the result.

Sample Usage

To run the learner, simply run the command below. The configuration parameters are contained in the config.json file. Sample configuration files are included in the examples folder. One example is for a single process and is called nanshe_learner.json. Another example uses multiprocessing and is called nanshe_learner_multiprocessing.json. The input data is stored within the HDF5 input.h5 in the top level group in the dataset images. The output file can be the same file or a different file. Here, output.h5 is the output file. The data is stored in the specified group, which is root as can be seen by the following /. The extracted neurons will be put in a compound data type (a.k.a. a structured array) called neurons. Various results including the masks (under the field mask) will be stored here. The first index on neurons will indicate which neuron is selected. All remaining indices for different types follow C-order convention (excepting gaussian_cov, which is the covariance matrix and slightly differs for obvious reasons).

./nanshe_learner.py config.json input.h5/images output.h5/

Viewer

Intro

The first item contained inside the package is nanshe_viewer.py. This runs the segmentation algorithm using a specified set of parameters on a provided dataset and stores the result.

Sample Usage

To run the viewer, simply run the command below. The configuration parameters are contained in the config.json file. These specify how the layers are ordered, grouped, and named. Also, they specify where to fetch the data for the layers. Lastly, they specify where this data is located in an HDF5 file. The input data is stored within the HDF5 input.h5.

./nanshe_viewer.py config.json input.h5

Prebuilt Binaries

There are some prebuilt binaries that are available. These are built using BuildEM ( https://github.com/janelia-flyem/buildem ). The binaries are designed to be largely independent of your own system. These contain all the dependencies needed to run everything used by Nanshe except for the ones explicitly required and excluded from BuildEM.

To run any of the above utilities, there is a run_*.sh command that will do this. For example, here is how you would call the nanshe_learner.

./nanshe-package/run_nanshe_learner.sh "config.json" "input.h5/images" "output.h5/"

It is also possible to submit jobs to cluster using the Open Grid Engine, which have DRMAA installed. This is done using qrun_*.sh commands as seen below. In the case of nanshe_learner, a special configure file is required, which can be found in the example directory. If DRMAA is not available, it is possible to run multiprocessing as mentioned above.

./nanshe-package/qrun_nanshe_learner.sh "config.json" "input.h5/images" "output.h5/"

Conventions

Throughout nanshe, there are certain conventions that are followed. Often it is not necessary to know them. However, they do occasionally come up. For instances, Nanshe uses the C-order indexing convention and does not use multiple channels. This means that the spatial ordering will be transposed of what is usually expected and time will come first. For a given 2D array, a_2D, indexing would be a_2D[t,y,x] where t is time and x, y are the usual spatial coordinates. Similarly, for a 3D array, a_3D, the indexing would be a_3D[t,z,y,x]. Also, the JSON configuration files permit commenting. This is done by using __comment__ as a prefix within a string. For objects (a.k.a. dictionaries), if the key or value has a comment both will be removed on parsing. For arrays (a.k.a. lists), if a value is a comment, it will be dropped from the list. Additionally, some steps are optional, in which case they can be removed or commented from the configuration file. For certain fields, like those in “accepted_region_shape_constraints” and “accepted_neuron_shape_constraints”, it is permissible to have a “min” and/or “max”. If both are specified, then the range must be met. If only one is specified, then only an upper or lower bound exists.