Registration, cell detection, spike extraction and visualization GUI.
Algorithmic details in http://biorxiv.org/content/early/2016/06/30/061507.
This code was written by Marius Pachitariu, Carsen Stringer and members of the cortexlab (Kenneth Harris and Matteo Carandini). It is provided here with no warranty. For support, please open an issue directly on github.
For the python version of the code, go here. We recommend using the python code - the documentation is better (see wiki), and the graphical interface has more functionality. We also are going to discontinue updating the matlab version.
An example dataset (with master_file and make_db) is provided here.
This is a complete, automated pipeline for processing two-photon Calcium imaging recordings. It is simple, fast and yields a large set of active ROIs. A GUI further provides point-and-click capabilities for refining the results in minutes. The pipeline includes the following steps
-
X-Y subpixel registration: using a modification of the phase correlation algorithm and subpixel translation in the FFT domain. If a GPU is available, this completes in 20 minutes per 1h of recordings at 30Hz and 512x512 resolution.
-
SVD decomposition: this provides the input to cell detection and accelerates the algorithm.
-
Cell detection: using clustering methods in a low-dimensional space. The clustering algorithm provides a positive mask for each ROI identified, and allows for overlaps between masks. There is also an option to perform automated red cell detection.
-
Signal extraction: by default, all overlapping pixels are discarded when computing the signal inside each ROI, to avoid using "demixing" approaches, which can be biased. The neuropil signal is also computed independently for each ROI, as a weighted pixel average, pooling from a large area around each ROI, but excluding all pixels assigned to ROIs during cell detection.
-
Spike deconvolution: cell and neuropil traces are further processed to obtain an estimate of spike times and spike "amplitudes". The amplitudes are proportional to the number of spikes in a burst/bin. Even under low SNR conditions, where transients might be hard to identify, the deconvolution is still useful for temporally-localizing responses. The cell traces are baselined using the minimum of the (overly) smoothed trace.
-
Neuropil subtraction: coefficient is estimated iteratively together with spike deconvolution to minimize the residual of spike deconvolution. The user is encouraged to also try varying this coefficient, to make sure that any scientific results do not depend crucially on it.
-
Automatic and manual curation: the output of the cell detection algorithm can be visualized and further refined using the included GUI. The GUI is designed to make cell sorting a fun and enjoyable experience. It also includes an automatic classifier that gradually refines itself based on the manual labelling provided by the user. This allows the automated classifier to adapt for different types of data, acquired under different conditions. (README FOR GUI AT https://github.com/cortex-lab/Suite2P/blob/master/gui2P/README.md)
The toolbox runs in Matlab and currently only supports tiff file inputs. To begin using the toolbox, you will need to make local copies (in a separate folder) of two included files: master_file and make_db. It is important that you make local copies of these files, otherwise updating the repository will overwrite them (and you can lose your files). The make_db file assembles a database of experiments that you would like to be processed in batch. It also adds per-session specific information that the algorithm requires such as the number of imaged planes and channels. The master_file sets general processing options that are applied to all sessions included in make_db, UNLESS the option is over-ridden in the make_db file.
Look into make_db_example for more detailed examples.
The following is a typical database entry in the local make_db file, which can be modelled after make_db_example. The folder structure assumed is RootStorage/mouse_name/date/expts(k) for all entries in expts(k).
i = i+1;
db(i).mouse_name = 'M150329_MP009';
db(i).date = '2015-04-27';
db(i).expts = [5 6]; % which experiments to process together
Other (hidden) options are described in make_db_example.m, and at the top of run_pipeline.m (set to reasonable defaults).
Change paths in master_file to the paths to your local toolbox and to your data. Then run this function. The master_file creates the ops0 variable and the db0 variable, and runs the main pipeline:
run_pipeline(db, ops);
For spike deconvolution, you need to download the OASIS github (https://github.com/zhoupc/OASIS_matlab) and add the path to this folder on your computer to the top of your master_file
addpath(genpath('pathtoOASIS')))
To run spike deconvolution (after running the pipeline), run
add_deconvolution(ops0, db);
For L0 spike deconvolution, you need to run mex -largeArrayDims SpikeDetection/deconvL0.c (or .cpp under Linux/Mac). If you're on Windows, you will need to install Visual Studio Community in order to mex files in matlab. To choose this deconvolution method, set
ops0.deconvType = 'L0';
See this paper comparing spike deconvolution methods for more information on choosing deconvolution methods/parameters: http://www.biorxiv.org/content/early/2017/06/27/156786
You can also run spike deconvolution without running the entire pipeline by calling wrapperDECONV(ops,F,N), where F and N are the fluorescence and neuropil traces respectively, while ops specifies some deconvolution parameters like sampling rate and sensory decay timescale. See the function help for more information.
Below we describe the outputs of the pipeline first, and then describe the options for setting it up, and customizing it. Importantly, almost all options have pre-specified defaults. Any options specified in master_file
in ops0 overrides these defaults. Furthermore, any option specified in the make_db
file (experiment specific) overrides both the defaults and the options set in master_file. This allows for flexibility in processing different experiments with different options. The only critical option that you'll need to set is ops0.diameter, or db(N).diameter. This gives the algorithm the scale of the recording, and the size of ROIs you are trying to extract. We recommend as a first run to try the pipeline after setting the diameter option. Depending on the results, you can come back and try changing some of the other options.
Note: some of the options are not specified in either the example master_file
or the example make_db file. These are usually more specialized features.
The output is a struct called dat which is saved into a mat file in ResultsSavePath using the same subfolder structure, under a name formatted like F_M150329_MP009_2015-04-29_plane1
. It contains all the information collected throughout the processing, and contains the ROI and neuropil traces in Fcell and FcellNeu, and whether each ROI j is a cell or not in stat(j).iscell. stat(j) contains information about each ROI j and can be used to recover the corresponding pixels for each ROI in stat.ipix. The centroid of the ROI is specified in stat as well. Here is a summary of where the important results are:
cell traces are in Fcell
neuropil traces are in FcellNeu
deconvolved traces are in sp
Each cell of the above structures is a different experiment from db.expts.
manual, GUI overwritten "iscell" labels are in stat.iscell
stat(icell) contains all other information:
- iscell: automated label, based on anatomy
- neuropilCoefficient: multiplicative coefficient on the neuropil signal
- xpix, ypix: x and y indices of pixels belonging to this max. These index into the valid part of the image (defined by ops.yrange, ops.xrange).
- ipix: linearized indices ((ypix, xpix) --> ypix + (xpix-1) x Ly) of pixels belonging to this mask.
- isoverlap: whether the pixels overlap with other masks.
- lam, lambda: mask coefficients for the corresponding pixels. lambda is the same as lam, but normalized to 1.
- med: median of y and x pixels in the ROI (indices for the valid part of the image, defined by ops.yrange, ops.xrange).
- blockstarts: the cumulative number of frames per block. Clould be useful for concatenating experiments correctly (some planes will have fewer frames/block).
- footprint, mrs, mrs0, cmpct, aspec_ratio, ellipse, mimgProj, skew, std, maxMinusMed, top5pcMinusMed: these are used by the automated classifier to label an ROI as cell or not. see section IX for details.
There are fields for red cell detection too (see the section on Identifying red cells)
The settings for the registration and the mean image are also output in the ops
structure:
- mimg: target mean image computed at the beginning of registration to which all frames are aligned
- mimg1: mean image computed from all the frames across all experiments
- DS: offsets computed in XY
- RootStorage --- the root location where the raw tiff files are stored.
- RegFileRoot --- location on local disk where to keep the registered movies in binary format. This will be loaded several times so it should ideally be an SSD drive. (to view registered movies use script "view_registered_binaries.m" in main folder)
- ResultsSavePath --- where to save the final results.
- DeleteBin --- deletes the binary file created to store the registered movies
- RegFileTiffLocation --- where to save registered tiffs (if empty, does not save) if you want to save red tiffs, then specify ops.RegFileTiffLocation and set ops.REDbinary = 1
ResultsSavePath is completed with separate subfolders per animal and experiment, specified in the make_db file. Your data should be stored under a file tree of the form
\RootStorage\mouse_name\session\block*.tif(f)
If you don't want to use this folder structure, see the make_db_example file for alternatives. The make_db_example file also shows how to group together tiffs from different experiments (i.e. different subfolders within this folder structure).
The output is a struct called dat which is saved into a mat file in ResultsSavePath using the same subfolder structure, under a name formatted like F_M150329_MP009_2015-04-29_plane1. It contains all the information collected throughout the processing, and contains the fluorescence traces in dat.Fcell and whether a given ROI is a cell or not in dat.stat(N).iscell. dat.stat contains information about each ROI and can be used to recover the corresponding pixels for each ROI N in dat.stat(N).ipix. The centroid of the ROI N is specified in dat.stat(N) as well.
- showTargetRegistration --- whether to show an image of the target frame immediately after it is computed.
- PhaseCorrelation --- whether to use phase correlation (default is phase-correlation, if 0 then cross-correlation used).
- SubPixel --- accuracy level of subpixel registration (default is 10 = 0.1 pixel accuracy)
- kriging --- compute shifts using kernel regression with a gaussian kernel of width 1 onto a grid of 1/SubPixel (default is kriging = 1)
- maxregshift --- maximum amount of movement allowed in FOV (default is 10% of max(y pixels, x pixels))
- maskSlope --- slope on the taper mask applied to image (default is 2 pixel exponential decay)
- NimgFirstRegistration --- number of randomly sampled images to compute target image from
- NiterPrealign --- number of iterations for the target computation (iterative re-alignment of subset of frames)
- smooth_time_space --- convolves raw movie with a Gaussian of specified size in specified dimensions; [t]: convolve in time with gauss. of std t, [t s]: convolve in time and space, [t x y]: convolve in time, and in space with an ellipse rather than circle
- nimgbegend --- number of frames over which to average at beginning and end of experiment (if worried about drift) (default is 0 frames)
Block Registration (for high zoom/npixels)
- nonrigid --- set to 1 for non-rigid registration (or set numBlocks > 1)
- numBlocks --- 1x2 array denoting the number of blocks to divide image in y and x (default is [8 1])
- blockFrac --- percent of image to use per block (default is 1/(numBlocks-1))
- quadBlocks --- interpolate block shifts to single line shifts (6 blocks -> 512 lines) by fitting a quadratic function (default is 1)
- smoothBlocks --- if quadBlocks = 0, then smoothBlocks is the standard deviation of the gaussian smoothing kernel
Bidirectional scanning issues (frilly cells - default is to correct)
- dobidi --- compute bidirectional phase offset from images (default 1)
- BiDiPhase --- value of bidirectional phase offset to use for computation (will not compute bidirectional phase offset)
Recordings with red channel
- AlignToRedChannel --- perform registration to red channel (non-functional channel) rather than green channel (this assumes that you have a red channel for all recordings in db.expts)
- redMeanImg --- compute mean image of red channel from experiments with red and green channel (you do not need to have a red channel for all db.expts, computes only from db.expred -- so make sure this isn't empty!!)
- REDbinary --- compute a binary file of the red channel (like the green channel binary) from db.expred
output ops.mimgRED will contain mean image (if AlignToRedChannel, redMeanImg or REDbinary = 1)
Splitting large tiffs for registration if running out of memory (e.g. 2048 x 2048 pixel images) currently only works with rigid registration, where each section of FOV is registered separately
- splitFOV --- 1x2 array specifying chunk size in y and x (default is [1 1])
- sig --- spatial smoothing constant: smooths out the SVDs spatially. Indirectly forces ROIs to be more round.
- nSVDforROI --- how many SVD components to keep for clustering. Usually ~ the number of expected cells.
- ShowCellMap --- whether to show the clustering results as an image every 10 iterations of the clustering
- getROIs --- whether to run the ROI detection algorithm after registration
- stopSourcery --- stop clustering if # of ROIs extracted < (ROIs extracted on iteration 1) x (stopSourcery) (default is 1/10)
- maxIterRoiDetection --- maximum number of clustering iterations (default is 100)
- refine --- whether or not to refine ROIs - suite2p smooths the PCs to find masks, refinement uses unsmoothed PCs to recompute masks from smoothed estimates
- NavgFramesSVD --- for SVD, data has to be temporally binned. This number specifies the final number of points to be obtained after binning. In other words, datasets with many timepoints are binned in higher windows while small datasets are binned less.
- getSVDcomps --- whether to obtain and save to disk SVD components of the registered movies. Useful for pixel-level analysis and for checking the quality of the registration (residual motion will show up as SVD components). This is a separate SVD decomposition from that done for cell clustering (does not remove a running baseline of each pixel).
- nSVDforROI --- how many SVD components to keep.
- getSVDcomps --- whether to obtain and save to disk SVD components of the registered movies. Useful for pixel-level analysis and for checking the quality of the registration (residual motion will show up as SVD components). This is a separate SVD decomposition from that done for cell clustering (does not remove a running baseline of each pixel).
- nSVD --- if getSVDcomps=1, then nSVD specifies how many components to save
- signalExtraction --- how should the fluorescence be extracted? The 'raw' option restricts cells to be non-overlapping, 'regression' option allows cell overlaps. The neuropil model is a set of spatial basis functions that tile the FOV. The 'surround' option means that the cell's activity is the weighted sum of the detected pixels (weighted by lambda). The neuropil is computed as the sum of activity of surrounding pixels (excluding other cells in the computation).
- ratioNeuropil --- used for both spatial basis functions and surround neuropil - the spatial extent of the neuropil as a factor times the radius of the cells (ops.ratioNeuropil * cell radius = neuropil radius)
if using surround neuropil (signalExtraction = 'surround')
- innerNeuropil --- padding in pixels around cell to exclude from neuropil
- outerNeuropil --- radius of neuropil surround (set to Inf to use ops.ratioNeuropil)
- minNeuropilPixels --- minimum number of pixels in neuropil surround (radius will expand until this number is reached)
- imageRate --- imaging rate per plane.
- sensorTau --- decay timescale (in seconds).
- maxNeurop --- neuropil contamination coef has to be less than this (sometimes good to impose a ceiling at 1, i.e. for interneurons)
- deconvType --- which type of deconvolution to use (either 'L0' or 'OASIS')
use function identify_redcells_sourcery(db, ops0)
to identify cells with red
Outputs are appended to stat in F.mat file
- redratio = red pixels inside / red pixels outside
- redcell = redratio > mean(redratio) + redthres x std(redratio)
- notred = redratio < mean(redratio) + redmax x std(redratio)
Options are
- redthres --- higher thres means fewer red cells (default 1.35)
- redmax --- the higher the max the more NON-red cells (default 1)
The Suite2p classifier uses a number of features of each ROI to assign cell labels to ROIs. The classifier uses a naive Bayes approach for each feature, and models the distribution of each feature with a non-parametric, adaptively binned empirical distribution. The classifier is initialized with some standard distributions for these features, but is updated continuously with new data samples as the user refines the output manually in the GUI.
The features used are the following (can see values for each ROI by selecting it in the GUI).
- std = standard deviation of the cell trace, normalized to the size of the neuropil trace
- skew = skewness of the neuropil-subtracted cell trace
- pct = mean distance of pixels from ROI center, normalized to the same measuree for a perfect disk
- footprint = spatial extent of correlation between ROI trace and nearby pixels
- mimgProjAbs = whether this ROI shape is correlated to the shape on the mean image
- aspect_ratio = of an ellipse fit to the ROI