-
Notifications
You must be signed in to change notification settings - Fork 100
Config Variables
Documenting the variables used in the options struct, ops
. Each variable here is assigned using: ops.<variableName>
(see StandardConfig_MOVEME.m
).
Data is processed in batches, NT
samples long, pre-processed via: filtering, median subtraction (common average referencing) and data whitening across channels (removes correlated noise - e.g. due to far away neurons). The whitened data is then scaled down, division by scaleproc
.
KiloSort thresholds the scaled down pre-processed data with, spkTh
, to identify initial spikes, with nt0
samples on either side of the minimum of the spike. For each threshold crossing +-loc_range(1)
samples and +-loc_range(2)
channels are checked to find the minimum. These spikes are clustered on the 7 dimensional PC space, wPCA
, to identify potential templates. Nfilt
number of templates are initially found, these templates are then run through your data in batches (think convolution). During a potential match the degree of similarity between the current match and the template is compared with a threshold, Th
. The match is compared to the mean of the waves, for lower lam
values the current match is allowed to be scaled more to match the template; in other words large lam
values force waves to be closer to the mean of the current template's waveforms. A certain amount of noise/uncertainty is allowed, larger values of momentum(1)
allow for more noise/variabilitiy in the waveforms for a given template.
After a set number of batches, 400
, templates are re-evaluated. If the distance between clusters is less than mergeT
these clusters and hence the templates are averaged together, if the score of the split between clusters is greater than splitT
the cluster is marked for splitting. Splitting is performed after merging, and contains a hidden test for number of spikes to allow overwriting small clusters [?].
Parallel Matching Pursuit occurs during the final pass of the data. This approach looks for the best matching template and subtracts it from the waveform, the residual waveform is then compared with other templates in a similar fashion until the amount of explained varience below a threshold.
- Nfilt
- nNeighPC
- nNeigh
- whitening
- nSkipCov
- whiteningRange
- chanMap
- criterionNoiseChannels
- Nrank
- nfullpasses
- maxFR
- ntbuff
- scaleproc
- NT
- Th
- lam
- nannealpasses
- momentum
- shuffle_clusters
- mergeT
- splitT
- nt0
- nt0min
- initialize
- spkTh
- loc_range
- long_range
- maskMaxChannels
- crit
- nFiltMax
- dd
- wPCA
- fracse
- epu
- ForceMaxRAMforDat
Nfilt sets the target number of clusters to find. This mean the output (before any auto-merging) will usually have this many clusters, but if shuffle_clusters = 1
, you may find the final number of clusters deviates from this value.
Typically you want this variable to be 2-4 times the number of recording sites (i.e. channels, Nchan
) you have. However, the lower the input impedance of your recording sites, the lower you can set this value. A low input impedance indicates that you will still receive large amplitude signals relatively further away from the recording site, hence if all your recording sites were low impedance you might find that they essentially record the same signal - KiloSort will therefore not be able to cluster signals base on a waveform signature that spans multiple channels.
If initialize='fromData'
, KiloSort uses this threshold to identify a set of sample waveforms, these are then projected onto the PCs and are clustered using k-means. Each cluster is used to generate a 'template', this becomes the set of initialisation templates that KiloSort subsequently uses.
Reference: #122
KiloSort projects a candidate spike waveform onto each template to assess how much of the variance of that spike in the waveform can be explained by the template. This threshold allows sets how much of the variance needs to be explained to consider the waveform part of the template.
In other words, the threshold is for how much variance is allowed around the template, a small value indicates a large amount of variance is allowed - allowing this template's cluster to accumulate more waveforms that vary from the template.
There are 3 elements. The first 2 elements are used to create a linspace()
between anneal 1 and the anneal final (nannealpasses*NBatch
). e.g. 1 and 5 for 10 anneals: linspace(1,5,10)
. This effectively creates an increasingly harder threshold to cross for each anneal pass. The final element is used during the final template matching pass - i.e. the pass that goes through each batch sequentially and performs parallel matching.
Relevant References: #122, #146(isolated_peaks
implementation notes)
A large value of lam
means that if the template needs to be scaled to match the candidate waveform, there is a large penalty associated with that. The penalty is referring to the value of similarity between the waveform and the template, hence a large penalty will cause a reduction in the similarity value. The threshold for similarity is set by Th
Sets the number of samples to use for templates and hence the extracted waveform. The peak of the template / extracted waveform is located at sample nt0min. Should always be an odd number. It also cannot exceed 80, #30, as there is a hardcoded maximum in the GPU code.
Excellent examples here: #177, #171
Should be set to equal: (spike peak location used for PC's -1)
Informs the algorithm where in your PC's is the peak location. The -1 is required because MATLAB uses 1-indexing, e.g. waveform centre at 21: 1+20, instead of 0+21.
wPCA should contain the first 7 principle components from some sample data.
Wi = pca(waveFormArray);
% waveFormArray is:
% rows x columns
% spikes x samples of the spike waveform
imagesc(Wi) % visualise the output
wPCA = Wi(:,1:7) % KiloSort only uses the first 7
To compute the xth PC value for a waveform you multiply the xth column of wPCA
with a spike waveform, e.g. a row of waveFormArray
.
For multi-channel data, the waveforms used can only be the channels with the largest amplitude.