-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review preprocessing. #135
base: reviewed_code
Are you sure you want to change the base?
Conversation
152ee9b
to
84d5f79
Compare
0e743a9
to
38c3865
Compare
38c3865
to
89ea9e7
Compare
paths to rawdata. The pp_steps attribute is set on | ||
this class during execution of this function. | ||
|
||
pp_steps: The name of valid preprocessing .yaml file (without the yaml extension). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this can also take a path. In general this pattern is not generalisable because when users pip install spikewrap
the spikewrap
will be hidden in whever pip installed spikewrap. Better to setup a config path at the start, possibly just a function spikewrap.set_config_path()
that will store the path to use for configs in /user/.spikewrap
.
@@ -44,6 +49,19 @@ def __post_init__(self) -> None: | |||
self.update_two_layer_dict(self, ses_name, run_name, {"0-raw": None}) | |||
self.update_two_layer_dict(self.sync, ses_name, run_name, None) | |||
|
|||
def set_pp_steps(self, pp_steps: Dict) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a docstring is not even required for this function.
available_files = glob.glob((config_dir / "*.yaml").as_posix()) | ||
available_files = [Path(path_).stem for path_ in available_files] | ||
|
||
if name not in available_files: # then assume it is a full path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this, see comment below on setting a config directory.
recording object will be stored. The name of the dict entry will be | ||
a concatenenation of all preprocessing steps that were performed. | ||
|
||
e.g. "0-raw", "0-raw_1-phase_shift_2-bandpass_filter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. "0-raw", "0-raw_1-phase_shift_2-bandpass_filter" | |
e.g. "0-raw", "2-raw-phase_shift-bandpass_filter" |
This PR contains the code for running preprocessing on the loaded spikeinterface recording objects.
On data load, the sessions / runs for a single subject are loaded into the
data_class.preprocessing.PreprocessingData
UserDict
in a nested dictionary e.g.:Now in the preprocessing step, Spikeinterface functions as defined in the config yaml are applied to the data and stored on the object in a specific numbered format. e.g. if phase_shift and bandpass filter are appllied, the dictionary becomes:
This is handled in the function
_fill_run_data_with_preprocessed_recording
.After this, the data is saved to disk, by calling a method on the preprocessing class itself.
In the
pipeline.preprocess.py
, the functions in the sectionHelpers for preprocessing steps dictionary
check and run the actual preprocessing in spikeinterface.In utils, there is a set of functions around handling the
PreprocessingData
userdict. Most of these are called later during sorting / postprocessing, but we may want to move some things around so let me know if anything looks out of place, maybe some of these functions should be onbase.py
.Running the script
The script
example_preprocessing.py
is added inexamples
. The path to the data used in the example preprocessing is on/ceph
and should be accessible from your user account (let me know if not). To test the data, cloning the repository toceph
, installing spikewrap and runningexample_script
should run the pipeline (assuming the data is accessible).Some General Notes / Questions
Currently the
preprocess_data
first contains raw-data then is updated in-place with the preprocessed data. So it is in fact unpreprocessed, and then after filled with preprocessing. Currently it is just calledpreprocess_data
at all stages but maybe the naming should reflect the change. I'm not sure how confusing this currently is.PreprocessingData
is aUserDict
(i.e. dictionary with user-defined functions). In the docstrings I can't decide whether to refer to it as a dict, class, object, userdict 🤷♂️these two points are not specifically related to this PR
locals()
beacuse it is not robust, but it leads to very long function signatures, which also duplicate arguments which is buggy in it's own right (e.g. may forget to update all instances when adding a new argument). Based on user feedback it is useful to print all passed arguments, in which case thelocals()
also comes in handle for this. However it might be worth just makingrun_full_pipeline
a class and storing all it's kwargs as attributes. I was thinking to avoid a class entry point if possible but maybe now is the time for it.e.g.
Dict
so kept having to refer to the config anyway. Also, it is likely there will only be a few standard settings that people will setup once and reuse. e.g.hpc="gpu-fast"
,hpc="cpu-large-mem"
.Thanks and please let me know if you have any questions!