You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- sessions from the late 2023 release of the BWM are uploaded
- new spike sorting revision is uploaded
- Allen coordinates and regions are accurate and match the ones in ONE
- the raw electrophysiology is synchronized to the behaviour in NWB format (the spiking data should already be)
- accessing the behaviour and spikes only takes a reasonable amount of time (on the order of tens of seconds) and does not require an inordinate amount of disk space (ie. we do not want to be streaming a few trials in 120Gb blank files)
the cosyne tutorial works with the newer data
So as we are going towards the BWM paper response to reviewers end of September we want to revise this Dandiset by the end of September.
How
Loop over sessions, per session:
1. fetch everything
2. write into .nwb
3. store
We will run this from the San Diego supercomputer SDSC or from AWS.
Discussions about splitting files and metadata and decisions to be made
Briefly, what I was discussing with Ryan was centered around how to logically group data and metadata, and how the nwb format has some flexibility in this. I was arguing that for me, the most intuitive structure would be one in which there are three levels based on the need fo the person that accesses the data, "acquisition related", "raw", and "processed".
The person that wants to analyze the data should not need to worry about anything acquisition related or not preprocessed, and should have easy access to spikes, aligned behavior, extracted info etc.
The person that wants to reanalyze the raw data, for example with a newer algorithm or so, needs to have access to one level deeper, but does not necessarily care about acquisition details such as amplifier settings and hardware information.
While the person that ultimately wants to replicate the experiment needs to have knowledge about the devices and all the contextual metadata that was part of creating the experiment.
I guess alternative logical groupings could be device centered, where for example all raw data comes from devices (that have all their metadata), and all processed data is then in the hierarchy under the raw data etc.
The data I was working with (the draft from the DANDI) follows mostly the analysis centric approach I was describing above, but sometimes not entirely. I was wondering what you think of this or what the discussion in the IBL regarding this was, what I should aim for in the conversion process etc. Basically, from where do I start? :)
Converting our data to Dandi is really an outreach effort, and the purpose is to reach users. As such the user centric way to organize the data you propose (analysts, method engineer) makes sense.
In practice there are 2 main difficulties: data size and meta-data complexity.
For data-size, the post-processed data represents 4% of our current data footprint, and this is what most of neuroscientists are interested in. The remaining method weirdos (of which I am a part) will have a much bigger data footprint looking at raw videos and recordings. Here I suggest to split the nwb files in 3 to address this:
one neuroscientist package with groomed spikes, behaviour and brain regions
raw electrophysiology data (AP + LF)
raw video data
I am open to split the raw data more or less: we could have AP, LF and video for instance, it is all a matter of if it is easy to access only part of the files. For example it would be frustrating to have to download all of the AP data to only look at the LF (which represents 1/13 of the raw ephys data size), which many people will be interested in.
For the metadata, it is small but very complex and time consuming. Here I would sick with what Catalyst did and make sure we link the protocols and documentation we have written and published.
The text was updated successfully, but these errors were encountered:
https://github.com/int-brain-lab/IBL-to-nwb
Rationale
We have an instance of the BWM dataset on DANDI here: https://dandiarchive.org/dandiset/000409?search=409&pos=1
And the conversion was done by Catalyst Neuro with this script: https://github.com/catalystneuro/IBL-to-nwb/tree/main/ibl_to_nwb
We have the following requirements:
So as we are going towards the BWM paper response to reviewers end of September we want to revise this Dandiset by the end of September.
How
Loop over sessions, per session:
1. fetch everything
2. write into .nwb
3. store
We will run this from the San Diego supercomputer SDSC or from AWS.
Discussions about splitting files and metadata and decisions to be made
The text was updated successfully, but these errors were encountered: