-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on customisation #57
Comments
Some requests I've had (and some thoughts of my own) include:
I'm not sure if this is a neuroblueprint issue. Perhaps it's a datashuttle one? NeuroBlueprint could be a particular specification. datashuttle will support custom versions for folder creation/validation/transfer, but we won't support them in analysis software etc. |
Hmm yes I think it's a bit of both. For #73 this is definitely a datashuttle thing, once we have a metadata standard implemented I think this would be a natural extension to metadata validation. For the datatypes, maybe this is a discussion for NeuroBlueprint first (maybe a new issue) as it is tricky problem but should be possible to support existing needs with a combination of datatypes and employing BIDS-like suffixes. If we can handle these concerns within the spec it would be great, a BIDS-like solution would be to have In general I would rather put forward a workable standard solution based on BIDS and ask people to try it. If they try it for a project and after 2 weeks say 'this is making my life more difficult and is not workable' then that's a problem. Or, if we have problems with adoption because people are immediately put off, despite us offering tooling for data management and analysis, then we should think about customisation. But for me customisations are a last resort if there are no other workable alternatives and our carrots are not working, as it immediately dilutes standardisation. For example if we allow customised datatypes, within a year we have "2p", "2photon", "2-PH0T0n", tP", knocking around and we've not really solved the problem. |
I agree with all the above. I think the one thing we will need to do somehow is validate the existance of specific files within the existing NB structure (e.g. metadata). |
The more I think about the datatype issue, I wonder if the best approach is just to have a different datatype for every conceivable technique out there. This is a large divergence from BIDS, but it would be easy to write converters. The suffix approach makes a lot of sense in MRI where you have lots of different sequences that are slightly different but are basically In systems neuroscience lots of quite different techniques come under With the current setup we have somethings like:
So all of the dtype information needs to be promoted to the session name to work around the meaningless anat / funcimg datatype. I think the suffix approach is nice in theory but in practice, I don't think many reserachers would want to mix datatypes in the same folder, where tracking them is entirely dependent on adding a suffix to all filenames. All of these problems will be solved by having more granular datatype names. The only downside I can think of is sometimes these might be weird (e.g. |
I like this approach. I think diverting from BIDS is fine, as long as there is a strong reason for it. NeuroBlueprint was never meant to be exactly the same as BIDS, otherwise it would be BIDS! |
Hey @niksirbi what do you think of this? If in agreement next steps could be to propose a list of datatypes and get feedback on this idea from SWC users. My guess is most would approve but probably worth getting wider feedback before making any chances to the spec. |
I've been pondering this for a while and I'm internally torn. I'll try to summarise my thoughts so far:
So to summarise, I'm fine with increasing the number of datatypes, instead of introducing modalities. Now I come to my biggest concern, which is establishing a list of NeuroBlueprint "datatypes". I'd have no idea how to do that, no two people would agree about what warrants being put in the same "datatype" folder vs in different ones. Any decision we make on that will be largely arbitrary (similarly to distinguishing between datatypes and modalities). The example Joe gave above, already showcases that (I would have probably "split" the data differently). Perhaps we should take the radically flexible approach to allow users/labs to pre-specify the list of desired datatypes per project. This could be in the form of a On the other hand, radical flexibility is at odds with standardisation, which is the whole point of specifications, so as I said, I'm torn. |
Another customisation-related issue which was not mentioned above, is allowing projects to skip the |
Thanks @niksirbi for that summary I agree on all points. It is not straightforward and even small annoyances (e.g. having to call ephys ecephys) could block uptake. One way around having to make many decisions on this could be to merge the concept of BIDS datatype and modality. We can have 'high level' datatypes that can be used use in most cases, and 'low level datatypes' that can be used if a user a) wants a more specific datatype name b) have two different modalities that fit into the same 'high level datatype'. The high-level datatypes are what we have already ( An example The datatype folder is where data from different acquisition modalities is put. We define a number of high-level datatypes that should suffice for most use cases:
In some cases ( Refined datatype namesYou can replace the high-level datatype name with one of the refined datatype names below. If a refined datatype name is used, the corresponding high-level datatype must not be used. ephys anat
funcimg |
For reference a list of BIDS datatypes / modalities can be found here. As far as I can tell, only MRI and microscopy really make use of them. A downside of the above is it does mean data could possibly be, at maximum, in two places. From the datashuttle aspect it is not a problem, low-level datatypes is just another datatype. For data-discovery, I think it should not add too much complexity (if there is no "2pe" folder check for an "anat" folder). EDIT: |
Hey Joe, I like your idea of 'high-level' vs 'low-level' datatypes, because it doesn't break with the current schema and it allows for considerable flexibility. That said, I have some thoughts to share. Broad vs Narrow datatypesLet's not call them high-leve vs low-level datatypes, because that implies a nested hierarchical structure, and that's not what we want (we don't want a low-level datatype sub-folder within a high-level datatype folder). I suggest using broad vs narrow datatypes instead, because it nicely captures that their main difference is the breadth of scope. Potential narrow datatypes for each broad datatype
|
Thanks @niksirbi I agree 'Broad' vs. 'Narrow' datatypes is much better names and we can use those going forward. I agree on all datatypes, although for the I guess the two main aims for the datatype names is to be:
In terms of implementing this I think it only requires:
|
On the datashuttle/NWB roadmap we have this deliverable:
The level of customisability is a general consideration for specifications (e.g. BIDS 2.0). A benefit of allowing customisations (and automated conversion between them) facilitates adoption and makes researchers lives easier. A downside is that it can be complex / error prone to implement and may dilute some of the benefits of standardisation.
It will be useful to discuss the specifics of the kind of customisations that people want can which we could support. @adamltyson what kind of requests have you had?
The text was updated successfully, but these errors were encountered: