You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked panderas and gx frameworks but they all seem tailored to work on top of data frame data. In search of a more agnostic testing framework, I am leaning towards simply using unnitest with fixtures. This has a few advantages
Very easy to deploy
Command line interface (one could simply pass a definition of a dataset / path to a dataset)
Rendered reports
Test organization that may allow mapping to aind-data-schemas (e.g. 1 test -> 1 qc field)
For now, the obvious candidates for this test is the harp and video data. But this could potentially be used for other datastreams.
This should give rise to a new "qc" module in this library.
In terms of API, there are two interesting ways to go about this:
Each datastream knows how to validate itself by implementing a collection of validators. This can be potentially very interesting because it would fit very neatly into the current inheritance pattern of datastreams. E.g. A given event can inherit from a CsvStream and run validators at the level of the parent as well as the child.
Go with a purely functional architecture and simply code validators that can be called on top of the datastream during the pyttest routine. This greatly simplifies the life of new contributors but may make things potentially hard to manage and use in other contexts.
The text was updated successfully, but these errors were encountered:
Following https://github.com/AllenNeuralDynamics/aind-file-standards it should now be possible to write simple QA/QC code to validate individual data streams.
I have checked panderas and gx frameworks but they all seem tailored to work on top of data frame data. In search of a more agnostic testing framework, I am leaning towards simply using unnitest with fixtures. This has a few advantages
For now, the obvious candidates for this test is the harp and video data. But this could potentially be used for other datastreams.
This should give rise to a new "qc" module in this library.
In terms of API, there are two interesting ways to go about this:
Each datastream knows how to validate itself by implementing a collection of validators. This can be potentially very interesting because it would fit very neatly into the current inheritance pattern of datastreams. E.g. A given event can inherit from a CsvStream and run validators at the level of the parent as well as the child.
Go with a purely functional architecture and simply code validators that can be called on top of the datastream during the pyttest routine. This greatly simplifies the life of new contributors but may make things potentially hard to manage and use in other contexts.
The text was updated successfully, but these errors were encountered: