Implement QC framework #10

bruno-f-cruz · 2024-11-26T17:11:42Z

Following https://github.com/AllenNeuralDynamics/aind-file-standards it should now be possible to write simple QA/QC code to validate individual data streams.

I have checked panderas and gx frameworks but they all seem tailored to work on top of data frame data. In search of a more agnostic testing framework, I am leaning towards simply using unnitest with fixtures. This has a few advantages

Very easy to deploy
Command line interface (one could simply pass a definition of a dataset / path to a dataset)
Rendered reports
Test organization that may allow mapping to aind-data-schemas (e.g. 1 test -> 1 qc field)

For now, the obvious candidates for this test is the harp and video data. But this could potentially be used for other datastreams.

This should give rise to a new "qc" module in this library.
In terms of API, there are two interesting ways to go about this:

Each datastream knows how to validate itself by implementing a collection of validators. This can be potentially very interesting because it would fit very neatly into the current inheritance pattern of datastreams. E.g. A given event can inherit from a CsvStream and run validators at the level of the parent as well as the child.
Go with a purely functional architecture and simply code validators that can be called on top of the datastream during the pyttest routine. This greatly simplifies the life of new contributors but may make things potentially hard to manage and use in other contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement QC framework #10

Implement QC framework #10

bruno-f-cruz commented Nov 26, 2024 •

edited

Loading

Implement QC framework #10

Implement QC framework #10

Comments

bruno-f-cruz commented Nov 26, 2024 • edited Loading

bruno-f-cruz commented Nov 26, 2024 •

edited

Loading