Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement QC framework #10

Open
bruno-f-cruz opened this issue Nov 26, 2024 · 0 comments
Open

Implement QC framework #10

bruno-f-cruz opened this issue Nov 26, 2024 · 0 comments

Comments

@bruno-f-cruz
Copy link
Collaborator

bruno-f-cruz commented Nov 26, 2024

Following https://github.com/AllenNeuralDynamics/aind-file-standards it should now be possible to write simple QA/QC code to validate individual data streams.

I have checked panderas and gx frameworks but they all seem tailored to work on top of data frame data. In search of a more agnostic testing framework, I am leaning towards simply using unnitest with fixtures. This has a few advantages

  • Very easy to deploy
  • Command line interface (one could simply pass a definition of a dataset / path to a dataset)
  • Rendered reports
  • Test organization that may allow mapping to aind-data-schemas (e.g. 1 test -> 1 qc field)

For now, the obvious candidates for this test is the harp and video data. But this could potentially be used for other datastreams.

This should give rise to a new "qc" module in this library.
In terms of API, there are two interesting ways to go about this:

  1. Each datastream knows how to validate itself by implementing a collection of validators. This can be potentially very interesting because it would fit very neatly into the current inheritance pattern of datastreams. E.g. A given event can inherit from a CsvStream and run validators at the level of the parent as well as the child.

  2. Go with a purely functional architecture and simply code validators that can be called on top of the datastream during the pyttest routine. This greatly simplifies the life of new contributors but may make things potentially hard to manage and use in other contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant