New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

New `ingestion_schemas` + MANY minor fixes and improvements #438

Merged

jkbhagatio merged 49 commits into SainsburyWellcomeCentre:datajoint_pipeline from ttngu207:datajoint_pipeline

Nov 6, 2024

Contributor

ttngu207 commented Oct 18, 2024 •

edited

Loading

This PR includes a few major new logic/features and many minor fixes (mostly for BlockAnalysis)

Use ingestion_schemas - i.e. a separate set of schemas for DJ ingestion (specialized Encoder reader and Video reader)
- Encoder reader with default downsampling to 50Hz
- Video reader with only the hw_timestamp column
Update streams.py DJ schema accordingly (mainly to drop extra columns when reading Video data)
Add special handling to ingest fullpose data for social0.2
Add a new tracking table: BlobPosition to read and store blob position tracking when SLEAP data is not available
Fixes & improvements for BlockAnalysis
- Bugfix incorrect extraction of subject_in_patch times
- fetch_stream function rounds times to microseconds (mysql precision is to microseconds only)
- BlockAnalysis use BlobPosition when SLEAPTracking is not available
- improve logic to search for chunks in a given block
- BlockDetection - when double 0s are found, use the first 0s (instead of the 2nd one)
- Add BlockForaging computed table

Fix #427
Replace #437

ttngu207 and others added 30 commits

August 8, 2024 08:52


          feat: add one-off logic to ingest fullpose data for social02

833f9e9


          Merge branch 'datajoint_pipeline' into dev_fullpose_for_social02

0558cbb


          Create reingest_fullpose_sleap_data.py

e9c7fa2


          Allow reading model metadata from local folder

38b44e0


          Avoid iterating over None

028ffc5


          Avoid iterating over the config file twice

25b7195


          Avoid mixing dtypes with conditional assignment

f77ac1d


          Remove whitespace on blank line

ac2aa13


          Use replace function instead of explicit loop

caf3ce1


          Improve error logic when model metadata not found

93428c8


          Test loading poses with local model metadata

00c1cca


          Use all components other than time and device name

6b32583


          Add regression test for poses with register prefix

010fdb9


          Infer base prefix from stream search pattern

0a88b79


          Use full identity likelihood vectors in test data

f925d75


          Merge pull request SainsburyWellcomeCentre#421 from SainsburyWellcome…

83cd905

…Centre/gl-issue-418

Allow reading pose model metadata from local folder


          Update worker.py

36ee97a


          new readers and schemas for reduced data storage in db

b54e1c3


          updated tests

d6cf52f


          cleaned up linting for ruff

f12e359


          updated pandas and changed S to s lmao

daf6224


          chore: code cleanup

6d798b8


          Merge remote-tracking branch 'upstream/ingestion_readers_schemas' int…

ea3c2ef

…o datajoint_pipeline


          chore: delete the obsolete dataset (replaced by schemas)

697c0a8


          chore: clean up load_metadata

2ef32c3


          feat(ingestion): use new ingestion_schemas

d5bd0fe


          feat(streams): update streams with new ingestion_schemas

8725e8f


          fix(ingestion_schemas): downsampling Encoder

0f210e1


          fix(ingestion_schemas): minor fix in _Encoder, calling super() init

d365bcd


          fix(harp reader): remove rows where the index is zero

cb90843

why? corrupted data in harp files? not sure

ttngu207 added 5 commits

October 21, 2024 16:08


          Update reingest_fullpose_sleap_data.py

3e59db8


          Update reingest_fullpose_sleap_data.py

64900ad


          feat(tracking): add BlobPositionTracking

538e4e5


          fix(block_analysis): various fixes and code improvements

290fe4e


          fix: improve logic to search for chunks in a given block

fb18016

ttngu207 changed the title ~~Datajoint pipeline~~ New ingestion_schemas + MANY minor fixes and improvements

ttngu207 marked this pull request as ready for review

October 22, 2024 18:56

ttngu207 added 3 commits

October 23, 2024 14:23


          feat(script): add script sync_ingested_and_raw_epochs

8f2fffc


          fix(sync_ingested_and_raw_epochs): minor code cleanup

8762fcf


          fix(BlockSubjectAnalysis): handle edge case where the encoder data ar…

…e inconsistent across patches

lochhh reviewed

View reviewed changes

aeon/dj_pipeline/__init__.py Outdated

+                  Args:
+                      query (datajoint.Query): A query object containing data from a Stream table
+                      drop_pk (bool, optional): Drop primary key columns. Defaults to True.
+                      round_microseconds (bool, optional): Round timestamps to microseconds. Defaults to False.

Contributor

lochhh Oct 30, 2024

Defaults to True?

Member

jkbhagatio Oct 30, 2024

For me this is ok?

Contributor

lochhh Oct 30, 2024

Mismatch is in the description of round_microseconds which says "Defaults to False"

Contributor Author

ttngu207 Oct 30, 2024

fixed

jkbhagatio reviewed

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py Outdated

                       # Patch data - TriggerPellet, DepletionState, Encoder (distancetravelled)
                       # For wheel data, downsample to 10Hz
-                      final_encoder_fs = 10
+                      final_encoder_hz = 10

Member

jkbhagatio Oct 30, 2024

I think we actually want this to be 50 hz, not 10 hz

Contributor Author

ttngu207 Oct 30, 2024

The Encoder data is ingested at 50Hz, this is at the BlockAnalysis step, do we also want 50Hz here (or further downsample to 10?)

Member

jkbhagatio Nov 5, 2024

yes we want 50 hz here too, thanks!

jkbhagatio reviewed

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py

+                          if encoder_df.empty:
+                              encoder_df["distance_travelled"] = 0
+                          else:
+                              encoder_df["distance_travelled"] = -1 * analysis_utils.distancetravelled(encoder_df.angle)

Member

jkbhagatio Oct 30, 2024

maybe add a comment saying something like -1 is for placement of magnetic encoder, where wheel movement actually decreases encoder value?

Contributor Author

ttngu207 Oct 30, 2024

done

jkbhagatio reviewed

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py

+                              patch_rate = depletion_state_df.rate.iloc[0]
+                              patch_offset = depletion_state_df.offset.iloc[0]
+                              # handles patch rate value being INF
+                              patch_rate = 999999999 if np.isinf(patch_rate) else patch_rate

Member

jkbhagatio Oct 30, 2024

is it actually an issue if patch rate is inf? Does it cause some downstream issue? We do this as default when no env is loaded.

Contributor Author

ttngu207 Oct 30, 2024

Yes, it's due to MySQL float doesn't handle INF well - we can convert to NaN but that would lose the intended meaning of INF here

jkbhagatio reviewed

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py

@@ @@ -288,27 +299,50 @@ def make(self, key): @@
                           & f'chunk_start <= "{chunk_keys[-1]["chunk_start"]}"'
                       )[:block_start]
                       subject_visits_df = subject_visits_df[subject_visits_df.region == "Environment"]
+                      subject_visits_df = subject_visits_df[~subject_visits_df.id.str.contains("Test", case=False)]

Member

jkbhagatio Oct 30, 2024

sometimes we use other, non "Test" subjects as test subjects. Maybe the check should be, if the subject does not begin with 'baa' (can str.lower to check for regardless of case) ?

Contributor Author

ttngu207 Oct 30, 2024

I see. Is startswith("baa") reliable, future-proof? Perhaps better logic is to also cross check with the Subjects to be manually specified by users for a particular experiment

Member

jkbhagatio Nov 5, 2024

Yes, your suggestion would be a better check.

jkbhagatio reviewed

View reviewed changes

aeon/dj_pipeline/analysis/block_analysis.py Outdated

+                              )
+                              pos_df = fetch_stream(pos_query)[block_start:block_end]
+                              pos_df["likelihood"] = np.nan
+                              # keep only rows with area between 0 and 1000

Member

jkbhagatio Oct 30, 2024

is this because areas of > 1000 is likely an experimenter, or some other artifact? Maybe specify that in the comment?

jkbhagatio reviewed

View reviewed changes

aeon/io/reader.py

+                          data = pd.DataFrame(payload, index=seconds, columns=self.columns)
+                      # remove rows where the index is zero (why? corrupted data in harp files?)
+                      data = data[data.index != 0]

Member

jkbhagatio Oct 30, 2024

This I guess will be fixed in the next PR targeting a new Encoder reader in ingestion_schemas.py ?

jkbhagatio reviewed

View reviewed changes

aeon/schema/social_02.py

		@@ -48,6 +48,12 @@ def __init__(self, path):
		super().__init__(_reader.Pose(f"{path}_test-node1*"))


		class Pose03(Stream):

Member

jkbhagatio Oct 30, 2024

made add a comment that this is necessary due to changing registers for the pose streams for social02 in particular? And that 03 corresponds to the fact that this is because this pattern is what we're going with for social03 and moving forward? Or call this class something else?

Member

jkbhagatio Oct 30, 2024

Or just import Pose from social_03.py as Pose03? I think I like something like this actually

Contributor Author

ttngu207 Oct 30, 2024

This is primarily due to full-pose data for social0.2 has a different pattern than that of the non-fullpose SLEAP data.
I don't think you can do from social_03 import Pose as Pose03, the way the the Devices/Streams being instantiated is that it's using the class name, so aliasing wouldn't work.

Member

jkbhagatio Nov 5, 2024

ah I see, then just some documentation is ok

jkbhagatio reviewed

View reviewed changes

Member

jkbhagatio left a comment

I've left a bunch of small comments here.

A general point - maybe add top-level module docstrings for all these modules? Unless this will be done by @MilagrosMarin in #443 ?

Contributor Author

ttngu207 commented Oct 30, 2024

I've left a bunch of small comments here.

A general point - maybe add top-level module docstrings for all these modules? Unless this will be done by @MilagrosMarin in #443 ?

Yes, this will be address when we apply Ruff checks and fixes

ttngu207 added 2 commits

October 30, 2024 18:10


          chore: minor fixes to address PR review comments

ebecb00


          fix: address PR comments

b0952eb

jkbhagatio mentioned this pull request

Subject:experiment mapping should be specified in pyrat and added to Experiment.Subject table #445

Closed

jkbhagatio merged commit c2e90b6 into SainsburyWellcomeCentre:datajoint_pipeline

jkbhagatio mentioned this pull request

new readers and schemas for reduced data storage in db #437

Closed

This was referenced Nov 14, 2024

Round all wheel values down to nearest 0.1cm in block analysis tables on db #426

Closed

Load blob tracking data for animal position when missing SLEAP data #400

Closed

Optimization of database storage #396

Closed

Block Analysis Todos #395

Closed

Review block_ends in block_analysis.Block #368

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet