Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure df columns #66

Merged
merged 23 commits into from
Mar 29, 2023
Merged

Restructure df columns #66

merged 23 commits into from
Mar 29, 2023

Conversation

sfmig
Copy link
Collaborator

@sfmig sfmig commented Mar 20, 2023

Restructures the dataframe imported from DLC, to make a dataframe with a single level. Adds ROI per frame and bodypart (and individual if the data is from a multianimal project), and event tags per frame.

Columns in exported dataframe:

'model_str' 'video_file' 'individual'* 'frame' 'bodypart' 'x' 'y' 'likelihood' 'ROI_tag' 'event_tag'

*if multi-animal

Features

  • the user can define the hierarchy of ROIs to handle cases in which a point is inside two (or more) ROIs. If not defined, the ROI with smallest area prevails.
  • the user can define a buffer around the ROI boundaries if desired. This was inspired by this SO post, but not sure if it would be very used.
  • the event_tag column holds an empty string for the frames with no tag defined, and the event_tag string if the frame was labelled.
  • a point is inside an ROI if its x,y coordinates for one frame are inside or at the boundary of the ROI.

mypy errors

I had some errors with mypy and pandas, I fixed some but others I couldn't figure out why... I marked those with # type: ignore for now. I didn't have errors when running mypy directly, but they did show up with the precommit hooks

This PR addresses issues:

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2023

Codecov Report

Merging #66 (e8c14c9) into main (b835e75) will decrease coverage by 2.00%.
The diff coverage is 8.19%.

@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
- Coverage   35.27%   33.27%   -2.00%     
==========================================
  Files          12       12              
  Lines         533      574      +41     
==========================================
+ Hits          188      191       +3     
- Misses        345      383      +38     
Impacted Files Coverage Δ
wazp/callbacks/dashboard.py 27.77% <0.00%> (-0.61%) ⬇️
wazp/callbacks/roi.py 16.86% <0.00%> (ø)
wazp/utils.py 23.07% <8.47%> (-5.25%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sfmig sfmig marked this pull request as ready for review March 21, 2023 12:27
@sfmig sfmig requested review from samcunliffe and niksirbi March 21, 2023 12:27
@niksirbi
Copy link
Member

mypy errors

Probably, unrelated to your errors, but I get

wazp/utils.py:8: error: Skipping analyzing "shapely": module is installed, but missing library stubs or py.typed marker  [import]
wazp/utils.py:10: error: Skipping analyzing "shapely.geometry": module is installed, but missing library stubs or py.typed marker  [import]

when running mypy -p wazp locally. This goes away if I add the --ignore-missing-imports flag, which is by default enabled when mypy runs as part of the CI.

Anyhow, to be on the safe side, we should add shapely.* to [[tool.mypy.overrides]] in the pyproject.toml, as we have done for other libraries with missing type hints.

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sfmig, good job on untangling that mess of a dataframe into something more manageable. I like the long format.

Besides the small comments I left, I also stumbled on a more fundamental issue.
The dataframe export works, but the "ROI_tag" column consists entirely of empty strings, even though ROIs are defined for one of the exported videos (in jwaspE_nectar-open-close_control.metadata.yaml).

Based on my reading of the add_ROIs_to_video_dataframe function, the ROI_tags for this video should have been included in the exported dataframe. Do you also see the same on your end? If yes, something is wrong in the above function (though I cannot quite figure out what).

wazp/callbacks/dashboard.py Outdated Show resolved Hide resolved
wazp/callbacks/dashboard.py Outdated Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Show resolved Hide resolved
Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool.

I had a play and it does what you want, I think.

In [4]: df.head()
Out[4]: 
                                           model_str                                video_file  frame  ... likelihood  ROI_tag         event_tag
0  DLC_resnet50_jwasp_femaleandmaleSep12shuffle1_...  jwaspI_nectar-open-close_wateronly_s.avi      0  ...   0.999943           experiment_start
1  DLC_resnet50_jwasp_femaleandmaleSep12shuffle1_...  jwaspI_nectar-open-close_wateronly_s.avi      1  ...   0.999911                           
2  DLC_resnet50_jwasp_femaleandmaleSep12shuffle1_...  jwaspI_nectar-open-close_wateronly_s.avi      2  ...   0.999832                           
3  DLC_resnet50_jwasp_femaleandmaleSep12shuffle1_...  jwaspI_nectar-open-close_wateronly_s.avi      3  ...   0.999816                           
4  DLC_resnet50_jwasp_femaleandmaleSep12shuffle1_...  jwaspI_nectar-open-close_wateronly_s.avi      4  ...   0.999826                           

[5 rows x 9 columns]

In [5]: df.columns
Out[5]: 
Index(['model_str', 'video_file', 'frame', 'bodypart', 'x', 'y', 'likelihood',
       'ROI_tag', 'event_tag'],
      dtype='object', name='')

.pre-commit-config.yaml Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
wazp/callbacks/dashboard.py Outdated Show resolved Hide resolved
wazp/callbacks/dashboard.py Outdated Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Show resolved Hide resolved
wazp/utils.py Outdated Show resolved Hide resolved
wazp/utils.py Show resolved Hide resolved
@sfmig
Copy link
Collaborator Author

sfmig commented Mar 28, 2023

hey @niksirbi!
I think the issue with the empty ROI tags may be because I didn't push the pose data for jwaspE_nectar-open-close_control.avi. This was to test the warning and unselect functionality when pose data is not available. If the pose data is missing, it is unselected before the export and is not actually be exported (so only the ROIs for the other videos will be present, which are all empty).

I pushed the pose data for jwaspE_nectar-open-close_control.avi now, hopefully it should fix it (let me know if it doesn't).

To quickly check it works as intended, you can run the following on the exported dataframe:
df.loc[df['video_file']=='jwaspE_nectar-open-close_control.avi','ROI_tag'].unique()
which returns:
>> array(['enclosure', '', 'tube', 'right_box'], dtype=object)

Maybe this is a symptom that we should clarify this a bit more....let me know any thoughts 🤔

@sfmig sfmig merged commit cf601c0 into main Mar 29, 2023
@niksirbi
Copy link
Member

I think the issue with the empty ROI tags may be because I didn't push the pose data for jwaspE_nectar-open-close_control.avi. This was to test the warning and unselect functionality when pose data is not available. If the pose data is missing, it is unselected before the export and is not actually be exported (so only the ROIs for the other videos will be present, which are all empty).

That makes complete sense. I tested it again now and it works!

Maybe this is a symptom that we should clarify this a bit more....let me know any thoughts

I think it's just a symptom of not having solved the data hosting problem properly, as per #33 .

@niksirbi niksirbi deleted the restructure-df-columns branch May 18, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants