Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute pairwise distances #278

Merged
merged 26 commits into from
Oct 29, 2024
Merged

Compute pairwise distances #278

merged 26 commits into from
Oct 29, 2024

Conversation

lochhh
Copy link
Collaborator

@lochhh lochhh commented Aug 23, 2024

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Closes #277

What does this PR do?

  • Adds functions to compute pairwise distances between elements (e.g. individuals, keypoints)
    • between a single pair of elements (_cdist - open to rename suggestions!)
    • between a single, multiple, or all pair(s) of elements (compute_pairwise_distances)
  • Includes example usage in docstrings
  • Adds tests for the new functions
    • with poses dataset (with single and multiple individuals/keypoints) or da.sel data with coord_val.ndim == 0
    • with bboxes dataset (with single and multiple individuals, but missing keypoints dim)

TO-DO:

  • Handle poses dataset
    • with single individual/keypoint coord_val.ndim == 1
    • from da.sel with coord_val.ndim == 0
    • with multiple individuals/keypoints
  • Handle datasets without "individuals" or "keypoints" dim, e.g. bboxes dataset doesn't have "keypoints" dim
  • Revert de4a96b that works around failing tests due to sleap-io's incompatibility with NumPy 2.0.

References

#277

How has this PR been tested?

Tests passing locally and on CI

Does this PR require an update to the documentation?

Docstrings with examples are provided alongside the new functions.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@lochhh lochhh changed the title Distance metrics Compute pairwise distances Aug 23, 2024
Copy link

codecov bot commented Aug 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.77%. Comparing base (b10896f) to head (d1ca2cd).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #278   +/-   ##
=======================================
  Coverage   99.76%   99.77%           
=======================================
  Files          14       14           
  Lines         846      878   +32     
=======================================
+ Hits          844      876   +32     
  Misses          2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lochhh lochhh force-pushed the distance-metrics branch 5 times, most recently from 44b1e17 to b110485 Compare August 29, 2024 08:12
@lochhh lochhh force-pushed the distance-metrics branch 2 times, most recently from b5f4ca4 to a20d283 Compare September 9, 2024 16:06
@lochhh lochhh marked this pull request as ready for review September 10, 2024 19:49
@lochhh lochhh marked this pull request as draft September 10, 2024 19:50
@lochhh lochhh marked this pull request as ready for review September 10, 2024 22:36
@lochhh lochhh requested a review from a team September 10, 2024 22:36
@lochhh lochhh force-pushed the distance-metrics branch 3 times, most recently from 58583b2 to 4f305de Compare September 17, 2024 14:10
Copy link

sonarcloud bot commented Sep 17, 2024

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work @lochhh!

The implementation is completely functional as is, but I think it's worth discussing the API design here.

I was thinking about this all day today, and it's not an easy problem to wrap one's head around, so as usual I resorted to drawing sketches 😉 .

Let's consider the following example:
image

Our current implementation exposes 3 public functions, and multiple ways of solving the above problem (i.e. getting the 3 distances of interest):

image

It took me a while to figure things out. Your examples were of great help and absolutely necessary, but in general it's not easy to figure out which function you need to use and there are multiple equivalent ways of doing things.

Here's what Id' propose as alternative:

image

We can debate the need to make cdist private or not, but in my eyes it doesn't add any extra functionality or convenience to just having compute_pairwise_distances.
You could argue that one can use cdist to compute distances across different datasets, but I can't foresee a use-case for this (I may be overlooking something).

movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
movement/analysis/kinematics.py Outdated Show resolved Hide resolved
tests/test_unit/test_kinematics.py Outdated Show resolved Hide resolved
@lochhh
Copy link
Collaborator Author

lochhh commented Oct 25, 2024

Thanks again @niksirbi for the usual thorough review! I've adopted your suggestion to remove the interkeypoint/individual wrappers, exposing only compute_pairwise_distances , whilst hiding _cdist - this makes sense as it's unlikely one would want to compute distances using 2 single-keypoint/individual data arrays from different datasets (with _cdist). Other changes include:

  • requiring pairs as argument in compute_pairwise_distances and allowing the special keyword 'all'
  • renaming core_dimslabels_dims to (hopefully) make it clearer that this is the dim to get labels from
  • removing the full list of distance metrics, providing only a couple of examples in _cdist and compute_pairwise_distances docstrings
  • raise ValueError when no pairs are found
  • remove circularity while generating expected pairs in tests

@niksirbi
Copy link
Member

Thanks @lochhh, I will take another look at this next week, with a fresh mind.

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lochhh, my remaining comments are mostly about some choices of phrase in the public docstring.

Other than these, the PR is good to go from my point of view. I'm not sure if @sfmig would like to also have a look before merging.

movement/kinematics.py Outdated Show resolved Hide resolved
movement/kinematics.py Outdated Show resolved Hide resolved
movement/kinematics.py Outdated Show resolved Hide resolved
movement/kinematics.py Outdated Show resolved Hide resolved
movement/kinematics.py Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Oct 29, 2024

@sfmig
Copy link
Contributor

sfmig commented Oct 29, 2024

sorry both for the delay and thanks for pinging me ❤️

I can have a look on Friday, but I know that would delay this work further, so feel free to go ahead! Maybe I can draft an example instead?

@lochhh
Copy link
Collaborator Author

lochhh commented Oct 29, 2024

sorry both for the delay and thanks for pinging me ❤️

I can have a look on Friday, but I know that would delay this work further, so feel free to go ahead! Maybe I can draft an example instead?

That would be perfect!

@lochhh lochhh added this pull request to the merge queue Oct 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 29, 2024
@lochhh lochhh added this pull request to the merge queue Oct 29, 2024
@niksirbi
Copy link
Member

Thanks for the initiative @sfmig.
It's good to keep things moving fast.
Plus, if you discover a flaw with the implementation while writing the example, we can always modify the implementation.

Perhaps the DLC dataset with the two mice would be a good candidate for the example?

Merged via the queue into main with commit 536de0e Oct 29, 2024
27 checks passed
@niksirbi niksirbi deleted the distance-metrics branch October 29, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compute pairwise distances
3 participants