Implement `compute_speed` and `compute_path_length` #280

niksirbi · 2024-08-23T16:14:14Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

What does this PR do?

It introduces two new functions in the kinematics module:

compute_speed: quite straightforward to implement, it just returns the norm of velocity, using existing movement functions
compute_path_length: in theory also straightforward, as it should just be the sum of the norms of displacement vectors, but I spent a lot of time debating how to handle missing values. See extensive discussion on zulip.

In the end I opted for providing two alternative nan policies, and emitting a warning if too many values are missing. Expand the dropdown below for more details.

compute_path_length() function signatures and docstring

def compute_path_length(
    data: xr.DataArray,
    start: float | None = None,
    stop: float | None = None,
    nan_policy: Literal["drop", "scale"] = "drop",
    nan_warn_threshold: float = 0.2,
) -> xr.DataArray:
    """Compute the length of a path travelled between two time points.

    The path length is defined as the sum of the norms (magnitudes) of the
    displacement vectors between two time points ``start`` and ``stop``,
    which should be provided in the time units of the data array.
    If not specified, the minimum and maximum time coordinates of the data
    array are used as start and stop times, respectively.

    Parameters
    ----------
    data : xarray.DataArray
        The input data containing position information in Cartesian
        coordinates, with ``time`` and ``space`` among the dimensions.
    start : float, optional
        The time to consider as the path's starting point. If None (default),
        the minimum time coordinate in the data is used.
    stop : float, optional
        The time to consider as the path's end point. If None (default),
        the maximum time coordinate in the data is used.
    nan_policy : Literal["drop", "scale"], optional
        Policy to handle NaN (missing) values. Can be one of the ``"drop"``
        or ``"scale"``. Defaults to ``"drop"``. See Notes for more details
        on the two policies.
    nan_warn_threshold : float, optional
        If more than this proportion of values are missing in any point track,
        a warning will be emitted. Defaults to 0.2 (20%).

    Returns
    -------
    xarray.DataArray
        An xarray DataArray containing the computed path length.
        Will have the same dimensions as the input data, except for ``time``
        and ``space`` which will be removed.

    Notes
    -----
    Choosing ``nan_policy="drop"`` will drop NaN values from each point track
    before computing path length. This equates to assuming that a track
    follows a straight line between two valid points flanking a missing
    segment. Missing segments at the beginning or end of the specified
    time range are not counted. This approach tends to underestimate
    the path length, and the error increases with the number of missing
    values.

    Choosing ``nan_policy="scale"`` will adjust the path length based on the
    the proportion of valid segments per point track. For example, if only
    80% of segments are present, the path length will be computed based on
    these and the result will be divided by 0.8. This approach assumes
    that motion dynamics are similar across present and missing time
    segments, which may not be the case.

    """

I'm in two minds as to whether we need the start and stop arguments, as the user could easily select a time range using ds.position.sel(time=slice(0, 10)), before passing the data array to compute_path_length(). I've chosen to include them for now, but I'm open to counter-arguments, CC @b-peri.

References

Closes #147

How has this PR been tested?

For speed, I used the existing kinematics tests (added "speed" as a parameter) with a bit of modification needed.

Testing for the path length was harder because:

this variable lacks both time and space dimensions, so modifying existing kinematics tests to account for it would b complicated
With the two aforementioned nan_policies, there are several tricky edge cases, and I tried to cover them all. The uniform linear motion fixture has been instrumental in helping me implement , debug and test these nan policies.
I also had to test that the nan warning is emitted in the right scenarios and contains specific messages.

As a result I ended up adding 3 new unit tests to account for all of the above, but suggestion to streamline the tests are welcome.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

API docs should be automatically updated. We might consider mentioning the new functions in some of the examples in the future, but not necessary right now.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

sonarcloud · 2024-08-23T16:14:40Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov · 2024-08-23T16:17:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.78%. Comparing base (b399ce0) to head (e09f42d).
Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #280   +/-   ##
=======================================
  Coverage   99.77%   99.78%           
=======================================
  Files          15       15           
  Lines         909      947   +38     
=======================================
+ Hits          907      945   +38     
  Misses          2        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lochhh

Thanks @niksirbi !
To summarise:

a few suggestions for wordings, which you're free to take or leave as usual
start, stop checks: I'm wondering if we should explicitly check these for slicing. While the checks are much more restrictive than xarray, do we need to do so? xarray, for instance, automatically constraints the slicing to the min, max, regardless of whether the provided start, stop values are within or out of range. if we drop our own checks, the relevant tests can then be much simpler
simplify _compute_path_length_drop_nan: hacking with ffill and let xarray's groupby take care of stacking - needs further testing?
simplify tests (reduce parametrisation, test only id_1 values in nan test)
resolve merge conflicts with main branch

movement/analysis/kinematics.py

tests/conftest.py

tests/test_unit/test_kinematics.py

lochhh · 2024-10-28T17:41:26Z

tests/test_unit/test_kinematics.py

+@pytest.mark.parametrize(
+    "nan_policy",
+    ["drop", "scale"],  # results should be same for both here
+)


we shouldn't need to test the nan policy here because test_path_length_with_nans() sufficiently covers this.

lochhh · 2024-10-28T17:55:51Z

tests/test_unit/test_kinematics.py

+                # 9 segments - 1 missing on edge
+                "centroid": np.sqrt(2) * 8,
+                # missing mid frames should have no effect
+                "left": np.sqrt(2) * 9,
+                "right": np.nan,  # all frames missing


while this is very clear, we could just have np.array([np.sqrt(2) * 8, np.sqrt(2) * 9, np.nan])

lochhh · 2024-10-28T17:57:18Z

tests/test_unit/test_kinematics.py

+        expected_array = xr.DataArray(
+            np.ones((2, 3)) * np.sqrt(2) * 9,
+            dims=["individuals", "keypoints"],
+            coords={
+                "individuals": position.coords["individuals"],
+                "keypoints": position.coords["keypoints"],
+            },
+        )
+        # insert expected path lengths for individual id_1
+        for kpt, value in expected_path_lengths_id_1.items():
+            target_loc = {"individuals": "id_1", "keypoints": kpt}
+            expected_array.loc[target_loc] = value
+        xr.testing.assert_allclose(path_length, expected_array)


following the above suggestion for using numpy array, we can directly get the path length values of id_1 and compare the numpy arrays ,without comparing the full data array.

Co-authored-by: Chang Huan Lo <[email protected]>

sonarcloud · 2024-10-31T19:55:08Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

niksirbi force-pushed the speed-and-distance-traveled branch from ebae6af to 129e8c1 Compare October 8, 2024 13:09

niksirbi changed the title ~~WIP: implement compute_speed and compute_distance_traveled~~ Implement compute_speed and compute_path_length Oct 8, 2024

niksirbi marked this pull request as ready for review October 10, 2024 16:35

niksirbi requested a review from lochhh October 10, 2024 16:35

niksirbi mentioned this pull request Oct 18, 2024

Split kinematics module into multiple files. #327

Open

niksirbi added 6 commits October 22, 2024 12:36

implement compute_speed and compute_path_length functions

58da185

added speed to existing kinematics unit test

8e878e6

rewrote compute_path_length with various nan policies

74e39be

unit test compute_path_length across time ranges

8d623ed

fixed and refactor compute_path_length and its tests

9f65a96

fixed docstring for compute_path_length

d089a9c

niksirbi force-pushed the speed-and-distance-traveled branch from 92e91f0 to d089a9c Compare October 22, 2024 11:37

lochhh requested changes Oct 28, 2024

View reviewed changes

niksirbi and others added 7 commits October 31, 2024 09:55

Accept suggestion on docstring wording

55220f2

Co-authored-by: Chang Huan Lo <[email protected]>

Remove print statement from test

74c7853

Co-authored-by: Chang Huan Lo <[email protected]>

Ensure nan report is printed

4cb449b

Co-authored-by: Chang Huan Lo <[email protected]>

adapt warning message match in test

40b1d6c

change 'any' to 'all'

4d0f69f

uniform wording across path length docstrings

9fb0b26

(mostly) leave time range validation to xarray slice

e09f42d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `compute_speed` and `compute_path_length` #280

Implement `compute_speed` and `compute_path_length` #280

niksirbi commented Aug 23, 2024 •

edited

Loading

sonarcloud bot commented Aug 23, 2024

codecov bot commented Aug 23, 2024 •

edited

Loading

lochhh left a comment •

edited

Loading

lochhh Oct 28, 2024

niksirbi Oct 31, 2024

lochhh Oct 28, 2024

lochhh Oct 28, 2024

sonarcloud bot commented Oct 31, 2024

Implement compute_speed and compute_path_length #280

Are you sure you want to change the base?

Implement compute_speed and compute_path_length #280

Conversation

niksirbi commented Aug 23, 2024 • edited Loading

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

sonarcloud bot commented Aug 23, 2024

Quality Gate passed

codecov bot commented Aug 23, 2024 • edited Loading

Codecov Report

lochhh left a comment • edited Loading

Choose a reason for hiding this comment

lochhh Oct 28, 2024

Choose a reason for hiding this comment

niksirbi Oct 31, 2024

Choose a reason for hiding this comment

lochhh Oct 28, 2024

Choose a reason for hiding this comment

lochhh Oct 28, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Oct 31, 2024

Quality Gate passed

Implement `compute_speed` and `compute_path_length` #280

Implement `compute_speed` and `compute_path_length` #280

niksirbi commented Aug 23, 2024 •

edited

Loading

codecov bot commented Aug 23, 2024 •

edited

Loading

lochhh left a comment •

edited

Loading