Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid expensive processing operations when loading dense pose data #424

Open
glopesdev opened this issue Oct 2, 2024 · 1 comment
Open
Assignees
Labels
proposal Request for a new feature

Comments

@glopesdev
Copy link
Contributor

glopesdev commented Oct 2, 2024

While working on #421 I noticed we are currently doing a lot of manipulation and reorganization of the data at the level of the pose reader, specifically running apply over the entire data frame to collapse columns into a single dictionary:

if bonsai_sleap_v == BONSAI_SLEAP_V3:
# combine all identity_likelihood cols into a single col as dict
part_data["identity_likelihood"] = part_data.apply(
lambda row: {identity: row[f"{identity}_likelihood"] for identity in identities}, axis=1
)
part_data.drop(columns=columns[1 : (len(identities) + 1)], inplace=True)
part_data = part_data[ # reorder columns
["identity", "identity_likelihood", f"{part}_x", f"{part}_y", f"{part}_likelihood"]
]
part_data.insert(2, "part", part)
part_data.columns = new_columns

This will unnecessarily slow down the parsing of the raw data (both the use of apply and allocation of many dictionaries), especially over long time intervals. Is this really necessary? I think the philosophy for readers should be as much as possible to simply load the raw data as-is. This code also scales poorly as performance will degrade the faster we run our cameras and the more possible identities we have.

We could add a flag similar to downsample for encoder data if we really wanted to preserve backwards-compatibility, but I feel that here we should rather do this dictionary transformation post-hoc in a utility function.

By doing this post-hoc we get the added benefit that we only run the coalescing code over the final "cropped" data in case we are reading in a time range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants