Avoid expensive processing operations when loading dense pose data #424

glopesdev · 2024-10-02T09:18:14Z

While working on #421 I noticed we are currently doing a lot of manipulation and reorganization of the data at the level of the pose reader, specifically running apply over the entire data frame to collapse columns into a single dictionary:

aeon_mecha/aeon/io/reader.py

Lines 364 to 374 in 7812b4f

    
           if bonsai_sleap_v == BONSAI_SLEAP_V3: 
        
               # combine all identity_likelihood cols into a single col as dict 
        
               part_data["identity_likelihood"] = part_data.apply( 
        
                   lambda row: {identity: row[f"{identity}_likelihood"] for identity in identities}, axis=1 
        
               ) 
        
               part_data.drop(columns=columns[1 : (len(identities) + 1)], inplace=True) 
        
               part_data = part_data[  # reorder columns 
        
                   ["identity", "identity_likelihood", f"{part}_x", f"{part}_y", f"{part}_likelihood"] 
        
               ] 
        
           part_data.insert(2, "part", part) 
        
           part_data.columns = new_columns

This will unnecessarily slow down the parsing of the raw data (both the use of apply and allocation of many dictionaries), especially over long time intervals. Is this really necessary? I think the philosophy for readers should be as much as possible to simply load the raw data as-is. This code also scales poorly as performance will degrade the faster we run our cameras and the more possible identities we have.

We could add a flag similar to downsample for encoder data if we really wanted to preserve backwards-compatibility, but I feel that here we should rather do this dictionary transformation post-hoc in a utility function.

By doing this post-hoc we get the added benefit that we only run the coalescing code over the final "cropped" data in case we are reading in a time range.

The text was updated successfully, but these errors were encountered:

glopesdev · 2024-10-02T12:54:57Z

Some relevant discussions:

glopesdev added the proposal Request for a new feature label Oct 2, 2024

glopesdev assigned glopesdev, jkbhagatio, lochhh and ttngu207 Oct 2, 2024

jkbhagatio mentioned this issue Nov 14, 2024

Optimize pose reader #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid expensive processing operations when loading dense pose data #424

Avoid expensive processing operations when loading dense pose data #424

glopesdev commented Oct 2, 2024 •

edited

Loading

glopesdev commented Oct 2, 2024

Avoid expensive processing operations when loading dense pose data #424

Avoid expensive processing operations when loading dense pose data #424

Comments

glopesdev commented Oct 2, 2024 • edited Loading

glopesdev commented Oct 2, 2024

glopesdev commented Oct 2, 2024 •

edited

Loading