Extract frames to label by content #7

sfmig · 2024-11-28T19:21:56Z

Given a video, we would like to be able to extract frames based on their pixel content.

Both DLC and SLEAP provide this option.

I understand the approach is roughly PCA -> k-means clutering -> sampling from those clusters.

Is there an efficient implementation of this out there?

sfmig · 2024-11-29T10:11:47Z

Is there an efficient implementation of this out there?

We could check out information retrieval / visual search techniques. I came across FAISS, a library for efficient similarity search and clustering of dense vectors, with support for running some of the algorithms on the GPU.

There is a very nice blogpost with more details about FAISS here

We could:

compute embeddings for regularly-spaced frames using DINOv2 or similar (in this blog post they suggest SIFT features)
then k-means cluster these image embeddings using FAISS
finally, sample a few images from each cluster.

I found a notebook that may be useful, doing something very similar to the above for image retrieval.

It would be nice to check:

if this approach is faster than SLEAP / DLC methods (basically is this worth it or is it overkill?)
eventually, it could be cool to check if this has a significant effect on the trained model (e.g., are frames extracted in this way more "informative" and therefore we can do away with less of them?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract frames to label by content #7

Extract frames to label by content #7

sfmig commented Nov 28, 2024 •

edited

Loading

sfmig commented Nov 29, 2024

Extract frames to label by content #7

Extract frames to label by content #7

Comments

sfmig commented Nov 28, 2024 • edited Loading

sfmig commented Nov 29, 2024

sfmig commented Nov 28, 2024 •

edited

Loading