You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there an efficient implementation of this out there?
We could check out information retrieval / visual search techniques. I came across FAISS, a library for efficient similarity search and clustering of dense vectors, with support for running some of the algorithms on the GPU.
There is a very nice blogpost with more details about FAISS here
We could:
compute embeddings for regularly-spaced frames using DINOv2 or similar (in this blog post they suggest SIFT features)
I found a notebook that may be useful, doing something very similar to the above for image retrieval.
It would be nice to check:
if this approach is faster than SLEAP / DLC methods (basically is this worth it or is it overkill?)
eventually, it could be cool to check if this has a significant effect on the trained model (e.g., are frames extracted in this way more "informative" and therefore we can do away with less of them?)
Given a video, we would like to be able to extract frames based on their pixel content.
Both DLC and SLEAP provide this option.
I understand the approach is roughly PCA -> k-means clutering -> sampling from those clusters.
Is there an efficient implementation of this out there?
The text was updated successfully, but these errors were encountered: