Rambling video of my process so far

Johnowhitaker · February 15, 2022, 6:01pm

Trying something different under the goal of ‘working in public’, I’ve made a sort of video diary of the first few hours on this challenge that lead to my current entry. It’s up on YouTube: Come along for the ride as I dive into an ML contest (MABe 2 on AICROWD) - YouTube (Skip to 18:00 where I decide to focus on this challenge).

To summarise the approach:

Feed a sequence of frames into a Perceiver model (eg 120 frames @ 10fps) and then have it try to predict N frames ahead. This works as a self-supervised task.
Extract the latent array from the Perceiver and use this to derive our embeddings.
Combine these with some hand-crafted features like body angles, mouse separation etc

Using a latent dimension of 40 for the perceiver, training very briefly and using just the reduced latent array as the representation submitted: 18/0.128. Using JUST the hand-crafted features: 18/0.135. Combining both with a few extra sprinkles: 13/0.173.

I can see that the trick with this contest is going to be packing that 100-dim representation with as many useful values as possible. What ideas do you have? Any other interesting SSL approaches? And has anyone had luck using the demo tasks to create useful embeddings? Let’s brainstorm

jennifer_j_sun · February 19, 2022, 4:20am

Thanks for sharing your approach!! It looks interesting.

We actually explored something similar, using combinations of heuristics with representation learning in our task programming paper at CVPR last year: https://arxiv.org/pdf/2011.13917.pdf. Our code is available here if you are interested (including trajectory reconstruction and contrastive loss): https://github.com/neuroethology/TREBA.

Other code bases that might be interesting includes VAME, which learns representations using an encoder-decoder setup: https://github.com/LINCellularNeuroscience/VAME. Another example is from Ben Wild (winner of our challenge last year): https://github.com/nebw/mabe. He used contrastive predictive encoding: https://arxiv.org/pdf/1807.03748.pdf.

Overall, there’s lots of different ideas in self-supervised learning, and here is a great blog post: Self-Supervised Representation Learning. A lot of work in this area is on visual data, so would be interesting to study how things translate to behavior!