[Round-1 Winners] Here Are The Winners For Round-1

:wave:t3: Hello AIcrowd,

As Round-1 of Multi-Agent Behavior Challenge 2022 comes to an end, let’s shine a spotlight on the winners of the Mouse Triplet Task and Fruit Fly Task.

Round-1 saw 224 participants making 700+ submissions. We thank you all for your participation. Here are the winners of Round-1 of Multi-Agent Behavior Challenge 2022. :clap:

:honeybee: Fruit Fly Winners

:trophy: Leaderboard Ranks

:1st_place_medal: Zac Partridge
:2nd_place_medal: Team DIG111
:3rd_place_medal: Param Uttarwar

:mouse: Mouse Triplet Winners

:trophy: Leaderboard Ranks

:1st_place_medal: Param Uttarwar
:2nd_place_medal: Team Jerry Mouse
:3rd_place_medal: Zac Partridge

:notebook: Winning Solutions

@edhayes1 from :2nd_place_medal: Team Jerry Mouse shared his solution with the participants. Click here to learn from his solution that won second place for :mouse: Mouse Triplet Task.

@Zac the :1st_place_medal: first-place winner for :honeybee: Fruit Fly Task shares the casual overview of his approach over here:

"I used a Bert style encoder, treating handcrafted features as the tokens and performing masked language modelling. Initially, every frame of key points is converted into a large number of handcrafted features representing angles, distances and velocities between body parts within an animal as well as features from each animal to all others. For the flies, I had 2222 features and 456 for the mice (but would use more in the future, especially for the mice) and these features are all normalised.

I’ll refer to the neural network as having a head, body and tail where the head and tail act on the single frame level and the body acts on the sequence level. The head is a two-layer fully connected network that reduces the input features down to the target dimension size (256 or 128). The body does the “language modelling” part - input is 512 partially masked tokens (masked before the head) and output is a sequence of the same shape (batch_size, seq_length, 128 or 256) but now hopefully includes higher-level sequence features. I used huggingface’s perceiver model for this - Perceiver.

The tail is a single linear layer and can be thought of in two parts, original unmasked features reconstruction and predicting any known labels so for example that would be of size 458 for the mice. The loss function I was using was mean square error for reconstructing the features and cross-entropy loss for the (non-nan) labels where the mean square error loss was weighted approximately 10 times more." Stayed tuned for a more in-depth breakdown of his solution.

:hourglass_flowing_sand: What Next? Participate In Round-2

In the new round, you’ll be given two sets of raw overhead videos and tracking data. As in Round 1, we ask you to submit a frame-by-frame representation of the dataset. We hope this video dataset will inspire you to try new ideas and see how much incorporation of information from video improves your ability to represent animal behaviors! Explore the sub-tasks to know more :point_down:

:mouse: Mouse Triplet Video Data
:ant: Ant & Beetle Video Data

The end goal remains the same, create a representation that captures behavior and generalizes well in any downstream task.

:trophy: $6000 USD cash prize pool for the two subtasks
:spiral_calendar: Round-2 runs till May 20th, 2022, 23:59 UTC
:medal_sports: Claim up to $400 AWS credits in Round-2
:writing_hand: Submit Your Solution to CVPR Workshop 2022