Final Evaluation Process & Team Scores

Hello Participants,

We want to thank all participants for their efforts and contributions. This post provides information about the final evaluation process and shares the final scores.

Aggregation:

For the grand prize and special awards, we aggregated performance across Task 1, Task 2, and Task 3 without applying any weighting. Each interaction, regardless of task, was treated equally in the final scoring process. This approach ensures a fair comparison across different solution strategies.

Missing Rates:

We reviewed the missing rates of all submissions selected for final evaluation. The missing rates for the winning solutions were within acceptable limits. As a result, no top-performing teams were disqualified due to missing data.

Task Scores

Task Team Submission ID Score
Task 1 Dianping-Trust-Safety 289765 12.8%
db3 289693 8.4%
cruise 289794 6.7%
Task 2 Team_NVIDIA 289355 23.3%
db3 289788 22.1%
AcroYAMALEX 289902 21.4%
Task 3 db3 289655 36.8%
BlackPearl 288641 30.9%
Dianping-Trust-Safety 288234 29.7%
All egocentric images db3 289693, 289788, 289655 21.0%

Question Type Scores

Question Type Team Submission ID (task 1, task 2, task 3) Score
Simple NEC_AI_ROCKETS 289393, 288443, 289337 15.9%
Multi-hop otonadake 289524, 286760, 286785 5.9%
Comparison and Aggregation gogoogo 288599, 287800, 288715 3.3%
Reasoning otonadake 289524, 286760, 286785 10.3%

Team Meta CRAG-MM