Hello Participants,
We want to thank all participants for their efforts and contributions. This post provides information about the final evaluation process and shares the final scores.
Aggregation:
For the grand prize and special awards, we aggregated performance across Task 1, Task 2, and Task 3 without applying any weighting. Each interaction, regardless of task, was treated equally in the final scoring process. This approach ensures a fair comparison across different solution strategies.
Missing Rates:
We reviewed the missing rates of all submissions selected for final evaluation. The missing rates for the winning solutions were within acceptable limits. As a result, no top-performing teams were disqualified due to missing data.
Task Scores
Task | Team | Submission ID | Score |
---|---|---|---|
Task 1 | Dianping-Trust-Safety | 289765 | 12.8% |
db3 | 289693 | 8.4% | |
cruise | 289794 | 6.7% | |
Task 2 | Team_NVIDIA | 289355 | 23.3% |
db3 | 289788 | 22.1% | |
AcroYAMALEX | 289902 | 21.4% | |
Task 3 | db3 | 289655 | 36.8% |
BlackPearl | 288641 | 30.9% | |
Dianping-Trust-Safety | 288234 | 29.7% | |
All egocentric images | db3 | 289693, 289788, 289655 | 21.0% |
Question Type Scores
Question Type | Team | Submission ID (task 1, task 2, task 3) | Score |
---|---|---|---|
Simple | NEC_AI_ROCKETS | 289393, 288443, 289337 | 15.9% |
Multi-hop | otonadake | 289524, 286760, 286785 | 5.9% |
Comparison and Aggregation | gogoogo | 288599, 287800, 288715 | 3.3% |
Reasoning | otonadake | 289524, 286760, 286785 | 10.3% |
Team Meta CRAG-MM