Final Evaluation Process & Team Scores

snehananavati · July 11, 2025, 10:46am

Hello Participants,

We want to thank all participants for their efforts and contributions. This post provides information about the final evaluation process and shares the final scores.

Aggregation:

For the grand prize and special awards, we aggregated performance across Task 1, Task 2, and Task 3 without applying any weighting. Each interaction, regardless of task, was treated equally in the final scoring process. This approach ensures a fair comparison across different solution strategies.

Missing Rates:

We reviewed the missing rates of all submissions selected for final evaluation. The missing rates for the winning solutions were within acceptable limits. As a result, no top-performing teams were disqualified due to missing data.

Task Scores

Task	Team	Submission ID	Score
Task 1	Dianping-Trust-Safety	289765	12.8%
	db3	289693	8.4%
	cruise	289794	6.7%
Task 2	Team_NVIDIA	289355	23.3%
	db3	289788	22.1%
	AcroYAMALEX	289902	21.4%
Task 3	db3	289655	36.8%
	BlackPearl	288641	30.9%
	Dianping-Trust-Safety	288234	29.7%
All egocentric images	db3	289693, 289788, 289655	21.0%

Question Type Scores

Question Type	Team	Submission ID (task 1, task 2, task 3)	Score
Simple	NEC_AI_ROCKETS	289393, 288443, 289337	15.9%
Multi-hop	otonadake	289524, 286760, 286785	5.9%
Comparison and Aggregation	gogoogo	288599, 287800, 288715	3.3%
Reasoning	otonadake	289524, 286760, 286785	10.3%

Team Meta CRAG-MM

snehananavati · July 12, 2025, 12:17pm

tereka · July 13, 2025, 8:32pm

@snehananavati @yilun_jin @Jiaqi
Thank you for sharing the detail result.
This is tereka team member of AcroYAMALEX

my submission ID 289902 is Task3 submission ID.
so I check more other teams, 289355(Team_NVIDIA) is task3, 289655(db3) is task2

Do you exchange result task2 and task3?

l0wang · July 14, 2025, 2:32am

Could the organizer confirm whether there is a possibility of confusing the results of task2 and task3?

l0wang · July 14, 2025, 2:39am

For task3, the commit id 288641 seems to be our team’s achievement (Dianping-Trust-Safety)

Jiaqi · July 16, 2025, 9:55pm

@tereka @l0wang

Can you clarify what you are referring to as task2 and task3?
We did noticed teams are confusing with task 2 and task 3 when submitting the google form.

Task 2 is multi-source augmentation
Task 3 is multi-turn QA

Task 2 and 3 have different number of interactions, so it’s unlikely these two are mis-placed as the total counts did match.

Jiaqi · July 16, 2025, 9:58pm

@l0wang that’s for catching this. The submission_id was wrong, but scores and rankings are correct.

This is the correct mapping. We will update the table soon.
BlackPearl, 288234 → 30.9%
Dianping-Trust-Safety, 288641 → 29.7%

tereka · July 17, 2025, 3:20am

Thank you for reply.
I understand, I just misunderstand task2 and task3 task name.