🏆 Behind the Winning Strategy of Team db3 [Meta CRAG 2024]

snehananavati · May 15, 2025, 4:51am

As we gear up for the new round of Meta CRAG MM Challenge 2025, let’s revisit the standout approaches from last year’s competition. In this Winner Spotlight, we dive into the strategy behind Team db3, who took the top spot across all three tasks in the Meta KDD Cup 2024 – CRAG Challenge. You can also read the complete technical report over here.

This deep dive is designed to inform and inspire participants aiming to push boundaries in retrieval-augmented generation (RAG) this year.

Challenge Overview: What Was CRAG 2024 All About?

The 2024 CRAG challenge focused on building RAG systems capable of sourcing relevant knowledge from web documents and mock knowledge graphs to answer complex queries. It tested not just retrieval and generation quality but also robustness and hallucination control.

Team db3, comprising Jiazun Chen and Yikuan Xia from Peking University, achieved:

1st in Task 1 – Retrieval Summarisation (28.4%)
1st in Task 2 – Knowledge Graph + Web Retrieval (42.7%)
1st in Task 3 – End-to-End RAG (47.8%)

Task 1: Retrieval Summarisation

Team db3 engineered a layered retrieval-generation pipeline:

Parse HTML with BeautifulSoup
Chunk text using LangChain into retrievable segments
Retrieve with the bge-base-en-v1.5 model
Rerank results using a custom relevance model
Add dynamic fallback: prompt the model to say “I don’t know” when uncertain

Tasks 2 & 3: Knowledge Graph + Web Integration

Their architecture evolved with more complex inputs and integrations:

Combine structured data (mock KGs) and unstructured web pages
Implement a Parent-Child Chunk Retriever for fine-grained retrieval
Use a tuned LLM to orchestrate API calls via a controlled, regularised set
Perform heavy reranking to ensure only the most relevant data reached the generator

Hallucination Mitigation

To keep outputs grounded and reliable, the team:

Fine-tuned the model to rely only on retrieved evidence
Added constraints to reduce overconfident generations
Used Python-based calculators for numerical reasoning tasks

Meet the Team

Jiazun Chen and Yikuan Xia are third-year PhD candidates at Peking University, advised by Professor Gao Jun.

Their research focuses on:

Community search in massive graph datasets
Graph alignment for cross-domain analysis
Table data fusion across heterogeneous sources

What Carries Over from 2024 to 2025?

While the Meta CRAG-MM Challenge 2025 takes a leap into multi-modal and multi-turn territory, several principles from db3’s approach remain highly applicable:

Structured + Unstructured Retrieval
db3’s integration of knowledge graphs and web data directly informs Task 2 of CRAG-MM, which fuses image-KG with web search.
Hallucination Mitigation
Their use of grounded generation and standardised fallback (“I don’t know”) is vital in MM-RAG, where conciseness and truthfulness are tightly evaluated.
Reranking and Retrieval Granularity
Techniques like Parent-Child Chunk Retrieval can be adapted to visual-context-aware retrieval in 2025.
LLM-as-Controller
db3’s LLM-mediated API selection prefigures the multi-turn query orchestration required in this year’s task 3.

In short: while the modality has evolved, the core disciplines—retrieval quality, grounding, and structured reasoning—remain front and centre. Studying the 2024 winning strategy is still a powerful head start for 2025.

Stay tuned for the next Winner Spotlight—and good luck with your submissions.

Read other winning strategies here: dRAGonRAnGers and md_dh

snehananavati · May 16, 2025, 6:42am