As we gear up for the new round of Meta CRAG MM Challenge 2025, let’s revisit the standout approaches from last year’s competition. In this Winner Spotlight, we dive into the strategy behind Team db3, who took the top spot across all three tasks in the Meta KDD Cup 2024 – CRAG Challenge. You can also read the complete technical report over here.
This deep dive is designed to inform and inspire participants aiming to push boundaries in retrieval-augmented generation (RAG) this year.
Challenge Overview: What Was CRAG 2024 All About?
The 2024 CRAG challenge focused on building RAG systems capable of sourcing relevant knowledge from web documents and mock knowledge graphs to answer complex queries. It tested not just retrieval and generation quality but also robustness and hallucination control.
Team db3, comprising Jiazun Chen and Yikuan Xia from Peking University, achieved:
-
1st in Task 1 – Retrieval Summarisation (28.4%)
-
1st in Task 2 – Knowledge Graph + Web Retrieval (42.7%)
-
1st in Task 3 – End-to-End RAG (47.8%)
Task 1: Retrieval Summarisation
Team db3 engineered a layered retrieval-generation pipeline:
- Parse HTML with
BeautifulSoup
- Chunk text using
LangChain
into retrievable segments - Retrieve with the
bge-base-en-v1.5
model - Rerank results using a custom relevance model
- Add dynamic fallback: prompt the model to say “I don’t know” when uncertain
Tasks 2 & 3: Knowledge Graph + Web Integration
Their architecture evolved with more complex inputs and integrations:
- Combine structured data (mock KGs) and unstructured web pages
- Implement a Parent-Child Chunk Retriever for fine-grained retrieval
- Use a tuned LLM to orchestrate API calls via a controlled, regularised set
- Perform heavy reranking to ensure only the most relevant data reached the generator
Hallucination Mitigation
To keep outputs grounded and reliable, the team:
- Fine-tuned the model to rely only on retrieved evidence
- Added constraints to reduce overconfident generations
- Used Python-based calculators for numerical reasoning tasks
Meet the Team
Jiazun Chen and Yikuan Xia are third-year PhD candidates at Peking University, advised by Professor Gao Jun.
Their research focuses on:
- Community search in massive graph datasets
- Graph alignment for cross-domain analysis
- Table data fusion across heterogeneous sources
What Carries Over from 2024 to 2025?
While the Meta CRAG-MM Challenge 2025 takes a leap into multi-modal and multi-turn territory, several principles from db3’s approach remain highly applicable:
-
Structured + Unstructured Retrieval
db3’s integration of knowledge graphs and web data directly informs Task 2 of CRAG-MM, which fuses image-KG with web search. -
Hallucination Mitigation
Their use of grounded generation and standardised fallback (“I don’t know”) is vital in MM-RAG, where conciseness and truthfulness are tightly evaluated. -
Reranking and Retrieval Granularity
Techniques like Parent-Child Chunk Retrieval can be adapted to visual-context-aware retrieval in 2025. -
LLM-as-Controller
db3’s LLM-mediated API selection prefigures the multi-turn query orchestration required in this year’s task 3.
In short: while the modality has evolved, the core disciplines—retrieval quality, grounding, and structured reasoning—remain front and centre. Studying the 2024 winning strategy is still a powerful head start for 2025.
Stay tuned for the next Winner Spotlight—and good luck with your submissions.
Read other winning strategies here: dRAGonRAnGers and md_dh