🏆 Behind the Winning Strategy of Team db3 [Meta CRAG 2024]

As we gear up for the new round of Meta CRAG MM Challenge 2025, let’s revisit the standout approaches from last year’s competition. In this Winner Spotlight, we dive into the strategy behind Team db3, who took the top spot across all three tasks in the Meta KDD Cup 2024 – CRAG Challenge. You can also read the complete technical report over here.

This deep dive is designed to inform and inspire participants aiming to push boundaries in retrieval-augmented generation (RAG) this year.


:mag: Challenge Overview: What Was CRAG 2024 All About?

The 2024 CRAG challenge focused on building RAG systems capable of sourcing relevant knowledge from web documents and mock knowledge graphs to answer complex queries. It tested not just retrieval and generation quality but also robustness and hallucination control.

Team db3, comprising Jiazun Chen and Yikuan Xia from Peking University, achieved:

  • :1st_place_medal: 1st in Task 1 – Retrieval Summarisation (28.4%)
  • :1st_place_medal: 1st in Task 2 – Knowledge Graph + Web Retrieval (42.7%)
  • :1st_place_medal: 1st in Task 3 – End-to-End RAG (47.8%)

:small_blue_diamond: Task 1: Retrieval Summarisation

Team db3 engineered a layered retrieval-generation pipeline:

  • Parse HTML with BeautifulSoup
  • Chunk text using LangChain into retrievable segments
  • Retrieve with the bge-base-en-v1.5 model
  • Rerank results using a custom relevance model
  • Add dynamic fallback: prompt the model to say “I don’t know” when uncertain

:small_blue_diamond: Tasks 2 & 3: Knowledge Graph + Web Integration

Their architecture evolved with more complex inputs and integrations:

  • Combine structured data (mock KGs) and unstructured web pages
  • Implement a Parent-Child Chunk Retriever for fine-grained retrieval
  • Use a tuned LLM to orchestrate API calls via a controlled, regularised set
  • Perform heavy reranking to ensure only the most relevant data reached the generator

:small_blue_diamond: Hallucination Mitigation

To keep outputs grounded and reliable, the team:

  • Fine-tuned the model to rely only on retrieved evidence
  • Added constraints to reduce overconfident generations
  • Used Python-based calculators for numerical reasoning tasks

:busts_in_silhouette: Meet the Team

Jiazun Chen and Yikuan Xia are third-year PhD candidates at Peking University, advised by Professor Gao Jun.

Their research focuses on:

  • Community search in massive graph datasets
  • Graph alignment for cross-domain analysis
  • Table data fusion across heterogeneous sources

:arrows_counterclockwise: What Carries Over from 2024 to 2025?

While the Meta CRAG-MM Challenge 2025 takes a leap into multi-modal and multi-turn territory, several principles from db3’s approach remain highly applicable:

  • Structured + Unstructured Retrieval
    db3’s integration of knowledge graphs and web data directly informs Task 2 of CRAG-MM, which fuses image-KG with web search.

  • Hallucination Mitigation
    Their use of grounded generation and standardised fallback (“I don’t know”) is vital in MM-RAG, where conciseness and truthfulness are tightly evaluated.

  • Reranking and Retrieval Granularity
    Techniques like Parent-Child Chunk Retrieval can be adapted to visual-context-aware retrieval in 2025.

  • LLM-as-Controller
    db3’s LLM-mediated API selection prefigures the multi-turn query orchestration required in this year’s task 3.

:jigsaw: In short: while the modality has evolved, the core disciplines—retrieval quality, grounding, and structured reasoning—remain front and centre. Studying the 2024 winning strategy is still a powerful head start for 2025.


Stay tuned for the next Winner Spotlight—and good luck with your submissions.

Read other winning strategies here: dRAGonRAnGers and md_dh

1 Like