CPDC Winner Spotlight: 💡 Ideas to Improve Solutions For Round 2

snehananavati · May 21, 2025, 11:54am

As we prepare for Round 2, we want to spotlight the winning strategies from CPDC 2023. These highlights offer practical insights and implementation tips to help strengthen your approach.

Task 2 Winner
Key Insight: Synthetic Data Generation + Modern Architectures
First Place: Kuan-Yen Lin
Username: @biu_biu

Background: NLP practitioner specialising in dialogue systems and commonsense reasoning.

Winning Strategy

Multi-Phase Approach:

Baseline evaluation
Dataset augmentation
Model fine-tuning

Key Methods:

Evaluated ComFact baseline on hidden test set
Merged Conv2 and Peacock datasets
Generated 20,000 synthetic conversations using GPT-3.5-Turbo
Fine-tuned DeBERTa-V3 with comprehensive hyperparameter search
Predicted head/tail facts both separately and jointly

Insight: This two-path evaluation enabled structural interpretations of the task. His system—powered by synthetic data, modern architecture, and rigorous tuning—proved effective for accurate persona-grounded knowledge linking.

Implementation Tips

Establish a strong baseline:

Run the baseline model on test data
Use results to identify weaknesses

Leverage synthetic data:

Combine existing persona datasets
Use GPT-3.5-Turbo to generate new labelled conversations
Balance the dataset for broader coverage

Optimise model performance:

Use DeBERTa-V3 or equivalent
Perform deep hyperparameter tuning
Experiment with separate vs. joint prediction of facts

Task 2 Runner-Up
Key Insight: Relational Structure Understanding Through Natural Language
Second Place: Jinrui Liang
Username: @TieMoJi

Background: AI algorithm engineer at NetEase Games focusing on deep learning and NLP.

Winning Strategy

Core Focus: Enhancing relational understanding between persona facts using natural language templates

Key Methods:

Augmented data with head/tail entities and relations
Translated structured relations into natural language form
Reframed task as sentence-triple correlation
Designed multi-prompt training setup
Used multi-loss optimisation and model fusion
Adopted mixed precision training
Applied sample resampling for class balance

Insight: His layered training and reconstruction strategy produced a generalisable architecture grounded in both theoretical and engineering best practices.

Implementation Tips

Refine data structure:

Explicitly encode relational structure
Use templates to express (head, relation, tail) in plain language

Advance training techniques:

Apply multi-prompt strategies
Incorporate multi-loss training
Experiment with model fusion

Improve efficiency:

Use mixed precision to accelerate training
Apply resampling to fix class imbalance

Task 2 Third Place
Key Insight: Generative LLMs for Multi-Turn Dialogue Processing
Third Place: Yiyang Zheng
Username: @yiyang_zheng
Team: Yiyang Zheng, Yingwu Yang

Background:

Yiyang Zheng: Undergraduate student at Shanghai University focused on NLP
Yingwu Yang: Machine learning practitioner in the financial sector

Winning Strategy

Core Focus: Using generative LLMs to manage complex multi-turn dialogues with subtle persona reasoning

Key Methods:

Fine-tuned Phi-2 on both official and open-source datasets
Selected Phi-2 for its balance between reasoning and efficiency
Focused on implicit and ambiguous dialogue-fact connections

Insight: Showed that generative LLMs like Phi-2 can effectively handle multi-turn, persona-grounded dialogue by reasoning through subtle context cues.

Implementation Tips

Choose the right LLM:

Consider compact models with strong reasoning (e.g., Phi-2)
Evaluate for multi-turn conversational capability

Focus on implicit reasoning:

Curate training examples with subtle persona links
Emphasise commonsense bridging in dialogue-fact alignment

Fine-tune for generalisability:

Combine various datasets
Retain a balance of general fluency and persona specificity
Test against diverse scenarios