Team NVIDIA, a group of data scientists and technologists, brought diverse skills to the KDD Cup 2024. Key members include Gilberto, a former #1 ranked Kaggle competitor with a background in Electrical Engineering; Chris, a Ph.D. holder in computational science and mathematics with experience across various professions; Benedikt Schifferer, a manager of Applied Research with expertise in recommender systems and LLMs; Ivan Sorokin, a Senior LLM Technologist; Ahmet Erdem, a Kaggle Grandmaster and open-source contributor; and Simon, a senior LLM technologist specializing in deep learning applications in computer vision and NLP.
Winning Strategy:
Team NVIDIA’s strategy for the KDD Cup 2024 involved the deployment of five fine-tuned Qwen2-72B LLM models, one for each of the competition’s tracks, leveraging cutting-edge techniques and substantial computational resources:
- Data Transformation and Model Training:
• The team transformed data from six public datasets, including Amazon-M2 and MMLU, into 500k question-answer pairs across 40 tasks and 5 task types.
• They fine-tuned multiple Qwen2-72B models using QLoRA on NVIDIA’s powerful 8xA100 80GB GPUs, employing techniques like DeepSpeed and Axolotl for efficiency.
- Model Optimization and Fine-Tuning:
• Fine-tuning involved adjusting LoRA parameters and experimenting with different weights for model adapters to optimize performance across various tasks.
• The models were trained with specific prompts tailored to simulate an online shopping assistant, enhancing task-specific performance.
- Quantization and Inference Optimization:
• To meet the competition’s stringent hardware limitations and inference time constraints, Team NVIDIA employed 4-bit AWQ quantization and batch inference strategies using software vLLM, significantly reducing the model’s memory footprint.
• During inference, logits processors were added to the model’s predictions to ensure output accuracy, particularly in handling structured responses like numbers and commas.
- Ensemble Techniques:
• The final submissions for each track involved sophisticated ensembles of base models and multiple LoRA adapters, fine-tuned to enhance the accuracy and robustness of the solutions.
Impact and Contributions:
Team NVIDIA’s comprehensive approach showcased their technical prowess and ability to innovate within constraints, leading to their first-place victory in all five competition tracks. Their work demonstrates the powerful capabilities of LLMs in handling diverse and complex real-world NLP tasks, particularly in a competitive setting with limited hardware resources.