Hi everyone,
We are excited to announce several updates to our competition, including the launch of the batch prediction interface, updated baselines, an extension of Phase 1, the V3 dataset release, and a new private test set.
Deadline Extensions of Phase 1
The Phase 1 deadline is now extended by 1 week until 27th May 2024. This extended duration will be marked on the leaderboard as Phase 1b
,
Private Test set
Phase 1b uses a new private test set (the logs for the evaluation on this test set will not be available to the participants).
Dataset V3 Release
A new version of the dataset has now been released, addressing errors identified in the questions/answers for a small subset of the V2 dataset. We have also added alternative answers (alt_ans
) to the questions.
Batch Prediction Interface
Your submitted models can now make batch predictions on the test set, allowing you to fully utilize the multi-GPU setup available during evaluations.
Changes to Your Code
-
Add a
get_batch_size()
Function:- This function should return an integer between
[1, 16]
. The maximum batch size supported at the moment is 16. - You can also choose the batch size dynamically.
- This function is a required interface for your model class.
- This function should return an integer between
-
Replace
generate_answer
withbatch_generate_answer
:- Update your code to replace the
generate_answer
function withbatch_generate_answer
. - For more details on the
batch_generate_answer
interface, please refer to the inline documentation in dummy_model.py.
# Old Interface def generate_answer(self, query: str, search_results: List[Dict], query_time: str) -> str: .... .... return answer # New Interface def batch_generate_answer(self, batch: Dict[str, Any]) -> List[str]: batch_interaction_ids = batch["interaction_id"] queries = batch["query"] batch_search_results = batch["search_results"] query_times = batch["query_time"] .... .... return [answer1, answer2, ......, answerN]
- The new function should return a list of answers (
List[str]
) instead of a single answer (str
). - This function is a required interface for your model class.
- The simplest example of a valid submission with the new interface is as follows:
class DummyModel: def get_batch_size(self) -> int: return 4 def batch_generate_answer(self, batch: Dict[str, Any]) -> List[str]: queries = batch["query"] answers = ["i don't know" for _ in queries] return answers
- Update your code to replace the
Backward Compatibility
To ensure a smooth transition, the evaluators will maintain backward compatibility with the generate_answer
interface for a short period. However, we strongly recommend updating your code to use the batch_generate_answer
interface to avoid any disruptions when support for the older interface is removed in the coming weeks.
Updated Baselines
We have updated the baselines to use the recently released Meta Llama3 8B Instruct model. The updated baselines also demonstrate how to use vllm for optimized batch inference with multi-GPU setups. Additionally, the RAG baseline now includes examples of using ray.remote for parallelized chunking, making better use of the numerous CPU cores available on evaluation nodes.
You can directly access the baselines here:
Best of luck with the competition!