📢 Announcements: Phase 1 Extension, New Private Test Set, Batch Prediction Interface, and Updated Baselines

aicrowd_team · May 14, 2024, 3:48pm

Hi everyone,

We are excited to announce several updates to our competition, including the launch of the batch prediction interface, updated baselines, an extension of Phase 1, the V3 dataset release, and a new private test set.

Deadline Extensions of Phase 1

The Phase 1 deadline is now extended by 1 week until 27th May 2024. This extended duration will be marked on the leaderboard as Phase 1b,

Private Test set

Phase 1b uses a new private test set (the logs for the evaluation on this test set will not be available to the participants).

Dataset V3 Release

A new version of the dataset has now been released, addressing errors identified in the questions/answers for a small subset of the V2 dataset. We have also added alternative answers (alt_ans) to the questions.

Batch Prediction Interface

Your submitted models can now make batch predictions on the test set, allowing you to fully utilize the multi-GPU setup available during evaluations.

Changes to Your Code

Add a get_batch_size() Function:
- This function should return an integer between [1, 16]. The maximum batch size supported at the moment is 16.
- You can also choose the batch size dynamically.
- This function is a required interface for your model class.

Replace generate_answer with batch_generate_answer:

Update your code to replace the generate_answer function with batch_generate_answer.
For more details on the batch_generate_answer interface, please refer to the inline documentation in dummy_model.py.

# Old Interface
def generate_answer(self, query: str, search_results: List[Dict], query_time: str) -> str:
    ....
    ....
    return answer

# New Interface
def batch_generate_answer(self, batch: Dict[str, Any]) -> List[str]:
    batch_interaction_ids = batch["interaction_id"]
    queries = batch["query"]
    batch_search_results = batch["search_results"]
    query_times = batch["query_time"]

    ....
    ....

    return [answer1, answer2, ......, answerN]

The new function should return a list of answers (List[str]) instead of a single answer (str).
This function is a required interface for your model class.
The simplest example of a valid submission with the new interface is as follows:

class DummyModel:
    def get_batch_size(self) -> int: 
        return 4

    def batch_generate_answer(self, batch: Dict[str, Any]) -> List[str]:
        queries = batch["query"]
        answers = ["i don't know" for _ in queries]
        return answers

Backward Compatibility

To ensure a smooth transition, the evaluators will maintain backward compatibility with the generate_answer interface for a short period. However, we strongly recommend updating your code to use the batch_generate_answer interface to avoid any disruptions when support for the older interface is removed in the coming weeks.

Updated Baselines

We have updated the baselines to use the recently released Meta Llama3 8B Instruct model. The updated baselines also demonstrate how to use vllm for optimized batch inference with multi-GPU setups. Additionally, the RAG baseline now includes examples of using ray.remote for parallelized chunking, making better use of the numerous CPU cores available on evaluation nodes.

You can directly access the baselines here:

Best of luck with the competition!

aicrowd_team · May 14, 2024, 3:48pm

aicrowd_team · May 15, 2024, 12:09am

jiazunchen · May 15, 2024, 3:40am

According to baseline , now each query only has 10s to answer?
- Response Time: Ensure that your model processes and responds to each query within 10 seconds.

aicrowd_team · May 15, 2024, 11:17am

@jiazunchen : The timeline for Round 1b is still 30 seconds / sample. It will however be reduced in Round 2 when we increase the overall test set size.

joseph_wei · May 15, 2024, 11:56am

submission id
7ea25a77e69b171feb181614ea37338b5088e776

Submission failed : You have not qualified for this round. Please review the challenge rules at www.aicrowd.com

How to deal with this?