[FAQ]Analysis and tips on submission failures

Dear participants,

Over the first week of the launch of the challenge, we frequently receive the question, “Why did our submission fail”. Personally, I totally echo your feelings when your submissions fail, and more specifically, our system does not return the error messages. However, we do so with a specific reason, that there is a possibility that the data will be leaked if we return error messages to participants (e.g. by manually raising exceptions).

To make your submission process smoother, we analyze the latest 50 failed submissions, to see what insights or suggestions we can provide to help your submission go through. Here are some useful tips.

  • Keep your requirements.txt as simple and concise as possible. Avoid unnecessary dependencies. Specifically, we recommend against using pip freeze > requirements.txt because that often leads to a very cumbersome requirements.txt and often leads to build error.
  • Test the submission with the Dockerfile we provided in the starter kit. One easy way is to test it with bash ./docker_run.sh (docker_run.sh can be found here. It will locally build your submission with the same dockerfile as what we use at our server. Therefore, bash ./docker_run.sh should be able to reveal potential build errors, and also code bugs.
  • Run local_evaluation.py. The development set data also exist in the test set, so they can be used to see whether your submission has excessively long inference time. One number to keep in mind is 15 seconds for a single prediction. In addition, you may not expect your submission to meet the total time limit if the average inference time is ~5s per sample (which is too long).
  • Be careful when using Mistral and Gemma-related models. We observe many submission failures caused either by single-prediction or total timeout when participants use Mistral-7B or Gemma-7B. The reason is that, although they have the same number of parameters as Vicuna-7B, model inference is much slower on these models (as much as 2-3x). Therefore, when participants use these models, please do some extra efforts to reduce their inference time or you may expect timeouts.
  • Do not generate too long answers. In the baseline, we generate 100 tokens for non-multiple-choice questions, and that leads to up to ~3 seconds per question. Therefore, you should keep the 15s per sample limit in mind and do not use too long generation lengths. Extra care needs to be taken if you are using Gemma or Mistral.
  • Use git lfs. We see several submission failures caused by broken model checkpoints. We encourage participants to use git lfs to upload them to avoid such failures.
  • Consider the difference between your device and our system. When you locally test your solutions, keep in mind that we use NVIDIA-T4 GPUs, which may not be as powerful as your device, and that you should factor in the differences when you test the efficiency.
  • My submission fails in test phase, why? You should consider the following factors when your submission failed during the test phase (and sometimes, validation, too), as we would not give you error messages.
    • Total time limit exceeded. This can be easily identified by the ‘Total prediction time’ info in your ‘issues’. If the number is close to the time limit and the submission failed, then this is the most probable reason.
    • GPU memory exceeded. This is not so likely if you only use 7B models, but when you use models > 10B, consider this possibility.
    • Single question time limit exceeded. This is the most common reason why submissions fail in the test phase. They are hard to identify (as the test data will not be revealed to participants), but you can simulate some data with the development set and see whether your solution meets the limit.
  • Tag @yilun_jin and @aicrowd-bot for assistance. If you really cannot figure out the reason of your failure, tag me in the issue of your submission, and we will come back to you in one day.

Please go through the checklist before submitting your model. By doing so, you increase your chances of successful submissions, while we save computation costs that can be used to allow more submission quotas.

Best,
Yilun Jin
On behalf of Amazon KDD Cup 2024 Organizers.

2 Likes

In Ububut 18.04, cannot build the image @yilun_jin @aicrowd-bot

#8 [ 4/19] RUN apt -qq update
#8 17.06 
#8 17.06 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
#8 17.06 
#8 24.53 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease: At least one invalid signature was encountered.
#8 24.53 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.
#8 24.53 W: GPG error: http://security.ubuntu.com/ubuntu focal-security InRelease: At least one invalid signature was encountered.
#8 24.53 E: The repository 'http://security.ubuntu.com/ubuntu focal-security InRelease' is not signed.
#8 24.53 W: GPG error: http://archive.ubuntu.com/ubuntu focal InRelease: At least one invalid signature was encountered.
#8 24.53 E: The repository 'http://archive.ubuntu.com/ubuntu focal InRelease' is not signed.
#8 24.53 W: GPG error: http://archive.ubuntu.com/ubuntu focal-updates InRelease: At least one invalid signature was encountered.
#8 24.53 E: The repository 'http://archive.ubuntu.com/ubuntu focal-updates InRelease' is not signed.
#8 24.53 W: GPG error: http://archive.ubuntu.com/ubuntu focal-backports InRelease: At least one invalid signature was encountered.
#8 24.53 E: The repository 'http://archive.ubuntu.com/ubuntu focal-backports InRelease' is not signed.
#8 ERROR: process "/bin/sh -c apt -qq update" did not complete successfully: exit code: 100

Please take a look at this.
https://forums.developer.nvidia.com/t/18-04-cuda-docker-image-is-broken/212892/3

Also your submissions successfully built so I don’t see any apparent issues.

1 Like