Note for our final evaluation

Hi @yilun_jin ,

Thank you for organizing the competition. I have two questions about selecting submission IDs.

  1. If one team does not fill out the form, will the top-2 submissions from each track of their team be automatically selected and rerun with different random seeds?
  2. Additionally, for the two selected submissions, will the final score be taken from the best score among all runs or the average score?

Best,
Pengyue

Your case is noted. I am working with aicrowd folks on this.

1 Like
  1. Yes. If a team does not fill in the sheets, we will take the top-2 submissions by default.
  2. We will take the higher one instead of the average.

Reply from aicrowd folks.
If your issue is created at roughly 00:00 UTC July 11, it will be counted. We will manually mark them.

Hi. @yilun_jin
thanks for your correspondence. I will check manually mark issues before selection of submissions

@yilun_jin , Sometimes the exact same code will succeed one time and fail another time. For example we submitted the exact same code [here] and [here]. The first succeeded and the second failed. During re-run what happens if code fails that has previously succeeded? Will the admins run it a second time?

Also, can you tell us why the second link above failed?

When we select our final 2 submissions for each track, should we just select our best scoring submission twice in case it fails the first time it is re-run?

Hi @yilun_jin ,
Thank you for hosting the competition. While selecting my final submission, I have two questions.

  1. Among the submissions marked as “Evaluation Succeeded,” there are some submissions where the score is not displayed. This information is necessary for selecting the final submission. Is it possible to check these scores? Examples are here and here, etc.

  2. Sometimes, even with the same code, there are submissions that failed due to “Evaluation timed out.” In such cases, I believe some of these might succeed if submitted again. Is it possible to select one of these as one of the two final submissions?

@Chris_Deotte It may happen that a valid submission may fail due to network connection issues (which is what happened to the second link). This is a problem we encounter since the beginning, but fail to completely address. We apologize for that.

During re-evaluation, we would make sure that all submissions will run through to get scores, so feel free to select two different submissions.

Hope that helps.

1 Like

Hi @wakawakawakawaka ,

  1. I manually check the logs to see the final scores. They have been updated on your issue page.

  2. It is possible to submit them, but since you did not see the final scores, you have to do that at some risk. Apologies for the occasional network issue.

Hi @yilun_jin ,

Thank you very much.
Regarding the answer to question 2, I understand well. Thank you for your detailed explanation.

Concerning question 1, I’d like to ask some additional questions.

  • I can now see the scores. Thank you very much! However, an unbelievable score was displayed. In particular, the Multiple-Choice Score was close to 0.0, which, considering the number of questions, could indicate some error in the evaluation. Or does it mean that none of the answers were correct? I’d like to request a confirmation on this. such case here
  • If possible, there are three more problems that haven’t been evaluated in the same way, which are here, here, and here that’s all. Would it be possible for you to evaluate these as well?

Best,

Hi @yilun_jin , It is possible to reevaluated this issues?

here
here
here
here
here
here
here
here

just in case, thank so much, Best,
fer

Hello @wakawakawakawaka ,
Regarding your low multiple-choice score, I have replied with a log snippet in the issue. Basically, your output did not conform with our parser, resulting in a very low multiple choice score. We don’t believe that there is any error with our evaluation.
Regarding your second question, I will post the scores in the issues as well.

aicrowd folks are notified about that, but we cannot make sure that all these can be finished.

No problem, just in case, thanks :slight_smile:

We have filled out the form but have not received a confirmation email. Could you please confirm that it has been received? @yilun_jin

Hi @yilun_jin , could you please help notify aicrowd folks to reevaluate these issues?
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/178
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/189
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/179
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/197
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/198
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/205
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/200
https://gitlab.aicrowd.com/pp/amazon-kdd-cup-2024/-/issues/195
And forgive me to reconfirm that it is ok not to fill in the form, because top2 performance submission will be automatically retested

Hi @yilun_jin ,Could you please help notify aicrowd folks to reevaluate these issues?

Hi @yilun_jin ,
Thanks so much for your help! I really appreciate it!

[quote=“boren, post:25, topic:10686”]
Hi @yilun_jin ,Could you please help notify aicrowd folks to reevaluate these issues?
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/420
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/418
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/416
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/415
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/386
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/385
https://gitlab.aicrowd.com/boren/amazon-kdd-cup-2024-starter-kit/-/issues/390

@yilun_jin Thank you for your excellent organization. I have a question could a team be notified which submissions are selected in the final testing stage?