Note for our final evaluation

yilun_jin · July 5, 2024, 9:43am

Dear participants,

As we are nearing the end of our competition. We would like to introduce our final evaluation process that will eventually determine the rankings.

We won’t have private datasets (since the test data are already hidden from you), so the final evaluation will just re-run your submissions with different random seeds. If your solution does not depend on a certain random seed to work, you should not worry too much about it.

We have sent out a google sheet asking you to select the submissions you want to use for the final evaluation. Note that only teams who currently rank top-20 in each track will receive the email. However, in case of any email glitches and you did not receive the email, you can find the google sheet (here)[https://forms.gle/thFE2B3P9cC7UcVt5]. Please fill the sheet before 12th July 2024, 12:00 UTC (noon).

Please note that we will clone your selected submissions from gitlab, so do not remove/delete submission tags that you intend to use as final submissions. If the corresponding git submission tag does not exist, your submission will be marked as failed during the final re-evaluation.

Special notes:

All submissions that are made before July 10th, 23:55 UTC are valid submissions that can be considered in the final re-evaluation, regardless of when they finished, and when you finish your Google sheet.
If a team does not fill the google sheets, we will take the top-2 submissions on the leaderboard by default for our final evaluation.
We will take the higher score of the two selected submissions, instead of the average.

Best,
Amazon KDD Cup 2024 Organizers

BenediktSchifferer · July 9, 2024, 9:13am

Hello @yilun_jin

thanks for sharing the process about the final submissions. I have some follow up questions and suggestions:

Who will you send out the google sheet to? We do not want to miss the email. We were not sure, if you send it only to the team captain or to the full team. Can you send it out to the every team member, please?

When will you send it out? Can you share an announcement in the forum on the KDD competition, when the email was sent? In that way, we can double check out inbox.

Thanks a lot,
Best,
Benedikt

yilun_jin · July 9, 2024, 9:26am

Hello Benedikt,

We will use the “Send Newsletter Email” function to send out the mail, and set the recipients to ‘All participants’. We will also mark it as a discussion post to make sure that even though there is a glitch in the email, you can still see the info.
We will send it out in one day or two (roughly at the deadline). So we have not sent it out yet. Please rest assured.

Best,
Amazon KDD Cup 2024 Organizers

pengyue_jia3 · July 10, 2024, 3:57am

Hi @yilun_jin ,

Thanks for your notice.

I would like to confirm one question about the deadline. If a submission is made before the deadline but finishes running after the deadline, will it be considered in the final evaluation?

Thanks for your attention,
Best,
Pengyue

yilun_jin · July 10, 2024, 9:34am

I just confirmed from aicrowd that it will be considered.

BenediktSchifferer · July 10, 2024, 10:01am

@yilun_jin

I would like to clarify it more precisely.

The deadline for making an eligible submissions: July 10th 23:59 UTC - is that correct?
If we submit before this deadline and the run finishes afterwards, it is an eligible submission.
If we submit after this deadline, the run will not be eligible

Is that correct?

The deadline of 12th of July is for selecting our top submission, but the submission has to be eligible and has to be submitted before July 10th 23:59 UTC?

yilun_jin · July 10, 2024, 11:04am

Deadline for making eligible submissions is 23:55 UTC July 10th, instead of 23:59. All submissions before that would be counted eligible, regardless of when it finished. However, all submissions that are submitted after that would be counted ineligible, regardless of when you fill in the submission selection form.

Hope that helps.

tereka · July 11, 2024, 12:09am

Hi @yilun_jin
I uploaded git tag before 23:55 UTC, but I couldn’t put tag for submission this code before 23:55 UTC.(don’t show issue). but UTC 00:00(July11), submission system automatically start submission.
How about these situation submission? include final submission or not?
ex)

pengyue_jia3 · July 11, 2024, 1:04am

Hi @yilun_jin ,

Thank you for organizing the competition. I have two questions about selecting submission IDs.

If one team does not fill out the form, will the top-2 submissions from each track of their team be automatically selected and rerun with different random seeds?
Additionally, for the two selected submissions, will the final score be taken from the best score among all runs or the average score?

Best,
Pengyue

yilun_jin · July 11, 2024, 2:43am

Your case is noted. I am working with aicrowd folks on this.

yilun_jin · July 11, 2024, 2:45am

Yes. If a team does not fill in the sheets, we will take the top-2 submissions by default.
We will take the higher one instead of the average.

yilun_jin · July 11, 2024, 1:02pm

Reply from aicrowd folks.
If your issue is created at roughly 00:00 UTC July 11, it will be counted. We will manually mark them.

tereka · July 11, 2024, 1:20pm

Hi. @yilun_jin
thanks for your correspondence. I will check manually mark issues before selection of submissions

Chris_Deotte · July 11, 2024, 3:22pm

@yilun_jin , Sometimes the exact same code will succeed one time and fail another time. For example we submitted the exact same code [here] and [here]. The first succeeded and the second failed. During re-run what happens if code fails that has previously succeeded? Will the admins run it a second time?

Also, can you tell us why the second link above failed?

When we select our final 2 submissions for each track, should we just select our best scoring submission twice in case it fails the first time it is re-run?

wakawakawakawaka · July 12, 2024, 1:19am

Hi @yilun_jin ,
Thank you for hosting the competition. While selecting my final submission, I have two questions.

Among the submissions marked as “Evaluation Succeeded,” there are some submissions where the score is not displayed. This information is necessary for selecting the final submission. Is it possible to check these scores? Examples are here and here, etc.
Sometimes, even with the same code, there are submissions that failed due to “Evaluation timed out.” In such cases, I believe some of these might succeed if submitted again. Is it possible to select one of these as one of the two final submissions?

yilun_jin · July 12, 2024, 2:28am

@Chris_Deotte It may happen that a valid submission may fail due to network connection issues (which is what happened to the second link). This is a problem we encounter since the beginning, but fail to completely address. We apologize for that.

During re-evaluation, we would make sure that all submissions will run through to get scores, so feel free to select two different submissions.

Hope that helps.

yilun_jin · July 12, 2024, 2:33am

Hi @wakawakawakawaka ,

I manually check the logs to see the final scores. They have been updated on your issue page.
It is possible to submit them, but since you did not see the final scores, you have to do that at some risk. Apologies for the occasional network issue.

wakawakawakawaka · July 12, 2024, 6:44am

Hi @yilun_jin ,

Thank you very much.
Regarding the answer to question 2, I understand well. Thank you for your detailed explanation.

Concerning question 1, I’d like to ask some additional questions.

I can now see the scores. Thank you very much! However, an unbelievable score was displayed. In particular, the Multiple-Choice Score was close to 0.0, which, considering the number of questions, could indicate some error in the evaluation. Or does it mean that none of the answers were correct? I’d like to request a confirmation on this. such case here
If possible, there are three more problems that haven’t been evaluated in the same way, which are here, here, and here that’s all. Would it be possible for you to evaluate these as well?

Best,

fersebasIn · July 12, 2024, 6:58am

Hi @yilun_jin , It is possible to reevaluated this issues?

here
here
here
here
here
here
here
here

just in case, thank so much, Best,
fer

yilun_jin · July 12, 2024, 8:33am

Hello @wakawakawakawaka ,
Regarding your low multiple-choice score, I have replied with a log snippet in the issue. Basically, your output did not conform with our parser, resulting in a very low multiple choice score. We don’t believe that there is any error with our evaluation.
Regarding your second question, I will post the scores in the issues as well.