Submission of only final predictions file

maruthi0506 · December 4, 2019, 6:35am

Hi Team,

Is it possible that we train the model completely and also predict on test data in local mode and use only the predictions file in required layout during the submission like it happens in most of the competitions?

This will avoid running the entire code on the cluster and also less complex process.

shivam · December 4, 2019, 7:05am

Hi, the process it extremely useful in longer run due to multiple reasons. This guarantees the reproducibility of the results and the transparency needed. We also preserves your submissions as docker images which guarantee the code to run forever even if any of the dependency is lost in public internet.

Meanwhile if you are facing any issues in setting up, it will be good to share it with us, so that can be taken care of for your smoother participation.

cc: @mohanty @kelleni2 if you have any additional points

maruthi0506 · December 4, 2019, 7:38am

Shivam,

While I completely agree with you on above points,we feel like people are being judged more for their technical skills in this rather than bringing in analytical insights.Personally I feel comfortable with this approach but there are lot of others who just lose interest in participation just because of this laborious process.For simplicity,all that the team needs is the train and test data sets and final submission layout.Just before the close of the competition,we can ask the teams to upload/deliver their final running code scripts along with the presentations if any.

kelleni2 · December 4, 2019, 9:44am

It would help to go back to the underlying motivation:

We wanted to reduce the chance of visibly fooling ourselves with top solutions including leaked information, rendering them irrelevant for real world decision making.
We wanted all top solutions able to be re-run by the evaluation & project team, to be interrogated for generalizability etc. By design, the kubernetes cluster and git combo enables this.

That said, we also want the best solutions possible for the larger initiative at the end of the event - which is why we were trying to ease some of the frustrations which were blocking some teams.

I discussed with the team, and we would highly encourage to continue to predict on the original test data in the evaluation clusters rather than provide a table of solutions. Especially for the final solution.

However, do what you feel you need to do as a team in order to come up with your optimal solution. But keep in mind, the final leaderboard will change when we add in the hold out test data, and winners will need their model to be validated by the evaluation team, so please make it clear how one would load and interrogate your model.