Round3-vocabulary.txt contains strange vocabularies

nissy-dev · January 22, 2021, 2:37am

According to round 3 announcement, I understand all vocabularies we should predict are in round3-vocabulary.txt. (So I understand we don’t need to predict 109 vocabularies, is it right?)

I confirmed round3-vocabulary.txt, but I found strange lines, dty and hyacinthvegetable. These are correct vocabularies? These are not included in vocabulary.txt.

I seem dty is a typo of dry and hyacinthvegetable is combined with hyacinth and vegetable.

mohanty · January 22, 2021, 3:33pm

@koupenchan: Thanks for pointing this out. We have updated the round3-vocabulary.txt file, and also re-evaluated all the previous submissions.

Best of luck,
Mohanty

hjuinj · January 25, 2021, 7:06pm

It seems in the train.csv, some of the labels are outside the vocabulary of round 3, can you please fix this?

mohanty · January 25, 2021, 7:10pm

@hjuinj: The train.csv is something that we have consistently used across the previous rounds, and hence changing the same can create some confusion.
If filtering the invalid vocab items for round 3 is adding friction, we would be very happy to upload a filtered version.
We will update this thread as soon as its done.

Best,
Mohanty

hjuinj · January 25, 2021, 7:12pm

thanks for the swift reply, it is okay, I can to the filtering myself.
I just thought it made sense to exclude labels outside of the actual vocab but I understand your reasoning.

hjuinj · January 28, 2021, 3:33pm

Question: have you made sure that now in round 3, with reduced vocab set, the test set molecules also have the out-of-vocab labels removed from the reference answer?

mohanty · January 28, 2021, 4:00pm

@hjuinj: Yes, we have.

hjuinj · January 28, 2021, 9:36pm

Thank you for your reply.

two other questions:

it seems it is not possible to do both training and then prediction in my submission (https://gitlab.aicrowd.com/hjuinj/learning-to-smell-starter-kit/-/issues/18) is this the case or did this submission of mine fail for some other reason?
there seems to be an limit on the size of model file that I can push. What is the upper limit? is it that in total I cannot exceed this limit or is it an limit per model file?

shivam · January 28, 2021, 9:41pm

Hi @hjuinj,

Yes, in this challenge only the testing phase is run online, while you need to do offline training and push your models.
Yes, the normal maximum file size is 50MB. For uploading larger file, you can submit files of size up to GBs via git lfs. In case you are not familiar with it can check quick help doc here: How to upload large files (size) to your submission

hjuinj · January 28, 2021, 9:43pm

thank you for the reply.

Can I follow up on the first point, what is the issue with doing the training and testing online? Is it due to me not being able to write the trained model in the container?

shivam · January 28, 2021, 10:03pm

Hi @hjuinj,

Launching training phase online means putting up limitations in term of time available to train, resources requirement for this challenge and so on upon participants.

We are not concerned about time taken or other factors in training phase (in this challenge), due to which it is kept offline, giving participants the flexibility of playing around with data, building up their models and so on in their familiar environment.

In case you do not want to train not on your system, you can make use of free compute available via Google Colab and submit directly via Colab.

Meanwhile, this challenge has some awesome community contributed notebooks which can help you in getting pre-setup environment too!

Here: https://www.aicrowd.com/challenges/learning-to-smell/notebooks

hjuinj · January 29, 2021, 9:18am

thank you.

Another question, could you please tell me how I can check the number of submissions I have left in round 3?