Is it allowed to use pseudo-labeling on Unlabeled dataset and train model on those labels during Purchase Phase ?


I guess this is what we are supposed to do. Train a model on given 5000 images, use the model to predict on 10000 unlabeled images, pick the BEST ONES and retrain or continue training the model on purchased images, save it, load it in prediction phase and you are done.

The only interesting part of challenge is to figure out the BEST ONES. Random picking is not a good idea, infant if a participant uses random purchases then he/she shouldn’t be winning because then the problem will remain unsolved.

The question is whether we can use pseudo-labels for training in addition to purchased ones?

1 Like

I hope not. It will nullify the research question. It’s not a semi supervised task.

In my opinion, there’s nothing been written that indicates we are not allowed to use pseudo-labels in addition. It’s also very much in the spirit of the problem statement:

Which labels do we need to purchase?

Obviously not the ones for images that we already get for free by using pseudo-labeling. Another valid (though unlikely) answer would be: none of them. If you are so good that you can come up with a very good self-learning/semi-supervised algorithm that doesn’t need them, more power to you! From a business perspecitive I would be very happy to have such an algorithm.


Indeed. I don’t see why we can’t use pseudo-labels. One of the naive approach for the purchase policy is to predict all images. 1) If the confidence is high => use the pseudo-label 2) if the confidence is low => purchase the label.