Experiments with “unlabelled” data

Here are just my results. I used the same model, but different purchase modes.

  1. Train with initial 5000 images only: LB 0.869
  2. Add 3000 random images from unlabelled dataset: 0.881
  3. “smart” purchasing (at least non random): 0.888

So we see, that using some “smart” purchasing is helpful, but not so many, maybe ~0.01.
Probably tuning models would be more helpful to push further.


@sergey_zlobin : Thanks for your information.

I am wondering if you tried to purchase all 10K then what the score could be.

I wrote scores from the leaderboard. I can’t check 10K there…
Local scores are a little bit higher than LB, but correlated with LB.
Yeah maybe I’ll check it locally.

I’ve checked it locally.
Using all 10K images is better than my 3K choosing by 0.006. Maybe I can take some of it by changing purchasing algorithm. But still I feel I need to tune my model.

yeah, looks like your model/training pipeline is limiting you. You should be able to get a much bigger improvement from using all labels. Maybe try a bigger model and tune it a bit.

Do you mean the reported score by @sergey_zlobin during the pre-training phase? or with the purchase phase?

I mean the addition of 10000 vs 3000 shouldn’t result in only such a modest improvement.

Aah, yes. With the post. I think maybe there are only some ~4000-5000 images good enough for improving the system and therefore adding all 10000 doesn’t make much improvement because the basic model will predict them correctly anyway.
It’s just a theory, not tested. I haven’t touched the purchase part yet.
Thanks for your insight.

yes, it’s very significant.

From my experiment notebook its something like this :

exp no. augmentation pretrained purchase_method score_pretraining_phase score_purchase_phase score_validation_phase LB_Score
1 NO NO NO 0.773 0.773 0.760
2 NO NO RANDOM 3000 0.773 0.804 0.760
3 NO NO ALL 10000 0.773 0.841 0.835
4 NO YES NO 0.857 0.857 0.850
5 NO YES RANDOM 3000 0.857 0.864 0.845 0.851
6 NO YES ALL 10000 0.857 0.892 0.875
7 YES YES NO 0.868 0.868 0.865
8 YES YES RANDOM 3000 0.868 0.886 0.869 0.880
9 YES YES ALL 10000 0.868 0.902 0.893

the notebook :

1 Like