Is this a fully unsupervised clustering challenge

ngcferreira · August 24, 2020, 9:01pm

@jason_brumwell Is this suppose to be a fully unsupervised clustering problem, or can we manually cluster some folders and use those to train a clustering algorithm?
Or can we use external data to train a model, which we then apply to this problem?

Thank you for your help,
Nuno

jason_brumwell · August 25, 2020, 3:33am

@ngcferreira thank you for the question Great questions, I will answer here and then update the overview page and rules.

The solution is only limited in that it should not be directly tied to the dataset and should perform relative in performance when additional teams are added to the dataset. The code also has to be reproducible and manually going through the challenge and training specific teams I think would be bypassing the spirit of the challenge.

ngcferreira · August 25, 2020, 5:49am

Thank you very much for your fast answer. I should have asked this before, instead of trying to go with unsupervised learning. Hope I still have time to train a supervised model.

Is it fair to assume that the final leader board will be based on the performance on a private test set, and not on the one used for the current leader board?

Thank you,
Nuno

rohitmidha23 · August 25, 2020, 6:44am

@jason_brumwell just so I’ve understood your reply clearly, we can use external datasets and (or) create a dataset on our own, for training a supervised model, so long as we don’t hand label the current dataset provided by you?

Secondly, is there a private test set in the challenge since you’ve mentioned “when additional teams are added”? If there isn’t a private test set can you explain what you mean by this or is this just a general statement?

Any clarity on this would be greatly appreciated.

Thanking you,
Rohit

jason_brumwell · August 25, 2020, 1:19pm

@ngcferreira Your welcome Nuno, I have heard this statement a couple of times now and we are looking at extending the challenge timeline.

@rohitmidha23 Yes the challenge is that your faced with 10 images of hockey players that belong to two teams. You have to separate them as accurately as possible.

As for a second dataset we are in the process of generating one to ensure that the final solutions perform relatively the same when new teams are added.

ngcferreira · August 25, 2020, 2:01pm

@jason_brumwell Great to hear that there will be an extra dataset, otherwise people could hand label all the images and train a model on those which would perform really bad and new data would be added.
Time extension would be great

jason_brumwell · August 25, 2020, 2:21pm

@ngcferreira We have extended the challenge until Sept 30th

magicaditya · August 25, 2020, 5:55pm

Just to understand better, even if you cluster a few images to generate labelled datasets and then train a supervised model, won’t the supervised model fail when its evaluated on the new teams that are there in the test set? Just trying to figure out if this is a clustering or a classification problem

ngcferreira · August 25, 2020, 9:15pm

@jason_brumwell Thank you very much Jason. Let’s see what I can come up with

ngcferreira · August 25, 2020, 9:19pm

@magicaditya I think both approaches can work. Not sure which one will give the best results. I will see if I manage to try both, and see which one behaves better.