Is this a fully unsupervised clustering challenge

@jason_brumwell Is this suppose to be a fully unsupervised clustering problem, or can we manually cluster some folders and use those to train a clustering algorithm?
Or can we use external data to train a model, which we then apply to this problem?

Thank you for your help,
Nuno

@ngcferreira thank you for the question :slight_smile: Great questions, I will answer here and then update the overview page and rules.

The solution is only limited in that it should not be directly tied to the dataset and should perform relative in performance when additional teams are added to the dataset. The code also has to be reproducible and manually going through the challenge and training specific teams I think would be bypassing the spirit of the challenge.

Thank you very much for your fast answer. I should have asked this before, instead of trying to go with unsupervised learning. Hope I still have time to train a supervised model.

Is it fair to assume that the final leader board will be based on the performance on a private test set, and not on the one used for the current leader board?

Thank you,
Nuno

@jason_brumwell just so I’ve understood your reply clearly, we can use external datasets and (or) create a dataset on our own, for training a supervised model, so long as we don’t hand label the current dataset provided by you?

Secondly, is there a private test set in the challenge since you’ve mentioned “when additional teams are added”? If there isn’t a private test set can you explain what you mean by this or is this just a general statement?

Any clarity on this would be greatly appreciated.

Thanking you,
Rohit

@ngcferreira Your welcome Nuno, I have heard this statement a couple of times now and we are looking at extending the challenge timeline.

@rohitmidha23 Yes the challenge is that your faced with 10 images of hockey players that belong to two teams. You have to separate them as accurately as possible.

As for a second dataset we are in the process of generating one to ensure that the final solutions perform relatively the same when new teams are added.

1 Like

@jason_brumwell Great to hear that there will be an extra dataset, otherwise people could hand label all the images and train a model on those which would perform really bad and new data would be added.
Time extension would be great :slight_smile:

@ngcferreira We have extended the challenge until Sept 30th :slight_smile:

Just to understand better, even if you cluster a few images to generate labelled datasets and then train a supervised model, won’t the supervised model fail when its evaluated on the new teams that are there in the test set? Just trying to figure out if this is a clustering or a classification problem

@jason_brumwell Thank you very much Jason. Let’s see what I can come up with :slight_smile:

@magicaditya I think both approaches can work. Not sure which one will give the best results. I will see if I manage to try both, and see which one behaves better.