Questions about test & validation sets

noam_finkelstein · August 16, 2022, 3:05pm

Thanks for setting up this competition! I’ve just gotten set up and it looks interesting. A few quick questions:

Are the testing and validation buildings in phases 2 & 3 from the same location and the same year as the buildings in the training set? If so we have access to weather data for the test and validation sets from the whole year as part of the training data. Can we use this information? Perhaps this is somewhat moot as the provided predictions appear to be “perfect” (Weather Data "Predictions"), but weather data from the full year still contains more information than the predictions.
Can I ask what the reasoning for including the training data set as part of the evaluation criteria in phases 2 & 3 is? In theory, one could write code to “recognize” which buildings are part of the training set, and deploy a strategy optimized off-line for those buildings. Even if this is not done explicitly in the code, it might be done implicitly in learned parameter weights. It seems like it would be simpler to exclude the training set from the evaluation.

Thank you.

joseph_amigo · August 16, 2022, 5:19pm

For you second point, I think it makes no sense to evaluate on the training dataset, as it is possible to overfit this dataset.

ludwigbald · September 1, 2022, 12:34pm

I’ll just add that I agree, it’s not standard practice to evaluate on the training data. Overfitting to the first five buildings should be avoided, not encouraged.

joseph_amigo · September 1, 2022, 1:06pm

@ludwigbald I’ve sent you a private message on aicrowd forum.

mt1 · September 3, 2022, 11:08pm

@kingsley_nweye What I’m not clear about is how the scoring weights are being applied. On the Overview tab it says

The train and validation dataset scores will carry 40% and 60% weights, respectively in the Phase 2 score.

Is this the calculation?

score1 = score on the training dataset
score2 = score on the validation dataset
final_score = 0.4 * score1 + 0.6 * score2

If so, then I agree that overfitting the training data is a problem. However, if the final_score is calculated in a way that encourages a model to coordinate the buildings in both the training and validation set, then I can see how there is still some value in including the training data in the evaluation. Naively overfitting the training data is probably not the best idea in that case.

kingsley_nweye · September 16, 2022, 2:31pm

Thanks for your question @noam_finkelstein. The location, weather and year remain the same across phases. The only thing that changes in each phase is the collection of building files; Building_n.csv since new buildings are added.

Please, take a look at this discussion post regarding your second question.

kingsley_nweye · September 16, 2022, 2:35pm

Thanks for your question @mt1. Please, take a look at this post regarding the addition of a coordination (grid) score and this post regarding the weighting of scores.

mt1 · September 16, 2022, 4:50pm

Awesome, thank you for the detailed answer!