[Announcement] Average Score Weighting

kingsley_nweye · September 16, 2022, 2:20pm

We had mentioned in the competition’s Overview that as the competition progressed in phase, previous phase Average Score would be weighted. Specifically:

By Phase II, the leaderboard will reflect the ranking of participants' submissions based on an unseen 5/17 buildings validation dataset as well as the seen 5/17 buildings dataset. The train and validation dataset scores will carry 40% and 60% weights, respectively in the Phase 2 score.
Finally in Phase III, participants' submissions will be evaluated on the 5/17 buildings training, 5/17 validation and remaining 7/17 test datasets. The train, validation and test dataset scores will carry 20%, 30% and 50% weights, respectively in the Phase 3 score.

However, this has not been the case yet.

How was average score weighted in Phase I?

In Phase I, the schema contained the 5-building train dataset and Average Score was then calculated as:

\textrm{Average Score}_\textrm{Phase I} = \textrm{Average Score}^\textrm{5-building train dataset}

How is average score weighted in Phase II?

According to the original description in the Overview page, Average Score in Phase II was to be calculated as:

\textrm{Average Score} = 0.4 \times \textrm{Average Score}^\textrm{5-building train dataset} + 0.6 \times \textrm{Average Score}^\textrm{5-building train + 5-building validation dataset}

This meant that 2 simulations were to be run for each submission. 1 using 5-building train dataset schema and another using 5-building train + 5-building validation dataset schema with their Average Score weighted 40%/60%. However, there are no weights currently applied and only 1 simulation that uses 5-building train + 5-building validation dataset schema is being run. Hence, Average Score in Phase II is calculated as:

\textrm{Average Score}_\textrm{Phase II} = \textrm{Average Score}^\textrm{5-building train + 5-building validation dataset}

This was error by the organizers and was not intentional. We will however, not make any changes yet since Phase II is under way to remain fair.

When will average score weighting begin?

Average Score weighting will be applied from the beginning of Phase III. Please, see the next section for more information on Phase III weighting.

How will average score be weighted in Phase III?

In Phase III, the weighting of Average Score will begin. There will be 2 leaderboards; a private and a public leaderboard.

Public leaderboard

The public leaderboard is what will be displayed here and Average Score will be calculated as:

\textrm{Average Score}^{Public}_\textrm{Phase III} = 0.4 \times \textrm{Average Score}^\textrm{5-building train dataset} + 0.6 \times \textrm{Average Score}^\textrm{5-building validation dataset}

The highlight here is that it will only include buildings from the train and validation datasets. 2 simulations will be run and their Average Score will be weighted 40%/60%. The first simulation will use 5-building train dataset schema and the second will use 5-building validation dataset schema. The 5-building train dataset is excluded from the second simulation to avoid biasing the 5-building train dataset that is made public and can be overfitted to.

Private leaderboard

The private leaderboard will be visible to only the organizers and will be used to decide the competition’s winners. It will only be made public at the time of announcing the winners. The Average Score in the private leaderboard will be calculated as:

\textrm{Average Score}^{Private}_\textrm{Phase III} = 0.2 \times \textrm{Average Score}^\textrm{5-building train dataset} + 0.3 \times \textrm{Average Score}^\textrm{5-building validation dataset} + 0.5 \times \textrm{Average Score}^\textrm{5-building test dataset}

The highlight here is that it will include buildings from the train, validation and test datasets. 3 simulations will be run and their Average Score weighted as 20%/30%/50%. The first simulation will use 5-building train dataset schema, the second will use 5-building validation dataset schema and the third will use 5-building test dataset.

Kafka · September 19, 2022, 7:58am

Hi,

Thanks for the update ! Still I have some concerns:

Since the leaderboard in Phase 2 is totally different from the Phase 3, would it be possible to update the current leaderboard just as the way in Phase 3 ? Otherwise it is meaningless to keep submitting on it. I think the difference is just the weighting of 40% and 60%.
Will the winner be decided by the last submission in Phase 3 or the highest score in private leaderboard among all submissions in Phase 3 ?