We had mentioned in the competition’s Overview that as the competition progressed in phase, previous phase
Average Score would be weighted. Specifically:
By Phase II, the leaderboard will reflect the ranking of participants' submissions based on an unseen 5/17 buildings validation dataset as well as the seen 5/17 buildings dataset. The train and validation dataset scores will carry 40% and 60% weights, respectively in the Phase 2 score.
Finally in Phase III, participants' submissions will be evaluated on the 5/17 buildings training, 5/17 validation and remaining 7/17 test datasets. The train, validation and test dataset scores will carry 20%, 30% and 50% weights, respectively in the Phase 3 score.
However, this has not been the case yet.
In Phase I, the schema contained the 5-building train dataset and
Average Score was then calculated as:
According to the original description in the Overview page,
Average Score in Phase II was to be calculated as:
This meant that 2 simulations were to be run for each submission. 1 using
5-building train dataset schema and another using
5-building train + 5-building validation dataset schema with their
Average Score weighted 40%/60%. However, there are no weights currently applied and only 1 simulation that uses
5-building train + 5-building validation dataset schema is being run. Hence,
Average Score in Phase II is calculated as:
This was error by the organizers and was not intentional. We will however, not make any changes yet since Phase II is under way to remain fair.
Average Score weighting will be applied from the beginning of Phase III. Please, see the next section for more information on Phase III weighting.
In Phase III, the weighting of
Average Score will begin. There will be 2 leaderboards; a private and a public leaderboard.
The public leaderboard is what will be displayed here and
Average Score will be calculated as:
The highlight here is that it will only include buildings from the train and validation datasets. 2 simulations will be run and their
Average Score will be weighted 40%/60%. The first simulation will use
5-building train dataset schema and the second will use
5-building validation dataset schema. The
5-building train dataset is excluded from the second simulation to avoid biasing the
5-building train dataset that is made public and can be overfitted to.
The private leaderboard will be visible to only the organizers and will be used to decide the competition’s winners. It will only be made public at the time of announcing the winners. The
Average Score in the private leaderboard will be calculated as:
The highlight here is that it will include buildings from the train, validation and test datasets. 3 simulations will be run and their
Average Score weighted as 20%/30%/50%. The first simulation will use
5-building train dataset schema, the second will use
5-building validation dataset schema and the third will use
5-building test dataset.