Hi there, thanks for hosting this interesting competition!
I have a question about the rounds - what’s the specific purposes for each round?
For example, what will happen after the round#1 deadline?
Hi Patrick,
There is no significant difference between the two rounds. We aim to incorporate any feedback and suggestions shared by participants during Round 1 into Round 2, but there are no changes to the dataset or structure. The final winners will be determined based on the Round 2 leaderboard.
I hope this clarifies. Thank you!
hi @snehananavati , can I confirm that
-
There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?
-
If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)
-
What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?
thanks
Hi @chan_jun_hao,
There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?
We have provided all of the features from the test set. However, the scores on the current public leaderboard is only based on a part of the test set. The final score will be based on the full set.
If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)
Approximately:
- 20% training set
- 45% public test set
- 35% private test set
What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?
The deliverables are:
- The current format of the predicted CSV
- The code to the model (for validation purposes)
- Solution Documentation
I don’t quite understand this distribution. There are around 10x samples in the public test set relative to the training set. How can the public test be 45% and the training be 20%? Am I misunderstanding something?
Response from the organisers:
This is indeed not the typical machine learning setup; however, it reflects practical realities. Unlike text or image data, publicly available datasets for buildings are extremely rare, and this is unlikely to change due to privacy concerns. Additionally, the distribution shifts between buildings are significant, driven by differences in size, design, use, legal restrictions, and occupant behaviors.
The goal of this challenge is to test generalisation capabilities. To achieve this, we intentionally moved a significant portion of the data from the training set to the testing set. This allows us to evaluate how well algorithms perform under different distributions. For context, you can think of this as analogous to a weak supervision or semi-supervised learning setup, where extensive time series data is available, but only a subset is labeled.
Thanks Sneha, but the point I am unable to understand is that if the train data is 20% of the total data and the public test set is 45% of the data, would the number of rows in the test set not be 2-3x the number of rows in the train set (since 45 is 2.25 x 20)? However, the number of rows in the public test set is 10x times that in the train set. So my question is: do the 20% and 45% values refer to some other metric rather than the number of files/rows?
man above has a point
Response from the organisers:
That’s correct—the 20% and 45% values do not refer to the number of “rows” or chunks. Instead, they refer to the proportion of time series allocated to the private, public, and secret sets during dataset preparation, as described here: AIcrowd | Brick by Brick 2024 | Challenges
Specifically, during the preparation stage:
All data from the three buildings are combined into a single dataset and then segmented into distinct sets for training, leaderboard testing, and secret competition testing.
In this step, approximately 20%, 45%, and 35% of the time series are assigned to the private, public, and secret sets, respectively. However, note that the length of each time series can vary. Later, the dataset undergoes further processing which yield the “rows” and “chuncks” available to the participants:
Time Series Chunking: The dataset is further divided into shorter segments or chunks with durations ranging from 2 to 8 weeks.
I hope this clarifies your concern.
Thank you, this makes sense!
Hi,
When will Round 2 leaderboard open? Submissions currently just say submitted but scores aren’t being shown.
Thanks!
Update: the issue is now fixed.
Yes, the team freeze has been in effect since 10th January. It typically begins three weeks before the challenge deadline to prevent individuals from submitting duplicate entries through multiple teams.
oic, thanks for the clarification!
Could you please clarify the evaluation metrics used for scoring? The competition description mentions “F1 micro”, but the description of it ("This involves computing precision and recall for each label) matches the definition of “F1 macro”.
F1 micro typically means calculating metrics globally by counting the total true positives, false negatives and false positives.
Thank you!
Hi Thomas,
Thank you for pointing that out. You are correct—the metric used is Macro F1.
The typo has now been corrected.
Thanks for clarifying! Please note that the description still contains references to “micro”:
Not a big deal, just FYI
Hi, I have a question I’d like to ask: Can the final winning team participate in the workshop remotely?
(post deleted by author)