đź’¬ Feedback & Suggestions

hi @snehananavati , can I confirm that

  1. There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?

  2. If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)

  3. What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?

thanks

Hi @chan_jun_hao,

There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?

We have provided all of the features from the test set. However, the scores on the current public leaderboard is only based on a part of the test set. The final score will be based on the full set.

If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)

Approximately:

  • 20% training set
  • 45% public test set
  • 35% private test set

What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?

The deliverables are:

  • The current format of the predicted CSV
  • The code to the model (for validation purposes)
  • Solution Documentation
1 Like

I don’t quite understand this distribution. There are around 10x samples in the public test set relative to the training set. How can the public test be 45% and the training be 20%? Am I misunderstanding something?

Response from the organisers:

This is indeed not the typical machine learning setup; however, it reflects practical realities. Unlike text or image data, publicly available datasets for buildings are extremely rare, and this is unlikely to change due to privacy concerns. Additionally, the distribution shifts between buildings are significant, driven by differences in size, design, use, legal restrictions, and occupant behaviors.

The goal of this challenge is to test generalisation capabilities. To achieve this, we intentionally moved a significant portion of the data from the training set to the testing set. This allows us to evaluate how well algorithms perform under different distributions. For context, you can think of this as analogous to a weak supervision or semi-supervised learning setup, where extensive time series data is available, but only a subset is labeled.

Thanks Sneha, but the point I am unable to understand is that if the train data is 20% of the total data and the public test set is 45% of the data, would the number of rows in the test set not be 2-3x the number of rows in the train set (since 45 is 2.25 x 20)? However, the number of rows in the public test set is 10x times that in the train set. So my question is: do the 20% and 45% values refer to some other metric rather than the number of files/rows?

1 Like

man above has a point

Response from the organisers:

That’s correct—the 20% and 45% values do not refer to the number of “rows” or chunks. Instead, they refer to the proportion of time series allocated to the private, public, and secret sets during dataset preparation, as described here: AIcrowd | Brick by Brick 2024 | Challenges

Specifically, during the preparation stage:

All data from the three buildings are combined into a single dataset and then segmented into distinct sets for training, leaderboard testing, and secret competition testing.

In this step, approximately 20%, 45%, and 35% of the time series are assigned to the private, public, and secret sets, respectively. However, note that the length of each time series can vary. Later, the dataset undergoes further processing which yield the “rows” and “chuncks” available to the participants:

Time Series Chunking: The dataset is further divided into shorter segments or chunks with durations ranging from 2 to 8 weeks.

I hope this clarifies your concern.

Thank you, this makes sense!

Hi,
When will Round 2 leaderboard open? Submissions currently just say submitted but scores aren’t being shown.
Thanks!

1 Like

Update: the issue is now fixed.

1 Like

hi @snehananavati, are we not allowed to form/join teams in round 2? it says team freeze in effect

Yes, the team freeze has been in effect since 10th January. It typically begins three weeks before the challenge deadline to prevent individuals from submitting duplicate entries through multiple teams.

oic, thanks for the clarification!

Could you please clarify the evaluation metrics used for scoring? The competition description mentions “F1 micro”, but the description of it ("This involves computing precision and recall for each label) matches the definition of “F1 macro”.

F1 micro typically means calculating metrics globally by counting the total true positives, false negatives and false positives.

Thank you!

Hi Thomas,
Thank you for pointing that out. You are correct—the metric used is Macro F1.
The typo has now been corrected.

Thanks for clarifying! Please note that the description still contains references to “micro”:

Not a big deal, just FYI

Hi, I have a question I’d like to ask: Can the final winning team participate in the workshop remotely?

Hi there,

We would like some clarifications on the labeling structure.

The description says:

Labeling Structure

The labelling adheres to a modified version of Brick schema version 1.2.1, featuring 94-point sub-classes. Each data point is classified with multiple label types:

  • Positive Labels: The true label and its parent classes.

  1. What is that modified version of the Brick Schema?
  2. What are the 94 point subclasses? Or how does that correspond to the 240 classes of the train_y csv? Are classes with a child not counted into these 94 classes (i.e. the 94 classes are lowest level)?
  3. If all the datapoints are subclasses of Point and the parent classes are labeled as True. Why is the Point class in train_y -1 for all samples? Shouldn’t it be 1 because it is the parent class of everything?

Thanks for the clarifications!

2 Likes

Hi @mike_q

Remote presentations will not be facilitated by the conference, so all attendees are expected to present in person. Each participant is responsible for their own conference registration. For those whose papers are accepted – typically among the top submissions – their registration will include the paper publication fee. This process is similar to submitting a paper to a conference and having it published. The competition organisers undertake this work on a completely voluntary basis, with no personal gain involved.

Attendance at the Sydney conference is mandatory for those awarded travel grants. They must present their paper in person, after which the travel grant will be disbursed following their attendance.

1 Like

Hi, thanks for reply, it means that if the top team wants to get the prize money, they must submit a short paper and register for the conference and publish the fees. I think the prize money of the top team may not cover the conference registration fees?
â‘ Could you list how much the conference registration fees and publishing fees will cost in total?
②The top team’s paper will be treated as a workshop paper or a main conference paper?
③At before, I didn’t have to pay the conference registration fee when I participated in the NIPS competition because of the competition was just a workshop. And this competition must also be a workshop, I think winner don’t have to register for the conference to get the prize money. If winner don’t want pulic paper and no want to pay for registration fees, can they get prize?

Wait for your reply, thanks!