💬 Feedback & Suggestions

snehananavati · January 10, 2025, 5:06am

Response from the organisers:

This is indeed not the typical machine learning setup; however, it reflects practical realities. Unlike text or image data, publicly available datasets for buildings are extremely rare, and this is unlikely to change due to privacy concerns. Additionally, the distribution shifts between buildings are significant, driven by differences in size, design, use, legal restrictions, and occupant behaviors.

The goal of this challenge is to test generalisation capabilities. To achieve this, we intentionally moved a significant portion of the data from the training set to the testing set. This allows us to evaluate how well algorithms perform under different distributions. For context, you can think of this as analogous to a weak supervision or semi-supervised learning setup, where extensive time series data is available, but only a subset is labeled.

kaushik_gopalan · January 10, 2025, 5:30am

Thanks Sneha, but the point I am unable to understand is that if the train data is 20% of the total data and the public test set is 45% of the data, would the number of rows in the test set not be 2-3x the number of rows in the train set (since 45 is 2.25 x 20)? However, the number of rows in the public test set is 10x times that in the train set. So my question is: do the 20% and 45% values refer to some other metric rather than the number of files/rows?

chan_jun_hao · January 10, 2025, 7:57am

man above has a point

snehananavati · January 11, 2025, 5:53am

Response from the organisers:

That’s correct—the 20% and 45% values do not refer to the number of “rows” or chunks. Instead, they refer to the proportion of time series allocated to the private, public, and secret sets during dataset preparation, as described here: AIcrowd | Brick by Brick 2024 | Challenges

Specifically, during the preparation stage:

All data from the three buildings are combined into a single dataset and then segmented into distinct sets for training, leaderboard testing, and secret competition testing.

In this step, approximately 20%, 45%, and 35% of the time series are assigned to the private, public, and secret sets, respectively. However, note that the length of each time series can vary. Later, the dataset undergoes further processing which yield the “rows” and “chuncks” available to the participants:

Time Series Chunking: The dataset is further divided into shorter segments or chunks with durations ranging from 2 to 8 weeks.

I hope this clarifies your concern.

kaushik_gopalan · January 11, 2025, 6:28am

Thank you, this makes sense!

jonasmacken · January 12, 2025, 11:50pm

Hi,
When will Round 2 leaderboard open? Submissions currently just say submitted but scores aren’t being shown.
Thanks!

snehananavati · January 13, 2025, 1:23pm

Update: the issue is now fixed.

chan_jun_hao · January 14, 2025, 4:08am

hi @snehananavati, are we not allowed to form/join teams in round 2? it says team freeze in effect

snehananavati · January 14, 2025, 10:24am

Yes, the team freeze has been in effect since 10th January. It typically begins three weeks before the challenge deadline to prevent individuals from submitting duplicate entries through multiple teams.

chan_jun_hao · January 14, 2025, 2:32pm

oic, thanks for the clarification!

thomas_tartiere · January 14, 2025, 11:27pm

Could you please clarify the evaluation metrics used for scoring? The competition description mentions “F1 micro”, but the description of it ("This involves computing precision and recall for each label) matches the definition of “F1 macro”.

F1 micro typically means calculating metrics globally by counting the total true positives, false negatives and false positives.

Thank you!

snehananavati · January 16, 2025, 9:52am

Hi Thomas,
Thank you for pointing that out. You are correct—the metric used is Macro F1.
The typo has now been corrected.

thomas_tartiere · January 16, 2025, 5:25pm

Thanks for clarifying! Please note that the description still contains references to “micro”:

Not a big deal, just FYI

mike_q · January 17, 2025, 2:41am

Hi, I have a question I’d like to ask: Can the final winning team participate in the workshop remotely?

sergio_a · January 22, 2025, 3:23pm

Hi there,

We would like some clarifications on the labeling structure.

The description says:

Labeling Structure

The labelling adheres to a modified version of Brick schema version 1.2.1, featuring 94-point sub-classes. Each data point is classified with multiple label types:

Positive Labels: The true label and its parent classes.

What is that modified version of the Brick Schema?
What are the 94 point subclasses? Or how does that correspond to the 240 classes of the train_y csv? Are classes with a child not counted into these 94 classes (i.e. the 94 classes are lowest level)?
If all the datapoints are subclasses of Point and the parent classes are labeled as True. Why is the Point class in train_y -1 for all samples? Shouldn’t it be 1 because it is the parent class of everything?

Thanks for the clarifications!

snehananavati · January 23, 2025, 8:31am

Hi @mike_q

Remote presentations will not be facilitated by the conference, so all attendees are expected to present in person. Each participant is responsible for their own conference registration. For those whose papers are accepted – typically among the top submissions – their registration will include the paper publication fee. This process is similar to submitting a paper to a conference and having it published. The competition organisers undertake this work on a completely voluntary basis, with no personal gain involved.

Attendance at the Sydney conference is mandatory for those awarded travel grants. They must present their paper in person, after which the travel grant will be disbursed following their attendance.

mike_q · January 23, 2025, 9:10am

Hi, thanks for reply, it means that if the top team wants to get the prize money, they must submit a short paper and register for the conference and publish the fees. I think the prize money of the top team may not cover the conference registration fees?
①Could you list how much the conference registration fees and publishing fees will cost in total?
②The top team’s paper will be treated as a workshop paper or a main conference paper?
③At before, I didn’t have to pay the conference registration fee when I participated in the NIPS competition because of the competition was just a workshop. And this competition must also be a workshop, I think winner don’t have to register for the conference to get the prize money. If winner don’t want pulic paper and no want to pay for registration fees, can they get prize?

Wait for your reply, thanks!

chan_jun_hao · January 23, 2025, 10:45am

hi @snehananavati, can I clarify why was team freeze recently just undone for a few days and frozen again today?

snehananavati · January 23, 2025, 10:53am

We were informed of an issue regarding a participant’s inability to add a team member. Upon review of the logs, it was identified that the request was made before the team freeze, but a technical glitch prevented it from being implemented.

To address the situation and ensure compliance with our processes, we made the necessary adjustments to resolve the issue while maintaining the integrity of the challenge

chan_jun_hao · January 23, 2025, 11:06am

thanks for the clarification!