đź’¬ Feedback & Suggestions

We are constantly trying to improve this challenge for you and would appreciate any feedback you might have! :raised_hands:

Please reply to this thread with your suggestions and feedback on making the challenge better for you!

  • What have been your major pain points so far?
  • What would you like to see improved?

All The Best!

Hi there,

I’m new to AIcrowd and was just wondering how can we properly add new topics within the Brick by Brick category? When I click on “New topic” under the “Discussion” tab, I’m redirected to the page shown in the screenshot, but it’s not clear to me how to proceed.

I mainly wanted to ask about the submission rules:

  1. The rules say that only 5 submissions are allowed/considered per 7 days per task. Does that mean that there are multiple tasks in this challenge, with separate submission workflows?
  2. Do failed submissions (e.g. due to errors) count as one of the allowed five?

Thanks for any clarification!

1 Like

Hi Maghnie,

You can create a new post by going to the Discussion tab and clicking the “New Topic” button. Attached is an image for your reference:

  1. The challenge and rules mention that participants can make ten submissions per day. There are no other tracks.

Participants can upload up to ten submissions per day in CSV format. Each submission must adhere strictly to the prescribed format to ensure accurate leaderboard evaluations, reflecting the test set’s real-time performance.

  1. Up to 5 failed submissions are allowed every day.

I hope this helps! :slight_smile:

1 Like

Thank you for the quick reply! My questions about the submission rules are now clarified.

As for adding a new topic, I think there may be some access settings missing from my account.
When I hover over the “New topic” button, it should lead to

  • https://discourse.aicrowd.com/new-topic?category_id=2917

But it redirects me instead to

  • https://discourse.aicrowd.com/c/2917

Not sure what’s going on with the topic creation, but I’m all set for now. :slight_smile:

Hi there,

I have seen that the data you provided are pickled files. Is there the possibility to provide the data in a format that is not using pickled files. e.g. CSV, parquet etc. The main reason behind this request it that I do not like to unpickle files from the internet, since this might execute random code on my machine. See the warning box on the official documentation of pickle from python: pickle — Python object serialization — Python 3.13.1 documentation

you can unserialise the pickle files on the cloud and save them to csv if you are worried about the random code execution on your local machine. For eg google collab

1 Like

Hi there, thanks for hosting this interesting competition!
I have a question about the rounds - what’s the specific purposes for each round?
For example, what will happen after the round#1 deadline?

3 Likes

Hi Patrick,

There is no significant difference between the two rounds. We aim to incorporate any feedback and suggestions shared by participants during Round 1 into Round 2, but there are no changes to the dataset or structure. The final winners will be determined based on the Round 2 leaderboard.

I hope this clarifies. Thank you!

1 Like

hi @snehananavati , can I confirm that

  1. There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?

  2. If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)

  3. What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?

thanks

Hi @chan_jun_hao,

There is an additional test set that is not currently avaliable. The final score will be how the model performs on the current public leaderboard test set and this holdout test set which only becomes avaliable after 3rd Feb?

We have provided all of the features from the test set. However, the scores on the current public leaderboard is only based on a part of the test set. The final score will be based on the full set.

If the above is true, is it possible to know the distribution of the seperation. For eg, (training 20%, public test set 20%, private test set 60%)

Approximately:

  • 20% training set
  • 45% public test set
  • 35% private test set

What is final submission deliverables for round 2 on 3rd Feb? Is it the current format of just the predicted csv, or the entire model pipeline (to be run on the unseen test set)?

The deliverables are:

  • The current format of the predicted CSV
  • The code to the model (for validation purposes)
  • Solution Documentation
1 Like

I don’t quite understand this distribution. There are around 10x samples in the public test set relative to the training set. How can the public test be 45% and the training be 20%? Am I misunderstanding something?

Response from the organisers:

This is indeed not the typical machine learning setup; however, it reflects practical realities. Unlike text or image data, publicly available datasets for buildings are extremely rare, and this is unlikely to change due to privacy concerns. Additionally, the distribution shifts between buildings are significant, driven by differences in size, design, use, legal restrictions, and occupant behaviors.

The goal of this challenge is to test generalisation capabilities. To achieve this, we intentionally moved a significant portion of the data from the training set to the testing set. This allows us to evaluate how well algorithms perform under different distributions. For context, you can think of this as analogous to a weak supervision or semi-supervised learning setup, where extensive time series data is available, but only a subset is labeled.

Thanks Sneha, but the point I am unable to understand is that if the train data is 20% of the total data and the public test set is 45% of the data, would the number of rows in the test set not be 2-3x the number of rows in the train set (since 45 is 2.25 x 20)? However, the number of rows in the public test set is 10x times that in the train set. So my question is: do the 20% and 45% values refer to some other metric rather than the number of files/rows?

1 Like

man above has a point

Response from the organisers:

That’s correct—the 20% and 45% values do not refer to the number of “rows” or chunks. Instead, they refer to the proportion of time series allocated to the private, public, and secret sets during dataset preparation, as described here: AIcrowd | Brick by Brick 2024 | Challenges

Specifically, during the preparation stage:

All data from the three buildings are combined into a single dataset and then segmented into distinct sets for training, leaderboard testing, and secret competition testing.

In this step, approximately 20%, 45%, and 35% of the time series are assigned to the private, public, and secret sets, respectively. However, note that the length of each time series can vary. Later, the dataset undergoes further processing which yield the “rows” and “chuncks” available to the participants:

Time Series Chunking: The dataset is further divided into shorter segments or chunks with durations ranging from 2 to 8 weeks.

I hope this clarifies your concern.

Thank you, this makes sense!

Hi,
When will Round 2 leaderboard open? Submissions currently just say submitted but scores aren’t being shown.
Thanks!

1 Like

Update: the issue is now fixed.

1 Like

hi @snehananavati, are we not allowed to form/join teams in round 2? it says team freeze in effect

Yes, the team freeze has been in effect since 10th January. It typically begins three weeks before the challenge deadline to prevent individuals from submitting duplicate entries through multiple teams.