I’m not sure what happened in Private LB as all top3 teams from Public LB are outside top3 now.
I know that Public LB was evaluated by non-expert and Private LB by an interior design expert. But how come top10 and top11 can be top2 and top3? Does it mean that Public LB was just random?
The second topic is the evaluation criteria in Private LB as they changed a lot compared to Public LB. I would say that in Private LB expert didn’t pay attention to details, whereas annotators in Public LB have seen ~1k images and are even better trained than the expert (in my opinion).
Here are a few examples
This team has 0.93 geometry. But this image completely changed the height of the floor. According to the rules
cannot be easily altered through basic renovation
image should have 0
for geometry.Also some anomalies below the door. Lastly, subjectively, all furniture all small compared to windows/doors, like for kids → for me it means low realism
Here, for example, the wall was moved to the left (check blue lines). Again
cannot be easily altered through basic renovation
, 0 score. + Everything is so small, not realistic
Here are also images top3. We have a double chandelier, no bed with a tufted headboard, no mirrored furniture. For me low realism and functionality for both images.
Here again, the floor was moved by ~1m, which should be a score 0. Also some new wall on the right that also require a lot of work to build.
That image is really cool. The only things are the wall was moved on the left side + left windows was transformed into a wardrobe.
Here we have: a window in a window, a new door, second chandelier (it is fine, but realism should go low).
Of course, our generation is not perfect either.
Like:
Double-bed (points for prompt, but low realism and functionality)
But, during the course of the competition, we tried to eliminate geometric issues (and for better geometry, we got better Public LB) but it was not worth it.
I know that is only a cherry-pick.
But I would like to ask the organizers to explain this huge shape-up, why evaluation criteria changed so much (again, top10 and top11 in Public are top2 and top3 in Private), and why wall/floor movement, adding a door does not lead to 0 scores (maybe it is, but if I have seen sth in these 3 images, I belive the error can be repeated in other, this was the case for our team). Maybe it would be better if both non-experts and experts were judging the generations.