🧞 Requesting Feedback and Suggestions

Dear Participants,

We are constantly trying to make this challenge better for everyone and would really appreciate your feedback. :raised_hands:

Feel free to reply to this thread with your suggestions and feedback on improving the challenge for you!

  • What have been your major pain points so far?
  • What would you like to see improved?

Cheers!

6 Likes

Things have been pretty smooth so far, with aicrowd-gym having some growing pains (Issues with Pillow and Iterating over Dict spaces in aicrowd-gym) :slight_smile: . However as long those issues can be worked around, things are cool!

2 Likes

Hey @dipam,

I just gave some feedback about tight time limits in this comment.

Hi @maciej_sypetkowski,

I agree to some of your points and have shared it with the nethack team, we’ll discuss it and may increase the time if they agree. Thanks for the feedback.

3 Likes

I would like to see continued improvement in returning log information to submitters. We’re seeing more than we were before a recent bugfix, but here’s the log section from Panic submission 46

image

2 Likes

This challenge is super awesome! Thank you so much! I love it!

One thing for improvement: I would have liked to see role, race, gender and alignment in the Observation (blstats).

I found it very nice that you provided the starter kit, so that I was able to get good scores without heavy hardware setup. My laptop is very old :slight_smile:

One thing that I do not like at all: the heavy score variance. With one and the same model I get scores from 400 to 500 (median). The mean score is also highly different on each evaluation. I hope you increase the final evaluation to take more than 2048 episodes. Or, you may use a more stable distribution for the roles, e.g. 20 times Tourist, 20 times Healer etc. or so…

1 Like

I believe the finals will use 4096 runs.

I really like the stable distribution idea – maybe next year? One of the things I have considered doing is writing a function to normalize our local scores to an even role distribution, but it has never moved to the top of my priority list.