🧞 Requesting Feedback and Suggestions

dipam · August 19, 2021, 5:52am

Dear Participants,

We are constantly trying to make this challenge better for everyone and would really appreciate your feedback.

Feel free to reply to this thread with your suggestions and feedback on improving the challenge for you!

What have been your major pain points so far?
What would you like to see improved?

Cheers!

vrv · August 19, 2021, 5:53am

anssi · August 19, 2021, 10:11am

Things have been pretty smooth so far, with aicrowd-gym having some growing pains (Issues with Pillow and Iterating over Dict spaces in aicrowd-gym) . However as long those issues can be worked around, things are cool!

maciej_sypetkowski · September 16, 2021, 3:09pm

Hey @dipam,

I just gave some feedback about tight time limits in this comment.

dipam · September 16, 2021, 3:55pm

Hi @maciej_sypetkowski,

I agree to some of your points and have shared it with the nethack team, we’ll discuss it and may increase the time if they agree. Thanks for the feedback.

martinathome · September 16, 2021, 5:14pm

I would like to see continued improvement in returning log information to submitters. We’re seeing more than we were before a recent bugfix, but here’s the log section from Panic submission 46

paul_puntschart · October 1, 2021, 2:13pm

This challenge is super awesome! Thank you so much! I love it!

One thing for improvement: I would have liked to see role, race, gender and alignment in the Observation (blstats).

I found it very nice that you provided the starter kit, so that I was able to get good scores without heavy hardware setup. My laptop is very old

One thing that I do not like at all: the heavy score variance. With one and the same model I get scores from 400 to 500 (median). The mean score is also highly different on each evaluation. I hope you increase the final evaluation to take more than 2048 episodes. Or, you may use a more stable distribution for the roles, e.g. 20 times Tourist, 20 times Healer etc. or so…

jon_grantham · October 1, 2021, 2:36pm

I believe the finals will use 4096 runs.

I really like the stable distribution idea – maybe next year? One of the things I have considered doing is writing a function to normalize our local scores to an even role distribution, but it has never moved to the top of my priority list.