I agree to some of your points and have shared it with the nethack team, we’ll discuss it and may increase the time if they agree. Thanks for the feedback.
I would like to see continued improvement in returning log information to submitters. We’re seeing more than we were before a recent bugfix, but here’s the log section from Panic submission 46
This challenge is super awesome! Thank you so much! I love it!
One thing for improvement: I would have liked to see role, race, gender and alignment in the Observation (blstats).
I found it very nice that you provided the starter kit, so that I was able to get good scores without heavy hardware setup. My laptop is very old
One thing that I do not like at all: the heavy score variance. With one and the same model I get scores from 400 to 500 (median). The mean score is also highly different on each evaluation. I hope you increase the final evaluation to take more than 2048 episodes. Or, you may use a more stable distribution for the roles, e.g. 20 times Tourist, 20 times Healer etc. or so…
I really like the stable distribution idea – maybe next year? One of the things I have considered doing is writing a function to normalize our local scores to an even role distribution, but it has never moved to the top of my priority list.