This challenge is super awesome! Thank you so much! I love it!
One thing for improvement: I would have liked to see role, race, gender and alignment in the Observation (blstats).
I found it very nice that you provided the starter kit, so that I was able to get good scores without heavy hardware setup. My laptop is very old
One thing that I do not like at all: the heavy score variance. With one and the same model I get scores from 400 to 500 (median). The mean score is also highly different on each evaluation. I hope you increase the final evaluation to take more than 2048 episodes. Or, you may use a more stable distribution for the roles, e.g. 20 times Tourist, 20 times Healer etc. or so…