Hardcoding "Elbereth"

paul_gamble3 · July 6, 2021, 6:37pm

I’m thinking about hardcoding “engrave Elbereth with finger” as a single action available to my agent.

Two questions for the organizers:

Is this allowed? I believe its mentioned in the original NLE paper as a possibility but you would expect authors to specify that they did it (and other hardcoding) - how does that apply to this competition? Would hardcoding it as an action available to an otherwise standard deep RL model move such a model into the “not using a neural network” track?
If it is allowed, what would be the preferred/allowed level to implement it at? Currently I’ve extended the base NLE gym.Env (in nle/env/base.py) to check for a specific action id (I manually expanded the discrete action space by 1) in step and then run the hardcoded steps to engrave - but this feels pretty hacky. I’m guessing the testing environment won’t allow modifications to the nle package itself so any changes will have to be in the challenge env wrappers.py. Is that correct?

Thanks, and thanks also for the fun competition!

eric_hammy · July 9, 2021, 10:06am

Hey there!

Something like this would be totally allowed by the rules. The best place to do this is NOT to extend the base env - because AIcrowd will always create the NetHackChallenge-v0 env.

As you mention, the best thing to do is write a wrapper for this environment that wraps around NetHackChallenge. I know it feels hacky, but the simplest solution is to write wrapper that would override the step function, accepting a slightly larger action space, and then when this action is taken, stepping through [‘E’, ‘-’, ‘E’, ‘l’, ‘b’, ‘e’, ‘r’,…] - as you mentioned! That would probably be the solution in the fewest lines of code.

The way AI crowd does evaluations is simple: they trigger the running what ever is in run.sh, and wait for you to generate rollouts using aicrowd_gym.make('NetHackChallenge-v0'), and they keep track of the scores for each env (which they build for you as above). What you subsequently do with the env is up to you, so adding wrappers is totally fine, as long as at its heart you are running on the right gym environment.

If you are looking at doing this in the starter kit, look at submission_config.py and the variable MAKE_ENV_FN. This is the function that is called to create the environment by the rollout.py and you can see there is already a wrapper which adds a TimeLimit. As you’ve already noticed, this wrapper is in env/wrappers.py and is a sensible place to put your wrappers.

Vis a vis which track you would be entered into if you added this composite action but then did Deep RL… I don’t think this would constitute a symbolic bot, and would still be using a neural network. Hope this helps

paul_gamble3 · July 9, 2021, 2:27pm

Thanks Eric, very helpful! One quick follow-up question:

In the wrapper, we’ll have to use aicrowd_gym.make(“NetHackScore-v0”) as you said, but my understanding is that by default that env enters ‘More’ whenever its prompted by a menu.

At testing time, will we be able to pass the argument allow_all_modes=“True” to gym.make in our wrapper? Is there a better/preferred way of enabling agent interaction with menus?

Thanks again!

dipam · July 12, 2021, 7:07am

Hi @paul_gamble3, currently the settings for gym.make cannot be changed. As Eric suggested, you have to have code outside the environment in a wrapper to handle the specific situations you want to handle.