Issues regarding observing agent behavior and Trainer/Policy interfaces

Hi, thanks a lot for organizing this competition. I am really excited to get started on trying to solve it!

We have some issues at the moment:

  1. Upon calling gym.make(“SomeMineRLEnv”) it takes a long time for the game to start at all, making it hard to do initial tests. Would it be possible to compile the env with a certain seed and start training immediately if a compiled version is available?

  2. We are only able to observe the behavior of the agent through a tiny 64x64 pixel window. This is useful to see what the agent can actually see but we see two issues:
    i. the text, hearts, and inventory is taking up a disproportionally large space of the visual field in the 64x64 window.
    ii. It is very hard for us to actually see anything in that tiny window. On that note, would it be possible to let each env have a .render() method that renders the complete POV of the agent, instead of the downsampled one? Or could it possibly render an upsampled version of the downsampled 64x64 image? Both would facilitate our work a lot and I think others would profit too.

  3. Are the interfaces for the training and the policy already defined? It would be very useful to know how our system will be trained and evaluated concretely so that we design our software architecture around this.

1 Like

Hi, welcome to the forum!

Upon calling gym.make(“SomeMineRLEnv”) it takes a long time for the game to start at all, making it hard to do initial tests. Would it be possible to compile the env with a certain seed and start training immediately if a compiled version is available?

We are working on this feature. Most of the time is spent launching Malmo and Forge, so leaving the Minecraft window open between runs is possible just requiring the environment to be made in between runs which incurs a small delay. @william_guss can speak more of this.

We are only able to observe the behavior of the agent through a tiny 64x64 pixel window. This is useful to see what the agent can actually see but we see two issues:
i. the text, hearts, and inventory is taking up a disproportionally large space of the visual field in the 64x64 window.
ii. It is very hard for us to actually see anything in that tiny window. On that note, would it be possible to let each env have a .render() method that renders the complete POV of the agent, instead of the downsampled one? Or could it possibly render an upsampled version of the downsampled 64x64 image? Both would facilitate our work a lot and I think others would profit too.

I believe re-sizing the default window should be possible, but we are resource limited currently. There is a git-hub issue related to this here with a wrapper to fix it.

Are the interfaces for the training and the policy already defined? It would be very useful to know how our system will be trained and evaluated concretely so that we design our software architecture around this.

The interface will be released with the AICrowd submission framework - apologies for any delay, perhaps @mohanty could point you too a representative demonstration.

i. the text, hearts, and inventory is taking up a disproportionally large space of the visual field in the 64x64 window.

Click on minecraft window and press F1, this should hide the hearts and hunger. But you will have to do this every time you run the test.