MineRL self._actions


I’m currently trying to tweak the codes in RL_plus_script.py at RL baseline code.

I’m totally new to Minecraft with a little bit knowledge in RL.

Can you guys please give me any tips to overcome current state?

I currently trained the RL script with my local laptop DELL XPS-15-7590’s GeForce GPU and it shows about reward 2 for the mean_episode average reward. And as the guide line I tried to set each component in self._actions as the concatenation of single actions

e.g) self._actions = [(‘foward’, 1), (‘attack’, 1), (‘jump’, 1), (‘camera’ : …) and etc]

Is there any tips for this? I am really shocked to the results at leaderboard.

The docstring of the class ActionShaping() should be enough to figure out how to adjust the actions for the RL part of the algo. What changes do you want to make and what have you tried?
Maybe playing Minecraft for a bit or watching a youtube guide would help with Minecraft knowledge?

Hello karolisram,

For now I have tried to train on ‘Treechop-v0’ env and my self._actions were as below.

    _s = ('sprint', 1)
    _j = ('jump', 1)
    _a = ('attack', 1)
    _c_up = ('camera', [-random.randint(0, camera_angle*2), 0])
    _c_lr = ('camera', [0, -random.randint(-camera_angle, camera_angle)])
    _f = ('forward', 1)
    _c_reset = ('camera', [0, 0])
    # TODO : 이 부분을 내가 잘 바꿔야 함
    self._actions = [ 
        [_c_reset, _f, _j, _a, _a, _a, _a, _a], 
        [_a for _ in range(5)], 
        [_c_lr, _a, _a, _a, _a, _a], 
        [_c_up,_c_lr, _a, _a, _a, _a, _a], 
        [_f, _j, _a, _a, _a, _a, _a], 
        [_c_reset, _c_up, _c_lr, _a,_a,_a,_a,_a] 

My objective is to show good performance on ‘MineRLObtainDiamond-v0’

I have also watched Youtube video and played the game by myself and also apply interactive mode to see what my agent is doing. But I still don’t get it.

Ah I see the issue now. I think the confusion comes from line 121 in RL_plus_script.py:
[('forward', 1), ('jump', 1)]
This line doesn’t mean two actions, forward on first tick and then jump on the next tick. Instead it means that the forward and jump keys are both pressed for a single tick.

You can see that by printing out act = env.action_space.noop():

OrderedDict([('attack', 0),
             ('back', 0),
             ('camera', array([0., 0.], dtype=float32)),
             ('forward', 0),
             ('jump', 0),
             ('left', 0),
             ('right', 0),
             ('sneak', 0),
             ('sprint', 0)])

This is a single action that does nothing, because none of the keys are pressed. If you then do:

act['forward'] = 1
act['jump'] = 1

act will become an action with those two buttons pressed. This is what the ActionShaping() wrapper does. To create meta actions that perform 5 attacks and such you will need to do something else. Maybe frame skipping would be an easier way to achieve that?