May I ask how the “reward” in the leaderboard computed? The numbers 152.0, 160.0, 158.0 do not look like averaged values. After checking the reward hierarchy of ObtainDiamond task, I find these numbers are not possible values for reward either. The reward should be an odd number!
Also it looks like it is “Dense” environment, because using evaluate_locally.sh script we’ve got reward for every crafted item, and after replacing “ObtainDiamond” with “ObtainDiamondDense” we’ve got reward only once per item.
Hi,
- We are using “ObtainDiamond” environment.
- The reward currently displayed is the sum of rewards in every episodes, which is wrong. It will be updated to average reward shortly.
As @ermekaitygulov commented, when I ran the ObtainDiamond env in my local machine, the agent was awarded every time it obtains items in the hierarchy.
Besides, I find a bug in the environment (ObtainDiamond). I manually control the agent by giving the action at each step. When I successfully break a log block, I receive a reward of 2. But there is nothing in the inventory and from the POV, it seems the agent has not collected the log.
obs, rew, done, info = env.step(action)
print(rew)
print(obs['inventory'])
# print output
# 2.0
# {'coal': 0, 'cobblestone': 0, 'crafting_table': 0, 'dirt': 0,
# 'furnace': 0, 'iron_axe': 0, 'iron_ingot': 0, 'iron_ore': 0,
# 'iron_pickaxe': 0, 'log': 0, 'planks': 0, 'stick': 0, 'stone': 0,
# 'stone_axe': 0, 'stone_pickaxe': 0, 'torch': 0, 'wooden_axe': 0,
# 'wooden_pickaxe': 0}
I find the change in inventory is actually 1 frame later than the reward. This lag applies to “crafting” actions too.
Also, the values of the reward are not consistent with the doc. Mining a log block gives reward of 2 instead of 1. This also happens during official evaluation. The rewards of each episode are always even numbers.
There are two ObtainDaimond environments. The competition is evaluated on MineRLObtainDiamond-v0
where rewards are sparse and one time only.
Also the delayed reward you are noticing is simply an ordering of infomation, the flow of states goes, observation (no block), action (move-fwd), reward (2 - for obtaining block). You should see the log present in the next observation!
There seems to be an error with evaluations currently so if you are using the live scoreboard those rewards are twice as large as they should be - we will be fixing this soon!
Thanks for your clarification.
May I ask another question about the “nearbySmelt” action? I am new to Minecraft and I wonder how one can make “coal” using furnace. The closest answer I found is to burn “log” to make “charcoal” but when I manually control the agent and feed “nearbySmelt=2” action (I have all required resources: logs, planks as fuel), nothing happens.
Have you fixed the error? There are entries with rewards at different scale (1.72, 30, 158) on the leaderboard.
We will make an announcement and re-run submissions once we can confirm that extraneous rewards are not present in the ObtainDiamond-v0
environment