Meta-information in human demonstrations samples



I have a question regarding the samples from the human demonstrations:
When using
data ='MineRLObtainDiamond-v0')
one can only get obs, rew, done, act, but no information about from which demonstration this particular sample actually belongs.

Is there a way to know the meta-information of the demonstration from which the sample was taken? (i.e. quality of the demonstration, file name, etc…).

If not, would be possible to provide this functionality? It would be very useful.

Thanks for all the effort in setting up this challenge. We are very excited :slight_smile:


Shooting this to @BrandonHoughton

Thanks again for being so patient with us :slight_smile:


Yes this is a great idea. Currently the data is organized by separating each sample into a folder with three files, the compressed video, the numpy actions and observations, and a json containing some metadata. I’ll make a git-hub issue to track this feature, we will add this when we get the chance or we welcome pull requests for this feature!


Thanks for the quick answer :slight_smile:

I have another question regarding the inventory data in the human demonstrations:

In the environment ‘MineRLObtainDiamond-v0’, the agent begins without any items. However, in the file


when loaded with
example_inv = np.load(file, allow_pickle=True, mmap_mode='r')

the first entry in the ‘observation_inventory’ is

> [ 0 1 0 0 0 0 0 0 13 0 0 3 0 1]

meaning that the agent already has a crafting table, 13 planks, 3 stone pickaxe and 1 wooden pickaxe.

Am I loading the data in a wrong way?
Am I making any mistake in the interpretation of the inventory?

Same happens in several (41) demonstrations, for ‘MineRLObtainDiamond-v0’, e.g.:

[11  0  3  0  0  0  1  0  0 14  3  3  0  0]
[ 2  0 14  0  1  0  3  0 17  2  1  0  0  1]
[ 1  0  1  0 11  0  3  7  2  0  0  1  4  1]

Thanks for the support


Thanks for raising that to our attention - does this happen with the next time-step as well? It could be left-over from their last attempt. I will create an issue for this as well


Tracking new information at #40 and issue with agents starting with non-zero inventory at #41

1 Like

Yes, it happens in the next time-steps as well.
For example, in this demonstration:
it looks like the sequence might be shifted. Also, the transitions 2-3 and 3-4 look weird to me:

(1) The inventory start like this:
[ 1 0 1 0 11 0 3 7 2 0 0 1 4 1]
(2) in time-step 4025 changes to this:
[ 1 1 1 0 11 0 2 7 2 0 0 1 4 1]
(3) in 4040 to
[ 0 0 0 0 0 0 0 19 0 0 0 0 0 0]
(4) and finally, in time-step 4058 to
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]

The total length of the sequence is 21471.

I hope you find this information helpful :slight_smile:


We also found some inconsistencies with the reward.
When looking at demonstrations marked as success=true only three have a cumulative reward >0 (overall only 5 demonstrations have any at all).
Furthermore, for those that have reward it looks strange.
A cumulative reward of 1025, composed of one log and one diamond, should not be possible in my understanding.
Is this related to the inventory issue?

player_stream_agonizing_kale_tree_nymph-20_289-7919: reward = 1025
player_stream_agonizing_kale_tree_nymph-20_16180-39256: reward = 16

Thanks for looking into this :slight_smile:

1 Like

I noticed you are working with MineRLObtainDiamond-v0 - if you try the same exploration with MineRLObtainDiamondDense-v0 does it have reasonable rewards? The second plot you shared here should not be possible in as single stream of MineRLObtainDiamond-v0


Yes we are working with the sparse environment and most reward sequences look inplausible or are zero for the whole sequence.
Here’s the same plot for the dense demonstration which looks reasonable (please ignore the filepath in the previous plot, that was a small mistake).


In addition to this query, within the accompanying paper, meta-data is mentioned as containing detailed labels:

“Additionally, trajectory meta-data includes timestamped markers for hierarchical labelings; e.g. when a house-like structure is built or certain objectives such as
chopping down a tree are met.”

However, the data available only includes meta-data with the following attributes:

{“success”: true, “duration_ms”: 152535, “duration_steps”: 3051, “total_reward”: 64.0}

Is this correct or is there another location where more detailed labels are located for each data item?

Thanks in advance



We have been using the rewards to demonstrate those sub-goals being accomplished, so currently there is no such axillary location for this demarcation. This is a good idea to improve the competition dataset however, I will look into adding this along side any pending fixes!


Hi there! When looking into the beta release of the dataset we found some issues.

  • could you clarify if 15 minutes should be the maximum duration of demonstrations?
    For example ‘MineRLObtainDiamond-v0/r2g1all_orange_djinn-4_193-66220’ is 54 minutes long with 66k frames.
  • also in this demonstration there seem to be synchronisation issues between actions and inventory state (see plot).
  • the “place” action is not recorded at all in some samples (in addition to the already known craft, nearbyCraft and nearbySmelt) .
  • we also noticed that the interaction distance is very high, something like 6 or 7 blocks away.
    It’s often very hard to even see the crafting table when it’s being used because it’s so far away.