Meta-information in human demonstrations samples

GrizzlyRL2019 · June 11, 2019, 1:45pm

Hi,

I have a question regarding the samples from the human demonstrations:
When using
data = minerl.data.make('MineRLObtainDiamond-v0')
one can only get obs, rew, done, act, but no information about from which demonstration this particular sample actually belongs.

Is there a way to know the meta-information of the demonstration from which the sample was taken? (i.e. quality of the demonstration, file name, etc…).

If not, would be possible to provide this functionality? It would be very useful.

Thanks for all the effort in setting up this challenge. We are very excited

william_guss · June 11, 2019, 3:48pm

Shooting this to @BrandonHoughton

Thanks again for being so patient with us

BrandonHoughton · June 11, 2019, 4:40pm

Yes this is a great idea. Currently the data is organized by separating each sample into a folder with three files, the compressed video, the numpy actions and observations, and a json containing some metadata. I’ll make a git-hub issue to track this feature, we will add this when we get the chance or we welcome pull requests for this feature!

GrizzlyRL2019 · June 11, 2019, 4:54pm

Thanks for the quick answer

I have another question regarding the inventory data in the human demonstrations:

In the environment ‘MineRLObtainDiamond-v0’, the agent begins without any items. However, in the file

MineRLObtainDiamond-v0/player_stream_agonizing_kale_tree_nymph-4_4131-16401/rendered.npz

when loaded with
example_inv = np.load(file, allow_pickle=True, mmap_mode='r')

the first entry in the ‘observation_inventory’ is

example_inv['observation_inventory'][0]
> [ 0 1 0 0 0 0 0 0 13 0 0 3 0 1]

meaning that the agent already has a crafting table, 13 planks, 3 stone pickaxe and 1 wooden pickaxe.

Am I loading the data in a wrong way?
Am I making any mistake in the interpretation of the inventory?

Same happens in several (41) demonstrations, for ‘MineRLObtainDiamond-v0’, e.g.:

MineRLObtainDiamond-v0/player_stream_equal_olive_chimera-9_13991-14521
[11  0  3  0  0  0  1  0  0 14  3  3  0  0]
MineRLObtainDiamond-v0/player_stream_other_pomegranite_orc-12_29519-31578
[ 2  0 14  0  1  0  3  0 17  2  1  0  0  1]
MineRLObtainDiamond-v0/player_stream_wary_salsa_werewolf-4_20915-42385
[ 1  0  1  0 11  0  3  7  2  0  0  1  4  1]

Thanks for the support

BrandonHoughton · June 11, 2019, 5:28pm

Thanks for raising that to our attention - does this happen with the next time-step as well? It could be left-over from their last attempt. I will create an issue for this as well

BrandonHoughton · June 11, 2019, 5:34pm

Tracking new information at #40 and issue with agents starting with non-zero inventory at #41

GrizzlyRL2019 · June 12, 2019, 7:14am

Yes, it happens in the next time-steps as well.
For example, in this demonstration:
MineRLObtainDiamond-v0/player_stream_wary_salsa_werewolf-4_20915-42385
it looks like the sequence might be shifted. Also, the transitions 2-3 and 3-4 look weird to me:

(1) The inventory start like this:
[ 1 0 1 0 11 0 3 7 2 0 0 1 4 1]
(2) in time-step 4025 changes to this:
[ 1 1 1 0 11 0 2 7 2 0 0 1 4 1]
(3) in 4040 to
[ 0 0 0 0 0 0 0 19 0 0 0 0 0 0]
(4) and finally, in time-step 4058 to
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]

The total length of the sequence is 21471.

I hope you find this information helpful

GrizzlyRL2019 · June 12, 2019, 8:07am

We also found some inconsistencies with the reward.
When looking at demonstrations marked as success=true only three have a cumulative reward >0 (overall only 5 demonstrations have any at all).
Furthermore, for those that have reward it looks strange.
A cumulative reward of 1025, composed of one log and one diamond, should not be possible in my understanding.
Is this related to the inventory issue?

Examples:
player_stream_agonizing_kale_tree_nymph-20_289-7919: reward = 1025
player_stream_agonizing_kale_tree_nymph-20_16180-39256: reward = 16

Thanks for looking into this

BrandonHoughton · June 13, 2019, 6:08pm

I noticed you are working with MineRLObtainDiamond-v0 - if you try the same exploration with MineRLObtainDiamondDense-v0 does it have reasonable rewards? The second plot you shared here should not be possible in as single stream of MineRLObtainDiamond-v0

GrizzlyRL2019 · June 14, 2019, 9:13am

Yes we are working with the sparse environment and most reward sequences look inplausible or are zero for the whole sequence.
Here’s the same plot for the dense demonstration which looks reasonable (please ignore the filepath in the previous plot, that was a small mistake).

osbornep · June 18, 2019, 10:28am

In addition to this query, within the accompanying paper, meta-data is mentioned as containing detailed labels:

“Additionally, trajectory meta-data includes timestamped markers for hierarchical labelings; e.g. when a house-like structure is built or certain objectives such as
chopping down a tree are met.”

However, the data available only includes meta-data with the following attributes:

{“success”: true, “duration_ms”: 152535, “duration_steps”: 3051, “total_reward”: 64.0}

Is this correct or is there another location where more detailed labels are located for each data item?

Thanks in advance

Phil

BrandonHoughton · June 19, 2019, 1:49pm

We have been using the rewards to demonstrate those sub-goals being accomplished, so currently there is no such axillary location for this demarcation. This is a good idea to improve the competition dataset however, I will look into adding this along side any pending fixes!

GrizzlyRL2019 · June 19, 2019, 3:21pm

Hi there! When looking into the beta release of the dataset we found some issues.

could you clarify if 15 minutes should be the maximum duration of demonstrations?
For example ‘MineRLObtainDiamond-v0/r2g1all_orange_djinn-4_193-66220’ is 54 minutes long with 66k frames.
also in this demonstration there seem to be synchronisation issues between actions and inventory state (see plot).
the “place” action is not recorded at all in some samples (in addition to the already known craft, nearbyCraft and nearbySmelt) .
we also noticed that the interaction distance is very high, something like 6 or 7 blocks away.
It’s often very hard to even see the crafting table when it’s being used because it’s so far away.