How to use the dataset generator object

Hello,

so I finally managed to download the data properly. But I am stuck now with being unable to understand the API for the dataset. I ran:

rewards = []
# Iterate through a single epoch gathering sequences of at most 32 steps
for current_state, action, reward, next_state, done \
in data.sarsd_iter(
    num_epochs=1, max_sequence_len=1):
    rewards.append(reward)

But this seems to loop forever. So what does num_epochs actually mean? What is an epoch? I first thought it refers to episode, but then the loop would stop at some point. I guess max_sequence_len determines the batch size.

So how can I loop through all the ObtainDiamond Episodes without an infinite loop?

What I tried now is to break the loop upon encountering a “done”. It did stop at some point, so it could just be that it takes a really long time to go through all episodes.

For some reason the returned done was a list of [False, False] quite often. That does not seem correct to me, what does the other “False” refer to ?

Did I overlook some documentation about the data where all of this is explained?

1 Like

We just fixed this bug - it will be bundled with the next release of minerl. You should expect done to be true iff it is the last done in a trajectory.

Okay great. Did I assume correctly that setting num_epochs to 1 will iterate exactly once through the whole dataset?

Correct! One pass through all available files. Note additional data will be released on September 25th which may make epochs larger.