Issues downloading the dataset

I’ve tried downloading the dataset several times by running minerl.data.download('./data/')
And the download keeps failing, getting interrupted by NoneType: None
and restarting, eventually quitting.
What does this mean?

1 Like

I had some trouble with the download script too, I just the data directly from the link

data_texture_0_low_res.tar.gz

2 Likes

Hey, I’m new here, so my apologies if this’s obvious, but…

What do I need to do next with this downloaded .tar.gz file?

UPD. Solved. You need to do 2 things:

  • extract this archive to any suitable folder.
  • set an environment variable MINERL_DATA_ROOT to point to that folder (the folder with MineRLNavigateDense-v0, MineRLObtainDiamond-v0 and other subfolders containing particular environments).

You need to extract it, if you’re on Ubuntu you can use this command:

Google it for other OSs.

Then use the path to the extracted folder in the setup for a dataset as per the documentation.

Also try using absolute paths rather than relative paths.

if someone make sure this dataset hava 60 million (state-action-reward)?

I’m download data_texture_0_low_res.tar.gz, only ~8M samples ?

MineRLTreechop-v0 426699
MineRLObtainIronPickaxe-v0 1401485
MineRLNavigateDense-v0 235786
MineRLObtainDiamond-v0 1780457
MineRLNavigateExtreme-v0 268744
MineRLNavigateDense-v0 235786
MineRLNavigateExtremeDense-v0 268326
MineRLObtainDiamondDense-v0 1780457
MineRLObtainIronPickaxeDense-v0 1401485
total_count 7799225

The lasted dataset is still the download link, right? https://router.sneakywines.me/minerl/data_texture_0_low_res.tar.gz

1 Like

Great point - the link has changed with the release of the new minerl version: https://router.sneakywines.me/minerl-v1/data_texture_0_low_res.tar.gz

Thanks really really a lot!

@weel2019, a large part of the dataset is open-world survival which has not been packaged here as to the rest of the experiments have given the complex actions and observations when you include the entire Minecraft item hierarchy and interactions with unique game mechanics such as villagers, red-stone, enchantments, ect.
We are trying to package this anyway using the same action/observation space as ‘MineRLObtainDiamond-v0’, however this would be auxiliary to the main dataset download as it is both off distribution and massive to download!

Is this link still up to date? I tried downloading the dataset by using the python script in utilities of the start repo and through the python interpreter in minerl, but it never worked.

Did you have this issue today? The server was down briefly which could have affected the download. If it is still down for you let me know!

Yes I had the issue today. I’m trying again. Usually my problem is that the download finishes, but then there is simply no data in the folder that I set.
My MINERL_DATA_ROOT is set to /home/anton/Documents/Projects/competition_submission_starter_template/
Does that sound alright?

1 Like

Yeah that looks right - are you specifying anything in the download script? And have you updated to the latest version of minerl with pip install - - upgrade minerl?

Also is there sufficient space on your drive? By default debug logging is disabled but if you install a basic logger first and set the logging level to debug you can see what the errors are with the downloading

I have 30 Gb left on my drive and I have the newest version of minerl. I do not specify anything, I just set the MINERL_DATA_ROOT to what I posted above and run the script as “python utility/verify_or_download_data.py”

I switched logging to debug and got this:
The data directory does not exist in your submission, are you running this script from the root of the repository? data_dir=/home/anton/Documents/Projects/competition_submission_starter_template Attempting to download the dataset... INFO:minerl.data.download:Downloading dataset to /home/anton/Documents/Projects/competition_submission_starter_template INFO:minerl.data.download:Using url "https://router.sneakywines.me/minerl-v1/data_texture_0_low_res.tar.gz" INFO:minerl.data.download:Folder "/tmp/pySmartDL" does not exist. Creating... INFO:minerl.data.download:Creating a ThreadPool of 20 thread(s). INFO:minerl.data.download:Fetching download hash ... INFO:minerl.data.download:Looking for SUMS files... INFO:minerl.data.download:Found a matching hash in https://router.sneakywines.me/minerl-v1/SHA256SUMS INFO:minerl.data.download:Starting download ... INFO:minerl.data.download:Starting a new SmartDL operation. INFO:minerl.data.download:One URL is loaded. INFO:minerl.data.download:Downloading 'https://router.sneakywines.me/minerl-v1/data_texture_0_low_res.tar.gz' to '/tmp/pySmartDL/data_texture_0_low_res.tar.gz'... INFO:minerl.data.download:Content-Length is 16212959354 (15.10 GB). INFO:minerl.data.download:Launching 20 threads (downloads 773.1 MB/thread).

So for some reason it is downloading into that tmp/pySmartDL folder, not sure if that is intended behavior.

Okay, I think it was intended behavior to download into that folder. I think I indeed did not have quite enough storage space, as I did not take into account that the downloaded file needs to be be still extracted from. Also I set the MINERL_DATA_ROOT to /home/anton/Documents/Projects/competition_submission_starter_template/data instead of /home/anton/Documents/Projects/competition_submission_starter_template/

Yes, the temp folder is to attempt to avoid the issue where the data is twice as large while being extracted, though it does not work on all systems if the os puts temp folders on the same disk =(