Adding Spotify Million Playlist Dataset in Kaggle(for computing)

md_sadakat_hussain_f · October 15, 2020, 3:29am

I am extremely low on computational resources. Currently doing my research on playlist continuation. I want to publically add this dataset on Kaggle. As people could share their Ideas/insights on the data too. But the rules say, I cannot distribute it without research purposes. I am asking can I please share this data.

vrv · October 15, 2020, 8:07am

Hi,

You may want to use google colab. It’s a fantastic solution if you are low on computing power.

Cheers

md_sadakat_hussain_f · October 15, 2020, 8:36am

Hey, vrv thanks for your solution.
The problem with colab is that it’s resource allocation is temporary(only the runtime). I tried that, every time I have to upload a part of the data. Seems frustrating to me.

chingweichen · November 17, 2020, 3:22pm

Hi md_sadakat_hussain_f,

Thanks for asking, I understand that it is a little extra work (and cost) to work on this dataset with your own machines. For free computing, I would also recommend Google Colab - for storage, you could store the ZIP file in Google Drive (you get 15GB free) and you can copy the ZIP file from the Colab instance and unzip it on the ephemeral storage (!unzip filename.zip), then process it there. It might be faster than uploading from your local machine each time, and easier to repeat if you script it.

Alternately, you could also follow these instructions from fast.ai (Steps 1-3) on setting up a GCP account and getting a VM and Jupyter notebook. You should get $300 credit for free as a new GCP user. Some VM’s are less than $1/hour, and include GPU and 100GB persistent disk.

Of course there are many other options for low or no-cost computing on different platforms, perhaps others can post what works for them here?

Please do respect the terms of the dataset license - you may not redistribute the dataset on Kaggle or other public platforms.