to be honest we do not have any experience running it on Google Colab. I believe that you need to download it on your local machine and upload it to your Google Drive. Then you should be able to access the data from Google Colab. To Submit your solution, you will need to upload your end-to-end system to our local GitLab repository as we have a private test set.
The train file is about 60GB so downloading and uploading won’t be feasible really, I saw a previous post someone using it in kaggle successfully and another tutorial using a different competition for colab so I think there is way around it hopefully.
I did a quick implementation and this might be the end to end example you are looking for.
Please note, that I only worried about dataset integration on Colab/Notebook via CLI in this example, and have not done end to end testing of baseline.
The baseline need some love, and you can even share your fixed/better version in the Notebook sections!
I am pretty sure other participants might be interested in the same too.
The dataset is quite large when downloaded to colab this way as I cant unzipped due to disk constraints. Any advise ? Is there a way to download the dataset unzipped to colab directly or perhaps a download link to be able to use with tensorflow directly which can download from url and extract it directly.
it is highly diverse. A lot with 12MP, e.g., 4000x3000. I can start by resizing just the big images. Resizing it to 1/4 of its original should not have a negative impact.
Thank you picekl for your great help, I am currently downloading the dataset and will update you when I have any progress. Hopefully I can make my first submission.
I downloaded the dataset chunks that you split. Any idea how to combine and extract them as all my trials failed with an error related to checksum in the first chunk.
After successfully splitting tar files or any large file in Linux, you can join the files using the cat command. Employing cat is the most efficient and reliable method of performing a joining operation.
To join back all the blocks or tar files, we issue the command below: