[Resolution] Bugs With Getting Started Of Round 2

gaurav_singhal · March 4, 2022, 1:59pm

Some issues that I faced with Getting started of Round 2.

Dataset download with prefix public_*, however local_evaluation.py uses directory without public_*
Spelling mistake for unlabelled dataset, currently it is public_unlabeled.zip rather it should be public_unlabelled.zip. You will see this once you download the dataset, not while listing it.

I may be wrong, if so please correct me @shivam @mohanty.

I have put together a Magic box (based on magic box from Round 1) that will make things easy and make the repository ready to use. Here are a few actions that I am trying to achieve.

Cloning the repository for Round 2
Downloading datasets for Round 2 and putting them in relevant directories (abiding the latest local_evaluation.py file)
Renaming the dataset directories as per latest local_evaluation.py

Magic Box for Colab

try:
  import os
  if first_run and os.path.exists("/content/data-purchasing-challenge-2022-starter-kit/data/training"):
    first_run = False
except:
  first_run = True

if first_run:
  %cd /content/
  !git clone http://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git > /dev/null
  %cd data-purchasing-challenge-2022-starter-kit
  !aicrowd dataset list -c data-purchasing-challenge-2022 | grep -e 'v0.2'
  !aicrowd dataset download -c data-purchasing-challenge-2022 *-v0.2-rc4.zip
  !mkdir -p data/
  !mkdir -p data/v0.2-rc4
  !mv *.zip data/v0.2-rc4 && cd data/v0.2-rc4 && echo "Extracting dataset" && ls *.zip | xargs -n1 -I{} bash -c "unzip \*.zip > /dev/null"
  !mv data/v0.2-rc4/public_debug data/v0.2-rc4/debug
  !mv data/v0.2-rc4/public_training data/v0.2-rc4/training
  !mv data/v0.2-rc4/public_unlabeled data/v0.2-rc4/unlabelled
  !mv data/v0.2-rc4/public_validation data/v0.2-rc4/validation

Magic Box for Local System

#!/bin/bash

git clone http://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git
cd data-purchasing-challenge-2022-starter-kit
aicrowd dataset list -c data-purchasing-challenge-2022 | grep -e 'v0.2'
aicrowd dataset download -c data-purchasing-challenge-2022 *-v0.2-rc4.zip
mkdir -p data/
mkdir -p data/v0.2-rc4
mv *.zip data/v0.2-rc4 && cd data/v0.2-rc4 && echo "Extracting dataset" && ls *.zip | xargs -n1 -I{} bash -c "unzip \*.zip > /dev/null"
mv public_debug debug
mv public_training training
mv public_unlabeled unlabelled
mv public_validation validation

Put the above code in magic_box.sh and execute
>>> bash magic_box.sh

Please do let me know any improvements or questions in the comments below, I would be glad to help you.
Click on if this post was of any help

jyotish · March 4, 2022, 2:06pm

Hello @gaurav_singhal

Thanks for pointing these out.

We will update the file names to match the names shown in the listing.

Regarding the colab notebook part, are you using the notebook that was released for round 1? We have an updated notebook for round 2. The updated notebook should have all these fixes along with additional code needed to run the post_purchase_training_phase. Can you try using the new notebook in case if you haven’t yet?

gaurav_singhal · March 4, 2022, 2:10pm

This post just focuses on the Magic box part. I haven’t checked out the methods yet, I hope there’s nothing left out there but I’ll check and report any inconsistencies.
You are right, the notebook uses the public_ prefix in dataset declarations but the GitLab code doesn’t. In any case, the spelling is still messed up, no big of an issue but maybe you want to correct it.

jyotish · March 4, 2022, 2:14pm

@gaurav_singhal The updated notebook already has some code changes in the magic box cell on the lines of your suggested changes. Sharing the link to the updated notebook just in case,

gaurav_singhal · March 4, 2022, 2:17pm

Yep. Thanks.
Could you add extract, rename script in GitLab repo, or maybe change the local_evaluation.py just like you did in colab.

jyotish · March 4, 2022, 2:34pm

@gaurav_singhal We updated the dataset file names and the extracted directory names. The zip files should extract debug, training, unlabelled and validation directories. Also updated the colab notebook with the new dataset paths.

Thanks for your help!