Clarification on Python .zip submission

dawid_kopczyk · December 27, 2020, 12:03pm

Initially, I thought that creating .zip submission would be a great idea since it allows you to be flexible on libraries used. Particularly, this is helpful since pickling keras models works correctly only for specific versions of tensorflow and h5py packages. Other participants, might also have various requirements, which are a bit natural for data scientist using Python in their everyday work.

However, it seems that sending a .zip submission (almost) always generate DockerBuildError error if basic ML packages such as tensorflow, lightgbm or xgboost are used. This is especially misleading since test.sh works locally correctly.

Thus, I would suggest to either

fix the issue with Python libraries flexibility or
create a complete list of libraries (and its exact versions) which can be used in Python .zip submission.

Since most of us might have a different needs and requirements, solution 1) is preferred. Regarding, solution 2) I would suggest including the following packages:

numpy==1.17.2
pandas==1.1.4
tables==3.6.1
scikit-learn==0.23.2
hyperopt==0.2.5
lightgbm==2.2.3
xgboost==1.3.1
statsmodels==0.11.1
tensorflow==2.0.0
h5py==2.10.0
dill==0.3.1.1

Some explanation for specific version of packages:

more recent versions of numpy than 1.17.2. would generate errors in tensorflow==2.0.0
more recent versions of h5py than 2.10.0 would not allow pickling keras models
dill is useful for pickling complex models which cannot be pickled by pickle
statsmodels -> GLMs

jyotish · December 27, 2020, 6:14pm

Hello @dawid_kopczyk

The docker build issue should be reproducible locally. This error generally means that the pip install step failed. When you specify "language": "python" in your config.json, we will use python 3.8.3 with nothing else installed. You can specify a different version of python as well according to the below table.

`language` in `config.json`	Python version
`python`	`3.8.3`
`python3.8`	`3.8.3`
`python3.7`	`3.7.9`

We can add support for more variants if need be.

We do not have any version of common packages like NumPy, sklearn, etc installed be default. We install the packages that are defined in your requirements.txt.

Reproducing the build error locally

With docker (100% reproducible)

Navigate to the directory containing your submission code.
Run docker run --rm -ti -v .:/src python:3.8.3-slim bash
Run cd /src && pip install -r requirements.txt

Testing in a `virtualenv`

You can create a virtual environment using conda or virtualenv and try running pip install -r requirements.txt to see if the installation works as expected.

If you need more information/additional logs for a failed submission, please feel free to reach out to us on discord, discourse or email.

dawid_kopczyk · December 27, 2020, 10:10pm

Many thanks for the hints , turned out to be Python version related problem! Now, works like a charm.

Clarification on Python .zip submission

Reproducing the build error locally

With docker (100% reproducible)

Testing in a virtualenv

Testing in a `virtualenv`