Initially, I thought that creating .zip submission would be a great idea since it allows you to be flexible on libraries used. Particularly, this is helpful since pickling keras
models works correctly only for specific versions of tensorflow
and h5py
packages. Other participants, might also have various requirements, which are a bit natural for data scientist using Python in their everyday work.
However, it seems that sending a .zip submission (almost) always generate DockerBuildError
error if basic ML packages such as tensorflow
, lightgbm
or xgboost
are used. This is especially misleading since test.sh
works locally correctly.
Thus, I would suggest to either
- fix the issue with Python libraries flexibility or
- create a complete list of libraries (and its exact versions) which can be used in Python .zip submission.
Since most of us might have a different needs and requirements, solution 1) is preferred. Regarding, solution 2) I would suggest including the following packages:
numpy==1.17.2
pandas==1.1.4
tables==3.6.1
scikit-learn==0.23.2
hyperopt==0.2.5
lightgbm==2.2.3
xgboost==1.3.1
statsmodels==0.11.1
tensorflow==2.0.0
h5py==2.10.0
dill==0.3.1.1
Some explanation for specific version of packages:
- more recent versions of numpy than 1.17.2. would generate errors in tensorflow==2.0.0
- more recent versions of h5py than 2.10.0 would not allow pickling keras models
- dill is useful for pickling complex models which cannot be pickled by pickle
- statsmodels -> GLMs