Submission error but Colab is run successfully

gciam · January 2, 2021, 7:43pm

Hello,

Could you please check my submission which has number #112236 ?

My Python Google Colab returns no error, and the last cell returns “successfully submitted!”.

However when I click on “This submission” link, the following message is displayed :

I have no idea what’s going wrong…

Many thanks in advance for your help.

Cheers

sai_krithik · January 2, 2021, 7:55pm

I am facing some submission errors too… Did u do at least one successful submission?

alfarzan · January 2, 2021, 9:00pm

Hi @gciam

The full traceback of your error is copied below. It seems that the main issue is the NameError you are getting at the last line.

PatsyError                                Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/statsmodels/base/model.py in predict(self, exog, transform, *args, **kwargs)
   1019                        '\n\nThe original error message returned by patsy is:\n'
   1020                        '{0}'.format(str(str(exc))))
-> 1021                 raise exc.__class__(msg)
   1022             if orig_exog_len > len(exog) and not is_dict:
   1023                 import warnings

PatsyError: predict requires that you use a DataFrame when predicting from a model
that was created using the formula api.

The original error message returned by patsy is:
Error evaluating factor: NameError: no data named 'pol_duration:[0, 5)' found

Hope this helps

gciam · January 3, 2021, 10:08am

Hello,

@alfarzan : Thank you for your answer !

I guess the error is due to the fact that I use pd.get_dummies() to generate my X training data columns in preprocess_X_data(X_raw) function.
I learn a model on the whole training dataset with corresponding generated dummy columns and no problem occurs on the whole dataset.
But I guess the error occurs in the final RMSE evaluation, if the preprocess_X_data(X_raw) is called on a part of this training data, it is likely that some of the dummy columns with which my model was trained are missing in this subset of data…
Quite a typical error…

Could you please confirm that in the RMSE evaluation process, the function preprocess_X_data(X_raw) is in indeed called on a part of the training dataset ?

Anyway, I think I should correct the way I generate my dummy columns to work also with part of the training data.

@sai_krithik : Yes I was able to make a successful submission with provided example notebook

Have a nice day

alfarzan · January 3, 2021, 4:35pm

That’s great to hear

Within any of the evaluation processes your functions are called as is. That means that if you have called your preprocess_X_data(X_raw) early in the function definition then it will run.