Issues with submissions

alan_feder · December 30, 2020, 3:32am

Hi,

Do you have the ability to show us where in our submissions an error ocurred?

For example, for submission #111627 (and #111622 and #111629) I get the error “REAL() can only be applied to a ‘numeric’, not a ‘logical’” , and I can’t figure out exactly which function caused that error, or which line within the function. Similarly, in #111628, I get a blank error – so I have no idea what caused that problem.

When I run it from end-to-end within Colab , it runs perfectly without any errors - could it be something in a package version issue? My xgboost is 1.2.0.1 (with R 4.0.3)

Thanks!

jyotish · December 30, 2020, 6:12am

Hello @alan_feder

Shared the logs with you.

demarsylvain · December 30, 2020, 8:27pm

Hello @alan_feder, @jyotish,

I got the same error (“REAL() can only be applied to a ‘numeric’, not a ‘logical’”) on my submission 111724, if I can have the logs too, or if alan you understand where it came from.
In submit a zip file, from R, I used an xgboost, I got this error when I changed the function save_model() and load_model() to use native xgboost save function instead of saving RData objects.

Thanks!

alan_feder · December 30, 2020, 9:17pm

Hi @demarsylvain - always good to see that I’m not the only person getting a given error ,

I’m still not 100% sure of the cause, but I think the error was somewhat on my end – something I did (either in preprocess_x_data or within one of the predict_*() functions, when applied to the test data may have caused some NAs or something that made the matrix into a “logical” matrix (of all missing or NA data), and can’t be put into xgb.Dmatrix() function.

I’m not 100% sure that I’m right, but some subsequent tweaks that I made seem to have (for now) fixed the issue.

Alan

matiasbatto · January 1, 2021, 9:00pm

Hi, my submission 112049 failed without any message. Can I take a look at the log to see what went wrong?

pitusg · January 2, 2021, 5:05pm

Hi, could you look into my submission 112010 and write me what is wrong? It failed due to wrong number of variables. I made effort to make sure that in predict_expected_claims() I use the same variables as in training so I’m suprised with this error.

alfarzan · January 2, 2021, 5:23pm

Hi @pitusg

I’ve just looked through your submission and I think this might be related to how preprocessing is done.

Are you certain that the preprocessing you do for your training data is exactly the same as the data that goes into your predict functions?
For your categorical variable (especially vh_make_model) the model should expect previously unseen categories in the new data, is that the case here?

Please let me know if you don’t think these are the issues and I will try and reproduce your problem and dig deeper

alfarzan · January 2, 2021, 5:27pm

Hi @matiasbatto

I’ve looked through the submission and I can see that you’re treating vh_make_model as a categorical variable in your model. This is of course fine however keep two things in mind:

There are thousands of categories here so you might run out of memory in training and
There are many categories that might not appear in your training data but do appear in the leaderboard data. Much like in the real world where a real insurance company will have to offer premiums for cars they might not have seen before.

As a test, please try submitting a model either without this feature or with preprocessing that accounts for (2) above and submit for us to see if this is the issue. If it is not, we will dig further

pitusg · January 2, 2021, 8:16pm

Alfarzan, thanks for your advice I simulated new value in vh_make_model and additional column has been created.