Issues with submissions

Hi,

Do you have the ability to show us where in our submissions an error ocurred?

For example, for submission #111627 (and #111622 and #111629) I get the error “REAL() can only be applied to a ‘numeric’, not a ‘logical’” , and I can’t figure out exactly which function caused that error, or which line within the function. Similarly, in #111628, I get a blank error – so I have no idea what caused that problem.

When I run it from end-to-end within Colab , it runs perfectly without any errors - could it be something in a package version issue? My xgboost is 1.2.0.1 (with R 4.0.3)

Thanks!

Hello @alan_feder

Shared the logs with you.

Hello @alan_feder, @jyotish,

I got the same error (“REAL() can only be applied to a ‘numeric’, not a ‘logical’”) on my submission 111724, if I can have the logs too, or if alan you understand where it came from.
In submit a zip file, from R, I used an xgboost, I got this error when I changed the function save_model() and load_model() to use native xgboost save function instead of saving RData objects.

Thanks!

Hi @demarsylvain - always good to see that I’m not the only person getting a given error :slight_smile: ,

I’m still not 100% sure of the cause, but I think the error was somewhat on my end – something I did (either in preprocess_x_data or within one of the predict_*() functions, when applied to the test data may have caused some NAs or something that made the matrix into a “logical” matrix (of all missing or NA data), and can’t be put into xgb.Dmatrix() function.

I’m not 100% sure that I’m right, but some subsequent tweaks that I made seem to have (for now) fixed the issue.

Alan

Hi, my submission 112049 failed without any message. Can I take a look at the log to see what went wrong?

Hi, could you look into my submission 112010 and write me what is wrong? It failed due to wrong number of variables. I made effort to make sure that in predict_expected_claims() I use the same variables as in training so I’m suprised with this error.

Hi @pitusg

I’ve just looked through your submission and I think this might be related to how preprocessing is done.

  1. Are you certain that the preprocessing you do for your training data is exactly the same as the data that goes into your predict functions?
  2. For your categorical variable (especially vh_make_model) the model should expect previously unseen categories in the new data, is that the case here?

Please let me know if you don’t think these are the issues and I will try and reproduce your problem and dig deeper :slight_smile:

Hi @matiasbatto

I’ve looked through the submission and I can see that you’re treating vh_make_model as a categorical variable in your model. This is of course fine however keep two things in mind:

  1. There are thousands of categories here so you might run out of memory in training and
  2. There are many categories that might not appear in your training data but do appear in the leaderboard data. Much like in the real world where a real insurance company will have to offer premiums for cars they might not have seen before.

As a test, please try submitting a model either without this feature or with preprocessing that accounts for (2) above and submit for us to see if this is the issue. If it is not, we will dig further :slight_smile:

Alfarzan, thanks for your advice :slight_smile: I simulated new value in vh_make_model and additional column has been created.

1 Like