Any tips for successfully submitting an h2o model?

hutch3232 · January 9, 2021, 5:08pm

I am trying to use the h2o package for model building and everything works within the notebook R, but all of my submissions fail. Any tips for how to approach this?

I’ve been initializing h2o in my global_imports function.

global_imports <- function() {
  library(h2o)
  library(bit64)
  library(data.table)
  h2o.init()
}
global_imports()

As for saving and loading h2o models, I don’t think the generic save works. However, I’ve tried using the three h2o save functions (h2o.save_model, h2o.save_mojo, and h2o.download_mojo) and while I can load them back in with my environment, they fail in submission.

alfarzan · January 9, 2021, 5:39pm

Hi @hutch3232

Sorry about that we have had some issues with h2o models and hopefully the error will help fix that one. Can you show me the submission ID so we can look into it and see if we can figure it out?

hutch3232 · January 9, 2021, 5:45pm

Thanks for helping me look! I have quite a few submissions and I’m happy to try more. Here are a couple recent examples: 113767 and 113758.

alfarzan · January 9, 2021, 5:52pm

That’s why we’re here

So it seems the important line that causes the error is:

Is that a familiar error for you?

I have looked around and this is one explanation I have found.

We’re also working to make sure this doesn’t happen as I can understand it can be confusing when the notebook runs but the submission does not.

hutch3232 · January 9, 2021, 9:19pm

Would you mind taking a look at submission 113841? It says: water.exceptions.H2ONotFoundArgumentException: File trained_model.RData does not exist

This is weird to me because I am naming my model something else, so I don’t know why it’s looking for that file. Is it perhaps hardcoded somehow in the evaluation environment?

nigel_carpenter · January 9, 2021, 9:25pm

Yes, it’s hardcoded in the R zip file routine load model which is part of model.R unless you’ve already overidden?

load_model <- function(){
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  load('trained_model.RData')
  return(model)
}

alfarzan · January 9, 2021, 9:26pm

Hi @hutch3232

This comes from our initial assumption that you could submit all sorts of model as RData files in R.

For tonight could you please make sure that the load and save models are referring to the correct model name and also change the predict.R file you are submitting?

I will update the templates later to include this flexibility like the Python templates
Sorry for the confusion

hutch3232 · January 9, 2021, 11:48pm

I see, thanks for the info. Unfortunately I don’t think .Rdata works for h2o model objects: stackoverflow

What I’m confused about is that my load_model function was adjusted to work with h2o:

load_model <- function(model_path){ 
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  #load(model_path)
  model <- h2o.import_mojo(model_path)
  return(model)
}

In my last submission (113883), I tried the zip file submission and I changed predict.R to use the path to my model model_output_path = 'model_1/GBM_model_R_1610224620766_5.zip' instead of the hard-coded value. Yet it says that doesn’t exist but I see that in my environment.

I guess I’m not following what’s needed.

alfarzan · January 9, 2021, 11:57pm

That’s quite strange indeed.

I’ve just downloaded your submission (#113883) and the saved model file is not there for some reason.

Are you sure that you had included the saved model object?

hutch3232 · January 10, 2021, 12:39am

Thank you, that helped! Unfortunately another issue has come up in the latest submission (113901):

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  unused argument (optional = TRUE)
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  Test/Validation dataset column 'drv_sex2' has levels not trained on: ["0"]

I don’t see read.table being used anywhere. I also don’t see where drv_sex2 would be getting a value of ‘0’ from.

alfarzan · January 10, 2021, 12:42am

Hi @hutch3232

That error would be from us loading the data and passing it to your model.

I believe for that column when there is no second driver, it is set to zero.

If you add the handling of that to the preprocessing function it should solve your issue.

(Unless I haven’t understood what you are mentioning?)

hutch3232 · January 10, 2021, 12:57am

I see, thank you. Those were NAs in the train data (as far as I can see) but adding it to pre-processing worked!

That read.table error however seems to be preventing my submission (113909) from moving forward.

nopenope · January 10, 2021, 3:47pm

Hi @hutch3232, I’ve been struggling with H2O for a few days as well. The issue has been related to the save model and load model segments, could you share what worked for you if at all?

hutch3232 · January 10, 2021, 4:15pm

Hi @nopenope, I finally got it to work just this morning! There are several things I had to do (note I’m using the .zip submission and R).

Here is the load_model function that works for me:

load_model <- function(model_path){ 
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  #load(model_path)
  model <- h2o.import_mojo(model_path)
  return(model)
}

I’m also using this save_model function and I think it works but not sure if it has been necessary with the zip submission method:

save_model <- function(model, output_path){
  # Saves this trained model to a file.

  # This is used to save the model after training, so that it can be used for prediction later.

  # Do not touch this unless necessary (if you need specific features). If you do, do not
  #  forget to update the load_model method to be compatible.
	
  # Save in `trained_model.RData`.

  #save(model, file=output_path)
  MODEL_OUTPUT_PATH <<- h2o.save_mojo(model, path = output_path, force = TRUE)
  print(MODEL_OUTPUT_PATH)
}

In the preprocess_X_data make sure you’re converting your x_raw to an h2o frame.

In fit_model I needed to use this:

 y_raw <- as.h2o(y_raw)
 train <- h2o.cbind(x_clean, y_raw)

Within predict.R change model_output_path from a hard-coded value to the mojo file you’ve saved in your submission folder.

Last step, in predict.R, you need to convert the h2o from to a data.frame before writing the table. I used this approach:

if(Sys.getenv('WEEKLY_EVALUATION', 'false') == 'true') {
  prices = predict_premium(trained_model, Xraw)
  prices = as.data.frame(prices)
  write.table(x = prices, file = output_claims_file, row.names = FALSE, col.names=FALSE, sep = ",")
} else {
  claims = predict_expected_claim(trained_model, Xraw)
  claims = as.data.frame(claims)
  write.table(x = claims, file = output_prices_file, row.names = FALSE, col.names=FALSE, sep = ",")
}

nopenope · January 10, 2021, 4:22pm

Thanks @hutch3232: as it happens I had my first successful submission just the minute I put my question up but thank you none the less! I think I am doing most of the above but it was an issue with the colab and my local H2O version, zip uploads are working well. Good luck!

alfarzan · January 12, 2021, 7:58pm

Just for future reference, I have now updated the notebook template as well as the predict.R file that gets attached to the notebook to be more flexible in this regard

alfarzan · January 30, 2021, 1:34pm

Finally adding this here, colab notebooks now fully support h2o submissions.

Make sure to initialise h2o inside the global_imports function
Make sure to save your model inside the saved_objects directory that is now created upon running the cell below “Prepare the notebook”
Adapt your load_model to load up the models from the saved_objects directory.
Lastly, it would help to specify the h2o version in your package installation, though if you don’t we just run install.packages("h2o") which loads up the most stable and recent release from the CRAN repositories.

That should work!

hutch3232 · February 7, 2021, 9:17pm

Update:
After a month of losing huge amounts of money in this game, I finally realized that there is a critical bug in my code above. In the above code, I have accidentally swapped the output_claims_file and output_prices_file.

The correct code should have been:

if(Sys.getenv('WEEKLY_EVALUATION', 'false') == 'true') {
  prices = predict_premium(trained_model, Xraw)
  prices = as.data.frame(prices)
  write.table(x = prices, file = output_prices_file, row.names = FALSE, col.names=FALSE, sep = ",")
} else {
  claims = predict_expected_claim(trained_model, Xraw)
  claims = as.data.frame(claims)
  write.table(x = claims, file = output_claims_file, row.names = FALSE, col.names=FALSE, sep = ",")
}

My sincere apologies if anyone used this and also did not catch the error. If you did catch the error and didn’t tell me…

alfarzan · February 7, 2021, 9:23pm

Hi @hutch3232

Don’t worry about this one

In fact in the backend we correct for this. The server-side code doesn’t care what the filenames are inside of the script. Those names (e.g. output_prices_file) are re-written based on the WEEKLY_EVALUATION flag and are not dependent on the actual filename used in the predict.R script.

They are only included for your benefit and we caught this mistake a while ago and updated the scripts I believe. But I reiterate that this is not the way your scripts were ever used so in the previous leaderboards your correct prices were competing in the markets

hutch3232 · February 7, 2021, 9:26pm

Thanks for the info, that’s great to know! Unfortunately that just means either my model or pricing strategy (or both) sucks lol