Any tips for successfully submitting an h2o model?

I am trying to use the h2o package for model building and everything works within the notebook R, but all of my submissions fail. Any tips for how to approach this?

I’ve been initializing h2o in my global_imports function.

global_imports <- function() {
  library(h2o)
  library(bit64)
  library(data.table)
  h2o.init()
}
global_imports()

As for saving and loading h2o models, I don’t think the generic save works. However, I’ve tried using the three h2o save functions (h2o.save_model, h2o.save_mojo, and h2o.download_mojo) and while I can load them back in with my environment, they fail in submission.

Hi @hutch3232

Sorry about that we have had some issues with h2o models and hopefully the error will help fix that one. Can you show me the submission ID so we can look into it and see if we can figure it out?

Thanks for helping me look! I have quite a few submissions and I’m happy to try more. Here are a couple recent examples: 113767 and 113758.

1 Like

That’s why we’re here :slight_smile:

So it seems the important line that causes the error is:

image

Is that a familiar error for you?

I have looked around and this is one explanation I have found.

We’re also working to make sure this doesn’t happen as I can understand it can be confusing when the notebook runs but the submission does not.

Would you mind taking a look at submission 113841? It says: water.exceptions.H2ONotFoundArgumentException: File trained_model.RData does not exist

This is weird to me because I am naming my model something else, so I don’t know why it’s looking for that file. Is it perhaps hardcoded somehow in the evaluation environment?

Yes, it’s hardcoded in the R zip file routine load model which is part of model.R unless you’ve already overidden?

load_model <- function(){
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  load('trained_model.RData')
  return(model)
}

Hi @hutch3232

This comes from our initial assumption that you could submit all sorts of model as RData files in R.

For tonight could you please make sure that the load and save models are referring to the correct model name and also change the predict.R file you are submitting?

I will update the templates later to include this flexibility like the Python templates :slight_smile:
Sorry for the confusion

I see, thanks for the info. Unfortunately I don’t think .Rdata works for h2o model objects: stackoverflow

What I’m confused about is that my load_model function was adjusted to work with h2o:

load_model <- function(model_path){ 
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  #load(model_path)
  model <- h2o.import_mojo(model_path)
  return(model)
}

In my last submission (113883), I tried the zip file submission and I changed predict.R to use the path to my model model_output_path = 'model_1/GBM_model_R_1610224620766_5.zip' instead of the hard-coded value. Yet it says that doesn’t exist but I see that in my environment.

I guess I’m not following what’s needed.

That’s quite strange indeed.

I’ve just downloaded your submission (#113883) and the saved model file is not there for some reason.

Are you sure that you had included the saved model object?

Thank you, that helped! Unfortunately another issue has come up in the latest submission (113901):

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  unused argument (optional = TRUE)
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  Test/Validation dataset column 'drv_sex2' has levels not trained on: ["0"]

I don’t see read.table being used anywhere. I also don’t see where drv_sex2 would be getting a value of ‘0’ from.

Hi @hutch3232

That error would be from us loading the data and passing it to your model.

I believe for that column when there is no second driver, it is set to zero.

If you add the handling of that to the preprocessing function it should solve your issue.

(Unless I haven’t understood what you are mentioning?)

I see, thank you. Those were NAs in the train data (as far as I can see) but adding it to pre-processing worked!

That read.table error however seems to be preventing my submission (113909) from moving forward.

Hi @hutch3232, I’ve been struggling with H2O for a few days as well. The issue has been related to the save model and load model segments, could you share what worked for you if at all?

Hi @nopenope, I finally got it to work just this morning! There are several things I had to do (note I’m using the .zip submission and R).

Here is the load_model function that works for me:

load_model <- function(model_path){ 
 # Load a saved trained model from the file `trained_model.RData`.

 #    This is called by the server to evaluate your submission on hidden data.
 #    Only modify this *if* you modified save_model.

  #load(model_path)
  model <- h2o.import_mojo(model_path)
  return(model)
}

I’m also using this save_model function and I think it works but not sure if it has been necessary with the zip submission method:

save_model <- function(model, output_path){
  # Saves this trained model to a file.

  # This is used to save the model after training, so that it can be used for prediction later.

  # Do not touch this unless necessary (if you need specific features). If you do, do not
  #  forget to update the load_model method to be compatible.
	
  # Save in `trained_model.RData`.

  #save(model, file=output_path)
  MODEL_OUTPUT_PATH <<- h2o.save_mojo(model, path = output_path, force = TRUE)
  print(MODEL_OUTPUT_PATH)
}

In the preprocess_X_data make sure you’re converting your x_raw to an h2o frame.

In fit_model I needed to use this:

 y_raw <- as.h2o(y_raw)
 train <- h2o.cbind(x_clean, y_raw)

Within predict.R change model_output_path from a hard-coded value to the mojo file you’ve saved in your submission folder.

Last step, in predict.R, you need to convert the h2o from to a data.frame before writing the table. I used this approach:

if(Sys.getenv('WEEKLY_EVALUATION', 'false') == 'true') {
  prices = predict_premium(trained_model, Xraw)
  prices = as.data.frame(prices)
  write.table(x = prices, file = output_claims_file, row.names = FALSE, col.names=FALSE, sep = ",")
} else {
  claims = predict_expected_claim(trained_model, Xraw)
  claims = as.data.frame(claims)
  write.table(x = claims, file = output_prices_file, row.names = FALSE, col.names=FALSE, sep = ",")
}
5 Likes

Thanks @hutch3232: as it happens I had my first successful submission just the minute I put my question up but thank you none the less! I think I am doing most of the above but it was an issue with the colab and my local H2O version, zip uploads are working well. Good luck!

2 Likes

Just for future reference, I have now updated the notebook template as well as the predict.R file that gets attached to the notebook to be more flexible in this regard :+1:

2 Likes

Finally adding this here, colab notebooks now fully support h2o submissions.

  1. Make sure to initialise h2o inside the global_imports function
  2. Make sure to save your model inside the saved_objects directory that is now created upon running the cell below “Prepare the notebook”
  3. Adapt your load_model to load up the models from the saved_objects directory.
  4. Lastly, it would help to specify the h2o version in your package installation, though if you don’t we just run install.packages("h2o") which loads up the most stable and recent release from the CRAN repositories.

That should work! :rocket:

Update:
After a month of losing huge amounts of money in this game, I finally realized that there is a critical bug in my code above. In the above code, I have accidentally swapped the output_claims_file and output_prices_file.

The correct code should have been:

if(Sys.getenv('WEEKLY_EVALUATION', 'false') == 'true') {
  prices = predict_premium(trained_model, Xraw)
  prices = as.data.frame(prices)
  write.table(x = prices, file = output_prices_file, row.names = FALSE, col.names=FALSE, sep = ",")
} else {
  claims = predict_expected_claim(trained_model, Xraw)
  claims = as.data.frame(claims)
  write.table(x = claims, file = output_claims_file, row.names = FALSE, col.names=FALSE, sep = ",")
}

My sincere apologies if anyone used this and also did not catch the error. If you did catch the error and didn’t tell me… :cry:

Hi @hutch3232

Don’t worry about this one :slight_smile:

In fact in the backend we correct for this. The server-side code doesn’t care what the filenames are inside of the script. Those names (e.g. output_prices_file) are re-written based on the WEEKLY_EVALUATION flag and are not dependent on the actual filename used in the predict.R script.

They are only included for your benefit and we caught this mistake a while ago and updated the scripts I believe. But I reiterate that this is not the way your scripts were ever used so in the previous leaderboards your correct prices were competing in the markets :chart:

1 Like

Thanks for the info, that’s great to know! Unfortunately that just means either my model or pricing strategy (or both) sucks lol

2 Likes