Training Model inside a function MASSIVELY bloats the Rdata file when saved

Hello all,

I don’t know where to turn to solve my issue but this is a real head scratcher. I’m 3 hours into this issue and I’m at a complete loss.

In the R zip kit, we have the function fit_model function which returns the trained_model.

 

Scenario A)
Whenever I train my model in “free code”, not nested in any function, the trained_model when saved in Rdata, is only a few MB.

Scenario B)
I train my model within the fit_model function, and take the return object which is supposed to be the same as scenario A, takes more than a GB!

Same inputs, same code, same all, except one is in free code, the other is within a function.

Did both scenarios on fresh empty R Studio sessions.

 

I have nooooo idea what’s going on. My guess is that because it’s being trained within a function, it returns much more than solely the trained_model?

 

However in both scenarios, the trained_model object, in the Global Environment, under Data shows up as “Large train (23 elements, 6.2 MB)”
Only when I save in Rdata that there’s a massive difference. (>1GB)

Any help would be greatly appreciated!
Thanks!

EDIT: found this post which details how an object can have Environment items attached to it.

Hi @michael_bordeleau

That is quite strange indeed :thinking: I’m surprised that this happens in R (or any language)

Though I can tell you this, if the issue of the size is the only issue here you don’t need to use the fit_model function however you must place exactly the same code into that function as you use to train your model (since we may reproduce it on our side and check outputs).

So if the size alone is what is prohibitive because of this setup then:

  1. Go ahead and save it in a way that can be loaded by your load_model function
  2. Make sure you copy the training code exactly as it would be in the fit_model function so that if we run it we can reproduce your prices
  3. Make sure that the predict_premium and predict_expected_claim functions work using the output from the load_model function.

If the above 3 are satisfied then your submission should be successful and eligible.

For my own curiosity if you could post a submission ID here once you have a successful submission this way that would be great. Let me know if this works :rocket:

Just one other thing to try, you could try the train.R file we have provided as well as an R script outside of RStudio. That way we can see if it’s something RStudio is doing or if the issue is deeper :mag:

Thanks for the feedback!

For anyone else encountering this issue, after much research, I found a helpful post here.

Before attempting to modify the zip templates, I will try this fix and see if it works.