@jyotish: Can we basically have a custom base image where all the common dependencies are already installed ?
It will hopefully both reduce the image build times and also help us avoid back and forth many such dependency related issues.
We already preparing a base image for R that has a large number of packages pre-installed. The package installation is taking very long. Will make an announcement as soon as things are ready with the pre-installed package list.
These packages will be pre-installed for R based submissions: Packages available in base environment for R
Let us know if you think we need to add any other package.
Maybe I missed something, but how about e.g. data.table , dplyr?
Guessing you intentionally left them out, not seeing them as “base image”?
Thanks
Hello @mangoloco69
They were not left out intentionally. We basically installed the packages from https://cran.r-project.org/web/views/MachineLearning.html.
You can still specify the packages you want to install
- In your
install.R
file if you are creating the zip files or - In
install_packages
function if you are using colab notebook.
dplyr
and data.table
seem to take only a few seconds to install on top of the base image. Can you try including them from your side?
I will include them from my side, no problem. Heavy packages are more important from your side, indeed. Thanks
@jyotish I’ve been unsuccessful giving another shot at the R zip submission following the update at:
It resulted in the same error message, although running the test.bat
works fine.
The only thing I could note was that I had to add the line :
set WEEKLY_EVALUATION=false
prior to the first call to predict in order to generate the claims.csv file.
Otherwise, I can’t see what could be wrong. Packages are installed in install.R
and loaded where indicated in model.R
(even tried to load them within the fit functions which was needed in the Colab submissions - still no success). Have you performed a test on the template to validate it works when using packages? It would be useful to get a working example of scripts that use packages.
@jeremiedb one idea that may be worth trying… I downloaded one of my colab submissions and noticed I got a zip file that seems to look very similar to the format that the zip file submission needs to be in.
Made me wonder if you can then successfully submit this zip file? If yes; can you, through inspection, work out what the contents of the zip need to be like to create a successful zip file submission?
Good hint!
I finally managed to get a R zip submission to work.
I’m unsure which moving part was critical, having all of the following does seem to work:
config.json
{"language": "r"}
Install.R
install_packages <- function() {
install.packages("data.table")
install.packages("xgboost")
}
install_packages()
And then have the the various functions split into seperate files as listed in the source call in the predict.R
file:
source("fit_model.R") # Load your code.
source("load_model.R")
source("predict_expected_claim.R")
source("predict_premium.R")
source("preprocess_X_data.R")
# This script expects sys.args arguments for (1) the dataset and (2) the output file.
output_dir = Sys.getenv('OUTPUTS_DIR', '.')
input_dataset = Sys.getenv('DATASET_PATH', 'training_data.csv') # The default value.
output_claims_file = paste(output_dir, 'claims.csv', sep = '/') # The file where the expected claims should be saved.
output_prices_file = paste(output_dir, 'prices.csv', sep = '/') # The file where the prices should be saved.
model_output_path = 'trained_model.RData'
args = commandArgs(trailingOnly=TRUE)
if(length(args) >= 1){
input_dataset = args[1]
}
if(length(args) >= 2){
output_claims_file = args[2]
}
if(length(args) >= 3){
output_prices_file = args[3]
}
# Load the dataset.
# Remove the claim_amount column if it is in the dataset.
Xraw = read.csv(input_dataset)
if('claim_amount' %in% colnames(input_dataset)){
Xraw = within(Xraw, rm('claim_amount'))
}
# Load the saved model, and run it.
trained_model = load_model(model_output_path)
if(Sys.getenv('WEEKLY_EVALUATION', 'false') == 'true') {
claims = predict_premium(trained_model, Xraw)
write.table(x = claims, file = output_claims_file, row.names = FALSE, col.names=FALSE, sep = ",")
} else {
prices = predict_expected_claim(trained_model, Xraw)
write.table(x = prices, file = output_prices_file, row.names = FALSE, col.names=FALSE, sep = ",")
}
I have a couple questions if you guys haven’t given up yet :
I’m not sure where to put the library() calls.
Also, my preprocessing involves a recipes::recipe() that wrangles the data and creates the dummy variables. I’d need to attach that ‘trained recipe’ to the submission, or re-train it everytime from the original “Training.csv”. Is it possible to attach more files to the submissions, like a my_trained_recipe.Rdata file?
cheers
For the zip submission, I went for the shotgun approach and likely loaded the libraries at too many places, put at least the following does work. I added require() / library()
right after the beginning of each of the functions preprocess_X_data
, predict_expected_claim
, predict_premium
and fit_model
, for example:
preprocess_X_data <- function (x_raw){
require("data.table")
require("xgboost")
...
Note that for the zip submission, no need to include the fit_model.R
, it really seems like all that is necessary are the files invoked by the predict.R
(so the model is directly loaded from trained_model.RData
.
Hello @nigel_carpenter
Indeed. When you try to submit via colab notebook, we are essentially creating a zip file and submitting that zip file to AIcrowd. For both python and R based submissions, you can see a directory submission_dir
and submission.zip
file on your colab notebook. The submission.zip
file is the exact file that is submitted.
Hello @jeremiedb
We did test the starter kit with a few packages installed. If you are still facing this issue, we would be happy to debug your submission.
Regarding including the packages, one starting point would be to include/load the library/packages when one of it’s functions is invoked in the subsequent steps.
However, if you are preparing the zip file yourself, you are free to organize your code as you like. If the test.bat
file works for you locally, it is expected to work during the evaluation as well. Can you share the submission ID where the submission worked locally for you but failed during evaluation?
Hello @simon_coulombe
You can include any file that you want in your submission.zip
file. However, you might want to check this post, Question about data, on including the training data during evaluation.
@jeremiedb Yes, the fit_model
is not needed for evaluation and is optional to submit. But it would be good if you can include the code used for your training so that it becomes easier for us to validate the submissions some time later.
But the fit_model
function should be submitted along with your prediction code for our validation.
Thanks @jyotish. Following the above steps, the zip file submission now works.
Something I noticed from downloading my Colab submission was that there seems to be a reversal between claims and prices at the end of the predict.R
file:
claims = predict_premium(trained_model, Xraw)
Not sure if this would have been a problem I introduced or if it related to the template linked to the submit.sh
utility.
On another note, leaderboard seems to be based on my worst submission (id 110383) instead of the best or most recent ( 110384, 110389)
Hello @jeremiedb
Thanks for pointing this out. It is a typo
The important thing in the script is that we invoke predict_premium
function during the weekly leaderboard evaluation and predict_expected_claim
for RMSE leaderboard evaluation. We will fix the variables names.
I thought that was rather cruel when I saw it! If only all the others that are rapidly catching my RMSE score could oblige and do the same. It would take the heat off me to find a better submission!
Hi @jeremiedb
We’ve fixed the leaderboard now!
And stay tuned for a few small tweaks to the starter-kit tomorrow that should make things much easier.
Ali
Pls add Catboost also
Hi @harnagpal
We will look into this, but for now you should be able to install and use it in your model regardless by filling out the install.R
file for a zip
submission or the install_packages
function for a submission through the notebooks.