How to use this notebook 📝¶
- Copy the notebook. This is a shared template and any edits you make here will not be saved. You should copy it into your own drive folder. For this, click the "File" menu (top-left), then "Save a Copy in Drive". You can edit your copy however you like.
- Link it to your AICrowd account. In order to submit your code to AICrowd, you need to provide your account's API key (see "Configure static variables" for details).
-
Stick to the function definitions. The submission to AICrowd will look for the pre-defined function names:
install_packages
fit_model
save_model
load_model
predict_expected_claim
predict_premium
-
preprocess_X_data
Anything else you write outside of these functions will not be part of the final submission (including constants and utility functions), so make sure everything is defined within them, except for:
Your pricing model 🕵️¶
In this notebook, you can play with the data, and define and train your pricing model. You can then directly submit it to the AICrowd server, with some magic code at the end.
Prepare the notebook 🛠¶
cat(system('curl -sL https://gitlab.aicrowd.com/jyotish/pricing-game-notebook-scripts/raw/r-functions/r/setup.sh > setup.sh && bash setup.sh', intern=TRUE), sep='\n')
source("aicrowd_helpers.R")
⚙️ Installing AIcrowd utilities... ✅ Installed AIcrowd utilities 💾 Downloading training data... ✅ Downloaded training data
Configure static variables 📎¶
In order to submit using this notebook, you must visit this URL https://aicrowd.com/participants/me and copy your API key.
Then you must set the value of AICROWD_API_KEY
wuth the value.
TRAINING_DATA_PATH = 'training.csv'
MODEL_OUTPUT_PATH = 'trained_model.RData' # Alter if not using .RData files
AICROWD_API_KEY = '' # You can get the key from https://aicrowd.com/participants/me
Download dataset files 💾¶
download_aicrowd_dataset(AICROWD_API_KEY)
Packages 🗃¶
Install and require here all the packages you need to define your model.
Note: Installing packages the first time might take some time.
install_packages <- function() {
# install.packages("caret")
# install.packages("rpart")
}
install_packages()
global_imports <- function() {
# require("caret")
# require("rpart")
}
global_imports()
NULL
Loading the data 📲¶
# Load the dataset.
train_data = read.csv(TRAINING_DATA_PATH)
# Create a model, train it, then save it.
Xdata = within(train_data, rm('claim_amount'))
ydata = train_data['claim_amount']
How does the data look like? 🔍¶
as.matrix(head(Xdata, 4))
as.matrix(head(ydata, 4))
Training the model 🚀¶
You must first define your first function: fit_model
. This function takes training data as arguments, and outputs a "model" object -- that you define as you wish. For instance, this could be an array of parameter values.
You may want to define the function preprocess_X_data
that prepares and cleans your predictor variables for the training and prediction.
Define your data preprocessing¶
You can add any class or function in this cell for preprocessing. Just make sure that you use the functions here in the fit_model
, predict_expected_claim
and predict_premium
functions if necessary.
preprocess_X_data <- function (x_raw){
# Data preprocessing function: given X_raw, clean the data for training or prediction.
# Parameters
# ----------
# X_raw : Dataframe, with the columns described in the data dictionary.
# Each row is a different contract. This data has not been processed.
# Returns
# -------
# A cleaned / preprocessed version of the dataset
# YOUR CODE HERE ------------------------------------------------------
# ---------------------------------------------------------------------
return(x_raw) # change this to return the cleaned data
}
Define the training logic¶
fit_model <- function (x_raw, y_raw){
# Model training function: given training data (X_raw, y_raw), train this pricing model.
# Parameters
# ----------
# X_raw : Dataframe, with the columns described in the data dictionary.
# Each row is a different contract. This data has not been processed.
# y_raw : a array, with the value of the claims, in the same order as contracts in X_raw.
# A one dimensional array, with values either 0 (most entries) or >0.
# Returns
# -------
# self: (optional), this instance of the fitted model.
# This function trains your models and returns the trained model.
# YOUR CODE HERE ------------------------------------------------------
# x_clean = preprocess_X_data(x_raw) # preprocess your data before fitting
trained_model = lm(unlist(ydata) ~ 1) # toy linear model
# ---------------------------------------------------------------------
# The result trained_model is something that you will save in the next section
return(trained_model)
}
model = fit_model(Xdata, ydata)
Saving your model¶
You can save your model to a file here, so you don't need to retrain it every time.
save_model <- function(model, output_path){
# Saves this trained model to a file.
# This is used to save the model after training, so that it can be used for prediction later.
# Do not touch this unless necessary (if you need specific features). If you do, do not
# forget to update the load_model method to be compatible.
# Save in `trained_model.RData`.
save(model, file=output_path)
}
save_model(model, MODEL_OUTPUT_PATH)
If you need to load it from file, you can use this code:
load_model <- function(model_path){
# Load a saved trained model from the file `trained_model.RData`.
# This is called by the server to evaluate your submission on hidden data.
# Only modify this *if* you modified save_model.
load(model_path)
return(model)
}
model = load_model(MODEL_OUTPUT_PATH)
Predicting the claims 💵¶
The second function, predict_expected_claim
, takes your trained model and a dataframe of contracts, and outputs a prediction for the (expected) claim incurred by each contract. This expected claim can be seen as the probability of an accident multiplied by the cost of that accident.
This is the function used to compute the RMSE leaderboard, where the model best able to predict claims wins.
predict_expected_claim <- function(model, x_raw){
# Model prediction function: predicts the average claim based on the pricing model.
# This functions estimates the expected claim made by a contract (typically, as the product
# of the probability of having a claim multiplied by the average cost of a claim if it occurs),
# for each contract in the dataset X_raw.
# This is the function used in the RMSE leaderboard, and hence the output should be as close
# as possible to the expected cost of a contract.
# Parameters
# ----------
# X_raw : Dataframe, with the columns described in the data dictionary.
# Each row is a different contract. This data has not been processed.
# Returns
# -------
# avg_claims: a one-dimensional array of the same length as X_raw, with one
# average claim per contract (in same order). These average claims must be POSITIVE (>0).
# YOUR CODE HERE ------------------------------------------------------
# x_clean = preprocess_X_data(x_raw) # preprocess your data before fitting
expected_claims = predict(model, newdata = x_raw) # tweak this to work with your model
return(expected_claims)
}
claims <- predict_expected_claim(model, Xdata)
Pricing contracts 💰¶
The third and final function, predict_premium
, takes your trained model and a dataframe of contracts, and outputs a price for each of these contracts. You are free to set this prices however you want! These prices will then be used in competition with other models: contracts will choose the model offering the lowest price, and this model will have to pay the cost if an accident occurs.
This is the function used to compute the profit leaderboard: your model will participate in many markets of size 10, populated by other participants' model, and we compute the average profit of your model over all the markets it participated in.
predict_premium <- function(model, x_raw){
# Model prediction function: predicts premiums based on the pricing model.
# This function outputs the prices that will be offered to the contracts in X_raw.
# premium will typically depend on the average claim predicted in
# predict_expected_claim, and will add some pricing strategy on top.
# This is the function used in the average profit leaderboard. Prices output here will
# be used in competition with other models, so feel free to use a pricing strategy.
# Parameters
# ----------
# X_raw : Dataframe, with the columns described in the data dictionary.
# Each row is a different contract. This data has not been processed.
# Returns
# -------
# prices: a one-dimensional array of the same length as X_raw, with one
# price per contract (in same order). These prices must be POSITIVE (>0).
# YOUR CODE HERE ------------------------------------------------------
# x_clean = preprocess_X_data(x_raw) # preprocess your data before fitting
return(predict_expected_claim(model, x_raw))
}
prices <- predict_premium(model, Xdata)
as.matrix(head(prices))
1 | 114.1812 |
---|---|
2 | 114.1812 |
3 | 114.1812 |
4 | 114.1812 |
5 | 114.1812 |
6 | 114.1812 |
Profit on training data¶
In order for your model to be considered in the profit competition, it needs to make nonnegative profit over its training set. You can check that your model satisfies this condition below:
print(paste('Income:', sum(prices)))
print(paste('Losses:', sum(ydata)))
if (sum(prices) < sum(ydata)) {
print('Your model loses money on the training data! It does not satisfy market rule 1: Non-negative training profit.')
print('This model will be disqualified from the weekly profit leaderboard, but can be submitted for educational purposes to the RMSE leaderboard.')
} else {
print('Your model passes the non-negative training profit test!')
}
[1] "Income: 26057988.08" [1] "Losses: 26057988.08" [1] "Your model is invalid: it loses money on its training data!"
Ready? Submit to AIcrowd 🚀¶
If you are satisfied with your code, run the code below to send your code to the AICrowd servers for evaluation! This requires the variable trained_model
to be defined by your previous code.
Make sure you have included all packages needed to run your code in the "Packages" section.
aicrowd_submit(AICROWD_API_KEY)
Warning message in system("curl -sL https://gitlab.aicrowd.com/jyotish/pricing-game-notebook-scripts/raw/master/r/submit.sh > submit.sh && bash submit.sh", : “running command 'curl -sL https://gitlab.aicrowd.com/jyotish/pricing-game-notebook-scripts/raw/master/r/submit.sh > submit.sh && bash submit.sh' had status 1”
🚀 Preparing to submit... ⚙️ Collecting the submission code... 💾 Preparing the submission zip file... 🚫 Failed to login to aicrowd 😢