Baseline for DOTAW

Baseline for the challenge DOTAW

Open In Colab

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

Download data

In [ ]:
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dotaw/data/public/test.zip
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dotaw/data/public/train.zip
!unzip train.zip
!unzip test.zip

Load Data

In [3]:
train_data = pd.read_csv('train.csv')

Analyse Data

In [4]:
train_data.head()
Out[4]:
winner cluster_id game_mode game_type hero_0 hero_1 hero_2 hero_3 hero_4 hero_5 ... hero_103 hero_104 hero_105 hero_106 hero_107 hero_108 hero_109 hero_110 hero_111 hero_112
0 -1 223 2 2 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 1 152 2 2 0 0 0 1 0 -1 ... 0 0 0 0 0 0 0 0 0 0
2 1 131 2 2 0 0 0 1 0 -1 ... 0 0 0 0 0 0 0 0 0 0
3 1 154 2 2 0 0 0 0 0 0 ... -1 0 0 0 0 0 0 0 0 0
4 -1 171 2 3 0 0 0 0 0 -1 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 117 columns

In [5]:
train_data.describe()
Out[5]:
winner cluster_id game_mode game_type hero_0 hero_1 hero_2 hero_3 hero_4 hero_5 ... hero_103 hero_104 hero_105 hero_106 hero_107 hero_108 hero_109 hero_110 hero_111 hero_112
count 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000 ... 92650.000000 92650.000000 92650.000000 92650.000000 92650.0 92650.000000 92650.000000 92650.000000 92650.000000 92650.000000
mean 0.053038 175.864145 3.317572 2.384587 -0.001630 -0.000971 0.000691 -0.000799 -0.002008 0.003173 ... -0.001371 -0.000950 0.000885 0.000594 0.0 0.001025 0.000648 -0.000227 -0.000043 0.000896
std 0.998598 35.658214 2.633070 0.486833 0.402004 0.467672 0.165052 0.355393 0.329348 0.483950 ... 0.535024 0.206112 0.283985 0.155940 0.0 0.220703 0.204166 0.168707 0.189868 0.139033
min -1.000000 111.000000 1.000000 1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000 ... -1.000000 -1.000000 -1.000000 -1.000000 0.0 -1.000000 -1.000000 -1.000000 -1.000000 -1.000000
25% -1.000000 152.000000 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
50% 1.000000 156.000000 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
75% 1.000000 223.000000 2.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
max 1.000000 261.000000 9.000000 3.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 1.000000 1.000000 1.000000 0.0 1.000000 1.000000 1.000000 1.000000 1.000000

8 rows × 117 columns

Split Data into Train and Validation

In [6]:
X = train_data.drop('winner',1)
y = train_data['winner']
# Validation testing
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

Define the Classifier and Train

In [7]:
classifier = LogisticRegression()
classifier.fit(X_train,y_train)
/home/gera/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
Out[7]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

Predict on Validation

In [9]:
y_pred = classifier.predict(X_val)
In [10]:
df = pd.DataFrame({'Actual': y_val, 'Predicted': y_pred})
df1 = df.head(25)
df1
Out[10]:
Actual Predicted
26389 1 -1
55196 -1 1
51250 -1 1
25508 1 -1
24128 1 -1
2442 -1 -1
5638 -1 -1
3714 -1 1
36579 -1 1
10399 -1 -1
13464 -1 -1
71600 -1 1
80162 1 -1
7077 1 1
63431 -1 1
78584 1 -1
31413 1 1
13393 1 1
90845 1 1
23339 -1 -1
13756 -1 1
63563 -1 -1
81880 -1 1
77591 -1 -1
23311 1 1

Evaluate the Performance

In [11]:
print('F1 score Score:', metrics.f1_score(y_val, y_pred))  
print('ROC AUC Score:', metrics.roc_auc_score(y_val, y_pred))
F1 score Error: 0.638888888888889
ROC AUC Error: 0.5928579002999843

Load Test Set

In [12]:
test_data = pd.read_csv('test.csv')

Predict Test Set

In [13]:
y_test = classifier.predict(test_data)
In [15]:
df = pd.DataFrame(y_test,columns=['winner'])
df.to_csv('submission.csv',index=False)

To participate in the challenge click here