Round 2 is open for submissions πŸš€

Hello all!

Thank you for your participation and enthusiasm during round 1! We are accepting submissions for round 2 for the Top 50 teams from Round 1.

Hardware available for evaluations

The evaluations will run on the following hardware (AWS sagemaker ml.p3.2xlarge instances):

vCPUs 8
GPU 16 GB Tesla V100

Note: The training and rollouts for each environment will run on a separate node.

Evaluation configuration

The configuration used during evaluations is available at FAQ: Round 1 evaluations configuration.


This round will run on six public (coinrun, bigfish, miner, plunder, starpilot, chaser) and four private environments.


The final score will be an average of mean normalized rewards for public environments and the private environment

Score = \frac{1}{12}*R_{coinrun} + \frac{1}{12}*R_{bigfish} + \frac{1}{12}*R_{miner} + \frac{1}{12}*R_{chaser} + \frac{1}{12}*R_{starpilot} \\ + \frac{1}{12}*R_{plunder} + \frac{1}{8}*R_{privateEnv1} + \frac{1}{8}*R_{privateEnv2} + \frac{1}{8}*R_{privateEnv3} + \frac{1}{8}*R_{privateEnv4}

R_{env} = Mean normalized reward for env.


Will we be able to choose which submission to use for the final 16+4 evaluation? It might be the case that our best solution that was tested locally on 16 envs is not the same as the best one for the 6+4 envs on public LB.


Hi @karolisram,

We were thinking about picking top 3 submissions, but I think it makes sense to let participants pick 3 submissions themselves.

We will float Google Form at the end of Round 2 for the same. (picking top-3 as default, if participant don’t fill it)



Sounds good, thanks @shivam . Could you please also give us the normalization factors for the 4 private envs (Rmin, Rmax) ?

It has been updated in the starter kit, along with few more changes for Round 2 (old fork will continue to work), here:


Hello @karolisram

As @shivam shared, the return_min, return_blind, return_max for the public envs are available in the starter kit. Please refer Min and Max rewards for an environment for information on how to use them in your code.


What about the generalization benchmark? Are we training with unlimited or only 200 episode seeds now?

Hello @jurgisp

In round 2, we will train on unlimited levels. Towards the end of round 2, we will run two tracks, one for generalization and one for sample efficiency on all 16 public envs and 4 private envs.

1 Like

Hi @jyotish

What’s the use of the blind reward?

Hi @jyotish
Can you tell me the score for the private environment? I need a score for gemjourney, hovercraft, safezone.

@Paseul I think the normalization ranges for the private envs can be obtained if you log the β€œreturn max” and β€œreturn blind” as custom metrics. I haven’t tried it though.

Hi! I didn’t receive any emails or notifications about Round 2 starting, does that mean I’m out of the competition?
update: I seem able to make submissions still.

Hello @Paseul

These are the [return_blind, return_max] for the private envs.

'caterpillar': [8.25, 24],
'gemjourney': [1.1, 16],
'hovercraft': [0.2, 18],
'safezone': [0.2, 10],

The return_min for all the four envs is 0. Please note that return_blind is used as the minimum reward when normalizing the rewards.

Can you give any more details on when the generalization track evaluations will start? Thanks.

1 Like