Evaluation resuts vary even if no changes made to the code

Sometimes even if the submission is made without any changes to the code, the metrics vary drastically after the evaluation. I am not able to figure out why this is happening. Could someone help me understand why this might be happening?

2 Likes

There’s inherent randomness in the training process; RL exacerbates this randomness.

1 Like

we just pushed updates to the eval code and starter kit including how the metrics are calculated.

2 Likes

Yes, thats right, it might be the case for training… but the submission that I make uses a trained agent and only runs the evaluation. Will that be the case even if we use a trained agent with frozen weights ?

1 Like