Sometimes even if the submission is made without any changes to the code, the metrics vary drastically after the evaluation. I am not able to figure out why this is happening. Could someone help me understand why this might be happening?
There’s inherent randomness in the training process; RL exacerbates this randomness.
we just pushed updates to the eval code and starter kit including how the metrics are calculated.
Yes, thats right, it might be the case for training… but the submission that I make uses a trained agent and only runs the evaluation. Will that be the case even if we use a trained agent with frozen weights ?