Hello! Thank you for hosting this exciting challenge.
I had a few questions about how the agent will be evaluated:
- Will the 5 evaluation seeds have themes that we will not see? In other words, is this challenge closer to weak or strong generalization mentioned in the paper?
- Does the evaluation environment use sparse or dense reward function?
- Is there a restriction on the number of submissions available to each participant per day?
Thank you!