I had a few questions about how the agent will be evaluated:

  1. Will the 5 evaluation seeds have themes that we will not see? In other words, is this challenge closer to weak or strong generalization mentioned in the paper?
  2. Does the evaluation environment use sparse or dense reward function?
  3. Is there a restriction on the number of submissions available to each participant per day?

Hi seungjaeryanlee,

For Round 1, we will be doing “weak generalization.” The environment uses dense rewards, and we are currently limiting participants to 100 submissions per round. We may adjust/open this restriction however depending on the server load of the system.