I have some more relatively straight forward questions about the submissions.
After reading the rules I understand that it is perfectly OK to use operations research methods instead of reinforcement learning to find the solution, is my understanding correct, please?
The time-step limit for each round is set to 1.5(height + width), however from the starting-kit it is not clear whether there is an actual time limit for each round’s computations. If there is, how is it calculated or what is the general value, please?
Is it possible for participants to modify their run.py script? i.e. Say that I would want to instantiate an object in the script that would actually do the decision making rather than the controller function, is this possible, please? The initial outlook of the run.py script seems somehow constraining to me.
Thank you very much for your answers and clarification.
I also have a lot of questions about the challenge regarded with submissions.
- What hardware are using for submission evaluation?
- How score was calculated in the leaderboard?
- How score will combined in Round 1 and Round 2.
- What maximum possible world size and train count?
And it’s a time to update the overview text, because now we can find a lot of useful information only in discussions.
I’m happy to answer your questions:
- Yes you can use any algorithm you would like to use. Keep in mind that computational complexity will vastly increase in round 2 as we allow for different train speeds.
- Currently you have 8 hour computational time limit to solve 1000 environments. Also if there is no action performed by your controller for 15 minutes the submission scoring will be aborted. (@mohanty anything more to add here?) (Update from @mohanty : you will have access to 3 cores of CPU and 8 GB of RAM. No GPUs are available at this point of time. )
- Yes you can modify you
run.py script . For example: You can pre-compute all the action you want to do (e.g. as a list) and your controller just provides the appropriate actions to the environment at each step. The environment was built with reinforcement learning in mind and thus the action at each step is needed. Therefore OR approaches need to do a little hacking to change their results into lists of actions.
Thanks for your questions here are the answers:
- Each submission has 4 CPU and 16GB of RAM, currently no GPU (@mohanty please update if we have changes here )
- The score is mean percentage of agents who solved their own objective (arrived at their destination). E.g if half the agents arrive at their target in time the score is 0.5. Because there might be several submissions with the same score we also compute the mean reward which is just the mean reward over all agents and all episodes. (reward = -1 for each step, reward = 0 for agent at its target, reward = 1 for all agents if all agents reach target)
- The scores will not be combined. Only scores from round 2 will be considered for the prizes.
- For round 1 the max number of trains is set to 50 at a size of 100x100.
Thank you very much for the quick clarification!
If the scores of round 1 and round 2 won’t be combined, what’s the point of round 1? why not start with round 2 directly (and have some percentage of environments where all agents have the same speed, aka round 1 type of environments - this percentage can even be 0 if it doesn’t represent a sufficiently important case)
@mlerik, Thank you for the clarification. I’m still confusing how final score will be calculated. Here is a quote from the challenge overview:
Hi @RomanChernenko and @mugurelionut
Thanks for your replies. There will be an announcement later today and an update to the rules to clarify how submission scoring works. And how prizes are awarded.
Hi @ryznefil and @mlerik,
Where is it mentioned that the max number of allowed time steps is 1.5*(width+height)? Last time I read only that such a constant exists, but I didn’t see it mentioned (so I assumed it’s hidden and maybe even different for each test case). If it’s indeed fixed at 1.5 for every test case (can anyone confirm?) I would like to use that in my solution.
You are correct with your assumption about the max number of steps allowed in Round 1.
The rail environment will terminate a episode at step = 1.5 (width+height).
ATTENTION: If you plan to use the max number of steps per episode in your code, be sure to make it variable as this is likely to change for the more complex Round 2.
The mean percentage of agents done is calculated as you expected. Number of arrived trains divided by total number of trains and then we take the mean over all episodes.
If we would calculate the mean over all episodes ones we would bias the mean towards the results of the larger envs. In the current setting the bias is towards smaller envs where few agents have more influence on the mean of agents finished.
Hope this answers your question.
Yes, that is perfectly explains calculations.