It seems to me all the provided examples get the observations as the result of calling the step function on the environment. However, that means an agent needs to first make an action before getting observations for the first time.
Is it allowed to get observations before calling step the first time? (in order to plan some actions from the very beginning) Without this the agent doesn’t know anything about its location/environment when choosing its first action, which seems suboptimal to me.
Locally I could achieve this by explicitly calling _get_observations() from rail_env. I was wondering if that’s allowed or not (because I couldn’t find this behavior in any example - or maybe I didn’t look carefully enough).