Hello, I suspect some dynamic normalization of observations or rewards to happen inside the environment (like normalization depending on preceding trajectories since env was instantiated). My reason is that instantiating the same environment after training gives me worse cumulative rewards although everything is the same (same normalization from my side applied, same weights of NN etc.). I lose about 15% of rewards that way.
So either remove this normalization or provide a way to save its inner state and restore it during submission & evaluation.
Okay I’ve found the issue. When resetting the env, the capacity doesn’t reset …
Thanks @joseph_amigo for this discovery. I have fixed the bug in the latest release, v.1.3.4. Please, see this post.
Thanks for pointing it out @joseph_amigo. I was having the same problem with an error " storage capacity cannot be less than zero".
Additionally i was trying to randomly sample a building’s observation for a given day to return the hour as is used in the ruled based agent(env.observation_space[0].sample()[2]). I noticed the hours were in float with the values were in the range [0,24]. Seems some kind of normalization took place coz the number were not just integers converted to floats but some had numbers like 1.23 etc. Same applies to day in index 1.
I’m trying to use this random sample to train a rule based controller with charging and discharge rate optimized by an evolutionary algorithm. @kingsley_nweye could you please clarify with the normalization or how the floats can be converted to their original hour.
What i’m trying to do is start my training at a random day and not the env.reset() day. For instance in episode 1 i want to train my agent only on day 30, in episode 2, i want to train my agent only on day 55. Is it possible to achieve this in the current env, if so how @kingsley_nweye. I understand i can train for day 1 after the env is reset.
@Chemago thanks for your question. The reason you get a float when randomly sampling otherwise discrete observations like hour is because, the observation_spaces
object is defined using a gym.spaces.Box
which defines the space for continuous observations. The reason for defining all observations in gym.spaces.Box
instead of a mixture of gym.spaces.Box
and gym.spaces.Discrete
is because 1) It is easier to manage and design for 1 gym.spaces
type and the need for random sampling of the discrete observations is rare since they already come pre-defined in the building files.
To set the time steps you want to train on, edit the simulation_start_time_step
and simulation_end_time_step
variables in schema.json.