Dynamic normalization inside the env

joseph_amigo · August 1, 2022, 1:58pm

Hello, I suspect some dynamic normalization of observations or rewards to happen inside the environment (like normalization depending on preceding trajectories since env was instantiated). My reason is that instantiating the same environment after training gives me worse cumulative rewards although everything is the same (same normalization from my side applied, same weights of NN etc.). I lose about 15% of rewards that way.

joseph_amigo · August 1, 2022, 2:00pm

So either remove this normalization or provide a way to save its inner state and restore it during submission & evaluation.

joseph_amigo · August 2, 2022, 2:29pm

Okay I’ve found the issue. When resetting the env, the capacity doesn’t reset …

kingsley_nweye · August 2, 2022, 6:34pm

Thanks @joseph_amigo for this discovery. I have fixed the bug in the latest release, v.1.3.4. Please, see this post.

Chemago · August 4, 2022, 6:03am

Thanks for pointing it out @joseph_amigo. I was having the same problem with an error " storage capacity cannot be less than zero".

Additionally i was trying to randomly sample a building’s observation for a given day to return the hour as is used in the ruled based agent(env.observation_space[0].sample()[2]). I noticed the hours were in float with the values were in the range [0,24]. Seems some kind of normalization took place coz the number were not just integers converted to floats but some had numbers like 1.23 etc. Same applies to day in index 1.

I’m trying to use this random sample to train a rule based controller with charging and discharge rate optimized by an evolutionary algorithm. @kingsley_nweye could you please clarify with the normalization or how the floats can be converted to their original hour.

What i’m trying to do is start my training at a random day and not the env.reset() day. For instance in episode 1 i want to train my agent only on day 30, in episode 2, i want to train my agent only on day 55. Is it possible to achieve this in the current env, if so how @kingsley_nweye. I understand i can train for day 1 after the env is reset.

kingsley_nweye · August 11, 2022, 11:04am

@Chemago thanks for your question. The reason you get a float when randomly sampling otherwise discrete observations like hour is because, the observation_spaces object is defined using a gym.spaces.Box which defines the space for continuous observations. The reason for defining all observations in gym.spaces.Box instead of a mixture of gym.spaces.Box and gym.spaces.Discrete is because 1) It is easier to manage and design for 1 gym.spaces type and the need for random sampling of the discrete observations is rare since they already come pre-defined in the building files.

To set the time steps you want to train on, edit the simulation_start_time_step and simulation_end_time_step variables in schema.json.

Chemago · August 11, 2022, 11:12am

Thank for the explanation @kingsley_nweye. Problem Solved