Hi everyone! I created this thread for anyone that wants to share their solution. I really want to know how people could push the discomfort below .2
My Solution was an Actor Critic, best score was .74 and we finish at position 19 on the public leaderboard. Here are some specific to my solution:
- There was no central agent in my solution, each building shared one policy and critic network
- The reward function was discomfort plus the absolute difference of net electricity consumption between two consecutive periods.
- For the critic training we apply the symlog function to the rewards, and discretize the symlog rewards using two hot encoding (base on dreamer-v3)
- The actor network was trained by reinforce, normalizing the rewards by computing exponential average of the 5th and 95 quantiles.
- Transform the day and hour as a function of sine and cosine
- Transform the day and hour using splines of degree 3
- Min max scale all other observations.