Hi, I see that the actions correspond to the charging decision of batteries. If I understand correctly, the electricity load, battery, and PV of each building are tied together, i.e. Building A’s battery can’t be charged with building B’s solar generation or be used to supply building B. In that case, the optimal policy for individual buildings should simply be the optimal policy for the district? Then why is multi-agent coordination necessary?
I think you’re right and I’m also confused by this. The evaluation metric is also just the sum of individual buildings’ metrics.
You might be able to increase each building’s performance by incorporating information from other buildings, but that is different from agents actually learning to coordinate.