Is it possible to divide the datasets based on the reward?
If that is possible, participants can take the following approachs
・Has the agent learn the divided datasets in turn.
・Create multiple sub-agents and have each sub-agent learn each of the divided datasets.
Then switch each sub-agent by a single master-agent.
(Like the approach of the team that finished 7th at MineRL 2019, https://openai.com/blog/learning-a-hierarchy/ and so on)
Are these approaches to dividing the datasets based on reward violate the rules?