We are constantly trying to make this competition better for everyone and would really appreciate your feedback.
Feel free to reply to this thread with your suggestions and feedback on making the competition better for you!
- What have been your major pain points so far?
- What would you like to see improved?
Are you planning to use human feedback?
We would be happy to accommodate any type of learning from human feedback you might have planned! Let us know on this thread or via DMs and we will polish the system for your use case in advance!
Some useful links:
- DAGGER (Dataset Aggregation) method
- Reward learning from human preferences and demonstrations in Atari
- OpenAI | Learning from Human Preferences
- OpenAI | Learning to Summarize with Human Feedback
- DeepMind | Scalable agent alignment via reward modeling
- Reward-rational (implicit) choice: A unifying formalism for reward learning
In case, you want to share it privately, you can do so by sending it via DM to us.