The MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks competition aims to promote research in learning from human feedback to enable agents that can accomplish tasks without crisp, easily-defined reward functions. Our sponsors have generously provided 20,000 USD:moneybag: in prize money to support this research, with an additional 100,000 USD for especially surprising results (see “Prizes”)!
This is the second iteration of this competition. You can find the page for the BASALT 2021 competition here. Major changes this year include:
- New MineRL simulator version with human-level observation and action-spaces. This change means, for example, that crafting requires opening the inventory UI and using the mouse to craft items.
- Pretrained models trained on different Minecraft tasks, which you are free to use in your solutions as you see fit (e.g., fine-tune for a specific task, use for a specific part of behaviour)
- Prizes to encourage exploring learning from human-feedback, even if the solution does not reach the top performance.
- An “intro” track, in which the task is the original, non-restrictive MineRL competition task “Obtain diamond shovel”, to ease entry to the competition. The MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) competition aims to promote research in learning from human feedback to enable agents that can accomplish tasks without crisp, easily-defined reward functions. Our sponsors have generously provided 20,000 USD:moneybag: in prize money to support this research, with an additional 100,000 USD for especially surprising results (see “Prizes”)!
There are three categories of prizes:
- 1st place: $7,000 USD
- 2nd place: $4,000 USD
- 3rd place: $3,000 USD
- Blue Sky award: $100,000 USD
- Research prizes: $5,000 USD
- Community support: $1,000 USD
Winners. As described in the Evaluation section, we will evaluate submissions using human feedback to determine how well agents complete each of the four tasks. The three teams that score highest on this evaluation will receive prizes of $7,000, $4,000, and $3,000.
Blue Sky award. This award of $100,000 will be given to submissions that achieve a very high level of performance: human-level performance on at least 3 of the 4 tasks. (Human-level performance is achieved if the human evaluators prefer agent-generated trajectories to human demonstrations at least 50% of the time.) If multiple submissions achieve this milestone, the award will be split equally across all of them.
Research prizes. We have reserved $5,000 of the prize pool to be given out at the organizers’ discretion to submissions that we think made a particularly interesting or valuable research contribution. We might give prizes to:
- Submissions that present novel negative results (e.g. a submission that shows that having humans correct the AIs behavior doesn’t help)
- Submissions that get particularly good results given their approach (e.g. best submission based on behavior cloning, or best submission based on learning from preferences)
- Approaches that create interesting agent behavior beyond “solves the task” (e.g. most human-like agent)
- New, interesting knowledge about learning from human feedback (e.g. an empirically validated scaling law that predicts how much human data is required for a given level of performance, or guidelines on how to decide which types of human feedback to use at any given point in fine-tuning)
If you wish to be considered for a research prize, please include some details on interesting research-relevant results in the README for your submission. We expect to award around 2-10 research prizes in total.
Community support. We will award $1,000 of the prize pool at the organizers’ discretion to people who provide community support, for example by answering other participant’s questions, or creating and sharing useful tools.
Learn about the Community Contribution Prize
Looking for teammates
Join the discord channel to meet other challenge participants like you.