Restriction Incentive


This is a beginner question, I’m new here. What is the restriction incentive to use the training dataset? Can I just build a model from scratch using RL for the task or what is the restriction that incentives using the training dataset to decrease the time needed for the model to perform well?

Thanks and feel free to correct me if I’m totally off.

There is no explicit restriction to use the dataset.
If you would like to try pure reinforcement learning methods or otherwise you are free to do so!
Note that we feel this would be quite difficult; random exploration with a diamond pickaxe took on average 9 million environment steps before obtaining a diamond! :gem: