This is a beginner question, I’m new here. What is the restriction incentive to use the training dataset? Can I just build a model from scratch using RL for the task or what is the restriction that incentives using the training dataset to decrease the time needed for the model to perform well?
Thanks and feel free to correct me if I’m totally off.
God bless,