I want to understand if we can use other LLM (not LLAMA2 family) during the traning stage, specifically, used for RLHF and Data Generation.
Below is the raw request for model:
This KDD Cup requires participants to use Llama models to build their RAG solution. Specially, participants can use or fine-tune the following 4 Llama 2 models from https://llama.meta.com/llama-downloads:
This means we can use other llms to generate the answer or just use it to genrate data for training. Is the use of private model api such as GPT4 or Claude for gen the data for training allowed or not?
The exact constraints have been specified in the challenge overview and rules.
Below is a copy of “USE OF EXTERNAL RESOURCES” from the challenge overview:
By only providing a small development set, we encourage participants to exploit public resources to build their solutions. However, participants should ensure that the used datasets or models are publicly available and equally accessible to use by all participants. Such a constraint rules out proprietary datasets and models by large corporations. Participants are allowed to re-formulate existing datasets (e.g., adding additional data/labels manually or with Llama models), but award winners are required to make them publicly available after the competition.