Dear all,
The team has now place the raw MIT/Informa tables for our data wranglers to explore in the /shared_data folders under “raw”. We can start a thread from our data wranglers if there are questions on the data.
As of now, and explained earlier for a variety of reasons, this data is not available in this raw format to the evaluation engine.
Since we will now be providing the test data, you will still be able to wrangle raw data to the test data and pass along additional columns to the evaluator.
NOTICE: It is the responsibility of the team to ensure that any additional data:
- DOES NOT CONTAIN INFORMATION AFTER 2015
- DOES NOT CONTAIN INFORMATION ABOUT THE PHASE 3 TRIAL FOR A LEADERBOARD PREDICTION – note that this is actually encouraged for some of the challenge insights questions!
- Which obviously includes, but is not limited to the outcome of the trial itself
Please Consider: Solutions and predictions by teams choosing to add additional data will be under an additional layer of oversight in both code and model generalizability to ensure no information has been leaked.
If you have observed a performance increase from specific data and would like to pass this into the evaluation cluster – making this available to other participants and to receive better validation that it is not leaking information, this sharing would be recognized especially if demonstrated to enable the competition and is highly encouraged - please email me or post on the forums.