Hi team,
where can I find the original datasets for trian and test.
There must training_phase_2.csv and testing_phase_2_release.csv…
What I see in the starter kit is just the subset of 100 rows data . Can you please share the files path ?
Hi team,
where can I find the original datasets for trian and test.
There must training_phase_2.csv and testing_phase_2_release.csv…
What I see in the starter kit is just the subset of 100 rows data . Can you please share the files path ?
Hi @shravankoninti,
Are you referring in workspace or on evaluator?
In the workspace those files are present in /shared_data/data/
while in the evaluator you can access them using the environment variables AICROWD_TEST_DATA_PATH
.
@shivam Thanks for the reply. Yes I am looking at /shared_data/data/
I see there are 3 files
Questions:
What is the use of the file -1?
In file-2/3 we have no of records = 8,691 records with 72 columns. Please confirm. Do we need to work only on this data as trainset?
But I see in README file (starterkit)you mentioned there will be file named training_phase2.csv with 1600649 records. What is this file? which file is our training dataset? the file with 8691 records?? Please let me know.
Hi,
Consider the files present on /shared_data/data/
on workspace as latest version and the records as correct. The README in starter kit contains number from previous dataset version and can be wrong.
I am not sure about random_number_join.csv
. @kelleni2 might be aware of it?
The test data was originally not intended to be visible other than a sample file for column names and format.
However, we will plan to make the test data available due to various logistical reasons for those who feel they need it. I will create a separate post on that topic.