Accessing the train file and test file in the same predict.py?

shravankoninti · November 29, 2019, 11:05am

Can we access and read the shared folder training files in the predict.py file where the orignal test data is layed out and this is accesed from production evironment path.

I want to read both the dataset i.e., train and test like this.

train_df = pd.read_csv(’/shared_data/data/training_data/training_data_2015_split_on_outcome.csv’)

the above path refers to local path.

test_df = pd.read_csv(AICROWD_TEST_DATA_PATH,index_col=0)

the above path refers to production environment path.

Please clarify this - becuase I want to train the model in predict.py only by using both the files train and test at the same location.

Regards
Shravan

shivam · November 29, 2019, 11:11am

Hi @shravankoninti,

Yes, you can access all the files at the same time during evaluation.

The starter kit have all the information about the environment variable, but let me clarify on the environment variables available during evaluations here as well.

AICROWD_TEST_DATA_PATH: Refers to testing_phase2_release.csv file which is used by evaluator to judge your models in testing phase (soon to be made public)
AICROWD_TRAIN_DATA_PATH: Refers to /shared_data/data/training_data/ in which all of training related files are present.
AICROWD_PREDICTIONS_OUTPUT_PATH: Refers to the path at which your code is expected to output final predictions

Now in your codebase, you can simply do something as follows to load both the files:

AICROWD_TRAIN_DATA_PATH = os.getenv("AICROWD_TRAIN_DATA_PATH", "/shared_data/data/training_data/")
AICROWD_TEST_DATA_PATH = os.getenv("AICROWD_TEST_DATA_PATH", "/shared_data/data/testing_data/to_be_added_in_workspace.csv")
AICROWD_PREDICTIONS_OUTPUT_PATH = os.getenv("AICROWD_PREDICTIONS_OUTPUT_PATH", "random_prediction.csv")


train_df = pd.read_csv(AICROWD_TRAIN_DATA_PATH + 'training_data_2015_split_on_outcome.csv')
# Do pre-processing, etc
[...]
test_df = pd.read_csv(AICROWD_TEST_DATA_PATH, index_col=0)
# Make predictions
[...]
# Submit your answer
prediction_df.to_csv(AICROWD_PREDICTIONS_OUTPUT_PATH, index=False)

I hope the example clarifies your doubt.

shravankoninti · November 29, 2019, 11:33am

Thanks very much!.. This is really helpful.

I hope you update the path for test data and its name ASAP.

Shravan

shivam · November 29, 2019, 11:41am

Sure. Can you point us to the file/link where you find wrong path?

shravankoninti · November 29, 2019, 12:20pm

No No. as of now it is good.

AICROWD_TEST_DATA_PATH = os.getenv(“AICROWD_TEST_DATA_PATH”, “/shared_data/data/testing_data/to_be_added_in_workspace.csv”)

you mentioned “to_be_added_in_workspace.csv” right —> this needs to be replace with testing_phase2_release.csv

Let me know if this is right?

shivam · November 29, 2019, 1:43pm

Yes, this is correct.

maruthi0506 · December 4, 2019, 6:26am

Hi Shivam,

Is ‘AICROWD_PREDICTIONS_OUTPUT_PATH’ customizable?What will be absolute path of this predictions file?Could you please explain with an example?

shivam · December 4, 2019, 7:07am

Hi @maruthi0506,

Your codebase need to read this environment variable i.e. absolute and just write final predictions at that location. The example is in starter kit already as well as in this comment above.

maruthi0506 · December 4, 2019, 7:33am

It just says 'AICROWD_PREDICTIONS_OUTPUT_PATH = os.getenv(“AICROWD_PREDICTIONS_OUTPUT_PATH”, “random_prediction.csv”)
'.But what is the default path for that file? For example,does it need to be in shared data or personal folder that I created or any directory and is it expected to have any predefined name for the output file?.Do we need to mention complete path - /x/y/z/predictions.csv.

shivam · December 4, 2019, 7:49am

Hi,

The default path can be anything of your preference i.e. your workspace based path for testing.

While during evaluation this environment variable will be set always and default value wouldn’t be used.

maruthi0506 · December 4, 2019, 7:56am

“While during evaluation this environment variable will be set always and default value wouldn’t be used.” – What does this line mean?Does it write to some other server for evaluation?

shivam · December 4, 2019, 8:14am

Yes, the evaluations run in seperate servers then your workspaces.