Evaluation Error

shivam · December 22, 2019, 4:53pm

Hi everyone, please make sure that your submissions are creating prediction file with correct row_id. The row_id was not being match strictly till the previous evaluator version and we have added assert for the same now. Due to which the submissions have failed with the row_ids in the generated prediction file do not match that of the ground_truth.

Your solution need to output row_id as shared in the test data and not hardcoded / sequential (0,1,2…). Also note, that row_id can be different on data present on evaluations v/s workspace, to make sure people aren’t hardcoding from that file.

We are trying to apply automatic patch wherever possible, but it need to be ultimately fixed in solutions submitted.

maruthi0506 · December 23, 2019, 3:54am

Hi Shivam,

Do you mean to say that row_id of 5 (example) in the given test data can be different row_id in evaluation test data?Just trying to understand how is the evaluation process - Is it on the pickle model or anything different?

maruthi0506 · December 23, 2019, 7:09am

Hi Shivam,

We modified the code and pushed the new one.Submission id is " #31956".

We get below error now:

“The following containers terminated prematurely. : agent
Please contact administrators, or refer to the execution logs.”.Could you please help?

maruthi0506 · December 23, 2019, 7:15am

I see reply as below:

"2019-12-23T07:07:16.767598332Z Resource [93mstopwords[0m not found.
2019-12-23T07:07:16.767601933Z Please use the NLTK Downloader to obtain the resource:
"
In local server,we downloaded and used it from desktop path.How do we resolve this on evaluation cluster?

Need wordnet,stopwords.

maruthi0506 · December 23, 2019, 8:39am

Tried with nltk.download(XXX).

But same error.Submission - #31960

shivam · December 23, 2019, 11:21am

Hi @maruthi0506,

Yes, the row_id i.e. 5,6,7 in test data provided to you on workspace can be anything say 1000123, 1001010, 100001 (and in random order) in test data present on the server going forward, so we know predictions are being carried out during evaluation.

To use nltk for the evaluation, you need to provide ntlk_data folder in your repository root, which can be done as follows (current working directory: at your repository root):

python -c "import nltk; nltk.download('stopwords', download_dir='./nltk_data')"
python -c "import nltk; nltk.download('wordnet', download_dir='./nltk_data')"

OR (assuming you already have it downloaded in workspace)

cp ~/nltk_data nltk_data

Followed by uploading it to git as:

#> git lfs install   (if not already using git lfs)
#> git lfs track "nltk_data/**"
#> git add nltk_data
#> git commit [...]

Please let us know in case you still face any issue.

maruthi0506 · December 23, 2019, 11:47am

Hope there won’t be any memory (disk space) constraints here.Do we need to execute git commands as-is ?

Our understanding is that this nltk_data folder should be present in out git repository.Is that right?

maruthi0506 · December 23, 2019, 11:50am

One more clarification is that even though row_id of test data may differ in evaluation cluster,hope the other content is intact as made available shared path testing_data_full.

shivam · December 23, 2019, 11:52am

Yes, just nltk_data folder need to be present.

Yes, the content remains same.

maruthi0506 · December 23, 2019, 12:01pm

git: It says ‘lfs’ is not a git command.Do we need to do anything specific?

shivam · December 23, 2019, 12:10pm

Hi, looks like git-lfs isn’t installed on your system.

Can you try sudo apt-get install git-lfs. (more)

maruthi0506 · December 23, 2019, 12:11pm

For now,it looks like I was able to upload without lfs.Let me check if that works or not and then come back.

maruthi0506 · December 23, 2019, 12:22pm

Submission #31962 failed.Could you please trace?

We shall try to put logging for next versions until it is stabilized.

shivam · December 23, 2019, 12:46pm

Hi, it is getting killed on running without traceback.

Does it have any high RAM/CPU need?

maruthi0506 · December 23, 2019, 12:48pm

No.It is running fine on local server and we were able to produce the output too.Does the docker has any memory limit?

shivam · December 23, 2019, 12:54pm

The solution have 8GB RAM available.

Edit: out of 8GB, ~5.5GB is available for evaluation code

maruthi0506 · December 23, 2019, 2:02pm

Any suggestion as to how to trace the error.We are struck at the moment

shivam · December 23, 2019, 11:50pm

Hi @maruthi0506,

I can confirm the recent submissions failed due to OOM kill, when they touched memory usage ~5.5G.

Upon debugging #31962, I found it is happening due to Series.str.get_dummies used in the code, which is not a memory optimised function.
Point at which OOM is happening: https://gitlab.aicrowd.com/maruthi0506/dsai-challenge-solution/blob/master/predict.py#L279

This demonstrates what is happening in your submission along with alternatives which you can use (name of variable changed to hide any potential information getting public on feature used):

(suggested ways #1, decently memory efficient)
>>> something_stack_2 = pd.get_dummies(something_stack)
>>> something_stack_2.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 38981 entries, (0, 0) to (8690, 2)
Columns: 4589 entries,  to <removed>
dtypes: uint8(4589)
memory usage: 170.7 MB

(suggested ways #2, most memory efficient, slower then #1)
>>> something_stack_2 = pd.get_dummies(something_stack, sparse=True)
>>> something_stack_2.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 38981 entries, (0, 0) to (8690, 2)
Columns: 4589 entries,  to <removed>
dtypes: Sparse[uint8, 0](4589)
memory usage: 304.8 KB

(what your submission is doing -- ~5G was available at this time)
>>> something_stack_2 = something_stack.str.get_dummies()
Killed

NOTE: The only difference between two approaches is Series.str.get_dummies use “|” as separator by default. In case you were relying on it, can do something like below:

>>> pd.get_dummies(pd.Series(np.concatenate(something_stack.str.split('|'))))

Let us know in case the problem continues after changing this (here and it’s usage anywhere else in your codebase), we will be happy to debug further accordingly.

References:
[1]: https://github.com/pandas-dev/pandas/issues/19618
[2]: https://stackoverflow.com/a/31324037

maruthi0506 · December 24, 2019, 3:42am

Thanks Shivam.

That makes sense and we shall work on it.

maruthi0506 · December 24, 2019, 5:13am

Hi Shivam,

Is there any way for the users to see the error cause by themselves?It feels odd to post here to know the reason of error every single time without any clue on trace back.