One more clarification is that even though row_id of test data may differ in evaluation cluster,hope the other content is intact as made available shared path testing_data_full.
Yes, just nltk_data
folder need to be present.
Yes, the content remains same.
git: It says ‘lfs’ is not a git command.Do we need to do anything specific?
Hi, looks like git-lfs isn’t installed on your system.
Can you try sudo apt-get install git-lfs
. (more)
For now,it looks like I was able to upload without lfs.Let me check if that works or not and then come back.
Submission #31962 failed.Could you please trace?
We shall try to put logging for next versions until it is stabilized.
Hi, it is getting killed on running without traceback.
Does it have any high RAM/CPU need?
No.It is running fine on local server and we were able to produce the output too.Does the docker has any memory limit?
The solution have 8GB RAM available.
Edit: out of 8GB, ~5.5GB is available for evaluation code
Any suggestion as to how to trace the error.We are struck at the moment
Hi @maruthi0506,
I can confirm the recent submissions failed due to OOM kill, when they touched memory usage ~5.5G.
Upon debugging #31962, I found it is happening due to Series.str.get_dummies
used in the code, which is not a memory optimised function.
Point at which OOM is happening: https://gitlab.aicrowd.com/maruthi0506/dsai-challenge-solution/blob/master/predict.py#L279
This demonstrates what is happening in your submission along with alternatives which you can use (name of variable changed to hide any potential information getting public on feature used):
(suggested ways #1, decently memory efficient)
>>> something_stack_2 = pd.get_dummies(something_stack)
>>> something_stack_2.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 38981 entries, (0, 0) to (8690, 2)
Columns: 4589 entries, to <removed>
dtypes: uint8(4589)
memory usage: 170.7 MB
(suggested ways #2, most memory efficient, slower then #1)
>>> something_stack_2 = pd.get_dummies(something_stack, sparse=True)
>>> something_stack_2.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 38981 entries, (0, 0) to (8690, 2)
Columns: 4589 entries, to <removed>
dtypes: Sparse[uint8, 0](4589)
memory usage: 304.8 KB
(what your submission is doing -- ~5G was available at this time)
>>> something_stack_2 = something_stack.str.get_dummies()
Killed
NOTE: The only difference between two approaches is Series.str.get_dummies
use “|” as separator by default. In case you were relying on it, can do something like below:
>>> pd.get_dummies(pd.Series(np.concatenate(something_stack.str.split('|'))))
Let us know in case the problem continues after changing this (here and it’s usage anywhere else in your codebase), we will be happy to debug further accordingly.
References:
[1]: https://github.com/pandas-dev/pandas/issues/19618
[2]: https://stackoverflow.com/a/31324037
Thanks Shivam.
That makes sense and we shall work on it.
Hi Shivam,
Is there any way for the users to see the error cause by themselves?It feels odd to post here to know the reason of error every single time without any clue on trace back.
I see reply as below:
"2019-12-24T05:03:59.99079973Z sed: -e expression #1, char 55: unterminated s' command 2019-12-24T05:03:59.991930842Z sed: -e expression #1, char 56: unterminated
s’ command
2019-12-24T05:03:59.992293646Z bash: -c: line 4: unexpected EOF while looking for matching `’’
It is not clear as to where is the error coming from.
I got the same error message.
2019-12-24T05:33:09.51635818Z sed: -e expression #1, char 55: unterminated s' command 2019-12-24T05:33:09.517464192Z sed: -e expression #1, char 56: unterminated
s’ command
2019-12-24T05:33:09.517708095Z bash: -c: line 4: unexpected EOF while looking for matching `’’
Debug mode didn’t help, the same error message.
I think its some scripting (shell script) issue on evaluation side.Someone has to look into this if that’s a common error.
Sorry for the sed
issue, we were trying to provided automated patch to user codes for row_id
which went wrong. I have undo this and requeued all the submissions affected by it now.
Thanks Shivam for the prompt respose.We see that it is in process now.
Hi Shivam,
Our submission #57763 is failing.COuld you please let us know the reason?