Do you see submissions fail at the ranker after the last fix?

felipe_b · October 31, 2022, 8:51pm

HI I am still seeing failures as before. It fails at ranker evaluation and then no logs are shown. Is this happening to anyone else? Maybe it is an error that is only occurring now for certain evaluations?

For example see #204968 which I believe was resubmited from the host side after the fix.

@dipam @dipam_chakraborty @marcalexandre_cote

dipam · October 31, 2022, 9:11pm

Hi @felipe_b

We’re not providing logs apart from the validation runs since those can be used can be used to leak data.

The previous issues we had were with fetching valid logs which is fixed.

Your run probably failed because some value is hardcoded that is not present in the test dataset, if you still want the logs please make a new submission and I can get them for you separately.

felipe_b · October 31, 2022, 9:19pm

Thank you Dipam. Sorry for the confusion. I am resubmitting and I will let you know, but in any case it is weird because the new submit is essentially an old one ( that run successfully and as far as I can tell does not hardcode anything) with different model weights.

felipe_b · October 31, 2022, 9:21pm

Here is my new submission AIcrowd
And thank you very much,

felipe_b · October 31, 2022, 11:08pm

It has failed again. @dipam note that it does not even show logs for the validation parts, it does not show any log at all. Although it would be nice to know what is going on with the ranker.

rein20 · October 31, 2022, 11:29pm

Same situation as @felipe_b. I submitted twice but getting “Clariq Ranker Validation: Inference failed” with no debug logs. @dipam

tatianak · November 1, 2022, 8:58am

Hi everyone,
I also experienced the same issues as @felipe_b and @rein20:

When testing locally with local_evaluation.py everything seemed to be fine, no logs are available, just failed inference. Please have a look @dipam if there is indeed some issue with the ranker.
Thank you!

zubin_gou · November 1, 2022, 9:02am

Same issue, we’ve been failing to submit all day and haven’t gotten any feedback or logs:

Issue13
Issue12
Issue11
Issue10
Issue9

rein20 · November 1, 2022, 9:47am

I figured it out without looking at the logs and for my case, it was CUDA OOM error.
Decreasing GPU Memory usage led to successful submission. (Note that server has T4 16GB, which may be different from other competitions)
Not being able to look at the logs cost me last day’s 2 submissions though.

Good luck to everyone, 2 hours left to go.

felipe_b · November 1, 2022, 9:55am

Thank you, In my case it fails at clariq ranker and does not provide logs from any steps. I have ran a very similar code with weights of the same size before successfully so all the packages being equal it should not be OOM.