What changes should one make to let evaluation work?

siyuzhou · July 16, 2019, 4:37am

Hi, I used the latest repo for submissions but always got failed evaluation. I notice that despite me not using pytorch for training, I have to include pytorch as a dependency to make it past the image building stage. aicrowd_helper.submit() is indeed included in the original tensorflow code, but the evaluation fails regardless in my case. What else should I do to have a successful submission? I’m sorry if this appears to be a dumb question…

mohanty · July 19, 2019, 8:05pm

@siyuzhou: Can you please point us to the relevant issue with the failed evaluation ?

siyuzhou · July 19, 2019, 8:31pm

Thank you for your response. I’ve had many failed submissions. https://gitlab.aicrowd.com/siyuzhou/neurips2019-disentanglement-challenge/issues/5 is the one I was referring to. I use TensorFlow. I got a successful submission later https://gitlab.aicrowd.com/siyuzhou/neurips2019-disentanglement-challenge/issues/8. The only change I made was to invoke local_evaluation.py in run.sh. What’s causing the failure?

mohanty · July 19, 2019, 8:37pm

@siyuzhou: I pasted the whole error on the relevant issue, but the line of interest seems to be :

2019-07-15T18:30:44.306963517Z AttributeError: 'GFile' object has no attribute 'seekable'

And you should not be adding local_evaluation.py in run.sh that would probably only cause conflicts. If you add the aicrowd_helpers.submit() call, that should trigger the actual evaluation code at our end.

The key idea being, if we trust the local_evaluation.py included, then anyone could very simply modify it to register arbitrary scores. Hence we have the actual evaluation score running in a separate container which computes the score after the training has been done, and the mean representation has been dumped.

siyuzhou · July 19, 2019, 8:53pm

Thank you. I am using the starter kit which has the aicrowd_helpers.submit() included in the end of train_tensorflow.py. I added local_evaluation.py as a hack, in the hope that it would fix inconsistency in environment variables which I suspected was the cause.

The error message you pasted makes very little sense to me… But I’ll try look into it.

mohanty · July 19, 2019, 10:24pm

@siyuzhou: The error seems to be tied to the tensorflow version.
I found something https://github.com/tensorflow/datasets/issues/127

The starter kit uses tensorflow-gpu==1.13.1, can you confirm you are using the same ?

siyuzhou · July 19, 2019, 10:56pm

Yes, I installed the dependencies from the requirement.txt file in the starter kit. The environment.yml exported is here https://gitlab.aicrowd.com/siyuzhou/neurips2019-disentanglement-challenge/blob/master/environment.yml. tensorflow-gpu==1.13.1 and is the one from pip not conda.

mohanty · July 21, 2019, 11:20am

@siyuzhou: Weird !! Can you build your code locally and run the built image by following the instructions here : https://github.com/AIcrowd/neurips2019_disentanglement_challenge_starter_kit/blob/master/FAQ.md

It might be much faster to debug that way.

siyuzhou · July 23, 2019, 3:02am

I built the image following the instructions and ran into no problem. The execution was able to submit.

siyuzhou · July 23, 2019, 5:34am

All my recent submissions get this error after the successful one: “Unable to listen to messages intended for the Oracle. Please contact administrators”. I wasn’t even able to reproduce the successful submission with the same settings. For example, https://gitlab.aicrowd.com/siyuzhou/neurips2019-disentanglement-challenge/issues/12, along with 3 other submissions in a row. I disabled local evaluation in the latest failed submission. What’s going on…

mohanty · July 23, 2019, 7:56am

@siyuzhou: Sorry for the inconvenience. We just pushed a fix for a bug which was potentially the reason for these errors. Also requeued your submission.