Hit a connection error, exiting silently

My all evaluations are successful but the scores are not coming because of the following error.

14%|█▍ | 277/2000 [11:15<3:27:11, 7.21s/it]
14%|█▍ | 278/2000 [11:26<3:59:58, 8.36s/it]
14%|█▍ | 279/2000 [11:37<4:17:18, 8.97s/it]
14%|█▍ | 280/2000 [11:47<4:30:02, 9.42s/it]
14%|█▍ | 281/2000 [11:55<4:16:08, 8.94s/it]
14%|█▍ | 282/2000 [12:03<4:08:46, 8.69s/it]
14%|█▍ | 283/2000 [12:05<3:09:33, 6.62s/it]
14%|█▍ | 284/2000 [12:16<3:42:55, 7.79s/it]
14%|█▍ | 285/2000 [12:26<4:02:44, 8.49s/it]
14%|█▍ | 286/2000 [12:27<3:02:42, 6.40s/it]
14%|█▍ | 287/2000 [12:39<3:50:03, 8.06s/it]
14%|█▍ | 288/2000 [12:49<4:04:10, 8.56s/it]
14%|█▍ | 289/2000 [13:02<4:41:35, 9.87s/it]
14%|█▍ | 290/2000 [13:05<3:40:30, 7.74s/it]
15%|█▍ | 291/2000 [13:06<2:44:39, 5.78s/it]
15%|█▍ | 292/2000 [13:19<3:43:49, 7.86s/it]
15%|█▍ | 293/2000 [13:32<4:31:01, 9.53s/it]
15%|█▍ | 294/2000 [13:34<3:30:36, 7.41s/it]
15%|█▍ | 295/2000 [13:46<4:03:05, 8.55s/it]
15%|█▍ | 296/2000 [13:59<4:44:13, 10.01s/it]
15%|█▍ | 297/2000 [14:15<5:31:23, 11.68s/it]2021-11-27 16:39:48.119 | INFO | aicrowd_gym.clients.zmq_client:_send_request:98 - Hit a connection error, exiting silently

15%|█▍ | 297/2000 [14:35<1:23:37, 2.95s/it]

1 Like

Hello @tahir_javed_cs20d407

This happens when the evaluation times out (2 hours is the current timeout) or the evaluation takes up too much of memory.

But in my local machine, It is working properly. Furthermore, this error is there in all evaluations and it’s not taking more than 1 hour.

Hello @tahir_javed_cs20d407

Can you give us a few submission IDs so that we can debug this further on our end?

165879 -> Evaluation success, but got this error in each evaluation

165813 -> Prev submission that was successful.

@jyotish My submission terminated almost immediately for acrobot. Submission #165890

This shouldn’t have have happened because I did not change anything for Acrobot. Can you please help here too.

Hello @tahir_javed_cs20d407 @utsav_dey_cs20s009

We updated our evaluation setup. We expect these changes to provide better stability during evaluations. We are also exposing a few more metrics from the evaluation on the GitLab issue pages that should give you better insights into how well your agents are performing.

All my evaluations have shown this error instantly without even starting. The same code works perfectly and takes less than a minute to evaluate on my local device. Submission ID: 166049, 166136, 166141
The error I got is:
Evaluate Rollouts: FileNotFoundError: [Errno 2] No such file or directory: ‘/shared/rewards_acrobot.json’

@jyotish @ashivani Pls check this. We are teammates, have been struggling with this since morning.