Why did Submission 285226 Fail?

Chris_Deotte · May 18, 2025, 7:15am

Hello admins, can I have more information why submission 285226 failed? The submission page says “Evaluation failed with exit code 1” and “Logs are not available.”. The progress made 85% generating predictions before error. Thank you.

Chris_Deotte · May 18, 2025, 12:08pm

Also, can I have more information why my submission 285303 failed too? I submitted to task 3 and received “Evaluation failed with exit code 1” and “Logs are not available.”. The progress made 13% generating predictions before error. Thank you.

yilun_jin8 · May 18, 2025, 3:50pm

Submission 285226 failed because it timed out while generating predictions.

2025-05-18 10:31:16.477	
[rank0]: TimeoutError: Operation timed out
2025-05-18 10:31:16.477	
[rank0]:     raise TimeoutError("Operation timed out")
2025-05-18 10:31:16.477	
[rank0]:   File "/aicrowd-source/launcher.py", line 149, in _timeout_handler
2025-05-18 10:31:16.477	
[rank0]:     full_text_row_masked_out_mask = full_text_row_masked_out_mask.to(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/models/mllama.py", line 1426, in get_full_text_row_masked_out_mask
2025-05-18 10:31:16.477	
[rank0]:     self.get_full_text_row_masked_out_mask(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/models/mllama.py", line 1469, in forward
2025-05-18 10:31:16.477	
[rank0]:     return forward_call(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-05-18 10:31:16.477	
[rank0]:     return self._call_impl(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-05-18 10:31:16.477	
[rank0]:     hidden_or_intermediate_states = model_executable(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/enc_dec_model_runner.py", line 188, in execute_model
2025-05-18 10:31:16.477	
[rank0]:     return func(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-05-18 10:31:16.477	
[rank0]:     output = self.model_runner.execute_model(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 420, in execute_model
2025-05-18 10:31:16.477	
[rank0]:     return func(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
2025-05-18 10:31:16.477	
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
2025-05-18 10:31:16.477	
[rank0]:     output = self.collective_rpc("execute_model",
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 140, in execute_model
2025-05-18 10:31:16.477	
[rank0]:     outputs = self.model_executor.execute_model(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1431, in step
2025-05-18 10:31:16.477	
[rank0]:     step_outputs = self.llm_engine.step()
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1409, in _run_engine
2025-05-18 10:31:16.477	
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 470, in generate
2025-05-18 10:31:16.477	
[rank0]:     return fn(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 1134, in inner
2025-05-18 10:31:16.477	
[rank0]:     outputs = self.llm.generate(
2025-05-18 10:31:16.477	
[rank0]:   File "/aicrowd-source/agents/rag_agent_61.py", line 514, in batch_visual_summaries
2025-05-18 10:31:16.477	
[rank0]:     visual_summaries = self.batch_visual_summaries(images)
2025-05-18 10:31:16.477	
[rank0]:   File "/aicrowd-source/agents/rag_agent_61.py", line 589, in batch_generate_response
2025-05-18 10:31:16.477	
[rank0]:     return fn(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/aicrowd-source/launcher.py", line 161, in run_with_timeout
2025-05-18 10:31:16.477	
[rank0]:     return run_with_timeout(
2025-05-18 10:31:16.477	
[rank0]:   File "/aicrowd-source/launcher.py", line 125, in batch_generate_response
2025-05-18 10:31:16.477	
[rank0]:     return method(*args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 144, in execute
2025-05-18 10:31:16.477	
[rank0]:     return self.execute(target_attribute, *args, **kwargs)
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 131, in route_agent_request
2025-05-18 10:31:16.477	
[rank0]:     "data": self.route_agent_request(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 101, in process_request
2025-05-18 10:31:16.477	
[rank0]:     raw_response, status, message = self.process_request(
2025-05-18 10:31:16.477	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 195, in run_agent
2025-05-18 10:31:16.476	
[rank0]:     oracle_client.run_agent()
2025-05-18 10:31:16.476	
[rank0]:   File "/aicrowd-source/launcher.py", line 178, in serve
2025-05-18 10:31:16.476	
[rank0]:     serve()
2025-05-18 10:31:16.476	
[rank0]:   File "/aicrowd-source/launcher.py", line 191, in main
2025-05-18 10:31:16.476	
[rank0]:     main()
2025-05-18 10:31:16.476	
[rank0]:   File "/aicrowd-source/launcher.py", line 198, in <module>
2025-05-18 10:31:16.476	
[rank0]:     raise exc
2025-05-18 10:31:16.476	
[rank0]:   File "/aicrowd-source/launcher.py", line 213, in <module>
2025-05-18 10:31:16.476	
[rank0]: Traceback (most recent call last):

yilun_jin8 · May 18, 2025, 3:51pm

Submission 285303 also failed because it triggered a timeout.

2025-05-18 18:54:38.210	
[rank0]: TimeoutError: Operation timed out
2025-05-18 18:54:38.210	
[rank0]:     raise TimeoutError("Operation timed out")
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/launcher.py", line 149, in _timeout_handler
2025-05-18 18:54:38.210	
[rank0]:     samples_lst = samples.tolist()
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 449, in _greedy_sample
2025-05-18 18:54:38.210	
[rank0]:     sample_results = _greedy_sample(seq_groups, greedy_samples)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 614, in get_pythonized_sample_results
2025-05-18 18:54:38.210	
[rank0]:     return get_pythonized_sample_results(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 744, in _sample_with_torch
2025-05-18 18:54:38.210	
[rank0]:     return _sample_with_torch(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 775, in _sample
2025-05-18 18:54:38.210	
[rank0]:     maybe_deferred_sample_results, maybe_sampled_tokens_tensor = _sample(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 287, in forward
2025-05-18 18:54:38.210	
[rank0]:     return forward_call(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-05-18 18:54:38.210	
[rank0]:     return self._call_impl(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-05-18 18:54:38.210	
[rank0]:     next_tokens = self.sampler(logits, sampling_metadata)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/models/mllama.py", line 1230, in sample
2025-05-18 18:54:38.210	
[rank0]:     output: SamplerOutput = self.model.sample(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/enc_dec_model_runner.py", line 208, in execute_model
2025-05-18 18:54:38.210	
[rank0]:     return func(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-05-18 18:54:38.210	
[rank0]:     output = self.model_runner.execute_model(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 420, in execute_model
2025-05-18 18:54:38.210	
[rank0]:     return func(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
2025-05-18 18:54:38.210	
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
2025-05-18 18:54:38.210	
[rank0]:     output = self.collective_rpc("execute_model",
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 140, in execute_model
2025-05-18 18:54:38.210	
[rank0]:     outputs = self.model_executor.execute_model(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1431, in step
2025-05-18 18:54:38.210	
[rank0]:     step_outputs = self.llm_engine.step()
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1409, in _run_engine
2025-05-18 18:54:38.210	
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 470, in generate
2025-05-18 18:54:38.210	
[rank0]:     return fn(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 1134, in inner
2025-05-18 18:54:38.210	
[rank0]:     output = self.llm.generate(
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/agents/rag_agent_63.py", line 219, in verify_answer_with_vllm
2025-05-18 18:54:38.210	
[rank0]:     features2 = self.verify_answer_with_vllm(image, queries[i], rag_context, responses[i], message_histories[i], post='rag')
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/agents/rag_agent_63.py", line 632, in batch_generate_response
2025-05-18 18:54:38.210	
[rank0]:     return fn(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/launcher.py", line 161, in run_with_timeout
2025-05-18 18:54:38.210	
[rank0]:     return run_with_timeout(
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/launcher.py", line 125, in batch_generate_response
2025-05-18 18:54:38.210	
[rank0]:     return method(*args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 144, in execute
2025-05-18 18:54:38.210	
[rank0]:     return self.execute(target_attribute, *args, **kwargs)
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 131, in route_agent_request
2025-05-18 18:54:38.210	
[rank0]:     "data": self.route_agent_request(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 101, in process_request
2025-05-18 18:54:38.210	
[rank0]:     raw_response, status, message = self.process_request(
2025-05-18 18:54:38.210	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 195, in run_agent
2025-05-18 18:54:38.210	
[rank0]:     oracle_client.run_agent()
2025-05-18 18:54:38.210	
[rank0]:   File "/aicrowd-source/launcher.py", line 178, in serve
2025-05-18 18:54:38.210	
[rank0]:     serve()
2025-05-18 18:54:38.209	
[rank0]:   File "/aicrowd-source/launcher.py", line 191, in main
2025-05-18 18:54:38.209	
[rank0]:     main()
2025-05-18 18:54:38.209	
[rank0]:   File "/aicrowd-source/launcher.py", line 198, in <module>
2025-05-18 18:54:38.209	
[rank0]:     raise exc
2025-05-18 18:54:38.209	
[rank0]:   File "/aicrowd-source/launcher.py", line 213, in <module>
2025-05-18 18:54:38.209	
[rank0]: Traceback (most recent call last):

Chris_Deotte · May 18, 2025, 6:04pm

Thank you for the info @yilun_jin8 . Can you explain how the following rules are enforced when our agents use batch prediction?

10-second timeout after the first token is generated.

Only answer texts generated within 30 seconds are considered.

For example, how do you monitor individual answer times under 30 seconds when an agent script uses batch_size = 8. Is the actual rule

Only ANSWER BATCHES generated within “30 x batchsize” seconds are considered.

Also can you provide more detail about “10-second timeout”?

(My same code that triggered a timeout once succeeded when i submitted it a second time. And the second time each answer averaged 6 seconds (52 seconds per batch of 8). So I’m trying to understand what exactly is the competition rule and what exactly happened when I submitted first time. Is it possible that AIcrowd server behaves different at different times? or maybe it is the randomness of text generation.)

Thank you!