Help! older submissions say 'failed' and 'graded successfully' at the same time ; new one says Gym server stopped

likhitha_nanda · May 12, 2025, 7:59pm

Hi! my past submissions show as ‘failed’ but the Message says ‘Graded Successfully! …’

Also, my latest submission says “Gym server stopped unexpectedly. Please contact the admins. Check the submission page for more details.”

Could you please help!?

likhitha_nanda · May 12, 2025, 8:36pm

Please ignore my previous post, was able to get past that. In my recent submission, it shows ‘Evaluation failed with exit code 1’ but fails to show the logs :

Could you please help me understand why it errored out?

yilun_jin8 · May 13, 2025, 5:18am

2025-05-13 02:30:34.239	
[rank0]: RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
2025-05-13 02:30:34.239	
[rank0]:     raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/build.py", line 32, in _build
2025-05-13 02:30:34.239	
[rank0]:     so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
2025-05-13 02:30:34.239	
[rank0]:     mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
2025-05-13 02:30:34.239	
[rank0]:     self.utils = CudaUtils()  # TODO: make static
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 450, in __init__
2025-05-13 02:30:34.239	
[rank0]:     return actives[0]()
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
2025-05-13 02:30:34.239	
[rank0]:     self._obj = self._init_fn()
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
2025-05-13 02:30:34.239	
[rank0]:     self._initialize_obj()
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
2025-05-13 02:30:34.239	
[rank0]:     device = driver.active.get_current_device()
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/jit.py", line 568, in run
2025-05-13 02:30:34.239	
[rank0]:     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/triton/runtime/jit.py", line 330, in <lambda>
2025-05-13 02:30:34.239	
[rank0]:     apply_token_bitmask_inplace_kernel[grid](
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/xgrammar/kernels/apply_token_bitmask_inplace_triton.py", line 106, in apply_token_bitmask_inplace_triton
2025-05-13 02:30:34.239	
[rank0]:     apply_token_bitmask_inplace_triton(logits, bitmask, vocab_size, indices)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/xgrammar/matcher.py", line 146, in apply_token_bitmask_inplace
2025-05-13 02:30:34.239	
[rank0]:     xgr.apply_token_bitmask_inplace(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/guided_decoding/xgrammar_decoding.py", line 395, in __call__
2025-05-13 02:30:34.239	
[rank0]:     logits_row = logits_processor(past_tokens_ids, logits_row)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py", line 195, in _apply_logits_processors_single_seq
2025-05-13 02:30:34.239	
[rank0]:     _apply_logits_processors_single_seq(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py", line 170, in _apply_logits_processors
2025-05-13 02:30:34.239	
[rank0]:     logits = _apply_logits_processors(logits, sampling_metadata)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py", line 83, in forward
2025-05-13 02:30:34.239	
[rank0]:     return forward_call(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-05-13 02:30:34.239	
[rank0]:     return self._call_impl(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-05-13 02:30:34.239	
[rank0]:     logits = self.logits_processor(self.language_model.lm_head,
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/models/mllama.py", line 1221, in compute_logits
2025-05-13 02:30:34.239	
[rank0]:     logits = self.model.compute_logits(hidden_or_intermediate_states,
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/enc_dec_model_runner.py", line 198, in execute_model
2025-05-13 02:30:34.239	
[rank0]:     return func(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-05-13 02:30:34.239	
[rank0]:     output = self.model_runner.execute_model(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 420, in execute_model
2025-05-13 02:30:34.239	
[rank0]:     return func(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
2025-05-13 02:30:34.239	
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
2025-05-13 02:30:34.239	
[rank0]:     output = self.collective_rpc("execute_model",
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 140, in execute_model
2025-05-13 02:30:34.239	
[rank0]:     outputs = self.model_executor.execute_model(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1431, in step
2025-05-13 02:30:34.239	
[rank0]:     step_outputs = self.llm_engine.step()
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1409, in _run_engine
2025-05-13 02:30:34.239	
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 470, in generate
2025-05-13 02:30:34.239	
[rank0]:     return fn(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/utils.py", line 1134, in inner
2025-05-13 02:30:34.239	
[rank0]:     outputs = self.llm.generate(
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/agents/vanilla_llama_vision_agent_ln.py", line 274, in batch_generate_response
2025-05-13 02:30:34.239	
[rank0]:     return fn(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 159, in run_with_timeout
2025-05-13 02:30:34.239	
[rank0]:     return run_with_timeout(
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 124, in batch_generate_response
2025-05-13 02:30:34.239	
[rank0]:     return method(*args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 144, in execute
2025-05-13 02:30:34.239	
[rank0]:     return self.execute(target_attribute, *args, **kwargs)
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 131, in route_agent_request
2025-05-13 02:30:34.239	
[rank0]:     "data": self.route_agent_request(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 101, in process_request
2025-05-13 02:30:34.239	
[rank0]:     raw_response, status, message = self.process_request(
2025-05-13 02:30:34.239	
[rank0]:   File "/usr/local/lib/python3.10/site-packages/aicrowd_gym/clients/base_oracle_client.py", line 195, in run_agent
2025-05-13 02:30:34.239	
[rank0]:     oracle_client.run_agent()
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 176, in serve
2025-05-13 02:30:34.239	
[rank0]:     serve()
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 189, in main
2025-05-13 02:30:34.239	
[rank0]:     main()
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 196, in <module>
2025-05-13 02:30:34.239	
[rank0]:     raise exc
2025-05-13 02:30:34.239	
[rank0]:   File "/aicrowd-source/launcher.py", line 211, in <module>
2025-05-13 02:30:34.239	
[rank0]: Traceback (most recent call last):

This looks like some errors caused by NVIDIA triton.

likhitha_nanda · May 13, 2025, 12:14pm

Thanks for your response! Could you also please check why this submission failed?
Submission #284400

yilun_jin8 · May 13, 2025, 1:21pm

This is some json decode error.

2025-05-13 11:06:38.462	
[rank0]: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 57 (char 56)
2025-05-13 11:06:38.462	
[rank0]:     obj, end = self.scan_once(s, idx)
2025-05-13 11:06:38.462	
[rank0]:   File "/usr/local/lib/python3.10/json/decoder.py", line 353, in raw_decode
2025-05-13 11:06:38.462	
[rank0]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2025-05-13 11:06:38.462	
[rank0]:   File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
2025-05-13 11:06:38.462	
[rank0]:     return _default_decoder.decode(s)
2025-05-13 11:06:38.462	
[rank0]:   File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
2025-05-13 11:06:38.462	
[rank0]:     response_json = json.loads(response)

likhitha_nanda · May 13, 2025, 1:36pm

Ah! That makes sense. Thanks a ton.

So, since I couldn’t look at the logs and the code was running with no issues on my end, I had been making several submissions (all failed) since yesterday. Will these be counted towards the 6-submissions limit?

likhitha_nanda · May 13, 2025, 5:00pm

@yilun_jin8 Thanks again for the help! Am able to successfully submit now!