Error - crag batch iteration

We are getting this error now:

Generating responses: 0it [00:00, ?it/s]{‘session_id’: ‘3a2b69dc-7833-4c79-96a5-1d74650e2057’, ‘image’: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=3024x4032 at 0x78B1086D78B0>, ‘image_url’: None, ‘image_quality’: 2, ‘turns’: [{‘interaction_id’: ‘3a2b69dc-7833-4c79-96a5-1d74650e2057’, ‘domain’: 6, ‘query_category’: 0, ‘dynamism’: 0, ‘query’: ‘what is the cost of this scooter?’}], ‘answers’: [{‘interaction_id’: ‘3a2b69dc-7833-4c79-96a5-1d74650e2057’, ‘ans_full’: ‘the vespa gts super 300 costs $7999’}]}
Generating responses: 0it [00:00, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File “/workspace/fer/meta-comprehensive-rag-benchmark-starter-kit/local_evaluation.py”, line 576, in
[rank0]: main()
[rank0]: File “/workspace/fer/meta-comprehensive-rag-benchmark-starter-kit/local_evaluation.py”, line 551, in main
[rank0]: turn_evaluation_results, score_dictionaries = evaluator.evaluate_agent()
[rank0]: File “/workspace/fer/meta-comprehensive-rag-benchmark-starter-kit/local_evaluation.py”, line 409, in evaluate_agent
[rank0]: self.generate_agent_responses(_generation_progress_callback)
[rank0]: File “/workspace/fer/meta-comprehensive-rag-benchmark-starter-kit/local_evaluation.py”, line 220, in generate_agent_responses
[rank0]: for batch_idx, batch in enumerate(tqdm.tqdm(self.batch_iterator, desc=“Generating responses”, disable=not self.show_progress)):
[rank0]: File “/usr/local/lib/python3.10/dist-packages/tqdm/std.py”, line 1181, in iter
[rank0]: for obj in iterable:
[rank0]: File “/workspace/fer/meta-comprehensive-rag-benchmark-starter-kit/crag_batch_iterator.py”, line 280, in iter
[rank0]: for idx in range(len(conv_data[“answers”][“interaction_id”])):
[rank0]: TypeError: list indices must be integers or slices, not str
[rank0]:[W604 16:31:02.813737357 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Distributed communication package - torch.distributed — PyTorch 2.7 documentation (function operator())

Any help?

There are two queries in dataset without an answer. If you use the latest version of crag batch iterator here, it will fix that:

The bad ids are
SESSIONS_TO_SKIP = ["04d98259-27af-41b1-a7be-5798fd1b8e95", "695b4b5c-7c65-4f7b-8968-50fe10482a16"]

2 Likes

Thanks @Chris_Deotte