Image and web search API updates and feedback

Jiaqi · May 27, 2025, 11:36pm

Hi participants,

Please note that the corrupted web search index for validation set was fixed. In addition, we added additional documents to validation and public-test corpus respectively to improve retrieval recall.

No code change is needed if you’re pulling from main. If you specified the tag version in web_hf_dataset_tag, please update from v0.5 to v0.6.

See release notes in the organization card: crag-mm-2025 (CRAG - MM Challenge 2025)

Feel free to leave any questions or feedback regarding image and web search API here. We appreciate your participation and understanding!

Chris_Deotte · May 28, 2025, 9:57pm

FYI, currently we cannot use image RAG for non egocentric images. Because all non egocentric images are resized to 960 x 1280 regardless of their original aspect ratio.

If an image is originally sized 2048 x 1024 for example, it then gets resized to 960 x 1280. So shapes that where originally wide and short become narrow and tall. Then image RAG returns narrow and tall images when we want wide and short.

If this is on purpose, then I suggest that the original image size is provided so that we can correct the aspect ratio before RAG. Thanks.

Jiaqi · May 28, 2025, 10:24pm

@Chris_Deotte Thanks for flagging the issue. This is not intended - resolution for non egocentric images should not be changed. We will fix this soon.

wufanyou · May 30, 2025, 10:13pm

Please check and update the quality of web serach index. I observed that many web results for 0.1.2 validation are not correct.

e.g.:

results = search_pipeline("Arancini")
results[2]
>>> {'index': 'https://en.wikipedia.org/wiki/Italy, https://en.wikipedia.org/wiki/Arancini_chunk_0',
 'score': 0.5711636543273926,
 'page_name': '',
 'page_snippet': '',
 'page_url': 'https://en.wikipedia.org/wiki/Italy, https://en.wikipedia.org/wiki/Arancini'}

What’s the reason that a search result could return two url and split by ',' and how diffenent webpage could have same score? No matter in what kind of scenario, e.g., (the chunking makes the content idendical), those results should be flatted.

Chris_Deotte · May 30, 2025, 10:29pm

Note that the images are being resized incorrectly in the latest fix to crag_batch_iterator.py here. With the recent code update in the starter kit, now egocentric images are not being resized and non-egocentric images are being resized to 960x1280. In other words, the recent update has made the situation worse.

jyotish · May 31, 2025, 3:28am

Apologies for this

Updated the starter kit!

Jiaqi · May 31, 2025, 4:41am

@Chris_Deotte Sorry for the trouble. please see @jyotish 's response below. It should be corrected now.
Note that this bug affected starter kit only. Submission has the correct set up.

yikuan_xia · May 31, 2025, 5:11am

The quality of the new search_pipeline for web is far worse than the previous version.
1.Why the currecnt public-test set and validatation set has similar number of queries, but the corresponding pipeline for web search have different number of entries. (public-test web search~130k,current validation web search~70k). This indicates that many information can’t be found in the current validation web search pipeline
2. I randomly check 10 entities in the ground truth validation dataset, 5 of them can not be found in the search pipeline, (the entites are: millipedes, gatorade, suits TV show, st.patrick’s cathedral, hugo boss)). In the last round validation pipeline and the claimed settings, we can find almost all related information in the pipeline. There must be a problem in the current version of web search pipeline. Besides, the end-2-end performance of our local baseline is worsened a lot using the current search pipeline compared with the pipeline proposed in phase 1. Can you check whether there’s problem with the web search index in the current version?

Jiaqi · May 31, 2025, 6:07am

@yikuan_xia
for #1, public-test index covers search corpus for both validation and public-test set.
Validation index covers search corpus for validation set only.
That is why the size is different and size of public-test set almost doubles the validation index.
Nevertheless, can expect similar behavior and recall across validation and public-test index.

for #2, thanks for raising the concern. We will investigate the quality issue and get back on this.

Jiaqi · June 1, 2025, 8:21am

@wufanyou thanks for flagging. Our team will fix the corrupted validation index soon.
do you observe any issue with public-test index?

Jiaqi · June 1, 2025, 8:24am

@yikuan_xia Our team will fix the corrupted validation index soon. The public-test index should be working as expected.

Chris_Deotte · June 1, 2025, 11:53pm

Hi. Please tell @yilun_jin8 and @jyotish to fix the leaderboard. (Details in discussion here). A working leaderboard will motivate participants to work harder and you and Meta will receive a better solution! Thank you.

Jiaqi · June 2, 2025, 8:06am

Hi @Chris_Deotte, agree with you and sorry to know that the leaderboard is not working. AIcrowd team is aware of this issue and working on the fix. will keep everyone posted.

Jiaqi · June 2, 2025, 9:31pm

@Chris_Deotte phase 2 leaderboard is back for all the tasks!