📢 Update of the search API (v0.4.0) and indices (v0.4)

yilun_jin8 · April 25, 2025, 3:20am

Dear participants,

We are happy to pass along the info from the Meta team in charge of the search API and the search indices. A new version of the search API has been released:

cragmm-search-pipeline==0.4.0
web search and image search indices v0.4.

Please use the latest versions of the indices to develop your solutions.

Major changes include:

upgraded web search API: enabled web chunk level similarity search, while previous version is at page level. page urls are deduped during retrieval. This should lead to better retrieval quality.
enable downloading the index from Huggingface repo by tags. previous index was tagged with v0.3. current index is v0.4 / main. we recommend always loading from main.
reset num_threads to os.cpu_count() when loading the web/image search index. This should solve the instability in using the cragmm-search-pipeline on nodes with < 10 vCPUs

yikuan_xia · April 27, 2025, 5:36pm

can we get access to the retrieved image using the image_url on the evaluation server?

jyotish · April 27, 2025, 6:16pm

Evaluation server would have the image cache pre-populated so calls from cragmm_search package should work as expected. Internet access is disabled during evaluations.

yikuan_xia · April 27, 2025, 6:25pm

which attribute should we access to get the retrieved image? Is there an example how we can acceess the image? (Not the image_url,but the raw image in the retrieval results)

yikuan_xia · April 28, 2025, 3:49pm

Can you clarify whether we can use the image in the image retrieval output or can we only use the text descriptions of the retrieved images? If we can use the image, how can we access it during evaluation. This is crucial because it determines the input information of the RAG system.

Tarou · April 29, 2025, 4:50am

Thanks for the update. Can I understand that we have to use this search lib and the participants should not change the search package? Thanks