We are happy to pass along the info from the Meta team in charge of the search API and the search indices. A new version of the search API has been released:
cragmm-search-pipeline==0.4.0
web search and image search indices v0.4.
Please use the latest versions of the indices to develop your solutions.
Major changes include:
upgraded web search API: enabled web chunk level similarity search, while previous version is at page level. page urls are deduped during retrieval. This should lead to better retrieval quality.
enable downloading the index from Huggingface repo by tags. previous index was tagged with v0.3. current index is v0.4 / main. we recommend always loading from main.
reset num_threads to os.cpu_count() when loading the web/image search index. This should solve the instability in using the cragmm-search-pipeline on nodes with < 10 vCPUs
Evaluation server would have the image cache pre-populated so calls from cragmm_search package should work as expected. Internet access is disabled during evaluations.
which attribute should we access to get the retrieved image? Is there an example how we can acceess the image? (Not the image_url,but the raw image in the retrieval results)
Can you clarify whether we can use the image in the image retrieval output or can we only use the text descriptions of the retrieved images? If we can use the image, how can we access it during evaluation. This is crucial because it determines the input information of the RAG system.