Starter‑Kit Update – ImageLoader helper for full‑image retrieval

yilun_jin8 · May 13, 2025, 1:07pm

We have added a small utility that makes it painless—and safe—to fetch complete images for any result returned by the cragmm-search-pipeline.

Pull the latest version of the starter‑kit to pick up the changes described below.

Note: We have also released a utility that pulls full web pages. Check this out here.

1. What changed?

File	Purpose
`crag_image_loader.py`	New helper class. Handles on‑demand download, local caching under `~/.cache/crag/image_cache.`
`docs/search_api.md`	New “Fetching the images” section with a copy‑paste example.

No existing APIs are removed or renamed.

2. Quick‑start

from crag_image_loader import ImageLoader
import matplotlib.pyplot as plt

image_url = "https://upload.wikimedia.org/wikipedia/commons/b/b2/The_Beekman_tower_1_%286214362763%29.jpg"
image = ImageLoader(image_url).get_image() 		# Get image from url

results = search_pipeline(image, k = 2)
print(f"Image search results for: '{image_url}'\n")


for result in results:
    print(result)
    plt.imshow(ImageLoader(result['url']).get_image()) # Show the images
    print('\n')

That’s all—you do not need to call requests yourself.

3. How the helper works.

First run (local development)
- Downloads the image at url and stores it in the cache directory
  ~/.cache/crag/image_cache (override with CRAG_IMAGE_CACHE_DIR).
Subsequent runs
- Reads straight from cache—zero network overhead.
Evaluation phase
- The results for the get_image() method are pre‑populated in the evaluation container, so no download attempt is made and your code remains identical.

4. Guidelines & caveats

Use only the URLs returned by cragmm-search-pipeline and those in the dataset.
Trying to fetch other sites will fail during evaluation (outbound internet is disabled).
The local cache size is entirely up to you; clean or relocate it if needed.
There are some broken image urls in the dataset and indices. In local runs, trying to download from them will cause 403 errors, and you are expected to handle them locally. However, in the online evaluator, a blank image will be returned upon these urls, and you do not need to handle it.

5. Action required

git pull (or re‑clone/re-fork) the starter‑kit.
When you want to load an image, wrap its url with ImageLoader and get the image with .get_image().
Run your usual tests to confirm everything works as expected.

If you hit any issues, open a thread in the forum and tag the organisers—happy hacking!