We have added a small utility that makes it painless—and safe—to fetch complete images for any result returned by the cragmm-search-pipeline
.
Pull the latest version of the starter‑kit to pick up the changes described below.
Note: We have also released a utility that pulls full web pages. Check this out here.
1. What changed?
File | Purpose |
---|---|
crag_image_loader.py |
New helper class. Handles on‑demand download, local caching under ~/.cache/crag/image_cache.
|
docs/search_api.md |
New “Fetching the images” section with a copy‑paste example. |
No existing APIs are removed or renamed.
2. Quick‑start
from crag_image_loader import ImageLoader
import matplotlib.pyplot as plt
image_url = "https://upload.wikimedia.org/wikipedia/commons/b/b2/The_Beekman_tower_1_%286214362763%29.jpg"
image = ImageLoader(image_url).get_image() # Get image from url
results = search_pipeline(image, k = 2)
print(f"Image search results for: '{image_url}'\n")
for result in results:
print(result)
plt.imshow(ImageLoader(result['url']).get_image()) # Show the images
print('\n')
That’s all—you do not need to call requests yourself.
3. How the helper works.
-
First run (local development)
- Downloads the image at url and stores it in the cache directory
~/.cache/crag/image_cache
(override withCRAG_IMAGE_CACHE_DIR
).
- Downloads the image at url and stores it in the cache directory
-
Subsequent runs
- Reads straight from cache—zero network overhead.
-
Evaluation phase
- The results for the
get_image()
method are pre‑populated in the evaluation container, so no download attempt is made and your code remains identical.
- The results for the
4. Guidelines & caveats
-
Use only the URLs returned by
cragmm-search-pipeline
and those in the dataset.
Trying to fetch other sites will fail during evaluation (outbound internet is disabled). -
The local cache size is entirely up to you; clean or relocate it if needed.
-
There are some broken image urls in the dataset and indices. In local runs, trying to download from them will cause 403 errors, and you are expected to handle them locally. However, in the online evaluator, a blank image will be returned upon these urls, and you do not need to handle it.
5. Action required
-
git pull
(or re‑clone/re-fork) the starter‑kit. -
When you want to load an image, wrap its url with
ImageLoader
and get the image with.get_image()
. -
Run your usual tests to confirm everything works as expected.
If you hit any issues, open a thread in the forum and tag the organisers—happy hacking!