Starter‑Kit Update – **`WebSearchResult` helper for full‑page retrieval**

We have added a small utility that makes it painless—and safe—to fetch complete web‑page contents for any result returned by the cragmm-search-pipeline.
Pull the latest version of the starter‑kit to pick up the changes described below.


1. What changed?

File Purpose
agents/rag_agent.py Each search hit is now wrapped in WebSearchResult, so you can access result["page_content"] directly.
crag_web_result_fetcher.py New helper class. Handles on‑demand download, local caching under ~/.cache/crag/web_search_results, and transparent access to all original fields (url, page_snippet, etc.).
docs/search_api.md New “Fetching the full page content” section with a copy‑paste example.

No existing APIs are removed or renamed.


2. Quick‑start

from cragmm_search.search import UnifiedSearchPipeline
from crag_web_result_fetcher import WebSearchResult

search = UnifiedSearchPipeline()
results = search("What to know about Andrew Cuomo?", k=2)

for hit in results:
    hit = WebSearchResult(hit)          # ← wrap once
    print(hit["page_content"][:500])    # full HTML, first 500 chars

That’s all—you do not need to call requests yourself.


3. How the helper works

  • First run (local development)

    • Downloads the page at hit["url"] and stores it in the cache directory
      ~/.cache/crag/web_search_results (override with CRAG_WEBSEARCH_CACHE_DIR).
  • Subsequent runs

    • Reads straight from cache—zero network overhead.
  • Evaluation phase

    • The same page_content field is pre‑populated in the evaluation container, so no download attempt is made and your code remains identical.

4. Guidelines & caveats

  1. Use only the URLs returned by cragmm-search-pipeline.
    Trying to fetch other sites will fail during evaluation (outbound internet is disabled).

  2. The local cache size is entirely up to you; clean or relocate it if needed.

  3. The helper exposes every original key plus the new page_content attribute:

    • hit["page_url"], hit["page_name"], hit["page_snippet"], hit["score"], …

5. Action required

  • git pull (or re‑clone) the starter‑kit.
  • Ensure your code wraps each search result in WebSearchResult before accessing page_content.
  • Run your usual tests to confirm everything works as expected.

If you hit any issues, open a thread in the forum and tag the organisers—happy hacking!


what about the image in the image api output, are such features available for images?

Hi, please check out our latest announcement here