search response item in example_data/web.json has the following schema:
[‘page_name’, ‘page_url’, ‘page_snippet’, ‘page_result’, ‘page_last_modified’, ‘search_result_type’]
and ‘search_result_type’ indicates value of ‘page_result’ is Web Content(html) or Snippet.
but in crag_task_1_v1.jsonl dataset.
search response item only has [‘page_name’, ‘page_url’, ‘page_snippet’, ‘page_result’, ‘page_last_modified’]
Differences:
search response items in crag_task_1_v1.jsonl dataset are lack of ‘search_result_type’ argument.
items in crag_task_1_v1.jsonl, Web Content(html) only consist in ‘page result’ and Snippet only consist in ‘page_snippet’?
in crag_task_1_v1 dataset, there are some items only have Web Content (html) and the length of Snippet is zero (line 5, item 5 in this dataset), while some items only have Snippet and the length of html is zero (line 4, item 3 in this dataset).
My questions:
Should I use the argument ‘search_result_type’ to judge which type of content is in ‘page_result’?
Which format should I use, example_data or crag_task_1_v1?
your submission will only have access to the query and the search_results
Won’t you provide field query_time ? Will it include exact time and timezone ?
Based on the 10 provided samples and the API results (endpoint finance/get_price_history), the answer always match 2024-02-14 at “Close”. However I am a bit confused because for the subset of HTML files I looked, the query date is always 2024-02-16.