i find server maybe errors in the first 10 examples.
âinteraction_idâ: â7b267839-848a-409c-b74e-33ce86d3f5a6â,
âquery_timeâ: â03/21/2024, 23:51:43 PTâ,
âdomainâ: âmusicâ,
âquestion_typeâ: âsimpleâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âwhatâs the date of after 7âs last song/album?â,
âanswerâ: â1997-03-11â,
âalternative_answersâ: [],
there is no information like 1997-03-11 in refs.
âinteraction_idâ: âdcf34e25-e5a1-4afb-8233-e357fdc1ed97â,
âquery_timeâ: â03/21/2024, 23:45:52 PTâ,
âdomainâ: âmusicâ,
âquestion_typeâ: âsimpleâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âwhen did one for all start performing together?â,
âanswerâ: â1993â,
âalternative_answersâ: [],
the answer is 1997?
âinteraction_idâ: âb0f8ce10-4511-4f1b-9cf3-29d1c63c99e5â,
âquery_timeâ: â03/17/2024, 16:50:22 PTâ,
âdomainâ: âfinanceâ,
âquestion_typeâ: âsetâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âwhat are the top 3 tech stocks that rise in value in january 2024â,
âanswerâ: âtop 3 tech stocks that rise in value in january 2024 are nvidia, oracle and netflixâ,
âalternative_answersâ: [],
The refs mentioned a lot of stocks and not only the three.
âinteraction_idâ: â6ad1e71e-273c-4433-b671-c3beacefd3adâ,
âdomainâ: âmovieâ,
âquestion_typeâ: âsimpleâ,
âstatic_or_dynamicâ: âslow-changingâ,
âqueryâ: âis the original dialogue of louis, martin & michael different?â,
âanswerâ: âenâ,
âalternative_answersâ: [],
âinteraction_idâ: â7bb29eb4-12f9-45f9-bf8a-66832b3c8962â,
âquery_timeâ: â03/10/2024, 23:19:21 PTâ,
âdomainâ: âsportsâ,
âquestion_typeâ: âpost-processingâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âhow many 3-point attempts did steve nash average per game in seasons he made the 50-40-90 club?â,
âanswerâ: â4 3-points attempts per gameâ,
4 3-points attempts per game not mentioned in refs
âinteraction_idâ: â68261da9-bf1a-4675-b0b8-ffd0c2f0c4c9â,
âquery_timeâ: â03/19/2024, 23:50:28 PTâ,
âdomainâ: âmovieâ,
âquestion_typeâ: âsimpleâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âwhich movie won the oscar best visual effects in 2006?â,
âanswerâ: âking kongâ,
âinteraction_idâ: âf51e7907-0f76-483e-b449-11b02eea1d78â,
âquery_timeâ: â02/28/2024, 07:21:06 PTâ,
âdomainâ: âfinanceâ,
âquestion_typeâ: âcomparisonâ,
âstatic_or_dynamicâ: âfast-changingâ,
âqueryâ: âwhich companyâs stock has had the lowest trading activity this week, kind or casi?â,
âanswerâ: âcasiâ,
casi doesnât appear in the reference at allïŒ
âinteraction_idâ: â642bbc21-ed9d-42e7-8455-382a6f2b0f08â,
âquery_timeâ: â03/17/2024, 16:55:12 PTâ,
âdomainâ: âsportsâ,
âquestion_typeâ: âsimple_w_conditionâ,
âstatic_or_dynamicâ: âstaticâ,
âqueryâ: âwhich player took home grand slam championship in 2017?â,
âanswerâ: ârafael nadal won his 16th grand slam title at the 2017 u.s. openâ,
âsearch_resultsâ: [
it is Roger Federer not Rafael Nadal i
I am worried that such poor data quality can really ensure the normal running of the game.
At least on this one, the Oscarâs are a bit weird and this may be where some ambiguities creep in. When the Oscarâs are hosted, they are giving awards for shows in the previous year. So while King Kong was released in 2005, it was awarded the Oscar in 2006. From the Oscarâs website:
The 78th Academy Awards | 2006
Kodak Theatre at Hollywood & Highland Center
Sunday, March 5, 2006
Honoring movies released in 2005
However, I agree with your analysis in other areas and have run into a variety of instances where manually inspecting the evidence provided for the query indicates that either the answer they provide is incorrect or I cannot find in the evidence where they are able to retrieve the correct answer.
Hi @jjplane and @mitchelldehaven ,
Thank you for the discussion! A few clarification:
- Web pages are not the only source of reference, there is also the Mock API.
- Not all questions are guaranteed to find answer in the provided reference (web pages and Mock API), just like what RAG systems can face in a practical context. In such cases, the best answer would be âI donât knowâ.
- V3 data has been released and some of those errors are fixed there.
Among the selected examples:
b0f8ce10-4511-4f1b-9cf3-29d1c63c99e5 is asking for top 3.
6ad1e71e-273c-4433-b671-c3beacefd3ad is fixed in V3.
68261da9-bf1a-4675-b0b8-ffd0c2f0c4c9 is correct just as @mitchelldehaven mentioned, King Kong is produced in 2005. However, it won the award in 2006.
642bbc21-ed9d-42e7-8455-382a6f2b0f08 is also correct (please see ref attached).
7b267839-848a-409c-b74e-33ce86d3f5a6 and dcf34e25-e5a1-4afb-8233-e357fdc1ed97 seem to be real errors. We will fix them in future release. Thank you for reporting those.
The CRAG Team
642bbc21-ed9d-42e7-8455-382a6f2b0f08ïŒif grand slam championship mean one of the four competitionsïŒthe answer should contains serverl names instead of random pick one as ground truth.
- Australian Open 2017
- Menâs Singles: Roger Federer
- Womenâs Singles: Serena Williams
- French Open 2017 (Roland Garros)
- Menâs Singles: Rafael Nadal
- Womenâs Singles: JeÄŒena Ostapenko
- Wimbledon 2017
- Menâs Singles: Roger Federer
- Womenâs Singles: Garbiñe Muguruza
- US Open 2017
- Menâs Singles: Rafael Nadal
- Womenâs Singles: Sloane Stephens