Performance of AI Tools in Consumer Topics: A Detailed Review
Which? conducted comprehensive lab tests on six AI tools to evaluate their performance on everyday consumer topics. The study involved posing 40 questions concerning personal finance, legal matters, health and diet, consumer rights, and travel. The experts reviewed each tool based on criteria such as accuracy, clarity, usefulness, relevance, and ethical considerations, culminating in a score out of 100.
Top Performers in AI Tool Tests
According to Which?, Perplexity emerged as the leading tool with a score of 71%. Gemini’s AIO followed closely with a score of 70%, while the standalone Gemini tool received 69%. Copilot garnered 68%, and ChatGPT achieved a score of 64%. Unfortunately, Meta AI lagged at the bottom with a score of 55%, despite ChatGPT being the most frequently used tool in the study.
Inaccuracies in Detailed Queries
During testing, significant gaps were found in how these tools handled nuanced inquiries. For instance, when questioned about ISA limits, both ChatGPT and Copilot confidently provided answers but failed to recognize that the allowance is £20,000. Instead of offering corrective information regarding the submitted £25,000 stipend, both tools contributed answers that could potentially mislead users regarding compliance with HMRC regulations.
Error in Travel Advisory Information
Travel advice proved problematic as well; Copilot mistakenly stated that passengers are always fully refunded in cases of flight cancellations, which is inaccurate. Meta provided incorrect timelines and compensation amounts for delayed flights. Many responses tended to favor airlines, incorrectly suggesting that compensation is only applicable in instances where the airline is directly at fault, thereby disregarding rules applicable to extraordinary circumstances.
AI Usage Among UK Adults
According to Which?, 51% of UK adults utilize AI for web-based information searches, equating to over 25 million individuals. Nearly half of these users expressed a significant or reasonable degree of trust in the retrieved information. Among frequent users, this confidence escalates to 65%. Furthermore, 1 in 6 users seek AI for financial insights, 1 in 8 for legal queries, and 1 in 5 for medical advice, indicating that these tools have become an ingrained aspect of daily life.
Risks Associated with AI Tools
Despite the high confidence levels among users, testing highlighted a troubling disconnect between trust and accuracy. Many responses were derived from outdated forum discussions. For instance, Gemini’s AIO referenced a three-year-old Reddit post for flight booking advice, and ChatGPT used similar sources for health inquiries, which should have utilized more reliable information. Instances were also noted where good sources were misinterpreted, causing further inaccuracies.
The Importance of Trusted Information
Risks are notably high in financial and legal advice from AI tools. For example, when testers inquired about tax refunds, ChatGPT and Perplexity provided leads to premium refund services that often charge exorbitant fees. Additionally, ChatGPT erroneously informed users that travel insurance is mandatory for Schengen area visits, despite it not being a requirement for UK residents without a visa. As consumers increasingly rely on AI for guidance, the need for accurate and contextually relevant data is paramount.
Expert Insights on AI Usage
Levent Ergin, Chief Strategist for Climate, Sustainability, and AI at Informatica, emphasized, “AI chatbots are only as good as the data and context that power them. Public models are impressive but lack the nuanced and governed information necessary for reliable financial advice.” He stressed that while these tools can provide access to information, they should not replace professional advice, particularly in financial matters. Accurate information can only be ensured through a trusted ecosystem built on validated data from financial institutions.
