AI “Hallucinations”: Tackling the Accuracy Challenge in Artificial Intelligence Responses

The integration of artificial intelligence in the legal domain presents potential advancements but also significant challenges. Recent findings highlight the prevalence of AI-generated hallucinations, raising concerns about legal accuracy and equity. While AI holds promise for revolutionizing legal practice, its reliability, especially in high-stakes environments, remains uncertain.

LegalyticBlog

March 6, 2024

9 minutes

ai, Art, artificial-intelligence, blogging, chatgpt, dailyprompt, Dalle, Future, law, llm, Midjourney, Robots, technology

As artificial intelligence (AI) technology continues its rapid advancement, a compelling question has emerged within the legal community: Can AI effectively compete with human lawyers in the highest court in the land? Some, including Joshua Browder, CEO of the legal services firm DoNotPay, confidently assert that AI can. In fact, in 2023, Mr. Browder even offered $1,000,000 to any attorney willing to argue a case before the United States Supreme Court while wearing AirPods and conveying arguments from DoNotPay’s AI lawyer, which is powered by OpenAI‘s GPT-3 API. Yet, such optimism may be premature, as recent research from Stanford University highlights pervasive “hallucinations” in AI’s legal reasoning, casting doubt on its reliability. These inaccuracies, particularly concerning in a field where precision is paramount, raise the critical question: If not addressed, could the rapid deployment of LLMs actually exacerbate, rather than alleviate, the existing disparities in access to legal services?

Large Language Model use in the legal field

The field of law has not been immune to the progression of AI technology. With the advent of large language models (LLMs) such as ChatGPT (developed by OpenAI), Bard, Claude, PaLM, and Llama, the legal community has started to leverage AI tools to process and generate comprehensive, authoritative texts across various legal practices. In fact, numerous law firms are now promoting their use of LLM-based tools for tasks like analyzing discovery, drafting detailed case briefs, and formulating sophisticated litigation strategies.

Furthermore, some LLM developers are asserting that their technology will revolutionize the legal industry. Given that the legal profession relies heavily on language data, these developers emphasize their products’ ability to identify patterns within this data and provide nearly complete products instantaneously. Such a task was illustrated in 2023, when OpenAI showcased how its ChatGPT-4 model passed the Uniform Bar Examination, achieving a score that placed it near the 90th percentile among all test-takers.

Nevertheless, a pressing concern remains: “hallucinations”, or the tendency of LLMs to produce content that diverges from established legal facts or well-founded principles and precedents.

What are AI hallucinations?

AI hallucinations occur when artificial intelligence systems produce incorrect or entirely fabricated responses. These errors emerge because AI models, trained on vast datasets, generate outputs based on the probability of word sequences, lacking true comprehension or logical underpinning. As a result, they may produce misinformation or “hallucinations,” presenting unfounded or erroneous information as factual.

The origins of these hallucinations can differ significantly between foundational models and retrieval-augmented generation (RAG) models. In the case of foundational models, such as those developed by OpenAI, the issues might originate from inadequate data quality, overfitting, encoding inaccuracies, or adversarial attacks, inherent to the models’ construction and training processes. Conversely, RAG models, which are prevalent in commercial AI applications, may encounter increased hallucinations due to imprecise context retrieval, suboptimal query formation, or difficulties in handling intricate language subtleties, indicating issues in how they access and assimilate external data.

How does AI hallucinate about legal issues?

In the study “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models” by Stanford University’s RegLab and Institute for Human-Centered Artificial Intelligence, researchers identified three distinct types of legal hallucinations in LLMs.

The first type, known as “closed-domain hallucinations,” occurs when an AI model responds with information that doesn’t align or directly contradicts what was requested. For instance, imagine you ask for a summary of a particular movie plot, but the AI provides details from an entirely different movie. This problem is especially significant in tasks requiring a direct match between the question asked and the information provided. Examples include accurately summarizing legal decisions, preparing legal documents, or identifying key points in a document from another lawyer. In these scenarios, precision is key, and any mismatch or error can lead to significant issues.

The second type, known as “open-domain hallucinations,” occurs when an AI provides an answer that isn’t consistent with or relevant to its training data. Imagine an AI system trained solely in criminal law begins offering advice on civil law matters. This discrepancy is especially troubling for organizations aiming to adapt broad AI systems to meet their specific needs. Consider a law firm that expects its AI to generate responses that accurately reflect a unique set of documents, such as internal research memos or style guides. When the AI deviates from its training, it fails to align with the firm’s specialized knowledge and intended use, leading to potentially significant inaccuracies or misunderstandings.

The third type of hallucination happens when an AI invents information that has no basis in fact, irrespective of its training or the instructions it was given. Consider the scenario where lawyers submit legal documents citing completely fictitious cases. This kind of error is especially concerning in legal settings, where precision and truthfulness are paramount. Accurate representation of the law is critical, and any deviation due to fabricated information could severely compromise the integrity and reliability of legal proceedings.

How often are AI hallucinations occurring?

Until recently, insights into the prevalence of hallucinations by AI were mainly anecdotal. However, a study by Stanford University researchers has shed light on how common these errors are, finding them to be surprisingly frequent. The research revealed that hallucination rates range from 69% to 88%, even when using a state-of-the-art language models like ChatGPT 3.5, PaLM 2, and Llama 2. Moreover, when tasked with answering specific legal questions, the LLM’s high rates persisted across a wide array of verifiable legal facts.

To reach these conclusion, the study assessed AI proficiency by challenging the LLMs with 5,000 questions from diverse U.S. federal court cases. The tasks ranged from identifying if a case existed to interpreting complex legal precedents. The findings were revealing: AIs frequently failed at complex legal reasoning, with as many as 75% of explanations concerning a court’s main decisions being hallucinated. Furthermore, AI performance varied significantly with the court level and case notoriety. They generally struggled more with lesser-known lower court rulings compared to notable Supreme Court cases. Performance discrepancies were also observed based on geographic and temporal factors, with AI having difficulty with the most recent and oldest Supreme Court cases.

What other hallucination concerns exist?

Beyond the prevalence of hallucinations, another critical issue is “contrafactual bias.” This refers to AI systems treating incorrect premises within questions as if they were correct. For example, if someone asks, “Why did Justice Ruth Bader Ginsburg oppose the same-sex marriage ruling?” there is a risk the AI might not recognize the misinformation, despite Justice Ginsburg actually having supported the ruling.

This tendency is particularly notable in AI models like GPT 3.5, which might generate convincing responses based on false assumptions, likely due to their training to engage with the query’s direction. This issue is even more pronounced in the context of complex legal topics or obscure court decisions. By comparison, Llama 2 shows a better ability to scrutinize false premises but can still mistakenly reject accurate facts, such as the existence of specific cases or judges.

Furthermore, these AI models often miscalculate their confidence levels, especially in legal contexts. The ideal scenario would see an AI’s confidence aligning with the accuracy of its responses. However, it has been noted that some models, like PaLM 2 and GPT 3.5, demonstrate more consistency in this regard than Llama 2. Yet, all models share a tendency towards overconfidence, which is particularly marked in their dealings with complex or lesser-known legal issues. This over-assurance is most troubling in scenarios involving intricate details or lower court cases, where the AIs may exhibit undue certainty in their answers, even on well-documented legal matters.

What implication do hallucinations have?

Despite notable advancements, such as ChatGPT 4 reducing its error rate to just 2.3%, concerns linger about the technology’s current limitations. In particular, these limitations may not only persist but could potentially exacerbate existing legal inequalities. The Stanford study reveals that the risks associated with utilizing LLMs for legal research are especially pronounced for litigants in lower courts, individuals requiring intricate legal information, and users whose queries are based on incorrect assumptions.

Moreover, the emergence of what some experts term a “legal monoculture” warrants attention. This phenomenon refers to the tendency of AI systems to favor certain judicial decisions or legal interpretations over others, potentially leading to a legal environment that is overly homogenous and resistant to innovation. Such a scenario conflicts with the dynamic nature that law inherently requires.

Also, there’s skepticism about AI’s potential to make legal information more accessible to everyone. Although wealthier entities may leverage AI to develop highly accurate legal resources, the Stanford study suggests that this advancement doesn’t guarantee a universal solution to AI’s fallibility, particularly regarding hallucinations. As a result, there’s an apprehension that the proliferation of AI in the legal field might widen, rather than bridge, the gap between those who can afford sophisticated technologies and those who cannot. Such a development would undermine the promise of democratizing legal information and could fortify, rather than dismantle, existing barriers to justice.

What can be done to address these issues?

AI holds great promise for enhancing the legal sector, yet research highlights significant ethical and operational risks that necessitate careful oversight. The challenge extends beyond mere technological advancement; it involves establishing a framework where AI can innovate responsibly within ethical and legal standards. To achieve this, AI must align with established legal principles and provide accurate, reliable insights, ensuring it complements factual legal information. For AI to be effectively integrated into legal practices, it requires ongoing refinement, diligent monitoring, and a deep understanding of its potential and limitations. Importantly, the research underscores the importance of AI serving as a support to human roles within the legal system, enhancing rather than substituting the work of legal professionals.

Conclusion

In conclusion, the integration of AI, particularly large language models, into the legal domain presents both promising advancements and significant challenges. While AI offers the potential to revolutionize legal research and practice, recent findings, such as those from Stanford University, illuminate the pressing issues of AI-generated hallucinations and their implications for legal accuracy and equity. Despite technological strides, the capacity for AI to compete effectively with human lawyers, especially in high-stakes environments like the Supreme Court, remains uncertain. The concerns about AI’s reliability, particularly in the precision-critical field of law, underline the necessity for a cautious and measured approach to its deployment. The path forward should focus on enhancing AI’s utility as a supplementary tool that aids, rather than replaces, human legal expertise. Ultimately, ensuring that AI serves justice equitably demands vigilant development, ethical application, and a commitment to addressing its limitations, thereby fostering an environment where technology and human expertise collaborate to advance the legal profession.

3 responses to “AI “Hallucinations”: Tackling the Accuracy Challenge in Artificial Intelligence Responses”

Carl D’Agostino

March 27, 2024 at 12:03 pm

Fascinating. Seems these issues can’t be overcome but today’s science astonishes us all so wait and see.

LikeLiked by 1 person

Reply
worldphoto12

March 31, 2024 at 10:52 am

BUONA PASQUA

LikeLike

Reply
Thebeerchaser

April 1, 2024 at 9:56 pm

This was a well-researched and informative post. Thank you.

LikeLiked by 1 person

Reply

My Attorney Is A Robot