The Hallucination Conundrum and Responsible use of Legal AI

The hallucination problem in legal AI is real, but is not unsolvable. Responsible law firms use tools custom-built for Australian legal research and designed for our system of jurisprudence to achieve the best outcomes and mitigate risk.

It is now a well-documented problem that both self-represented litigants and experienced practitioners have cited false authorities in court after relying too heavily on generic AI tools.

General-purpose large language models (LLMs) such as ChatGPT, Claude, and Gemini are prone to hallucinations: a phenomenon where an AI model confidently generates an entirely incorrect statement. Well-publicised examples include Google’s AI Overviews advising users to add glue to pizza sauce, or suggesting that geologists recommend eating one rock per day.

When hallucinations enter a legal context, the consequences are far more serious.

Research indicates that general-purpose LLMs hallucinate in 58–88% of responses to legal queries (Dahl et al.) when faced with more complex questions. The result is not merely incorrect advice, but real professional risk. Lawyers face embarrassment, sanctions, loss of client trust, and impose additional burdens on courts, forcing judicial officers to act as “human filters” separating genuine authorities from invented ones.

Judicial Concern and Professional Scepticism

These risks have not gone unnoticed.

High Court Chief Justice Stephen Gageler raised this issue at the Australian Legal Convention, warning that although AI has the potential to meaningfully assist courts, its current use in litigation is becoming “unsustainable”.

Many practitioners share this scepticism. Given the stakes, it is understandable that lawyers might assume that using AI in legal work is inherently reckless. That intuition is understandable, but we should look more closely at the underlying assumptions.

Not All AI Systems Are Equally Hallucinatory

The hallucination problem is not inevitable. It is fundamentally a design problem.

General-purpose LLMs are designed to be universal conversationalists, capable of responding to almost any question. That breadth comes at the cost of precision. A model optimised to “answer everything” is not necessarily optimised to answer legal questions accurately. The old sayingrings true: jack of all trades, master of none.

Even domain-specific legal models like SaulLM, while providing some improvement, can struggle. Research on Australian legal tasks (Shareghi, Han, and Burgess) demonstrates that:

  • Untuned LLMs continue to hallucinate at high rates
  • Citation accuracy is particularly poor
  • The most reliable systems combine task-specific instruction tuning with retrieval grounding against a curated dataset

This is where architecture matters and continues to endure.

Why Retrieval and Search Matters in Legal AI

Retrieval-augmented generation (RAG) addresses hallucinations by changing how an AI produces answers.

Rather than “guessing” based on patterns in its training data, a RAG system retrieves real legal documents from a curated corpus and grounds its response in those sources. The model is constrained by actual legislation and case law, rather than free-form prediction.

In legal contexts, this distinction is critical. Properly implemented RAG systems are significantly safer than generic LLMs because answers cannot drift far from authoritative material.

What Sets Habeas Apart

1. Retrieval-Augmented Generation

Before generating an answer, Habeas retrieves relevant snippets from a corpus of over 300,000 Australian legal documents.

This means:

  • Answers are anchored in verifiable legislation or case law
  • Citations are transparent and traceable
  • Hallucinations are materially reduced because responses are grounded in real legal texts

Every answer can be followed back to its source.

2. Jurisdiction-Specific Training

Generic LLMs are trained on everything from forum posts to recipe blogs.

Habeas is trained and tuned specifically on Australian legal material. It understands Australian legislation, case law conventions, and jurisdiction-specific legal language.

This level of localisation cannot be matched by generic LLMs, or even by legal-specific models trained primarily on foreign jurisdictions.

3. Built for Legal Accuracy, Not Apparent Plausibility

Generic models are optimised to sound correct.

Habeas is optimised to ensure traceability and observability for every answer or search result. In other words, it represents the midpoint between traditional legal research tools and the new possibilities enabled by Gen AI.

Factual accuracy, citation validity, and explainability are prioritised over fluent but unsupported answers. The system is designed to make its reasoning observable, so users can assess whether an answer is appropriate for their task. There is still, of course, a high level of accountability expected of practitioners and even advanced generative AI solutions like Habeas are not a replacement to their work.

Responsible Use and Professional Judgment

Habeas is not a replacement for legal judgment. Some lawyers view AI tools as a 'magic eight ball' or conceptualise them as akin to a human being, accepting nothing less than perfect accuracy, but this is problematic. Like any legal research tool:

  • Poorly framed questions or incomplete facts will affect outputs
  • Professional oversight remains essential
  • Users must critically evaluate results before relying on them

What Habeas provides is transparency. Users can see how an answer was formed, review the underlying authorities, and make an informed decision about whether and how to rely on the output.

Used responsibly, Habeas functions as a tireless and accountable research assistant that strengthens, rather than undermines, professional practice.

Conclusion

The legal profession is right to be cautious.

Hallucinations are not a minor technical flaw. They pose real risks to clients, courts, and public confidence in the legal system. But the solution is not to reject AI altogether.

The solution is to use the right kind of legal AI.

Retrieval-grounded, jurisdiction-specific, evidence-based systems like Habeas represent a responsible path forward for legal assistance technologies. They reduce risk, increase transparency, and free practitioners from laborious manual research so they can focus on strategy, judgment, and advocacy.

Other blog posts

see all

Experience the Future of Law