
All rights reserved, Habeas 2024
See our Privacy Policy
See our Privacy Policy

It is now a well-documented problem that both self-represented litigants and experienced practitioners have cited false authorities in court after relying too heavily on generic AI tools.
General-purpose large language models (LLMs) such as ChatGPT, Claude, and Gemini are prone to hallucinations: a phenomenon where an AI model confidently generates an entirely incorrect statement. Well-publicised examples include Google’s AI Overviews advising users to add glue to pizza sauce, or suggesting that geologists recommend eating one rock per day.
When hallucinations enter a legal context, the consequences are far more serious.
Research indicates that general-purpose LLMs hallucinate in 58–88% of responses to legal queries (Dahl et al.) when faced with more complex questions. The result is not merely incorrect advice, but real professional risk. Lawyers face embarrassment, sanctions, loss of client trust, and impose additional burdens on courts, forcing judicial officers to act as “human filters” separating genuine authorities from invented ones.
These risks have not gone unnoticed.
High Court Chief Justice Stephen Gageler raised this issue at the Australian Legal Convention, warning that although AI has the potential to meaningfully assist courts, its current use in litigation is becoming “unsustainable”.
Many practitioners share this scepticism. Given the stakes, it is understandable that lawyers might assume that using AI in legal work is inherently reckless. That intuition is understandable, but we should look more closely at the underlying assumptions.
The hallucination problem is not inevitable. It is fundamentally a design problem.
General-purpose LLMs are designed to be universal conversationalists, capable of responding to almost any question. That breadth comes at the cost of precision. A model optimised to “answer everything” is not necessarily optimised to answer legal questions accurately. The old sayingrings true: jack of all trades, master of none.
Even domain-specific legal models like SaulLM, while providing some improvement, can struggle. Research on Australian legal tasks (Shareghi, Han, and Burgess) demonstrates that:
This is where architecture matters and continues to endure.
Retrieval-augmented generation (RAG) addresses hallucinations by changing how an AI produces answers.
Rather than “guessing” based on patterns in its training data, a RAG system retrieves real legal documents from a curated corpus and grounds its response in those sources. The model is constrained by actual legislation and case law, rather than free-form prediction.
In legal contexts, this distinction is critical. Properly implemented RAG systems are significantly safer than generic LLMs because answers cannot drift far from authoritative material.
Before generating an answer, Habeas retrieves relevant snippets from a corpus of over 300,000 Australian legal documents.
This means:
Every answer can be followed back to its source.
Generic LLMs are trained on everything from forum posts to recipe blogs.
Habeas is trained and tuned specifically on Australian legal material. It understands Australian legislation, case law conventions, and jurisdiction-specific legal language.
This level of localisation cannot be matched by generic LLMs, or even by legal-specific models trained primarily on foreign jurisdictions.
Generic models are optimised to sound correct.
Habeas is optimised to ensure traceability and observability for every answer or search result. In other words, it represents the midpoint between traditional legal research tools and the new possibilities enabled by Gen AI.
Factual accuracy, citation validity, and explainability are prioritised over fluent but unsupported answers. The system is designed to make its reasoning observable, so users can assess whether an answer is appropriate for their task. There is still, of course, a high level of accountability expected of practitioners and even advanced generative AI solutions like Habeas are not a replacement to their work.
Habeas is not a replacement for legal judgment. Some lawyers view AI tools as a 'magic eight ball' or conceptualise them as akin to a human being, accepting nothing less than perfect accuracy, but this is problematic. Like any legal research tool:
What Habeas provides is transparency. Users can see how an answer was formed, review the underlying authorities, and make an informed decision about whether and how to rely on the output.
Used responsibly, Habeas functions as a tireless and accountable research assistant that strengthens, rather than undermines, professional practice.
The legal profession is right to be cautious.
Hallucinations are not a minor technical flaw. They pose real risks to clients, courts, and public confidence in the legal system. But the solution is not to reject AI altogether.
The solution is to use the right kind of legal AI.
Retrieval-grounded, jurisdiction-specific, evidence-based systems like Habeas represent a responsible path forward for legal assistance technologies. They reduce risk, increase transparency, and free practitioners from laborious manual research so they can focus on strategy, judgment, and advocacy.
