AI Legal Research Australia: Why ChatGPT Is the Wrong Tool for the Job

ChatGPT confidently produces fake case citations and misrepresents holdings. Learn why general AI tools are dangerous in Australian legal work and what.

AI Legal Research Australia: Why ChatGPT Is the Wrong Tool for the Job

The AI legal research problem in Australian law is not complicated, but it is being discussed plainly almost nowhere. Here is what it looks like: a paralegal asks ChatGPT for Queensland Court of Appeal decisions on misleading and deceptive conduct under the Australian Consumer Law, specifically recent authority on the causation element. The model answers confidently. It produces three case citations, each with neutral citation format, court name, year, and a brief summary of the holding. The summaries are coherent. They fit the question. Two of the cases do not exist. The third is real but the holding described is wrong.

This is not a theoretical risk. It is the documented behaviour of large language models applied to legal research, and it is happening in law firms right now.

Partners are watching paralegals run ChatGPT and Claude to draft research memos. Some firms have issued policies; many haven't. Professional bodies have addressed AI and accuracy obligations in published guidance without yet spelling out what adequate supervision of an AI research tool actually requires. What is still underappreciated is that the failure is not random or occasional. It is structural, and it runs three layers deep.

Why generic LLMs fail at Australian law

General-purpose language models are trained on text corpora weighted heavily toward American and British sources. Australian primary law, the legislation, judgments, and regulatory instruments that actually govern Australian disputes, is a small fraction of that training data. The models know Australian law exists, but they know it the way a well-read tourist knows a foreign city: well enough to sound credible, not well enough to be reliable.

The hallucination problem compounds this. LLMs generate plausible-sounding output by predicting what text should follow from the context. When a model doesn't have reliable data about a specific body of case law, it fills the gap. The output looks exactly like a real citation. The neutral citation format is correct. The court name and year are internally consistent. The reasoning sounds like judicial prose. Nothing in the surface presentation tells you the decision was never made.

Jurisdictional reasoning is the second failure. Australian law involves a federal-state architecture that generic models handle badly. Take a question about retail lease termination in Victoria. That question potentially engages the Retail Leases Act 2003 (Vic), the Australian Consumer Law, and the contested interaction between s 18 ACL and state-specific scheme obligations that Victorian courts have not uniformly resolved. Getting the right answer requires knowing not just the legal proposition but which legislative scheme applies, whether there are recent amendments, and whether any court has addressed the interaction directly. ChatGPT will often collapse this complexity, giving you an answer that sounds settled when the question is genuinely contested, or applying Victorian law to a New South Wales question without flagging the difference.

The currency problem is the third failure. The training data for these models has a cutoff. Decisions handed down in the last twelve to eighteen months, often the decisions that matter most to a live research question, are absent. An LLM cannot tell you about a significant Full Federal Court judgment from last year because it was handed down after the model was trained. It may not know this about itself. It will answer as though the gap doesn't exist.

What purpose-built legal AI does differently

The case for purpose-built legal AI has nothing to do with AI being bad at things. It is about matching the tool to the task. A platform designed for Australian legal research works from verified Australian primary sources, retrieving from actual judgments rather than reconstructions of them. It can tell you which decisions are current, which have been overturned, and what the state of play is in a specific jurisdiction. When it produces a citation, that citation resolves to a real document.

That is what separates a language model from a research system. A general-purpose model has read about Australian law and learned to imitate its patterns. A purpose-built platform has indexed the actual sources and retrieves from them. The first is sophisticated guesswork dressed as authority. The second is the source itself.

Law firms that have started using generic LLMs for research memos are running an unacknowledged risk. A hallucinated case in a brief is a professional conduct issue, a potential costs consequence, and a trust problem with the client. The fact that it came from AI doesn't reduce the exposure. It increases it, because it raises questions about supervision that most firms have not yet answered clearly.

The tools exist to do this properly. The question is why the profession is still treating this as an experiment when the professional conduct consequences are already real.

Habeas is built for Australian legal research: retrieving from verified primary sources so citations resolve to real documents, covering jurisdiction-specific legislative schemes and their interactions rather than collapsing them, and updated to include decisions your general-purpose LLM has never seen.

Hero image: Benyamin Bohlouli on Unsplash

Other blog posts

see all

Experience the Future of Law