"Trained on Australian Law" Is Not a Compliance Answer Anymore

Australia rejected a text-and-data mining exception, meaning legal AI tools must prove their right to use training data.
Blue and black geometric box, symbolising the compliance layers of legal AI training data rights in Australia.

When a vendor tells you their legal AI tool was trained on Australian case law and legislation, the natural assumption is that this is a point in their favour. Depth of training data, local focus, jurisdictional relevance. Tick, tick, tick. What the assumption misses is a prior question: did they have the right to use that data at all?

In October 2025, Australia's Attorney-General's Department confirmed the government's position following the Productivity Commission's final report on copyright and AI. Australia would not be introducing a text-and-data mining exception. The Productivity Commission found it "premature to make changes to Australia's copyright laws," and the government agreed. That decision left Australia in a notable position among common law jurisdictions: one of the only countries without a carve-out that would permit the scraping of copyright-protected material for AI training without licence or authorisation.

The implications for practitioners evaluating legal AI tools are underappreciated.

Copyright in published judicial decisions and legislative materials sits across a complicated landscape in Australia. Commonwealth legislation is Crown copyright, which carries its own access conditions. Reported decisions published by law publishers attract independent copyright claims. Legal textbooks, journal articles, and annotated legislation, common training fodder for legal AI models, are protected in the ordinary way. When AI developers scraped these sources to train models, they did so in a jurisdiction that has now explicitly declined to retrospectively legitimise the practice. The Copyright and AI Reference Group is still working through the policy questions, and that process will run well into 2026. The landscape will not get cleaner in the short term; it will get more contested.

For the practitioners whose names are on the retainer agreements and the professional indemnity policies, this is a supply-chain question about the tools they are choosing to adopt. If a vendor's model was trained on Australian legal materials without licensed access, that copyright exposure sits outside the firm's view entirely. The vendor carries it. But the firm adopts the tool, and the firm's reputational position and professional obligations are engaged every time the tool is used in a matter.

We have watched the legal AI market in Australia develop rapidly enough that due diligence on training data provenance has rarely been part of the procurement conversation. That is changing. When regulators, courts, and professional bodies turn their attention to AI use in legal practice, the questions will not stop at "what did the tool output." They will extend to "where did the tool's knowledge come from, and on what basis."

The serviceable answer to that question looks different depending on the vendor. A general-purpose model trained on a broad internet crawl will not have a clean account of its Australian legal content. A model trained specifically on scraped Australian caselaw, without clear licensed access, has an exposure that the vendor may or may not have thought through. Neither is the same as a system built on a closed dataset of legitimate, verifiable Australian legal sources.

Habeas's Search Engine scans over 300,000 Australian cases and pieces of legislation in seconds, with results grounded in a closed dataset of legitimate Australian legal sources, so they are verifiable and traceable, never hallucinated. That distinction has always mattered for accuracy, because traceable citations are either right or demonstrably wrong. The copyright development gives it a second dimension: the sourcing question is now a legal question as much as a quality question.

We are not claiming Habeas has solved every problem the Copyright and AI Reference Group is working through. The policy landscape is unsettled, and anyone who tells you they have definitive answers about AI training and Australian copyright as of late 2025 is getting ahead of the facts. What we can say is that building on a closed, legitimate, traceable corpus was the right design choice before the Productivity Commission reported, and it is a more clearly defensible choice now.

For practitioners currently reviewing their AI tool stack, the question to add to the list is a simple one: where did this tool's knowledge of Australian law come from, and can the vendor account for it? "Trained on Australian law" has never been a complete answer. Right now, it is even less of one.

If that question is one you want to put to us directly, a demo is a reasonable place to start. You can book one at habeas.ai.

The legal research in this article was conducted and every citation verified using Habeas, the Australian legal AI research platform.

Hero image: Global Residence Index on Unsplash

Other blog posts

see all

Experience the Future of Law