
All rights reserved, Habeas 2026
See our Privacy Policy
See our Privacy Policy

Nearly every legal technology vendor is an AI company in 2026. The label appears in product names, marketing materials, and sales conversations so uniformly that it has largely stopped conveying information. AI-powered, AI-assisted, AI-enhanced: these formulations describe products ranging from genuinely retrieval-grounded research systems [like Habeas] to general-purpose language models with a legal interface layered on top. In most professional contexts, that distinction would be a matter of preference. In legal practice, where a confident-sounding wrong answer carries professional liability, it is not.
Understanding what you are actually purchasing requires looking past the label to the architecture underneath, and running tests that reveal the difference in practice.
The most important question you can ask a legal AI vendor is this: when your tool produces an answer to a legal question, where does that answer actually come from?
There are two fundamentally different answers, and they describe two fundamentally different products. The first is that the tool generates responses based on statistical patterns learned during training on legal text. It has processed a large body of legal materials and produces outputs that are statistically consistent with that corpus. This is how most general-purpose language models work, including many marketed specifically for legal use. The second is that the tool retrieves information from a specific, curated legal database and grounds its outputs in those retrieved sources, showing you exactly which documents it drew from and allowing you to navigate directly to them. This is retrieval-augmented architecture.
Tools in the first category produce authoritative-sounding text and are wrong a material percentage of the time. Stanford HAI benchmarking in 2026 found error rates of 17% for Lexis+ AI and 34% for Westlaw's AI-assisted research tool on questions within their primary jurisdictions. For Australian practitioners using tools trained predominantly on US and UK materials, the error rate on Australian-specific questions is higher still.
Take a specific legal question from your practice area, one where you know the relevant authority, and run it through the tool. A retrieval-grounded system will return specific citations, often with a direct path to the source. A generative tool without proper grounding will often return citations that appear correct in format but do not exist, or exist but do not say what the tool attributes to them.
Verify each citation against the primary source. This test is simple, takes fifteen minutes, and reveals the fundamental difference between a tool that retrieves verified legal authority and one that generates plausible-sounding text about legal authority. It should be standard practice for any practitioner evaluating a new legal AI tool before deploying it.
Ask the vendor to specify which Australian legal sources are in the database: High Court, Federal Court, Full Federal Court, state Supreme Courts, the FCFCOA, specialist tribunals. Ask how current the data is. Ask whether the tool has been benchmarked on Australian legal questions, and what those results showed.
A tool built primarily for US or UK legal research is not automatically useful for Australian legal research. An Australian lawyer is usually asking questions that require Australian-specific grounding to answer reliably. Surface-level Australian coverage does not produce accurate Australian legal research.
A legal AI tool worth deploying in professional practice should do three things reliably: show you exactly where every output came from, return results you can verify against the primary source, and produce accurate results on the specific questions arising from the law of the jurisdiction where you actually practice.
A tool meeting those criteria changes the research economics of a practice in ways that compound over time. A tool that does not meet them produces outputs requiring complete reconstruction before use, which is operationally worse than working without AI at all: it consumes time and generates false confidence in results that have not been properly verified.
