The Demo Looked Good. Then Came the Call from Opposing Counsel.

Thomson Reuters data shows 91% of legal organisations underdeliver on AI. Discover why impressive demos don't translate to real-world courtroom wins.
Two legal professionals in black jackets, representing the gap between AI legal research demonstrations and actual courtroom performance.

The demonstration runs without a hitch. Searches come back in seconds. Authorities appear, cited, organised, apparently reliable. The room is persuaded, and sign-off comes quickly.

Three weeks later, the email arrives from opposing counsel. One of those authorities does not say what the submission says it says. The correction is minor in isolation. The cost in credibility is not.

We have sat through enough conversations with practitioners across firms and in-house teams to know the pattern. The demo is the easy part. The work breaks down at verification.

Thomson Reuters surveyed 1,816 professionals across law, tax, audit, and other professional services in 62 countries, and found that 91 percent say their organisations are falling short of what AI could deliver. The client-side figure is more precise about what falling short looks like: 78 percent of corporate clients describe AI-enabled quality improvements as very important or essential, while only 6 percent say most or all of their providers actually deliver it. Adoption has moved well past the question of whether to try. What remains is whether the technology is producing something a lawyer can stand behind.

The reason the gap is structural comes down to what most AI tools are optimised for. Fluency and speed are measurable at the output level, and they make for a good demo. Verifiability is harder to show in a fifteen-minute session. The failure mode, once the demo is over and the work is real, is a summary that is plausible, grammatically impeccable, and subtly wrong about what the authority actually held. By the time it surfaces, in a submission, in a client advice, in a call from opposing counsel, the speed gain has long since been spent.

The TR report does not name traceable citations as the missing variable, but read the gap closely and that is what it describes. Clients are not complaining about turnaround time. Seventy-eight percent of them call quality improvement essential, and quality in legal work means accuracy you can source. A well-phrased answer grounded in nothing you can check is a liability with a polished surface, and experienced clients have learned to recognise the surface.

Courts have begun formalising the same point. Australian jurisdictions have issued AI guidance at pace over the past eighteen months, and the common thread holds across all of them: verification is the practitioner's obligation regardless of how the work was produced. A mis-cited authority is a mis-cited authority whether it arrived in ten seconds or two hours.

Firms that close the gap will not do it by deploying more tools. They will do it by being disciplined about which tools they trust for which tasks, and by building verification into the workflow instead of treating it as an optional final step. That means tools that surface the source alongside the conclusion, so the lawyer reads the authority rather than accepting the summary.

Habeas was built around that constraint. Every result traces to its source, a closed corpus of over 300,000 Australian cases and pieces of legislation drawn from primary law, with no hallucinated authorities. Research that once consumed much of a morning can be done in a fraction of it, but the speed is not the point on its own. It matters because the output is verifiable, and a lawyer putting their name to what follows has to be able to tell the two apart.

One general counsel described the effect plainly: the Australian-law focus and the depth of nuance in the answers materially changes confidence and speed when forming legal views. That is the first-line intelligence layer the TR data says clients are looking for and not finding, a tool that earns its place in the workflow by producing something a practitioner can stand behind, rather than something that reads well until the follow-up question.

The TR report has given the profession a clear picture of where it stands, and the distribution is not subtle: 91 percent falling short, a single-digit fraction of providers delivering what clients call essential. For Australian firms and in-house teams, the competitive implication does not take much inference. Clients are watching whether the work they receive is more accurate, better sourced, and easier to defend, and they will notice which firms can answer that with confidence. Closing the gap starts with being honest about what "better" means in legal work, and then building toward that standard with some rigour.

If that framing matches how your team is thinking about AI research, habeas.ai is worth a visit.

The legal research in this article was conducted and every citation verified using Habeas, the Australian legal AI research platform.

Hero image: Paul White on Unsplash

Other blog posts

see all

Experience the Future of Law