Can AI Really ‘Interpret’ the Law? Rethinking Large Language Models as Probabilistic Evaluators, Not Robot Judges

by Dr Václav Janeček, Senior Lecturer in Law, University of Bristol Law School

This post is based on a presentation given at the Institute of Legal Informatics and Judicial System’s ‘IGSG Dialogues’ seminar series. The author is hosted at the IGSG-CNR as the Leverhulme International Fellow.

Heidi Kaden via Unsplash

One of the best things about doing legal research is, I suspect, the investigator-like thrill involved in the task at hand. When trying to identify the laws demanded by a research question, lawyers can often feel like undercover journalists chasing some big case, pursuing each probable lead and cutting all loose ends. They meticulously discriminate between good and bad sources, as well as between those that are squarely relevant to their question and those that are relevant only indirectly. They write down interim ideas and keep logs of steps taken and corners of the law already swept. In their heads, they play with words as they emerge from the sources and blend them into possible interpretations of the law.

Lawyers like myself imagine possible interpretations of the law as a form of jigsaw puzzle, one where individual pieces of information can be joined up in different ways, some better than others, but where none is defined by any ground truth. Sometimes, a court will say with authority how the law should be understood, but even that authoritative statement remains open to interpretation. This iterative process—putting the pieces together again and again until you find the interpretation that appears most defensible—is what my craft of interpretation looks like in practice. And I imagine yours looks very similar.

But internal mental processes are not what we are interested in. As lawyers, we need to make the results of our intellectual gymnastics intelligible to the outer world, or at least to participants in the legal debate whom we want to convince with our interpretations of the law.

The Importance of Methodology and the Lack Thereof in Modern AI Tools

This is where methodology comes in. As lawyers, we know what makes a defensible interpretation: after all, we have a whole set of methods devoted to legal interpretation. Established methods of legal research and interpretation are designed to help resolve disagreements about which legal sources are the most relevant and what the meaning of the words contained in those sources is. These methods of ‘doctrinal’ lawyering are designed to justify the ‘doctrinal’ meaning of legal sources, i.e. the meaning dogmatically accepted or most likely to be accepted.  Such ‘doctrinal’ methods promote legal certainty by helping us to identify and understand our legal obligations.

The difficulty is that there is no robust, nor even any weak but widely accepted methodology for deciding what constitutes good use of generative artificial intelligence (GenAI) in legal interpretation. Without such a methodology, we can hardly achieve a trusted, safe, ethical, and scalable deployment of artificial intelligence in legal practice and education.

So, as artificial intelligence and large language models (LLM) increasingly enter into legal research, a fundamental question arises: can these systems truly engage in legal interpretation?

I think we should push against the temptation to answer that in the affirmative. On the one hand, today’s AI systems – including sophisticated ‘agents’ that can plan and act – do not engage in legal interpretation in the strong jurisprudential sense in which lawyers use that term. On the other hand, these systems might still play a meaningful role in interpretive work, if we reconceive them not as robot judges, but as probabilistic tools for evaluating competing human interpretations.

Why ‘Generative Interpretation’ Doesn’t Work

Recently, scholars and judges have proposed something called ‘generative interpretation’, conducting experiments using large language models to answer interpretive questions about the law. These experimenters argue that AI is successful at this task. But there’s a fundamental problem: whilst there are seemingly useful outputs from these AI systems, there’s no clear methodological framework in the existing scholarship that would explain what’s going on ‘under the hood’ in a way that would legitimately qualify it as legal interpretation. We get our answers first, and explanations (if any) only later. This boils down to the classic ‘black box’ issue: if we don’t know what’s happening inside the model, it’s impossible to explain with certainty that legal interpretation is occurring.

What is more, when lawyers interpret, they spot legal issues, plan data searches, access legal materials, read and analyse texts, and form actionable conclusions. Computer scientists, meanwhile, define AI agents as computational entities that sense their environment by interacting with different kinds of input data, autonomously plan steps, and execute them by using appropriate tools. These frameworks simply do not align, and so talking about one in terms of the other is misleading.

Besides, it is a category mistake to regard a statistics-based computational analysis of legal texts as jurisprudential interpretation. When AI systems process legal data as bulk data, they are engaging in a categorically different exercise from that involved when lawyers closely read and interpret texts in the jurisprudential sense. Yes, textual corpora, when seen from a distance as data suited for computational analysis, can reveal interesting (and sometimes even useful) meta-level insights about the source documents. But jurisprudential arguments are not made at this meta-level.

As Grimmelmann et al put it:

[t]he superficial fluency of LLM-generated text conceals fundamental gaps between what these models are currently capable of and what legal interpretation requires to be methodologically and socially legitimate. Put simply, any human or computer can put words on a page, but it takes something more to turn those words into a legitimate act of legal interpretation. LLM proponents do not yet have a plausible story of what that ‘something more’ comprises.

A Different Approach: Probabilistic Interpretation

I think the solution to the legitimacy problem lies in large part in understanding how large language models actually operate, To make an LLM-enabled interpretation of legal texts methodically reliable, one needs to work out if the insights provided by the algorithmic model are valid in a scientific sense. And to do that, one needs a robust causal theory of how the model operates in a mechanistic sense. In other words, we need to be certain how the LLM-enabled interpretations came about, not whether they score well against some increasingly popular benchmarks.

And this is where ‘mechanistic interpretability’, a subfield of Explainable AI (XAI), can help. Mechanistic interpretability examines the internal workings of neural networks by analysing their computational mechanisms, including word embeddings (representing words and their relations), the Transformer architecture (capturing context), and, crucially, probability prediction—predicting what may come next based on how probable that next token in a sequence would be.

Based on these features of the generative models in question, it makes better sense, in my view, to use GenAI as tool for what I call ‘probabilistic legal interpretation’, rather than generative interpretation. Here’s how it works. Imagine you’re interpreting the rule ‘no vehicles in the park’. You might propose three competing interpretations: (1) all motorised transport is prohibited; (2) all wheeled conveyances except wheelchairs are prohibited; (3) anything capable of transporting people or goods is prohibited. Rather than asking the AI to generate an interpretation through bulk processing, you come up with these interpretations yourself, then ask about the probabilistic weights and values for each competing interpretation—using the system’s underlying mechanistic features to shed light on the probabilistic relations between the original text and your proposed readings.

The system compares the original text (embedded and represented as vectors and tokens) with each proposed interpretation (also embedded and represented), asking: what’s the probability that each expression is closely related to the original?

Such probabilistic interpretation can be defined as the processing of text with the aim of ascertaining its meaning by mapping mechanistic features of some original text somehow represented in the foundation language model onto the interpretive counterpart.

From Interpretation to Evaluation

This raises a crucial question of social (rather than methodological) legitimacy: can a judge (or a lawyer generally) do anything with such a ‘probabilistic interpretation’? Is this a legitimate method, alongside the established methods of legal interpretation?

The answer lies in recognising that probability is always relative to something—to a specific model, which might focus on specialist jargon, a particular historical period, or even the linguistic patterns of a specific judge. We might create custom language models to ascertain interpretations based on controlled variables: how language is used by everyone in a given language, by people in a certain historical period or within a specific legal system, or by particular judges.

If you ponder these theoretical questions for a while, one conclusion starts to emerge. Probabilistic legal interpretation is not actually a method of legal interpretation at all—it’s a method of evaluation. It cannot be justified at the same level as existing interpretation methods, because the questions we ask about probability are effectively versions of traditional interpretive questions, just approached through a different technique.

The social legitimacy of probabilistic interpretation lies in its use as an evaluative framework—a rigorous approach to ranking competing interpretations. When you’ve proposed three readings of ‘no vehicles in the park’, probabilistic interpretation provides a sophisticated tool for evaluating which interpretation aligns most closely with how the relevant linguistic community uses those terms. By contrast, it is not an approach that sees AI as a robot-judge-like tool that spits out its own interpretation of the legal sources at hand.

Preserving What Matters

In the spirit of open science, I share these early thoughts to help move the research agenda forward. Properly understood, I think probabilistic legal interpretation offers a path forward for the legal profession’s engagement with AI—not replacing the investigator-like thrill of legal research or the intellectual gymnastics of interpretation, but providing an analytical tool for evaluating the interpretations that humans propose. The lawyer still does the interpretive work, while AI simply helps assess which interpretation is most defensible based on probabilistic linguistic analysis. It’s a more modest role for AI than some advocates suggest, but potentially a more valuable one that preserves what makes legal research intellectually rewarding whilst harnessing technology’s evaluative power.

Leave a Reply

Your email address will not be published. Required fields are marked *