Artificial intelligence tools, particularly large language models (LLMs), are increasingly being used to draft, interpret and challenge insurance policy language. This trend was highlighted in the Eleventh Circuit’s May 2024 decision in Snell v. United Specialty Insurance Co., where U.S. Circuit Judge Kevin Newsom, in a concurring opinion, considered how LLMs could help clarify the plain meaning of contested terms like “landscaping.” While many see significant potential in using AI to streamline and improve policy interpretation, they also warn of the legal and practical challenges these tools may introduce.

In an interview with Law360, Litigation partner Colin Kemp, who also serves as managing partner of Pillsbury’s San Francisco office said, “I’m not so sure that using LLMs is accepted yet as a source of truth, or at the equal level to a dictionary—and you can question that in and of itself.”

Insurance Recovery & Advisory partner Tamara Bruno added that LLMs are highly sensitive to how questions are asked, can be hard to control when dealing with issues more complex than defining a single word, and may produce different results over time.

“It just seems to me like we’re almost creating a new expert issue, and now parties are going to have to bring in generative AI prompt experts who can explain what they did, document what they did, and demonstrate it and then show how it is a source that the court should consider,” she said.

Additionally, Bruno highlighted that at the end of Judge Newsom’s appendix of prompts, one model ultimately concluded that whether installing a trampoline counts as landscaping by saying, “is a matter of opinion. There is no right or wrong answer.”

“What benefit did we get from that, if, at the end of the day, it’s going to say, ‘Well, it can go either way’?'” she said.

Both Bruno and Kemp agreed that AI tools can assist policyholders in understanding their coverage but warned against placing too much reliance on the results they provide.

“I’ve certainly gotten queries from clients where they have run things through ChatGPT and then come to me to add my expertise to the output that they’ve received, because that is one of these issues with the LLMs: It’s very hard to evaluate the output if you don't know much about the subject,” Bruno said.

“I think it’s a tool, but using it as an oracle is problematic,” she concluded.

Click here to read the full article.