The Dictionary Sues OpenAI

Merriam-Webster and Encyclopedia Britannica have sued OpenAI for copyright infringement, alleging AI training on 100,000 articles without permission.
Matilda

Merriam-Webster and Britannica Just Declared War on OpenAI — And the Stakes Could Reshape the Entire AI Industry

Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a landmark lawsuit against OpenAI, accusing the artificial intelligence company of massive copyright infringement. The publishers allege that nearly 100,000 of their copyrighted online articles were scraped and used without permission to train OpenAI's large language models. This legal battle could become one of the most consequential copyright cases in the history of artificial intelligence.

The Dictionary Sues OpenAI
Credit: Getty Images

The Core Accusation: 100,000 Articles Stolen Without Permission

At the heart of the lawsuit is a straightforward but explosive claim. Britannica and Merriam-Webster say OpenAI used their vast library of proprietary content — articles carefully researched, written, and maintained over decades — as raw material to build its AI systems. No license was purchased. No permission was sought. No compensation was offered.

The publishers retain full copyright over their digital content, making this alleged use a direct and deliberate violation of intellectual property law. For organizations whose entire business model depends on the value of trusted, authoritative information, the accusation is not just a legal matter. It is an existential one.

The lawsuit does not stop at training data. Britannica also alleges that OpenAI violates copyright when ChatGPT generates outputs containing full or partial verbatim reproductions of their articles — essentially regurgitating protected text word for word in response to user queries.

The RAG Problem: When AI Pulls Directly From Your Content

One of the most technically detailed allegations in the lawsuit involves something called retrieval-augmented generation, or RAG. This is a process where a language model like ChatGPT scans the web or external databases to pull in fresh, up-to-date information when answering a question.

Britannica argues that OpenAI's use of its articles within this RAG workflow constitutes ongoing copyright infringement — not just a one-time training violation but a recurring, live process of using protected content. Every time ChatGPT reaches out and pulls from Britannica's database to construct a response, the publisher argues, a new infringement occurs.

This dimension of the lawsuit is significant because it moves the legal argument beyond the training-data debate that has dominated AI copyright discussions. It forces courts to examine whether AI systems are infringing in real time, not just during their creation.

Fake Answers, Real Damage: The Trademark Angle Nobody Expected

Perhaps the most unexpected layer of this case is the trademark claim. Britannica alleges that OpenAI violates the Lanham Act — a federal trademark statute — when ChatGPT generates hallucinations and then falsely attributes those fabricated answers to Britannica or Merriam-Webster.

In plain terms: when ChatGPT makes something up and presents it as if it came from a trusted dictionary or encyclopedia, it weaponizes the credibility those brands have built over centuries. A user who receives a confident but wrong definition labeled as coming from Merriam-Webster walks away with misinformation they believe to be authoritative.

The lawsuit specifically argues that this threatens the public's continued access to high-quality and trustworthy online information. That framing is deliberate. Britannica is positioning this not just as a corporate grievance but as a public interest issue — one where the casualty is the trustworthiness of knowledge itself.

Revenue Starvation: How AI Is Killing Publisher Business Models

Beyond copyright and trademarks, the complaint raises a pointed economic argument. ChatGPT, Britannica alleges, effectively starves publishers of revenue by providing answers that substitute for the original content. A user who gets a complete definition or detailed explanation from ChatGPT has no reason to click through to Merriam-Webster's website.

This is a traffic and monetization crisis hiding inside a legal filing. Publishers depend on ad revenue, subscriptions, and direct traffic. When an AI tool delivers their content as a polished answer — often without attribution or a link — the publisher loses the visit, the impression, and the income. Multiply that across millions of daily queries, and the financial damage compounds rapidly.

This argument mirrors complaints from news publishers, book authors, and content creators who have made similar observations about how generative AI disrupts the economic foundations of professional content creation.

A Growing Legal Front: Publishers Are Uniting Against OpenAI

Britannica and Merriam-Webster are far from alone in this fight. A growing coalition of publishers and media organizations has filed or joined legal actions against OpenAI over the past two years, creating a mounting legal front that the company cannot easily dismiss.

Major newspapers across the United States and Canada have taken action, including properties in Chicago, Denver, Miami, and Toronto, as well as a prominent national public broadcaster. A large digital media company that owns several major tech and entertainment publications has also pursued claims. The breadth of this legal mobilization signals that the publishing industry has moved beyond individual protests and is now coordinating a sustained challenge to how AI companies have built their products.

Britannica itself is fighting on another front simultaneously, with a separate pending lawsuit against a different AI company over similar allegations involving search and content retrieval.

The Legal Uncertainty That Could Define the AI Era

Despite the volume of lawsuits, there is still no settled legal precedent establishing whether training an AI model on copyrighted content constitutes infringement. The courts are working through these questions in real time, and the outcomes will shape the future of the entire AI industry.

In one notable development, a federal judge found that using content as training data could be considered transformative use — a core concept in fair use defenses. However, the same case produced a significant consequence for the AI company involved, which faced a substantial financial settlement after a separate finding that it had illegally obtained the source material rather than licensing it. The settlement reached into the billions of dollars and affected a large class of professional writers.

That outcome illustrates the complicated legal terrain these cases occupy. Transformative use may protect some AI practices, but the means of acquiring content — how it was obtained, not just how it was used — can independently create liability.

What This Means for the Future of AI and Trusted Information

The Britannica and Merriam-Webster lawsuit arrives at a critical moment. Generative AI tools are being used by hundreds of millions of people for everything from homework to medical research to professional writing. The quality and accuracy of what those tools produce depends, in large part, on the quality of the content they were built on.

If publishers pull back their content, restrict access, or win legal protections that force AI companies to stop using their material, the information landscape inside AI tools will shift. The authoritative, carefully curated definitions and encyclopedic entries that generations of readers have trusted may no longer flow freely into the AI systems that people increasingly treat as their first source of truth.

That is the deeper tension this lawsuit exposes. It is not simply about money or copyright law. It is about who gets to define the boundaries of knowledge in the age of artificial intelligence — and whether the organizations that built trusted information ecosystems will survive long enough to keep doing so.

The case is still in its early stages, but its implications extend far beyond a dictionary and a search chatbot. Every publisher, every author, and every reader has a stake in how it resolves.

Post a Comment