Did OpenAI Train GPT-4o on Paywalled O’Reilly Books?
OpenAI is accused of training GPT-4o on paywalled O’Reilly books without permission.
Matilda
Did OpenAI Train GPT-4o on Paywalled O’Reilly Books?
The AI industry continues to be mired in controversy over training data, and OpenAI is once again in the spotlight. A new study suggests that OpenAI may have trained its GPT-4o model using paywalled books from O’Reilly Media without explicit permission. If true, this could add to the growing concerns about AI companies leveraging copyrighted materials without proper licensing. Image:Google The allegations stem from research conducted by the AI Disclosures Project, a nonprofit founded in 2024 by Tim O’Reilly (CEO of O’Reilly Media) and economist Ilan Strauss. The study applied a technique called DE-COP, designed to detect copyrighted content in AI models, to determine whether GPT-4o had prior exposure to paywalled O’Reilly books. Key findings from the study include: GPT-4o demonstrated a significantly higher recognition of paywalled O’Reilly book content compared to previous models like GPT-3.5 Turbo. GPT-3.5 Turbo showed greater familiarity with publicly accessible O’Reilly content, indic…