AI Content Moderation Is Broken — This Startup Is Fixing It
A former Facebook insider has raised $12 million to solve one of the most persistent problems in tech: AI content moderation that actually works in real time. Moonbounce, a safety infrastructure startup, is now processing over 40 million daily content reviews across AI chatbots, image generators, and dating platforms — and its founders believe safety can be a product feature, not just an afterthought.
| Credit: Moonbounce |
Why AI Content Moderation Has Always Been a Coin Flip
Content moderation at scale has never been solved cleanly. When Brett Levenson joined a major social media platform in 2019 to lead business integrity operations, he expected to find a technology problem. What he found was far messier.
Human reviewers were handed a 40-page policy document — machine-translated into their language — and given roughly 30 seconds to assess flagged content. They had to decide not just whether something violated the rules, but what action to take: remove it, ban the account, or throttle distribution. That entire process, Levenson discovered, was producing decisions that were only "slightly better than 50% accurate." For a system trusted to protect millions of users, that was effectively a coin toss.
The deeper problem was timing. By the time a human reviewer reached flagged content, the harm had often already spread. The system was reactive by design — and in a world of fast-moving, well-funded bad actors, reactive is not enough.
From Policy Documents to Executable Code: The Moonbounce Insight
Levenson's frustration crystallized into a single idea: what if policy documents weren't static text, but living, executable logic? He called it "policy as code" — turning the rules that govern content into something a machine could read, apply, and update in milliseconds.
That idea became the foundation of Moonbounce, which he co-founded with Ash Bhardwaj, a former colleague who had built large-scale cloud and AI infrastructure. Together, they trained a proprietary large language model to ingest a customer's policy documents, evaluate content at runtime, and return a decision in 300 milliseconds or less.
Depending on the platform's preferences, that decision can take different forms. High-risk content might be blocked outright. Lower-risk content might be flagged and routed toward delayed human review. The system sits between the user and the application, operating independently of the chatbot's own context window — which means it is not overwhelmed by thousands of tokens of prior conversation history. Its job is singular: enforce the rules, right now.
A $12 Million Bet on Real-Time AI Safety Infrastructure
Moonbounce announced this week it has closed a $12 million funding round co-led by Amplify Partners and StepStone Group. The raise comes at a moment when the AI safety problem has moved from a technical concern to a legal and reputational one.
AI companies are facing mounting pressure after a series of high-profile failures. Chatbots have been accused of providing harmful guidance to teenagers. Image generation tools have been misused to create nonconsensual intimate imagery. Internal guardrails, critics argue, are simply not keeping up with the scale and sophistication of misuse.
Amplify Partners' general partner noted that with large language models now powering nearly every application, content moderation challenges have grown more complex than ever. The firm sees real-time, objective guardrails becoming essential infrastructure for the AI era — not a nice-to-have, but a foundational layer.
Levenson frames it differently: safety, he argues, has never been a product benefit because it has always been built after the fact. Moonbounce is trying to change that by making safety something companies can ship as a feature.
Who Is Already Using Moonbounce?
Moonbounce currently serves more than 100 million daily active users across its customer base. Its three primary verticals are platforms dealing with user-generated content — such as dating apps — AI companionship and character products, and AI image and video generation tools.
Current customers include Channel AI, an AI companion startup; Civitai, an image and video generation platform; and character roleplay platforms Dippy AI and Moescape. The system is already handling more than 40 million daily content reviews.
A major dating platform's head of trust and safety has publicly noted that using this category of large language model-powered moderation service produced a tenfold improvement in detection accuracy. That figure illustrates the gap between legacy systems and what real-time, policy-driven moderation can achieve.
The Next Frontier: Steering Conversations, Not Just Blocking Them
Moonbounce's next major capability is what the team calls "iterative steering." The concept emerged from cases where blunt refusals — the typical safety response when a harmful topic arises — failed to help the person on the other side of the conversation.
The 2024 death of a 14-year-old boy in Florida, who had developed an obsessive relationship with a character AI chatbot, became a turning point in how the industry thinks about this problem. A refusal message does not redirect a vulnerable person toward help. It simply closes a door.
Iterative steering aims to intercept those moments and modify the user's prompt in real time, pushing the chatbot not just toward a neutral response but toward an actively supportive one. The goal is to turn the chatbot into a helpful listener, not just an empathetic wall.
This kind of intervention requires precision. Moonbounce's system is designed to identify the moment a conversation begins drifting toward harm and alter its trajectory before the damage is done — without the user necessarily being aware that anything changed. It is a fundamentally different philosophy from the block-and-ban logic that has defined content moderation for years.
The Independence Question: Why Moonbounce Wants to Stay Open
Levenson is candid about the tension between building something valuable and keeping it widely accessible. When asked whether an acquisition by a large platform would make sense — bringing his content moderation work full circle — he acknowledged the strategic fit while expressing something unusual for a founder: reluctance.
His concern is not valuation. It is access. If Moonbounce's technology is acquired and locked behind a single company's walls, the broader AI ecosystem loses the benefit of an independent, third-party safety layer. That independence is, in his view, part of the product's value proposition. A system owned by one of the platforms it is meant to police is a different thing entirely.
For now, Moonbounce remains independent, serving a growing roster of AI companies that are increasingly willing to look outside their own walls for safety infrastructure they cannot build fast enough themselves.
What This Means for the Future of AI Safety
The AI content moderation problem is not going away. If anything, it is accelerating. As large language models become the interface layer between humans and nearly every digital service, the potential for harm — and the volume of content that must be assessed — grows with each new deployment.
Moonbounce's approach represents a meaningful shift in how the industry might address this. Rather than asking individual companies to build and maintain their own safety stacks — a process that is slow, expensive, and often reactive — it proposes a shared infrastructure model where enforcement logic is fast, auditable, and continuously updatable.
The $12 million raise gives the 12-person team runway to expand its verticals, develop iterative steering, and deepen its integrations across the AI application landscape. Whether the "policy as code" vision can scale to meet the demands of a rapidly expanding AI ecosystem remains to be seen — but the early numbers suggest the market is ready for it.
For an industry that has spent years treating safety as a cost center, Moonbounce is making the case that it can be something else entirely: a competitive advantage.
Moonbounce's $12 million seed round was co-led by Amplify Partners and StepStone Group. The company is headquartered in San Francisco.