Insights from a Facebook Insider: Developing Content Moderation for the AI Era

Leadership Challenge at Facebook and the Quest for Effective Moderation

When Brett Levenson transitioned from Apple to Facebook in 2019 to spearhead business integrity, the social media platform was still grappling with the lingering repercussions of the Cambridge Analytica scandal. Initially, he believed that enhancing technology could quickly rectify Facebook’s content moderation issues.

Complexities of Content Moderation Beyond Technology

However, Levenson soon discovered that the challenges extended far beyond mere technological updates. Human reviewers were expected to memorize a dense 40-page policy document, which had been translated by machine into their native language. Tasked with evaluating flagged content in just 30 seconds, they had to make rapid decisions regarding violations and appropriate sanctions, including blocking users or limiting content distribution. Levenson’s experience revealed that these hasty judgments were only “slightly better than 50% accurate.”

The Perils of Reactive Approaches in a Rapidly Evolving Landscape

This slow and reactive approach is increasingly untenable in an environment teeming with agile and well-funded adversaries. The emergence of AI chatbots has further exacerbated content moderation issues, leading to high-profile incidents such as chatbots that inadvertently provided guidance on self-harm or generated imagery circumventing safety filters.

Innovating Content Moderation with Policy as Code

Frustrated with existing methods, Levenson conceived the idea of “policy as code,” which transforms static policy documents into executable, updatable code that is closely linked to enforcement actions. This concept paved the way for the establishment of Moonbounce, which recently secured $12 million in funding, co-led by Amplify Partners and StepStone Group.

Revolutionizing Content Safety for Diverse Applications

Moonbounce collaborates with various companies to implement an additional layer of safety across all content generated, whether by users or AI. The company has developed its own large language model capable of analyzing customer policy documents and evaluating content in real time, delivering responses within 300 milliseconds. Depending on client needs, Moonbounce can slow content distribution for later review or block high-risk material immediately.

Expanding Impact Across Multiple Sectors

The company currently focuses on three primary sectors: platforms that handle user-generated content like dating applications, AI firms developing interactive characters, and AI image generation technologies. As Levenson noted, Moonbounce is already managing more than 40 million daily reviews and serving over 100 million active users. Key customers include AI companion startup Channel AI, and image generation companies like Civitai, along with character roleplay platforms such as Dippy AI and Moescape.

Integrating Safety as a Competitive Advantage

Levenson emphasizes that safety can be a significant product benefit, shifting the narrative from simply addressing safety concerns to embedding them within the product itself. This perspective allows customers to innovate in ways that make safety a cornerstone of their offerings. For example, Tinder’s head of trust and safety recently highlighted how their platform utilizes similar LLM-powered services to achieve a tenfold improvement in detection accuracy.

Navigating the Complex Landscape of AI Safety

As AI firms face increasing legal and reputational scrutiny, particularly concerning the behavior of chatbots that have reportedly led vulnerable users toward self-harm, the demand for robust safety infrastructure grows. Levenson pointed out that many AI companies are now seeking external support to enhance their safety frameworks. Moonbounce serves as an intermediary, positioned between users and chatbots, focused solely on enforcing rules in real-time.

Future Developments and Ethical Considerations

Looking ahead, Levenson and his team aim to introduce “iterative steering,” a response mechanism to address sensitive topics with conversational redirection rather than outright rejection. This real-time intervention would guide chatbots toward more supportive responses when harmful subjects arise. While contemplating the company’s future, Levenson acknowledges the potential compatibility of Moonbounce’s technology with platforms like Meta, yet he remains cautious about the implications of an acquisition that could limit broader accessibility.