We are delighted to host a workshop for policymakers and academics to examine what choices, dilemmas and policy initiatives we can engage on the issue of content moderation, free speech and platform governance. The workshop takes place at Stewart House, London on 21-22 May. All details of our speakers and their presentations are below. The event is led by Dr. Andreu Casas.

Wednesday May 21st

9:00 - 9:30: Coffee, Registration, and Welcome

9:30 - 11:00: Academic Session #1

● Content Moderation and the 'Three Effects'

Ralph Schroeder

Content moderation will continue to be subject to the 'three effects', Brussels, Silicon Valley (plus Washington) and Beijing. All three are a product of geopolitics and of different principles underlying the legitimacy of their states (or of the EU and its member states). This talk will discuss the implications of the persistence of the three effects, a global fragmented landscape where US social media companies dominate, Brussels has 'exemplary' regulation, and Beijing's social media have limited but growing reach outside of China. Publics are exposed to different political and cultural 'gray zone' and harmful content, with open-ended contestation of the boundaries of what should be gatekept or not.

● From Bystanders to Reporters: Who Acts Against Illegal Online Content?

Friederike Quint, Yannis Theocharis, Spyros Kosmidis, Margaret E. Roberts

Harmful and illegal content on social media remains pervasive, posing significant challenges for content moderation and user safety. As platforms grapple with managing vast amounts of user-generated content, user-driven reporting is a crucial mechanism for identifying and addressing violations. Despite its importance, there is limited understanding of who actively participates in reporting such content, what motivates them, and how they differ from the general population. This applies not only to those reporting harmful content on platforms but also to other portals for hateful speech reporting, such as those provided by civil society actors. This study aims to bridge this gap by addressing two fundamental questions: Who takes action against harmful and illegal content, and to what extent do interventions, such as nudges promoting counter-speech and civic duty, influence reporting behavior? This research comprises two complementary studies. Study 1 analyzes unique survey data from two distinct populations: a novel dataset of confirmed reporters in Germany who have actively flagged content and a representative sample of the general population in both Germany and the United States. This study investigates the demographic, political, attitudinal, and behavioral differences between these groups, shedding light on the profiles of individuals who engage in reporting and their underlying motivations. Building on these insights, Study 2 is a preregistered experiment that explores the potential of nudging interventions to encourage broader participation in reporting harmful and illegal content. The survey experiment, conducted in both Germany and the United States, tests the effectiveness of nudges that emphasize different motivational aspects for reporting illegal content. Participants in the experiment are randomly assigned to control or treatment conditions, with variations in messaging designed to evoke a sense of responsibility and proactive engagement. Measures of political and social attitudes are also incorporated to assess their influence on participants' likelihood to report content. By employing a cross-national perspective, this research not only identifies key motivators behind reporting behavior but also evaluates the effectiveness of interventions aimed at enhancing user participation in content moderation.

● Perceived Legitimacy in Borderline Content Moderation: The Impact of Moderation Source, Procedural Contestability, and Outcome Alignment

Diyi Liu

Content moderation navigates tensions between efficiency, consistency, and contextual understanding, yet theoretical foundations explaining user perceptions of moderation practices remain underdeveloped, particularly across diverse cultural contexts. Using two pre-registered survey experiments with TikTok users in Indonesia (n=304) and Pakistan (n=303), the study examines how three factors—moderation source, procedural contestability, and outcome alignment with user preferences—individually and jointly influence perceived legitimacy of content moderation for morally, emotionally, and politically charged content.

Results show that outcome alignment with users' own moderation preferences most consistently and significantly enhances perceived legitimacy across both countries. Moderation source (algorithmic systems, human moderators, governmental agencies, or civil society organizations) and procedural contestability showed limited significant effects on legitimacy perceptions, with only marginal benefits observed in Indonesia. Most notably, the study reveals a complex three-way interaction: when algorithmic decisions are contestable and aligned with user preferences, perceived legitimacy reaches its peak; however, when these same contestable algorithmic decisions contradict user preferences, legitimacy perceptions drop significantly. Additionally, individual differences in moral restrictions on free speech significantly moderated responses, with substantial variation in content evaluation across demographic groups. As content moderation systems continue to evolve, understanding these legitimation dynamics will be essential for designing approaches that can effectively govern online speech while maintaining user trust and acceptance across diverse global communities.

11:00 - 11:15: Coffee Break

11:15 - 13:00: Academic Session #2

● Divergent patterns of engagement with partisan and low-quality news across seven social media platforms

Mohsen Mosleh, Jennifer Allen, David Rand

In recent years, social media has become increasingly fragmented, as platforms evolve and new alternatives emerge. Yet most research studies a single platform—typically Twitter/X, or occasionally Facebook—leaving little known about the broader social media landscape. Here we shed new light on patterns of cross-platform variation in the high-stakes context of news sharing. We examine the relationship between user engagement and news domains’ political orientation and quality across seven platforms: Twitter/X, BlueSky, TruthSocial, Gab, GETTR, Mastodon, and LinkedIn. Using an exhaustive sampling strategy, we analyze all (over 10 million) posts containing links to news domains shared on these platforms during January 2024. We find that the news shared on platforms with more conservative user bases is significantly lower quality on average. Turning to patterns of engagement, we find—contrary

to hypotheses of a consistent “right wing advantage” on social media—that the relationship between political lean and engagement is strongly heterogeneous across platforms. Conservative new posts receive more engagement on platforms where most content is conservative, and vice versa for liberal news posts, consistent with an “echo platform” perspective. In contrast, the relationship between news quality and engagement is strikingly consistent: across all platforms examined, lower-quality news posts received higher average engagement even though higher quality news is substantially more prevalent and garners far more total engagement across posts. This pattern holds despite accounting for poster-level variation, and is observed even in the absence of ranking algorithms, suggesting user preferences – not algorithmic – bias may underlie the underperformance of higher-quality news.

● Assessing global hate speech moderation on Twitter

Manuel Tonneau, Dylan Thurgood, Diyi Liu, Niyati Malhotra, Ralph Schroeder, Scott A. Hale, Samuel Fraiberger, Victor Orozco-Olvera, Manoel Horta Ribeiro, Paul Röttger

Social media has been criticised for exposing users to hate speech, with potential negative offline consequences ranging from political polarisation to hate crimes. To counter this phenomenon, platforms have resorted to content moderation, consisting in deleting or downranking hateful posts as well as suspending their authors. Despite many platforms using content moderation since their inception, the level of moderation enforcement remains largely unclear. In this project, we aim to bridge this gap by assessing the enforcement of hate speech moderation on Twitter. Using a manually annotated representative dataset of all tweets posted in one day, we find that the majority of hate speech has not been removed five months after being posted, with an average removal rate of 19% across the eight considered languages. We also find that being hateful is less strongly associated with removal compared to other online harms such as crypto scams or adult content. We also uncover linguistic disparities in moderation enforcement, with hateful tweets in European languages such as German or French more likely to be removed than English hateful tweets, whereas Arabic tweets are less likely to be removed. We finally estimate the feasibility of human-in-the-loop moderation, whereby AI models flag the tweets most likely to be hateful and human moderators verify them. We find that this approach can achieve a high enforcement rate with enough resources, reaching 80% moderation rate or more for most languages when employing 1000 moderators per language. Overall, our results demonstrate that moderation enforcement is low on Twitter but that a high enforcement is achievable with publicly available detection models combined with enough moderation workforce. We hope that our work will pave the way for more auditing of moderation practices, increase the accountability of platforms and eventually better protect users from online harms.

● Opening the Black Box: Exploring Content Moderation Dynamics on Bluesky

Mia Nahrgang

Content moderation on social media has been described as a notoriously opaque black box, and little is known about how, when, and why platforms moderate. This paper makes use of the fact that the new and rising platform Bluesky provides unprecedented insights into the inner workings of a social media platform's content moderation. Bluesky vows to empower users by giving them greater choices in content moderation and allowing them to customize and define their own standards for how strict or lenient moderation should be on their feeds. This is done with the help of labels, which are applied to content through a combination of human and algorithmic decisions. In the settings, users can then decide how content labeled in a specific category should be treated and whether the content should be hidden from view. Therefore, the labels are integral to the platform’s moderation framework. Drawing on the

complete historical record of Bluesky-issued labels - encompassing 10 million labels and covering nearly two years - this paper explores the dynamics of label creation as well as how those have evolved alongside the platform’s maturation.

● Inattention and Differential Exposure: How Media Questioning of Election Fraud Misinformation Often Fails to Reach the Public

Mathieu Lavigne, Brian Fogarty, John Carey, Brenday Nyhan and Jason Reifler

Why are high-profile misperceptions like the myth of widespread fraud in the 2020 U.S. presidential election so persistent and pervasive? Many observers blame partisan demand for congenial news and resistance to corrective information. Others blame media amplification of false claims. However, nationally representative survey and behavioral data from the U.S. during the periods around the 2020 and 2022 elections show that skewed online information diets are rare and that even supporters of Donald Trump change their views about fraud claims when randomly exposed to fact-checks. The problem is not the prevalence of accurate information, either; the fraud-related content that people saw online overwhelmingly questioned fraud claims. We instead conclude that a key factor is inattention (i.e., a lack of exposure) --- most people encountered relatively little fraud-related content in their web browsing. Among those who did see such content, most supporters of both Joe Biden and Trump saw more content that questioned fraud claims. However, Trump supporters were differentially likely to be exposed to articles that did not question claims of widespread fraud, including after encountering more skeptical coverage --- a pattern that has been shown to undermine correction effects. The persistence of fraud beliefs thus appears to be attributable to the combination of low levels of attention and differential exposure to congenial content undermining the effects of more accurate information.

13:00 - 14:00: Lunch (on-site)

14:00 - 16:30: Stakeholder Session: from Research to Policy, from Policy to Research, moderated by Ben O’Loughlin.

● Academic Lightning Talks

○ Spyros Kosmidis: Public Attitudes on Content Moderation and Freedom of Expression: Evidence from 11 Countries

○ Mohsen Mosleh: Differences in misinformation sharing can lead to politically asymmetric sanctions

○ Manuel Tonneau: Moderating hate speech at scale on social media

○ Diyi Liu: Content Moderation, Platform Governance, and Legitimacy

○ Jason Reifler: How the Public Can Both Spoil and Improve Social Media as a Source of Information.

○ Anastasia Kozyreva: Public Attitudes to Content Moderation of Harmful Misinformation in the EU, the UK, and the US

○ Andreu Casas: Auditing the moderation and curation of political content on social media platforms: Promises and Pitfalls

● Confirmed Stakeholder Participants

○ Ofcom

○ Electoral Commission

○ Information Commissioner’s Office

○ Royal Society

○ Smart Data Research UK

○ UNHCR

16:30 - 18:00: Drinks (on-site)

19:00 Dinner:

TAS Bloomsbury 22 Bloomsbury St, London WC1B 3QJ

Thursday May 22nd

9:00 - 10:45: Academic Session #3

● Public Attitudes on Content Moderation and Freedom of Expression: Evidence from 11 Countries

Spyros Kosmidis

This presentation explores public attitudes toward content moderation, examining the balance between safeguarding freedom of speech and preventing harm, as well as the prevalence of toxic behavior in online spaces. Using representative survey data from 11 countries, the presentation offers a detailed account of how citizens perceive the responsibilities of social media platforms, governments, and independent organizations in moderating content. The findings shed light on how people navigate competing priorities for free expression and safety, offering valuable insights for policymakers, platform designers, and stakeholders invested in shaping digital communication norms.

● Platform Power and Democracy: Skepticism of Digital Gatekeepers 10 Democracies

Jan Zilinsky, Yannis Theocharis, Friederike Quint, Spyros Kosmidis

Using a survey of over 13,000 respondents, we find that citizens across different democratic contexts share significant concerns about platform power while showing notable variation in who they trust to govern online spaces. Our findings reveal that while social media platforms are widely seen as best positioned to combat harmful speech (43% across all countries), there is substantial skepticism about their concentrated power. This creates a paradox: citizens appear to believe platforms should be responsible for content moderation while simultaneously distrusting their dominance. To analyze these dynamics, we test whether trust, political efficacy, anti-elitism, and concerns about the effects of social media apps on social outcomes (polarization, misinformation, mental health) predict beliefs that 1) large online platforms are ungovernable; 2) the power wielded by large platforms is a threat to democracy. Our results suggest that respondents who perceive social media as harmful for society also express hopelessness about the possibility that national governments could reign in digital platforms.

● Public Attitudes to Content Moderation of Harmful Misinformation in the EU, the UK, and the US

Anastasia Kozyreva, Jason Reifler, Stefan M.Herzog, Stephan Lewandowsky, Ralph Hertwig

Social media companies act as regulators of user-generated content and have implemented various measures to mitigate risks from harmful misinformation. These measures range from system-level content moderation—such as restricting content proliferation or penalizing accounts (e.g., removal, suspensions, demonetization, algorithmic down-ranking)—to softer individual-level approaches like user empowerment and content labeling. In recent times, debates over the limits of free speech on social media platforms have intensified, with several global platforms scaling back restrictions on harmful content in the name of freedom of expression. This shift reflects a clear preference for one side of the dilemma in moderating online speech: prioritizing free speech protectionism over harm prevention. However, this stance taken by social media companies contrasts with the public’s preferences. Our previous study explored the US public navigates these moral dilemmas in content moderation. In this talk, I will present results from our most recent study, conducted online in five EU countries,

the UK, and the US. This study examines public attitudes toward different content moderation interventions and the dilemmas involved in addressing harmful misinformation.

● Tolerance and Social Media Regulation. Evidence from Germany

Richard Traunmüller

This paper presents the results of a survey experiment on political tolerance and preferences for social media regulation in Germany.

Inspired by Stouffer's classic approach to measuring political tolerance, we “modernize” both the controversial viewpoints (racism, gender identity, immigration, climate change) and the activities (allow, delete post, ban user) so that they directly relate to current debates on the regulation of free expression on social media platforms. Next to the effects of the perceived truth, offensiveness and danger of the controversial viewpoints the paper also explores the political correlates of content moderation preferences.

10:45 - 11:00: Coffee break

11:00 - 12:45: Academic Session #4

● Moderation of Political Content on Youtube during the 2024 US Election

Andreu Casas

Today social media platforms play a crucial role in the moderation of political information and speech. Yet, despite growing concerns, we still know very little about how often nor the conditions under which platforms moderate political content. For more than one year leading to the 2024 US election (July 2023 - November 2024), I monitor the moderation of content from more than 11,000 salient YouTube channels posting about US politics (more than 6 million videos in total). I use state of the art computational methods, such as Large Visual Language Models, to address the following research questions: (1) How often does Youtube remove channels and content of political relevance? (2) Does Youtube remove political (v. non-political) content at different rates? (3) For what reasons are channels and videos of political relevance being removed? (4) Are removals equally frequent across topics? And (5) are conservative (v. liberal) channels and videos removed at higher rates? The analysis reveals many key insights regarding the role of social media platforms in the moderation of political content during a relevant electoral period.

● Cross-National Variation in Content Moderation Preferences: Exposure to Toxic Language Toward Different Targets Across 10 Countries

Spyros Kosmidis, Yanies Theocharis, Jan Zilinsky, Friederike Quint

This study explores cross-national variation in content moderation preferences by examining how citizens across 10 countries respond to exposure to -extreme- toxic language directed at different targets. While existing research has extensively explored how the severity of language is predictive of moderation preferences, demands for moderation seem to vary depending on the study, type of language and context. By experimentally exposing participants to toxic language targeting specific groups, we assess whether content moderation preferences differ depending on the target group, how these preferences vary

across countries and whether there is consensus across societal groups. Our findings contribute to the ongoing debate about the role of citizens and social media platforms in managing harmful content and highlight the complex relationship between context and users' expectations for platform moderation.

● How Meta’s Content Moderation Reform Affected the Moderation of Online Hate Speech: Evidence from a New Dataset

Dario Landwehr, Mia Nahrgang, Sebastian Nagel, Nils B. Weidmann, Margaret E. Roberts

Following the 2024 US-Presidential election, Meta's CEO Mark Zuckerberg announced a major reform in his social media platforms (e.g. Facebook, Instagram) content moderation systems, on January 7th, 2025. The announcement contained two main parts: First, except from illegal content, Meta's content moderation would switch from automatically filtering violations to their community guidelines, to only reacting to users flagging certain content. Second, Zuckerberg announced reduced content restrictions concerning certain protected categories like gender. In this research note we are going to evaluate the impact this announcement had on large scale content moderation, using a unique dataset that records the universe of content moderation decisions in the EU by all major social media platforms. To our best knowledge we are the first to provide evidence for a moderate negative effect of Zuckerberg's announcement on the absolute volume of moderations relating to harmful speech on Meta platforms vis-à-vis other platforms and content categories.

● Global Preferences for Online Hate Speech Moderation

Simon Munzert, Julia Ellingwood, Lisa Oswald, Richard Traunmüller

To openly express one’s views is a fundamental right in any liberal democracy. However, in the age of social media, the questions of what is allowed to say and how public discourse should be regulated are ever more contested. We present a visual conjoint study combined with a framing and exposure experiment to analyze perceptions of online hate speech and preferences for its regulation. The survey was fielded in 11 countries (Brazil, Colombia, Germany, India, Indonesia, Nigeria, Philippines, Poland, Turkey, United Kingdom, United States), totaling more than 18k respondents. Our setup exposed participants to synthetic social media vignettes mimicking actual cases of hate speech. Respondents were asked to evaluate the posts on multiple criteria, allowing us to identify message-, context-, and citizen-level determinants for perceptions of and preferences for action against hate speech. We find people’s evaluations to be primarily shaped by type and severity of the messages, and less by contextual factors. At the same time, people's attitudes on contested topics heavily bias their perceptions of what is and is not legitimate to say, and how online speech should be moderated. Further experimental evidence highlights the downstream effects of hate speech exposure on preferences for the regulation of speech online.

12:45 - 13:30: Lunch (on-site)

13:30 - 15:00: Breakout Sessions

● Breakout room A: Measuring (the determinants and effects of) moderation

● Breakout room B: Theorizing and studying moderation in the West and Beyond

15:00 - 15:30: Closing