Keller’s Humility about Transparency: Driving the conversation forward

6 min readMay 5, 2021

In her recent piece titled, ‘Some Humility about Transparency’, Daphne Keller, Director of the Program on Platform Regulation at Stanford’s Cyber Policy Center, asked some important questions and put forward a set of recommendations about what we mean when we say we seek ‘transparency’ from online intermediaries. A thank you to Daphne for starting this conversation. Here at the Lumen project, we agree that it is an important one and that this is a critical time to have it, as more and more regulatory and legislative attention is being paid to online ‘transparency’.

The Lumen team feels a sense of responsibility to help drive this conversation forward. For the uninitiated, Lumen, formerly known as Chilling Effects, is an independent research project that studies, and facilitates the study of, requests to remove online material. Founded in 2002 by Wendy Seltzer, Lumen maintains a database containing millions of DMCA takedown notices and other removal requests and orders that have been shared with us voluntarily by the original recipients and senders, which include copyright holders, Online Service Providers (OSPs), search engines and other online intermediaries, as well as members of the general public. The database is available to researchers, scholars, journalists and others to facilitate research about different kinds of complaints and requests for removal — both legitimate and questionable; and to provide as much transparency as possible about such notices in terms of who sends them, why, and to what effect.

In this context, the ‘transparency’ that Lumen is able to help provide is contingent on the information that has been shared with the database. Here is a sample notice from April 23, 2021, available in the database (also visible in the figure below) The notice indicates that the Indian government is asking Twitter to takedown or block critical tweets by Indian journalists. As of January 2021, Lumen’s database contains over fifteen million notices referencing just under four billion URLs.

A sample notice from the Lumen Database

This scale of data (that grows by approximately two hundred thousand notices every month) comes with challenges of its own. First, Lumen has a LOT of ‘raw’ information. We want to be able to serve all types of researchers and scholars and so we house as much data as we can possibly receive from submitters (which include Google, Twitter, Wikipedia, WordPress among many others). But the range of submitters participating and the range of notice data they have chosen to share means the data itself is inconsistent in nature. This means that although it is possible to carry out a variety of advanced searches in the database, and Lumen continues to work to find ways to improve this, successfully adding more data to the database makes finding the ‘relevant’ data (which is different for different scholars) an uphill task.

We find that the questions raised by Keller about the standard of precision and the need/or lack thereof of homogeneity in the implementation of transparency laws are important ones, even in the more narrow context of the notices in the Lumen database.

As Mark MacCarthy, a Senior Fellow at the Center for Technology innovation notes, disclosures are costly and time-consuming and so they demand a ‘careful and measured articulation of the rigorously defined interests they are serving.’ He puts forth five types of disclosure requirements. The one relevant to content moderation is that of ‘Reports on the operation of content moderation programs’. Through this requirement, MacCarthy seeks aggregate statistics about the type and number of contents that have been taken down and also an insight into the decision-making process through disclosures on whether the moderation decisions are biased.

However, two questions arise here. First, as Keller pointed out — the statistics in the transparency reports can sometimes be misleading because of the variance in the number of notices versus the number of URLs per notice. Trends in the Lumen Database indicate that over the past decade, the number of URLs per notice has gone up by at least an order of magnitude, from typically single or double-digit quantities of URLs to typically three, with occasional notices that have as many as 20,000 URLs. With this in mind, maybe there is reason to determine not only what statistics are important but also are practical or reasonable to ask of intermediaries to disclose. To be able to foster innovation and maintain competition, the ‘asks’ of transparency cannot be so big that small companies either break under pressures of unending disclosures or have to sell off to bigger companies.

Second, asking for insight into the ‘decision-making process’ raises the question of the use(s) to which this type of information could be put and the possible steps that could be taken once these disclosures are made. In addition, it is worth keeping in mind that externally triggered removals like takedown notices are just one type of moderation. The internal decisions are another, likely much larger set of decisions.

Lawyers and policymakers alone are not equipped to address these important and complex questions alone. Technologists, academics, human rights activists, and even consumers themselves should have a say in some manner about what aspects of platform governance they seek most transparency from.

Whatever disclosures the laws don’t require may never happen. — There is a common agreement in scholarship on transparency that it is unclear what kind of information we want or need for enabling better content moderation laws. However, seeing as there exists this moment in history to be able to gather information through laws that will most likely not be reworked for many years to come, it is worth being very careful in picking the information that on one hand, does not overburden intermediaries with disclosure requirements but also, on the other hand, provides enough transparency that the conversations around platform governance can continually move forward.

Acknowledging that not all intermediaries have the infrastructure and capacity to be able to store data that may be, as Keller notes, ‘useful later’ (and also acknowledging that the idea of storing data that may be ‘useful later’ may conflict with privacy and data protection principles), does the solution lie in creating tiers of intermediaries based on their revenue generation, which might be a rough proxy for their capability of being able to handle disclosure obligations? As an intermediary moves up or down a rank/tier in revenue generation, its obligations will grow and shrink.

To Keller’s point about the transparency “budget” — may be a basis for transparency requirements could be similar to the basis for filing tax returns? Similar to how there is a base income tax percentage above which the percentage of tax to be paid increases progressively based on the amount of income earned, a helpful model may be one where a base amount of transparency requirements are compulsory, but in addition, there are added obligations for platforms that have increased transparency “budgets” (literally, in this case). In saying this, we acknowledge that this may raise more questions than it answers — for example, the first that comes to mind is this — what about the platforms that generate more revenue than some countries of the world? Should they have no upper limit for how much data they can gather to enable ‘transparency’ and necessary ‘disclosures’ simply because their “budget” allows for it? A balancing act is necessary here to ensure that access to information in a bid to ensure transparency does not harm other, equally important rights. Again, this might open more gates of questioning than it closes, but it is one way to consider that ensures access to information that can potentially inform future policies.

The Lumen team is thrilled that Keller’s thoughts in her piece have initiated dialogue around the meaning and cost of transparency among research communities because dialogue (even if only virtually at the moment), is the only way to find a solution that may serve to be both functional, and effective. Finally, the Lumen Database is grateful to be a stakeholder in this conversation and we seek and actively encourage collaborations and ideas about how we can best make use of the data powerhouse we host to help move the conversation along.

About the Author: Shreya is an Employee Fellow at the Berkman Klein Center, where she works on the Lumen Project. She is a passionate digital rights activist and uses her research and writing to raise awareness about how digital rights are human rights. She tweets at @shreyatewari96!

Keller’s Humility about Transparency: Driving the conversation forward

Written by Lumen Database Team