New RSL Protocol Aims to Solve AI Data Licensing Challenges

Introducing Real Simple Licensing: A New Era for AI Data Licensing
The artificial intelligence industry is facing a pivotal moment as copyright lawsuits over unlicensed data surge. With major settlements—like Anthropic's recent $1.5 billion agreement—highlighting the risks of using unlicensed content, the need for a scalable, transparent data licensing solution has never been clearer.
What Is Real Simple Licensing (RSL)?
Real Simple Licensing (RSL) is a new protocol designed to address AI’s training data dilemma. Co-founded by Eckart Walther, one of the original creators of RSS, RSL provides a technical and legal framework for large-scale data licensing across the internet. RSL is already backed by influential web publishers such as Reddit, Quora, Yahoo, Medium, and others.
How Does RSL Work?
- Technical Layer: Through the RSL Protocol, publishers can specify licensing terms in a machine-readable format, typically in their
robots.txt
files. This enables AI companies to easily identify the conditions for using web content for training models. - Legal Layer: The RSL Collective acts as a central licensing organization, similar to ASCAP for music or MPLC for films. It negotiates terms and collects royalties on behalf of participating publishers, streamlining the process for both licensors and AI developers.
Who Is Supporting RSL?
Major publishers including Yahoo, Reddit, Ziff Davis (Mashable, CNET), Internet Brands (WebMD), People Inc., and The Daily Beast have joined the RSL Collective. Other organizations such as Fastly, Quora, and Adweek endorse the standard, even if they haven’t formally joined the royalty pool.
Some, like Reddit, are already benefiting from direct licensing deals, reportedly earning around $60 million annually from Google. RSL allows for both collective and individual agreements, giving flexibility to publishers of all sizes.
Challenges and Opportunities
Licensing music or films is straightforward compared to tracking AI’s use of web data. Determining when and how a specific piece of content is used in AI training can be complex. The RSL Protocol offers options for both per-inference and blanket licensing fees, but effective tracking will require cooperation from AI companies.
Despite these technical hurdles, RSL’s founders are optimistic. As Doug Leeds, RSL co-founder and former CEO of IAC Publishing, notes, some AI companies already provide detailed usage reports as part of their licensing agreements. The goal is not perfection, but a practical system that gets publishers compensated.
Will AI Labs Join In?
The biggest unknown is whether leading AI companies will embrace RSL. While some labs are willing to pay for high-quality datasets, the web has long been a free data source. Recent controversies, such as disputes over web scraping, underscore the need for clear, enforceable standards.
Industry leaders, including Google’s Sundar Pichai, have publicly acknowledged the need for transparent licensing protocols. The RSL team is determined to hold them to their word and move the industry toward a fairer, more sustainable model.
Looking Forward
As AI continues to advance, establishing a scalable, transparent system for data licensing is crucial for both innovation and creator rights. RSL could be the breakthrough that helps turn industry calls for fair compensation into a practical reality.
References
- TechCrunch: RSS co-creator launches new protocol for AI data licensing
- The Atlantic: Pending cases on AI data licensing
- AP News: Midjourney AI copyright lawsuit
- Lawfare: Anthropic’s copyright settlement analysis
- Wired: Dataset Providers Alliance
- Reuters: Reddit’s AI licensing deal with Google
- Cloudflare Blog: Web scraping controversies
- Dealbook Summit: Sundar Pichai’s remarks on AI data licensing