When Creative Commons (CC) introduced its suite of open content licenses two decades ago, it changed how creators shared their work online. CC licenses offered a clear, permissive framework: share freely, remix legally, and credit responsibly. That model helped build the digital commons we rely on today. But the explosion of AI (specifically, large models trained on massive swaths of public content) has shaken things up.
This week, CC announced CC Signals, a framework designed to help content stewards and data holders express how they want their works used in AI training. The aim is not to restrict AI, but to redefine the norms governing it, building a new social contract for machines that mirrors the reciprocity and openness that CC licenses originally fostered.
A Framework for AI-Era Permissions
CC Signals wants to allow creators and data holders to express preferences for how their work should, or should not, be used by AI systems. These “signals” can be embedded into content in human- and machine-readable ways, providing indicators of acceptable use cases. Unlike licenses, which are grounded in enforceable intellectual property rights, signals may be normative or legal, depending on jurisdiction and context, and designed to be interoperable, flexible, and voluntary: prioritizing ethics and collective practice over enforcement.
Anna Tumadóttir, CEO of Creative Commons, summarized the ethos behind the effort as “We give, we take, we give again.” The goal isn’t to cordon off knowledge, but to sustain openness by fostering responsible reuse. CC’s designers want to incentivize good behavior rather than punish bad actors.
Tensions with Enforcement, Feasibility, and AI Scale
Still in alpha, CC Signals is an aspirational solution in a messy, rapidly evolving space. Its effectiveness depends on one key variable: collective adoption. Without widespread recognition from platforms, developers, and aggregators, these signals may carry little practical weight. “This only works if enough people agree to play by the same rules,” Tumadóttir admtted.
Even if adopted broadly, the framework faces another challenge: how to detect and enforce preferences in an era of indirect reuse. Once content is absorbed into an AI training set, there’s often no residual fingerprint. A model can generate a sentence that reflects a training source without quoting or even closely mimicking it. This makes enforcement nearly impossible for individual creators and institutions, especially when major AI developers decline to disclose their training data or model internals.
Plus, while CC Signals offers a new vocabulary of intent, it’s unclear whether major AI companies, who’ve so far operated with minimal transparency, will respect or implement these signals meaningfully. As of now, some developers have even circumvented traditional mechanisms like `robots.txt` or site-specific terms of use.
The real risk, CC argues, is enclosure. In response to unrestricted scraping, many publishers and platforms are withdrawing from the open web: putting up paywalls, erecting CAPTCHA barriers, or blocking crawlers altogether. According to Cloudflare, over 84% of its hosted sites now restrict AI access. This pullback threatens the commons CC helped build.
Who Stands to Benefit, and Who Should Be Cautious
Academic repositories, digital libraries, media archives, and nonprofits are among the stakeholders most likely to benefit from CC Signals. These institutions are often mission-aligned with the idea of open access but face increasing pressure to protect their content from exploitation. The framework offers them a middle ground: stay open, but on their terms.
Individual creators, on the other hand, may face more limitations. Expressing a CC Signal requires technical knowledge or integration by platforms. For now, there’s no widespread toolset for automatically attaching signals to blog posts, images, or audio content at scale. Moreover, signals alone offer no guarantee of compliance, and without visibility into model training pipelines, creators may never know if their preferences are being honored.
Even for large-scale repositories, the incentives for compliance by AI companies remain limited. Signals rely on social norms, not legal mandates. And because AI outputs are often abstracted from their inputs, end users may consume generated results without ever encountering the original work, undermining both recognition and revenue.
Still, CC sees this as a necessary step. In the absence of enforceable global norms or binding legislation, establishing common expectations and creating pressure through visibility may be the most pragmatic option available.
Importantly, CC Signals is not an extension of copyright. In fact, Creative Commons explicitly rejects the idea of expanding copyright law to control AI training. As the organization sees it, that path would do more harm than good: limiting access to facts and educational content, and further consolidating control in the hands of those who can afford complex legal negotiations or enforcement.
Instead, CC Signals is about creating shared norms. It draws inspiration from the informal “robots.txt” protocol that once governed how crawlers behaved, with the hope of setting new standards for how machines and their makers treat public knowledge today.
Creative Commons is currently collecting public feedback and expects to launch a prototype in November 2025. The effort being part of a larger push by public interest organizations to find scalable, interoperable mechanisms for regulating machine use of open data.
Tumadóttir noted, “This is not about creating new property rights, it’s more like defining manners for machines.”