AI's free web scraping days may be over, thanks to this new licensing protocol.

  • 2025-10-03 08:00:00
  • ZDNet

The endless training of countless artificial intelligence models obviously requires huge amounts of data and content to feed these virtual monsters. To obtain enough “feed”, therefore, the companies responsible for training are involved in scraping the vast world of the internet, scraping the depths of platforms, websites and similar places in search of useful data, information and material.

How can we deal with such a breach? Several major publishers and technology companies, including Reddit, Yahoo and Medium, have recently developed a solution that could prove phenomenal: the Really Simple Licensing (RSL) standard, a sort of younger and more robust sibling of Really Simple Syndication (RSS).

While the latter deals with syndication, and therefore the dissemination of words, stories and videos within the web, the former acts as a sort of gatekeeper, a ticket office through which one must pass to gain access. In essence, RSL adds machine-readable licence terms.