LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI.

Nowadays, using copyrighted content as material to train artificial intelligence services is, unfortunately, not particularly surprising. Following the rapid and frightening rise of this technology, every company interested in the sector seems willing to do anything to secure a place in the AI arena, whether it be chatbots, LLMs, or audiovisual content generators.

Meta, as a social media giant, is no exception. According to Drop Site News, a non-profit investigative news agency, the company in question has collected endless amounts of data from countless sites, scraping news outlets, educational platforms, personal blogs, and, sadly, revenge porn sites. This latest addition highlights a shameful aspect of the whole affair: Meta is training its AI using illegal content, obtained against the will of the people depicted in it.

Drop Site News recently obtained a list of sites infringed by Meta, including news outlets and platforms known for protecting themselves against precisely this kind of violation. This means that Meta's bots have been programmed specifically to bypass anti-scraping protections.