Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

The famous artificial intelligence models currently in circulation are trained using thousands and thousands of pieces of information from online datasets, composed of content found in the vast world of the Internet - without any kind of filtering. One of the most widely used datasets is LAION-5B, created by the non-profit company of the same name and favoured by models such as Stable Diffusion.

Following a report by researchers at Stanford University, the presence of child pornography material (CSAM) was confirmed within the dataset in question - more than 1,000 images depicting what was mentioned thus form part of an information pool used daily by thousands of users.

This news should come as no surprise. Researchers on AI ethics have long warned that the enormous scale of AI datasets makes it, in fact, impossible to filter them or verify the AI models that use them.