April 19, 2024
AI Dataset Withdrawn Over Child Abuse Discovery
AI

AI Dataset Withdrawn Over Child Abuse Discovery

A popular artificial intelligence data set, LAION-5B, utilized in training various text-to-image generators such as Stable Diffusion and Imagen, has been withdrawn by its creator following the revelation that it contained numerous instances of suspected child sexual abuse material (CSAM). LAION, or Large-scale Artificial Intelligence Open Network, is a German nonprofit organization renowned for developing open-source AI models and datasets.

According to a report released on December 20 by researchers from the Stanford Internet Observatory’s Cyber Policy Center, 3,226 instances of suspected CSAM were identified in the LAION-5B dataset. David Thiel, the Big Data Architect and Chief Technologist at Stanford Cyber Policy Center, highlighted that a significant portion of these instances was confirmed as CSAM by third parties.

Thiel underscored that the existence of CSAM in the dataset may not significantly alter the models’ output, but it could wield influence. He clarified, “While the amount of CSAM present does not necessarily indicate that the presence of CSAM drastically influences the output of the model above and beyond the model’s ability to combine the concepts of sexual activity and children, it likely does still exert influence.” Thiel’s remarks highlight the nuanced impact of child sexual abuse material on the models trained using the dataset, acknowledging its potential to influence despite not causing a drastic shift in output.

He also expressed concern about the presence of repeated identical instances of CSAM, noting that it is problematic, particularly due to its reinforcement of images of specific victims.

The LAION-5B dataset, introduced in March 2022, comprises 5.85 billion image-text pairs, according to LAION. In response to the findings, LAION has taken a precautionary measure by removing both the LAION-5B and LAION-400M datasets to ensure their safety before any future republication.

In a statement, LAION acknowledged the removal of the datasets, stating it was done “out of an abundance of caution” and emphasizing the commitment to ensuring their safety before reintroducing them.

Image By WangXiNa

Related posts

Blockchain and AI Advance Research for Extending Human Lifespan

Henry Clarke

Meta to Take Action Against AI Misuse ahead of EU Parliament Elections

Bran Lopez

Oracle’s New AI for Finance and Supply Chain

Robert Paul

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More