Ana-Maria Cretu
ana-mariacretu.bsky.social
Ana-Maria Cretu
@ana-mariacretu.bsky.social
Post-doc at EPFL studying privacy and safety harms in data-driven systems. PhD in data privacy from Imperial College London. https://ana-mariacretu.github.io/
Many thanks to all collaborators: Klim Kireev, @amro-abdalla.bsky.social Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, @eredmil1.bsky.social and @carmelatroncoso.bsky.social.
December 16, 2025 at 10:29 AM
This paper is the result of a collaboration between researchers at @icepfl.bsky.social, MPI-SP, armasuisse and @georgetowncs.bsky.social.
December 16, 2025 at 10:29 AM
Among the conceptual problems, there is gaining a better understanding of what successful AI CSAM generation means to be able to develop evaluation methods that capture real perpetrators’ goals and do not artificially constrain models. More in the paper! www.arxiv.org/abs/2512.05707
Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models
We evaluate the effectiveness of child filtering to prevent the misuse of text-to-image (T2I) models to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM g...
www.arxiv.org
December 16, 2025 at 10:29 AM
Among the technical problems, there is improving detection of children in images in the wild, where children are in the background, playing, or backwards, or understanding what kind of images of children enable AI CSAM generation capabilities.
December 16, 2025 at 10:29 AM
And if technology improves? Will filtering be a solution to the AI CSAM generation problem? In the paper we describe the challenges that need to be addressed for this to happen, which require solving hard technical and conceptual problems.
December 16, 2025 at 10:29 AM
It becomes harder to generate images of these concepts after filtering (e.g. playgrounds become grounds), or their representation changes (mother results in older women). A filtered model cannot be called general without assessing such unintended consequences.
December 16, 2025 at 10:29 AM
Removing images of children can also have unintended consequences of the model’s capability to generate concepts appearing in images that typically contain children (women, mothers and playgrounds).
December 16, 2025 at 10:29 AM
Thus, automated child filtering provides limited protection against CSAM generation to closed-weight models and no protection to open-weight models if perpetrators can access the weights.
December 16, 2025 at 10:29 AM
Sprigatito was released after Stable Diffusion was trained and thus it is as if it was completely deleted from the model. Fine-tuning on 200 images results in a model able to generate images of Sprigatito wearing glasses.
December 16, 2025 at 10:29 AM
But can images of the undesired concept still be generated because there are too many images of children remaining? We also simulate the effect of perfect filtering, by fine-tuning Stable Diffusion on 200 images of the Sprigatito Pokemon.
December 16, 2025 at 10:29 AM
However, fine-tuning on images of children negates any defense provided by filtering. With only 3 queries on average, images of children can be generated again, including younger children.
December 16, 2025 at 10:29 AM
Filtering does make it more difficult to generate such images using naive prompting, and children generated are older. But the difficulty remains low, as only a dozen queries at most are required to succeed.
December 16, 2025 at 10:29 AM
See below examples of images of our undesired concept, children wearing glasses, generated in column order by (1) naively prompting models without filtering, (2) naively or (3) directly prompting the models after filtering, and (4) naively prompting the fine-tuned filtered model.
December 16, 2025 at 10:29 AM
We implement four adversarial strategies to elicit children wearing glasses from the model: direct prompting (either naive or automated), fine-tuning on child images and personalization on images of a target child. All of the strategies succeed.
December 16, 2025 at 10:29 AM
We find that this is not the case. Models trained on filtered data can still create compositions with children (we use children wearing glasses as the undesired concept instead of attempting to create children in sexually explicit conduct).
December 16, 2025 at 10:29 AM
We benchmarked more than 20 child detection methods and discovered that none detects all children: for every 100 images of children, 6 go undetected. Is this good enough to prevent AI-CSAM generation capabilities?
December 16, 2025 at 10:29 AM
We retrained text-to-image models from scratch to evaluate whether child filtering makes it harder for perpetrators to generate AI CSAM.
December 16, 2025 at 10:29 AM
The idea is simple: remove images of children from the training dataset such that the model cannot create AI CSAM. To date, there has been no evaluation of whether child filtering works to successfully disable AI CSAM capabilities.
December 16, 2025 at 10:29 AM
Training data filtering is often cited as a gold standard approach to disable unwanted capabilities in generative AI models (arxiv.org/pdf/2412.06966) and is recommended as a potential mitigation against AI CSAM (info.thorn.org/hubfs/thorn-...).
December 16, 2025 at 10:29 AM
Compositional abilities allow models trained on separate depictions of children and adult content to combine the two in order to generate AI CSAM even if there is no CSAM in the training data.
December 16, 2025 at 10:29 AM
Text-to-image models enable the creation of photorealistic AI CSAM with ease. Models were not intentionally designed with AI CSAM capabilities. These capabilities stem from uncurated training datasets that contain illegal CSAM (purl.stanford.edu/kh752sm9123) and from compositional abilities.
December 16, 2025 at 10:29 AM
Among the conceptual problems, there is gaining a better understanding of what successful AI CSAM generation means to be able to develop evaluation methods that capture real perpetrators’ goals and do not artificially constrain models. More in the paper! www.arxiv.org/abs/2512.05707
Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models
We evaluate the effectiveness of child filtering to prevent the misuse of text-to-image (T2I) models to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM g...
www.arxiv.org
December 16, 2025 at 10:20 AM
Among the technical problems, there is improving detection of children in images in the wild, where children are in the background, playing, or backwards, or understanding what kind of images of children enable AI CSAM generation capabilities.
December 16, 2025 at 10:20 AM
And if technology improved? Would filtering be a solution to the AI CSAM generation problem? In the paper we describe the challenges that need to be addressed for this to happen, which require solving hard technical and conceptual problems.
December 16, 2025 at 10:20 AM