My opinion here is that “naked young person” isn’t as simple as other compound concepts because there are physiological changes we go through during puberty that an AI can’t reverse engineer. Something like “Italian samurai” involves concepts that occur at a surface level that it can easily understand while “naked young person” involves some components that can’t be derived simply from applying “young” to “naked person” or “naked” to “young person”.
Someone did have a valid counter argument in this subthread though: https://sh.itjust.works/comment/11713795
Well, I haven’t gone to any of my image AIs and actually asked them to generate naked pictures of young people. So unless you want to go there this will necessarily involve some degree of theoretical elements.
However, according to the article it’s possible to generate this stuff with Stable Diffusion models, and Stable Diffusion models have a negligible amount of CSAM in the training set. So short of actually doing the experiment that would seem to settle it.
I think a lot of people don’t appreciate just how surprisingly sophisticated the “world model” that these image AIs have learned is. There was a paper a while back where some researchers were trying to analyze how image generators were working internally, and they discovered that if you were to for example ask one to make a picture of a bicycle it will first come up with a depth map of the image before it starts doing anything to the visual output. That shows that the AI has figured out what the three-dimensional form of a bicycle is based entirely on a pile of two-dimensional training images, with no other clues telling it that the third dimension even exists in the first place.