sampling a fraction of another person’s imagery or written work.
So citing is a copyright violation? A scientific discussion on a specific text is a copyright violation? This makes no sense. It would mean your work couldn’t build on anything else, and that’s plain stupid.
Also to your first point about reasoning and advanced collage process: you are right and wrong. Yes an LLM doesn’t have the ability to use all the information a human has or be as precise, therefore it can’t reason the same way a human can. BUT, and that is a huge caveat, the inherit goal of AI and in its simplest form neural networks was to replicate human thinking. If you look at the brain and then at AIs, you will see how close the process is. It’s usually giving the AI an input, the AI tries to give the desired output, them the AI gets told what it should have looked like, and then it backpropagates to reinforce it’s process. This already pretty advanced and human-like (even look at how the brain is made up and then how AI models are made up, it’s basically the same concept).
Now you would be right to say “well in it’s simplest form LLMs like GPT are just predicting which character or word comes next” and you would be partially right. But in that process it incorporates all of the “knowledge” it got from it’s training sessions and a few valuable tricks to improve. The truth is, differences between a human brain and an AI are marginal, and it mostly boils down to efficiency and training time.
And to say that LLMs are just “an advanced collage process” is like saying “a car is just an advanced horse”. You’re not technically wrong but the description is really misleading if you look into the details.
And for details sake, this is what the paper for Llama2 looks like; the latest big LLM from Facebook that is said to be the current standard for LLM development: