That’s where the almost comes in. Unfortunately, there are many traps for the unwary stochastic parrot.
Training a neural net can be seen as a generalized regression analysis. But that’s not where it comes from. Inspiration comes mainly from biology, and also from physics. It’s not a result of developing better statistics. Training algorithms, like Backprop, were developed for the purpose. It’s not something that the pioneers could look up in a stats textbook. This is why the terminology is different. Where the same terms are used, they don’t mean quite the same thing, unfortunately.
Many developments crucial for LLMs have no counterpart in statistics, like fine-tuning, RLHF, or self-attention. Conversely, what you typically want from a regression - such as neatly interpretable parameters with error bars - is conspicuously absent in ANNs.
Any ideas you have formed about LLMs, based on the understanding that they are just statistics, are very likely wrong.
“such as neatly interpretable parameters”
Hahaha, hahahahahaha.
Hahahahaha.
If parameters aren’t neatly interpretable then it’s bad statistics. You’ve learned nothing about the general structure of the data.
Linear regression models are often great tools for explaining the structure of the data. You can directly see which parts of the input are more important for determining the output. You have very little of that when using neural networks with more than 1 hidden layer.
“If parameters aren’t neatly interpretable then it’s bad statistics.”
Haha, keep going guys. You obviously know a lot about statistics.