archive https://archive.ph/is57b
You are viewing a single thread.
View all comments 0 points
Though making an unreliable intern is amazing and was impossible 5 years ago…
0 points
0 points
I mean, it’s not shit at everything; it can be quite useful in the right context (GitHub Copilot is a prime example). Still, it doesn’t surprise me that these first-party LLM benchmarks are full of smoke and mirrors.