JFranek
I’m wondering about the benchmark too. It’s way above my level to figure out how it can be gamed. But, buried in the article:
Moreover, ARC-AGI-1 is now saturating – besides o3’s new score, the fact is that a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval.
The most expensive o3 version achieved 87.5%
Man I don’t need to be reminded of the sorry state of meat alternatives.
It’s bitterly funny to me that fashoid governments started banning cultivated meat as if the economic and technical issues weren’t enough. Ignorants terrified of threats they made up in their head as always.
The promptfans testing OpenAI Sora have gotten mad that it’s happening to them and (temporarily) leaked access to the API.
https://techcrunch.com/2024/11/26/artists-appears-to-have-leaked-access-to-openais-sora/
“Hundreds of artists provide unpaid labor through bug testing, feedback and experimental work for the [Sora early access] program for a $150B valued [sic] company,” the group, which calls itself “Sora PR Puppets,” wrote in a post …
“Well, they didn’t compensate actual artists, but surely they will compensate us.”
“This early access program appears to be less about creative expression and critique, and more about PR and advertisement.”
OK, I could give them the benefit of the doubt: maybe they’re new to the GenAI space, or general ML Space … or IT.
But I’m not going to. Of course it’s about PR hype.
That article gave me a whiplash. First part: pretty cool. Second part: deeply questionable.
For example these two paragraphs from sections ‘problem with code’ and ‘magic of data’:
“Modular and interpretable code” sounds great until you are staring at 100 modules with 100,000 lines of code each and someone is asking you to interpret it.
Regardless of how complicated your program’s behavior is, if you write it as a neural network, the program remains interpretable. To know what your neural network actually does, just read the dataset
Well, “just read the dataset bro” sound great sounds great until you are staring at a dataset with 100 000 examples and someone is asking you to interpret it.
Yeah, neural network training is notoriously easy to reproduce /s.
Just few things can affect results: source data, data labels, network structure, training parameters, version of training script, versions of libraries, seed for random number generator, hardware, operating system.
Also, deployment is another can of worms.
Also, even if you have open source script, data and labels, there’s no guarantee you’ll have useful documentation for either of these.