Avatar

diz

diz@awful.systems
Joined
3 posts • 38 comments
Direct message

I seriously doubt he ever worked anywhere like that, not to mention that he’s too spineless to actually get in trouble IRL.

permalink
report
parent
reply

He’s such a complete moron. He doesn’t want to recite “DEI shibboleths”? What does he even think that would refer to? Why shibboleths?

To spell it out, that would refer to an antisemitic theory that the reason (for example) some black guy would get a medal of honor (the “deimedal”) is because of the jews.

I swear this guy is dumber than Trump. Trump for all his rambling, uses actual language - Trump understands what the shit he is saying means to his followers. Scott… he really does not.

permalink
report
parent
reply

I think they worked specifically on cheating the benchmarks, though. As well as popular puzzles like pre existing variants of the river crossing - it is a very large puzzle category, very popular, if the river crossing puzzle is not on the list I don’t know what would be.

Keep in mind that they are also true believers, too - they think that if they cram enough little pieces of logical reasoning, taken from puzzles, into the AI, then they will get robot god that will actually start coming up with new shit.

I very much doubt that there’s some general reasoning performance improvement that results in these older puzzle variants getting solved, while new ones that aren’t particularly more difficult, fail.

permalink
report
parent
reply

Did you use any of that kind of notation in the prompt? Or did some poor squadron of task workers write out a few thousand examples of this notation for river crossing problems in an attempt to give it an internal structure?

I didn’t use any notation in the prompt, but gemini 2.5 pro seem to always represent state of the problem after every step in some way. When asked if it does anything with it says it is “very important”, so it may be that there’s some huge invisible prompt that says its very important to do this.

It also mentioned N cannibals and M missionaries.

My theory is that they wrote a bunch of little scripts that generate puzzles and solutions in that format. Since river crossing is one of the top most popular puzzles, it would be on the list (and N cannibals M missionaries is easy to generate variants of), although their main focus would have been the puzzles in the benchmarks that they are trying to cheat.

edit: here’s one of the logs:

https://pastebin.com/GKy8BTYD

Basically it keeps on trying to brute force the problem. It gets first 2 moves correct, but in a stopped clock style manner - if there’s 2 people and 1 boat they both take the boat, if there’s 2 people and >=2 boats, then each of them takes a boat.

It keeps doing the same shit until eventually its state tracking fails, or its reading of the state fails, and then it outputs the failure as a solution. Sometimes it deems it impossible:

https://pastebin.com/Li9quqqd

All tests done with gemini 2.5 pro, I can post links if you need them but links don’t include their “thinking” log and I also suspect that if >N people come through a link they just look at it. Nobody really shares botshit unless its funny or stupid. A lot of people independently asking the same problem, that would often happen if there’s a new homework question so they can’t use that as a signal so easily.

permalink
report
parent
reply

Yeah I think the best examples are everyday problems that people solve all the time but don’t explicitly write out solutions step by step for, or not in the puzzle-answer form.

It’s not even a novel problem at all, I’m sure there’s even a plenty of descriptions of solutions to it as part of stories and such. Just not as “logical puzzles” due to triviality.

What really annoys me is when they claim high performance on benchmarks consisting of fairly difficult problems. This is basically fraud, since they know full well it is still entirely “knowledge” reliant, and even take steps to augment it with generated problems and solutions.

I guess the big sell is that it could use bits and pieces of logic gleaned from other solutions to solve a “new” problem. Except it can not.

permalink
report
parent
reply

And it is Google we’re talking about, lol. If no one uses their AI shit they just replace something people use with it (also see search).

permalink
report
parent
reply

It’s google though, if nobody uses their shit they just put it inside their search.

It’s only gonna go away when they run out of cash.

edit: whoops replied to the wrong comment

permalink
report
parent
reply

I just describe it as “computer scientology, nowhere near as successful as the original”.

The other thing is that he’s a Thiel project, different but not any more sane than Curtis Yarvin aka Moldbug. So if they heard of moldbug’s political theories (which increasingly many people heard about because of, well, them being enacted) it’s easy to give a general picture of total fucking insanity funded by thiel money. It doesn’t really matter what the particular insanity is, and it matters even less now as the AGI shit hit mainstream entirely bypassing anything Yudkowsky had to say on the subject.

permalink
report
reply

Yeah it really is fascinating. It follows some sort of recipe to try to solve the problem, like it’s trained to work a bit like an automatic algebra system.

I think they had employed a lot of people to write generators of variants of select common logical puzzles, e.g. river crossings with varying boat capacities and constraints, generating both the puzzle and the corresponding step by step solution with “reasoning” and re-printing of the state of the items on every step and all that.

It seems to me that their thinking is that successive parroting can amount to reasoning, if its parroting well enough. I don’t think it can. They have this one-path approach, where it just tries doing steps and representing state, just always trying the same thing.

What they need for this problem is to take a different kind of step, reduction (the duck can not be left unsupervised -> the duck must be taken with me on every trip -> rewrite a problem without the duck and with 1 less boat capacity -> solve -> rewrite the solution with “take the duck with you” on every trip).

But if they add this, then there’s two possible paths it can take on every step, and this thing is far too slow to brute force the right one. They may get it to solve my duck variant, but at the expense of making it fail a lot of other variants.

The other problem is that even seemingly most elementary reasoning involves very many applications of basic axioms. This is what doomed symbol manipulation “AI” in the past and this is what is dooming it now.

permalink
report
parent
reply

Not really. Here’s the chain-of-word-vomit that led to the answers:

https://pastebin.com/HQUExXkX

Note that in “its impossible” answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can’t actually choose the correct answer (a problem shared with the monkeys on typewriters).

I’d say it clearly isn’t ignoring the prompt or differences from the original river crossings. It just can’t actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.

permalink
report
parent
reply