Gemini 2.5 "reasoning", no real improvement on river crossings.

posted 18 days ago

So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

froztbyte@awful.systems

18 points

17 days ago

oh look it’s a loadbearing “just” in the wild. better hope you can shore that fucker up with some facts

Try writing it out in ASCII

my poster in christ, what in the fuck are you on about. stop prompting LLMs and go learn some things instead

some other weird way that it hasn’t been specifically trained on and I bet it actually performs better

“no no see, you just need to prompt it different. just prompt it different bro it’ll work bro I swear bro”

god, every fucking time

permalink

report

parent

[ - ]

Sailor Sega Saturn@awful.systems

10 points

17 days ago

All along my mistake was that I was prompting it in unicode instead of latin1, alphameric BCD, or “modified UTF-8”.

permalink

report

parent

[ - ]

froztbyte@awful.systems

7 points

17 days ago

I thought everyone knew that you had to structure prompts in ALGOL 420 to get the best performance by going close to the metal

permalink

report

parent

[ - ]

bitofhope@awful.systems

8 points

17 days ago

I use UTF-9 to efficiently handle Unicode on my PDP-10.

permalink

report

parent

[ - ]

Charlie Stross@wandering.shop

6 points

17 days ago

@bitofhope @techtakes Surely you need a PDP-9 for that?

permalink

report

parent

Show more comments

[ - ]

BurgersMcSlopshot@awful.systems

8 points

16 days ago

Well has anyone tried prompting it in EBCDIC? How do we know doing so won’t immediately create the super intelligence that "or whatever"s us to silicon Valhalla? Asking for a friend.

permalink

report

parent

[ - ]

froztbyte@awful.systems

5 points

16 days ago

you know, I was briefly considering trying to, and I figured you’d probably have to be forcing it by content escaping tricks or something (at least I presume their APIs will do basic type-checking…)

got other yaks to do atm tho

permalink

report

parent

TechTakes

!techtakes@awful.systems

Create post

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Community stats

1.3K
Monthly active users
694
Posts
16K
Comments

Community moderators

David Gerard@awful.systems