LLMs average <5% on 2025 Math Olympiad; award each other 20x points(arxiv.org)

posted 2 months ago

slop_as_a_service@awful.systems

techtakes@awful.systems

44 commentshide report

“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

bitofhope@awful.systems

22 points

2 months ago

Essentially they do not simply predict the next token

looks inside

it’s predicting the next token

permalink

report

parent

[ - ]

froztbyte@awful.systems

15 points

2 months ago

every time I read these posters it’s in that type of the Everyman characters in the discworld that say some utter lunatic shit and follow it up with “it’s just [logical/natural/obvious/…]”

permalink

report

parent

[ - ]

o7___o7@awful.systems

9 points

2 months ago

Stands to reason

permalink

report

parent

[ - ]

Pennomi@lemmy.world

-11 points

2 months ago

Read the paper, it’s not simply predicting the next token. For instance, when writing a rhyming couplet, it first plans ahead on what the rhyme is, and then fills in the rest of the sentence.

The researchers were surprised by this too, they expected it to be the other way around.

permalink

report

parent

[ - ]

bitofhope@awful.systems

18 points

2 months ago

Oh, sorry, I got so absorbed into reading the riveting material about features predicting state name tokens to predict state capital tokens I missed that we were quibbling over the word “next”. Alright they can predict tokens out of order, too. Very impressive I guess.

permalink

report

parent

[ - ]

froztbyte@awful.systems

15 points

2 months ago