“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”
Essentially they do not simply predict the next token
looks inside
it’s predicting the next token
every time I read these posters it’s in that type of the Everyman characters in the discworld that say some utter lunatic shit and follow it up with “it’s just [logical/natural/obvious/…]”
Read the paper, it’s not simply predicting the next token. For instance, when writing a rhyming couplet, it first plans ahead on what the rhyme is, and then fills in the rest of the sentence.
The researchers were surprised by this too, they expected it to be the other way around.