Kay mate, rational thought 101:
When the setup is “we run each query multiple times” the default position is that it costs more resources. If you claim they use roughly the same amount you need to substantiate that claim.
Like, that sounds like a pretty impressive CS paper, “we figured out how to run inference N times but pay roughly the cost of one” is a hell of an abstract.
“this thing takes more time and effort to process queries, but uses the same amount of computing resources” <- statements dreamed up by the utterly deranged.
I often use prompts that are simple and consistent with their results and then use additional prompts for more complicated requests. Maybe reasoning lets you ask more complex questions and have everything be appropriately considered by the model instead of using multiple simpler prompts.
Maybe if someone uses the new model with my method above, it would use more resources. Im not really sure. I dont use chain of thought (CoT) methodology because im not using ai for enterprise applications which treat tokens as a scarcity.
Was hoping to talk about it but i dont think im going to find that here.