Test-time training (TTT) significantly enhances language models’ abstract reasoning, improving accuracy up to 6x on the Abstraction and Reasoning Corpus (ARC). Key factors for successful TTT include initial fine-tuning, auxiliary tasks, and per-instance training. Applying TTT to an 8B-parameter model boosts accuracy to 53% on ARC’s public validation set, nearly 25% better than previous public, neural approaches. Ensemble with recent program generation methods achieves 61.9% accuracy, matching average human scores. This suggests that, in addition to explicit symbolic search, test-time training on few-shot examples significantly improves abstract reasoning in neural language models.

You are viewing a single thread.
View all comments

achieving similar statistical accuracy when training off of large datasets which probably have the answers to a lot of the parts of these benchmarks doesn’t seem too impressive

permalink
report
reply

The training datasets don’t have the answers because the benchmark is diverse enough. That’s why other models struggled to perform as well as humans until they applied the approach outlined in the paper. This is the benchmark: https://liusida.github.io/ARC/

permalink
report
parent
reply

technology

!technology@hexbear.net

Create post

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

  • 1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
  • 2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
  • 3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
  • 4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
  • 5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
  • 6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
  • 7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

Community stats

  • 1.3K

    Monthly active users

  • 1.5K

    Posts

  • 19K

    Comments