Did nobody really question the usability of language models in designing war strategies?
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
However, this only worked for a model trained on a synthetic dataset of games uniformly sampled from the Othello game tree. They tried the same techniques on a model trained using games played by humans and had poor results. To me, this seemed like a major caveat to the findings of the paper which may limit its real world applicability. We cannot, for example, generate code by uniformly sampling from a code tree.
Author later discusses training on you data versus general datasets.
I am out of my depth, but does not seem to provide strong evidence for the modem not just repeating information that shows up a lot for the given inputs.