Having dealt with ML engineers in depth before, American tech companies tend to throw blank checks their way which, combined with them not tending to have backgrounds in optimization or infrastructure, means they spin up 8 billion GPU instances in the cloud and use 10% of them ever because engineers are lazy.
They could, without any exaggeration, reduce their energy consumption by a factor of ten with about two weeks of honest engineering work. Yes this bothers the fuck out of me.
That’s my biggest gripe with mainstream closed source AI, they can optimize some of their most powerful MLAs to run on a potato but… they don’t. And they’ll never open source becaue it’d be forked by people who are genuinely passionate about improvement.
AKA they’d be run outta business in no time.