IMO theyβre way too much fixated on making a single model AGI.
Some people tried to combine multiple specialized models (voice recognition + image recognition + LLM, + controls + voice synthesis) to get quite compelling results.
Iβm just impressed how snappy it was, I wish he had the ability to let it listen longer without responding right away though.
80% time sheβs just a bot, but there are these flashes of brilliance that makes me think weβre closer to general purpose intelligence than we think
And this is just one dude using commercially available tooling. Well funded company could do infinitely better, if they were willing to give up some of the political correctness when training the model
EDIT: When he removed the word filter last time it got really hilarious quickly
What I am 100% certain of, because humanity is terrible, is that if a true AI is created that fact will be ignored for being inconvenient to profit seeking.
If youβre the programmer, itβs not hard to use a key press to enable TTS and then send it in chunks. I made a very similar version of this project, but my GPU didnβt stream the responses nearly as seamlessly.