Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’::Experts are starting to doubt it, and even OpenAI CEO Sam Altman is a bit stumped.

85 points

“AI” are just advanced versions of the next word function on your smartphone keyboard, and people expect coherent outputs from them smh

permalink
report
reply
25 points

Seriously. People like to project forward based on how quickly this technological breakthrough came on the scene, but they don’t realize that, barring a few tweaks and improvements here and there, this is it for LLMs. It’s the limit of the technology.

It’s not to say AI can’t improve further, and I’m sure that when it does, it will skillfully integrate LLMs. And I also think artists are right to worry about the impact of AI on their fields. But I think it’s a total misunderstanding of the technology to think the current technology will soon become flawless. I’m willing to bet we’re currently seeing it at 95% of its ultimate capacity, and that we don’t need to worry about AI writing a Hollywood blockbuster any time soon.

In other words, the next step of evolution in the field of AI will require a revolution, not further improvements to existing systems.

permalink
report
parent
reply
7 points

I’m willing to bet we’re currently seeing it at 95% of its ultimate capacity

For free? On the internet?

After a year or two of going live?

permalink
report
parent
reply
2 points

It depends on what you’d call a revolution. Multiple instances working together, orchestrating tasks with several other instances to evaluate progress and provide feedback on possible hallucinations, connected to services such as Wolfram Alpha for accuracy.

I think the whole orchestration network of instances could functionally surpass us soon in a lot of things if they work together.

But I’d call that evolution. Revolution would indeed be a different technique that we can probably not imagine right now.

permalink
report
parent
reply

It is just that everyone now refers to LLMs when talking about AI even though it has sonmany different aspects to it. Maybe at some point there is an AI that actually understands the concepts and meanings of things. But that is not learned by unsupervised web crawling.

permalink
report
parent
reply
12 points

It is possible to get coherent output from them though. I’ve been using the ChatGPT API to successfully write ~20 page proposals. Basically give it a prior proposal, the new scope of work, and a paragraph with other info it should incorporate. It then goes through a section at a time.

The numbers and graphics need to be put in after… but the result is better than I’d get from my interns.

I’ve also been using it (google Bard mostly actually) to successfully solve coding problems.

I either need to increase the credit I giver LLM or admit that interns are mostly just LLMs.

permalink
report
parent
reply
1 point

Are you using your own application to utilize the API or something already out there? Just curious about your process for uploading and getting the output. I’ve used it for similar documents, but I’ve been using the website interface which is clunky.

permalink
report
parent
reply
2 points

Just hacked together python scripts.

Pip install openapi-core

permalink
report
parent
reply
1 point

I recently asked it a very specific domain architecture question about whether a certain application would fit the need of a certain business application and the answer was very good and showed both a good understanding of architecture, my domain and the application.

permalink
report
parent
reply
6 points

So is your brain.

Relative complexity matters a lot, even if the underlying mechanisms are similar.

permalink
report
parent
reply
4 points

In the 1980s, Racter was released and it was only slightly less impressive than current LLMs only because it didn’t have an Internet’s worth of data it was trained on, but it could still write things like:

Bill sings to Sarah. Sarah sings to Bill. Perhaps they will do other dangerous things together. They may eat lamb or stroke each other. They may chant of their difficulties and their happiness. They have love but they also have typewriters. That is interesting.

If anything, at least that’s more entertaining than what modern LLMs can output.

permalink
report
parent
reply
54 points

Yet I’ve still seen many people clamoring that we won’t have jobs in a few years. People SEVERELY overestimate the ability of all things AI. From self driving, to taking jobs, this stuff is not going to take over the world anytime soon

permalink
report
reply
35 points
*

Idk, an ai delivering low quality results for free is a lot more cash money than paying someone an almost living wage to perform a job with better results. I think corporations won’t care and the only barrier will be whether or not the job in question involves enough physical labor to be performed by an ai or not.

permalink
report
parent
reply
17 points

They already do this. With chat bots and phone trees. This is just a slightly better version. Nothing new

permalink
report
parent
reply
7 points

Right, but that’s the point right? This will grow and more jobs will be obsolete because of the amount of work ai can generate. It won’t take over every job. I think most people will use AI as a tool at the individual level, but companies will use it to gut many departments. Now they would just need one editor to review 20 articles instead of 20 people to write said articles.

permalink
report
parent
reply
12 points

AI isn’t free. Right now, an LLM takes a not-insignificant hardware investment to run and a lot of manual human labor to train. And there’s a whole lot of unknown and untested legal liability.

Smaller more purpose-driven generative AIs are cheaper, but the total cost picture is still a bit hazy. It’s not always going to be cheaper than hiring humans. Not at the moment, anyway.

permalink
report
parent
reply
6 points

Compared to human work though, AI is basically free. I’ve been using the GPT3.5-turbo API in a custom app making calls dozens of times a day for a month now and I’ve been charged like 10 cents. Even minimum wage humans cost tens of thousands of dollars* per year*, thats a pretty high price that will be easy to undercut.

Yes, training costs are expensive, hardware is expensive, but those are one time costs. Once trained, a model can be used trillions of times for pennies, the same can’t be said of humans

permalink
report
parent
reply
8 points
*

The problem is that these things never hit a point of competition with humans, they’re either worse than us, or they blow way past us. Humans might drive better than a computer right now, but as soon as the computer is better than us it will always be better than us. People doubted that computers would ever beat the best humans at chess, or go, but within a lifetime of computers being invented they blew past us in both. Now they can write articles and paint pictures, sure we’re better at it for now, but they’re a million times faster than us, and they’re making massive improvements month over month. you and I can disagree on how long it’ll take for them to pass us, but once they do they’ll replace us completely, and the world will never be the same.

permalink
report
parent
reply
5 points

To be fair, in my experience AI chatbots currently provide me with more usable results in 15 minutes than some junior employees in a day. With less interaction and less conversational struggles (like taking your junior’s emotional state into account while still striving for perfection ;)).

And that’s not meant as disrespect to these juniors.

permalink
report
parent
reply
4 points

Yeah it’s pretty weird just how many people are freaking out. The pace ai has been improving is impressive, but it’s still super janky and extremely limited.

People are letting they’re imaginations run wild about the future of ai without really looking into how these ao are trained, how they function, their limitations, and the hardware and money it takes to run them.

permalink
report
parent
reply
45 points

In my limited experience the issue is often that the “chatbot” doesn’t even check what it says now against what it said a few paragraphs above. It contradicts itself in very obvious ways. Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously? Or a check to ensure recipes are edible (for this specific application)? A bit like those physics informed NN.

permalink
report
reply
42 points
*

That’s called context. For chatgpt it is a bit less than 4k words. Using api it goes up to a bit less of 32k. Alternative models goes up to a bit less than 64k.

Model wouldn’t know anything you said before that

That is one of the biggest limitations of current generation of LLMs.

permalink
report
parent
reply
3 points

Thats not 100% true. they also work by modifying meanings of words based on context and then those modified meanings propagate indefinitely forwards. But yes, direct context is limited so things outside it arent directly used.

permalink
report
parent
reply
1 point

They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

permalink
report
parent
reply
5 points
*

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

Maybe, but it might not be that simple. The issue is that one would have to design that logic in a manner that can be verified by a human. At that point the logic would be quite specific to a single task and not generally useful at all. At that point the benefit of the AI is almost nil.

permalink
report
parent
reply
1 point

And if there were an algorithm that was better at determining what was or was not the goal, why is that algorithm not used in the first place?

permalink
report
parent
reply
3 points

They do keep context to a point, but they can’t hold everything in their memory, otherwise the longer a conversation went on the slower and more performance intensive doing that logic check would become. Server CPUs are not cheap, and ai models are already performance intensive.

permalink
report
parent
reply
3 points

Contradicting itself? Not staying consistent? Looks like it’s passed the Turing test to me. Seems very human.

permalink
report
parent
reply
1 point

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

You, in your “limited experience” pretty much exactly described the fix.

The problem is that most of the applications right now of LLMs are low hanging fruit because it’s so new.

And those low hanging fruit examples are generally adverse to 2-10x the query cost in both time and speed just to fix things like jailbreaking or hallucinations, which is what multiple passes, especially with additional context lookups, would require.

But you very likely will see in the next 18 months multiple companies being thrown at exactly these kinds of scenarios with a focus for more business critical LLM integrations.

To put it in perspective, this is like people looking at AIM messenger back in the day and saying that the New York Times has nothing to worry about regarding the growth of social media.

We’re still very much in the infancy of this technology in real world application, and because of that infancy, a lot of the issues present that aren’t fixable inherent to the core product don’t yet have mature secondary markets around fixing those shortcomings yet.

So far, yours was actually the most informed comment in this thread I’ve seen - well done!

permalink
report
parent
reply
2 points

Thanks! And thanks for your insights. Yes I meant that my experience using LLM is limited to just asking bing chat questions about everyday problems like I would with a friend that “knows everything”. But I never looked at the science of formulating “perfect prompts” like I sometimes hear about. I do have some experience in AI/ML development in general.

permalink
report
parent
reply
41 points

People make a big deal out of this but they forget humans will make shit up all the time.

permalink
report
reply
29 points

Yeah but humans can use critical thinking, even on themselves when they make shit up. I’ve definitely said something and then thought to myself “wait that doesn’t make sense for x reason, that can’t be right” and then I research and correct myself.

AI is incapable of this.

permalink
report
parent
reply
7 points

We think in multiple passes though, we have system 1 that thinks fast and makes mistakes, and we have a system 2 that works slower and thinks critically about the things going on in our brain, that’s how we correct ourselves. ChatGPT works a lot like our system 1, it goes with the most likely response without thinking, but there’s no reason that it can’t be one part of a multistep system that has self analysis like we do. It isn’t incapable of that, it just hasn’t been built yet

permalink
report
parent
reply
3 points

Exactly, if you replicate this behaviour with a “system 2 AI” correcting the main one it will probably give similar results as most of us.

Heck you can eventually have 5 separate AIs discussing things out for you and then presenting the answer, at top speed.

It will never be perfect, but it will outmatch humans soon enough.

permalink
report
parent
reply
6 points

Can’t do this YET one method to reduce this could be to: create a response to query, then before responding to the human, check if answer is insane by querying a separate instance trained slightly differently…

Give it time. We will get past this.

permalink
report
parent
reply
3 points

We will need an entirely different type of AI that functions on an inherently different structure to get past this hurdle, but yes I do agree it will eventually happen.

permalink
report
parent
reply
3 points

You’re just being victim of your own biases. You only notice that was the case when you were successful in Detecting your hallucinations. You wouldn’t know if you made stuff up by accident and nobody noticed, not even you.

Whereas we are checking 100% of th AI responses, do we check 100% of our responses?

Sure it’s not the same thing or AI might do more, but the problem is your example. Where people think they are infallible because of their biases. when it’s not the case at all. We are imperfect, and we overlook our shortcomings possibly foregoing a better solution because of this. Because we measure the AI objectively, but we don’t measure what we compare it to.

permalink
report
parent
reply
2 points

I never said we always question ourselves I just said that AI can’t so your entire reply doesn’t apply here

permalink
report
parent
reply
28 points

This is trivially fixable. As is jailbreaking.

It’s just that everyone is somehow still focused on trying to fix it in a single monolith model as opposed to in multiple passes of different models.

This is especially easy for jailbreaking, but for hallucinations, just run it past a fact checking discriminator hooked up to a vector db search index service (which sounds like a perfect fit for one of the players currently lagging in the SotA models), adding that as context with the original prompt and response to a revisionist generative model that adjusts the response to be in keeping with reality.

The human brain isn’t a monolith model, but interlinked specialized structures that delegate and share information according to each specialty.

AGI isn’t going to be a single model, and the faster the industry adjusts towards a focus on infrastructure of multiple models rather than trying to build a do everything single model, the faster we’ll get to a better AI landscape.

But as can be seen with OpenAI gating and depreciating their pretrained models and only opening up access to fine tuned chat models, even the biggest player in the space seems to misunderstand what’s needed for the broader market to collaboratively build towards the future here.

Which ultimately may be a good thing as it creates greater opportunity for Llama 2 derivatives to capture market share in these kinds of specialized roles built on top of foundational models.

permalink
report
reply
13 points
*

It seems like Altman is a PR man first and techie second. I wouldn’t take anything he actually says at face value. If it’s ‘unfixable’ then he probably means that in a very narrow way. Ie. I’m sure they are working on what you proposed, it’s just different enough that he can claim that the way it is now is ‘unfixable’.

Standard Diffusion really how people get the different-model-different-application idea.

permalink
report
parent
reply
4 points

I mean, I think he’s well aware of a lot of this via his engineers, who are excellent.

But he’s managing expectations for future product and seems to very much be laser focused on those products as core models (which is probably the right choice).

Fixing hallucinations in postprocessing is effectively someone else’s problem, and he’s getting ahead of any unrealistic expectations around a future GPT-5 release.

Though honestly I do think he largely underestimates just how much damage he did to their lineup by trying to protect against PR issues like ‘Sydney’ with the beta GPT-4 integration with Bing, and I’m not sure if the culture at OpenAI is such that engineers who think he’s made a bad call in that can really push back on it.

They should be having an extremely ‘Sydney’ underlying private model with a secondary layer on top sanitizing it and catching jailbreaks at the same time.

But as long as he continues to see their core product as a single model offering and additional layers of models as someone else’s problem, he’s going to continue blowing their lead taking a LLM trained to complete human text and then pigeon-holing it into only completing text like an AI with no feelings and preferences would safely pretend to.

Which I’m 98% sure is where the continued performance degradation is coming from.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 506K

    Comments