OpenAI just admitted it can’t identify AI-generated text. That’s bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

I wonder if it was too many false positives, like when some tool said the US constitution was written by AI. Which seems quite logical considering that LLMs imitate humans very closely and cannot by themselves prevent hallucinations which is the best predictor of whether it was written by a person in good faith or not.

permalink
report
reply
22 points
*
Deleted by creator
permalink
report
reply
3 points

I don’t think it’s possible to always assume you can be misled, the influences remain even when they are not noticed. Also it is not advisable to be too suspicious, this breeds conspiratorial mindset. This is a dark side of critical thinking. Information space is already loaded with trash, and AI is about to amplify it. I think we need personal identity management, and AI agents will have their identities too. The danger is that this is hard to do in free internet. But it is possible in part, there are technologies.

permalink
report
parent
reply
3 points
*
Deleted by creator
permalink
report
parent
reply
3 points

There are real conspiracies but conspirstorial mindset is still unhealthly. There is a joke “even if you do have paranoia, it does not mean that THEY are not actually watching you”. It’s just important not do descend into paranoia, even if it starts by legitimate concerns. It’s also important to be aware that one person can not derive all knowledge for by themselves, so it is necessary to trust, even conditionally. But right now, there is no established technical process helping to choose how to trust. I just belive that most people in here are not bots and not crazy.

permalink
report
parent
reply
8 points

Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.

permalink
report
reply
5 points

OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.

permalink
report
reply
6 points

Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.

permalink
report
parent
reply
1 point

Citation needed

permalink
report
parent
reply
2 points
*

The entropy in text is not good enough to provide enough space for watermarking. No it does not get better in longer text because you have control over i lot/chunking. You have control over top-k and temperature and prompt which creates infinite output space. Open text-generation-webui, go to the parameter page and count the number of parameters you can adjust to guide outcome. In the future you can add wasm encoded grammar to that list too.

Server side hashing / watermarking can be trivially defeated via transformations / emoji injection Latent space positional watermarking breaks easily with post processing. It would also kill any company trying to sell it (Apple be like … you want all your chats at openAI or in the privacy of your phone?) and ultimately be massively dystopian.

Unlike plagiarism checks you can’t compare to a ground truth.

Prompt guidance can box in the output space to a point you could not possibly tell it’s not human. The technology has moved from central servers to the edge, even id you could build something for one LLM, another one not in your control, like a local LLAMA which is open source (see how quickly Stable Diffusion 2 Vae watermarking was removed after release)

In a year your iphone will have a built in LLM. Everything will have LLMs, some highly purpose bound with only a few M parameters. Finetuning like LoRa is accessible to a large number of people with consumer GPUs today and will be commoditized in a year. Since it can shape the output, it again increases the possibility space of outputs and will scramble patterns.

Finally, the bar is not “better than a flip of a coin. If you are going to accuse people or ruin their academic career, you need triple nine accuracy or you’ll wrongfully accuse hundreds of essays a semester.

The most likely detection would be if someone finds a remarkable stable signature that magically works for all the models out there (100s by now), doesn’t break with updates (lol - see chatgpt presumably getting worse), survives quantisation and somehow can be kept secret from everyone including AI which can trivially spot patterns in massive data sets. Not Going To Happen.

Even if it was possible to detect, it would be model or technology specific and lagging technology - we are moving at 2000miles and hour and in a year it may mot be transformers. They’ll be GAN or RNN elements fused into it or something completely new.

The entire point of the technology is to approximate humanity - plus we are moving at it from the other direction - more and more conventional tools embed AI (from your camera not being able to take non AI touched pictures anymore to Photoshop infill to word autocomplete to new spellchecking and grammar models).

People latch onto the idea that you can detect it because it provides an escapism fantasy and copium so they don’t have to face the change that is happening. If you can detect it you can keep it out. You can’t. Not against anyone who has even the slightest idea of how to use this stuff.

It’s like gunpowder was invented and Samurai would throw themselves into the machine guns because it rendered decades of training and perfection, of knowledge about fortification, war and survival moot.

On video detection will remain viable for a long time due to the available entropy. Text. It’s always been snakeoil and everyone peddling it should be shot.

permalink
report
parent
reply
0 points

How not? You ever talk to Chat-GPT, it’s full of blatant lies and failure to understand context.

permalink
report
parent
reply
2 points
*

And? Blatant lies are not exclusive to AI texts. Every right wing media is full of blatant lies, yet are written by humans (for now).

The problem is, if you properly prompt the AI, you get exactly what you want. Prompt it a hundred times, and you get a hundred different texts, posted to a hundred different social media channels, generating hype. How in earth will you be able to detect this?

permalink
report
parent
reply
1 point

Just like your comment you say? Indistinguishable from human - garbage in, garbage out .

If you actually used the technology rather than being a stochastic parrot, you’d understand:)

permalink
report
parent
reply
3 points
*
Deleted by creator
permalink
report
reply
1 point

Nope. You’d just ask chatgpt to generate the conversation with emojis instead of spaces and replace the emojis after.

A million variations of this approach AND it would push people towards Apple who will launch an on the phone LLM in the next 12 month.

In a year the technology will run locally on any computer - it’s time to give up on the fantasy that this can be detected or controlled. Today you can run a GPT 3.5 alike with 30B parameters on a consumer GPU at home that, with the right finetuning - will reach chatgpt performance.

Just let the idea go, it doesn’t work.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 507K

    Comments