Don’t use AI to summarize documents — it’s worse than humans in every way

Hahaha what a load of nonsense.

Summarised by Gemini

your post history tells me you’re pretty fucking comfortable with pointless nonsense

Well, to be fair, AI can do it in seconds. Which beats humans.

But if that is relevant if the results are worthless is another question.

Yeah it changes the task from note taking or summarizing to proofreading.

And proofreading is notably more complex and has a worse failure state than just writing your own summary.

Thing is you can do in real time and not pay as much attention to the goings on as you write or do it in the end and forget stuff. there is no harm in the ai summariziation. you could instead write a summary and check if you left anything out via the ai.

Facts are not a data type for LLMs

I kind of like this because it highlights the way LLMs operate kind of blind and drunk, they’re just really good at predicting the next word.

They’re not good at predicting the next word, they’re good at predicting the next common word while excluding most unique choices.

What results is essentially if you made a Venn diagram of human language and only ever used the center of it.

Yes, thanks for clarifying what I meant! AI will never create anything unique unless prompted uniquely and even then it will tend to revert back to what you expect most.

You could use them to know what the text is about, and if it’s worth your reading time. In this situation, it’s fine if the AI makes shit up, as you aren’t reading its output for the information itself anyway; and the distinction between summary and shortened version becomes moot.

However, here’s the catch. If the text is long enough to warrant the question “should I spend my time reading this?”, it should contain an introduction for that very purpose. In other words if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.

EDIT: I’m not addressing documents in this. My bad, I know. [In my defence I’m reading shit in a screen the size of an ant.]

@lvxferre @dgerard have you bumped your head?

No, it’s just rambling. My bad.

I focused too much on using AI to summarise and ended not talking about it summarising documents, even if the text is about the later.

And… well, the later is such a dumb idea that I don’t feel like telling people “the text is right, don’t do that”, it’s obvious.

You’d think so, but guess what precise use case LLMs are being pushed hard for.

ChatGPT gives you a bad summary full of hallucinations and, as a result, you choose not to read the text based on that summary.

(For clarity I’ll re-emphasise that my top comment is the result of misreading the word “documents” out, so I’m speaking on general grounds about AI “summaries”, not just about AI “summaries” of documents.)

The key here is that the LLM is likely to hallucinate the claims of the text being shortened, but not the topic. So provided that you care about the later but not the former, in order to decide if you’re going to read the whole thing, it’s good enough.

And that is useful in a few situations. For example, if you have a metaphorical pile of a hundred or so scientific papers, and you only need the ones about a specific topic (like “Indo-European urheimat” or “Argiope spiders” or “banana bonds”).

That backtracks to the OP. The issue with using AI summaries for documents is that you typically know the topic at hand, and you want the content instead. That’s bad because then the hallucinations won’t be “harmless”.

But the claims of the text are often why you read it in the first place! If you have a hundred scientific papers you’re going to read the ones that make claims either supporting or contradicting your research.

You might as well just skim the titles and guess.

Both the use cases here are goverment documents. I’m baffled at the idea of it being “fine if the AI makes shit up”.

if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.

And if it’s badly written then the LLM will shit itself.

Now let’s ask ourselves how much of the text in the world is “well-written”?

Or even better, you could apply this to Copilot. How much code in the world is good code? The answer is fucking none, mate.

I had GPT 3.5 break down 6x 45-minute verbatim interviews into bulleted summaries and it did great. I even asked it to anonymize people’s names and it did that too. I did re-read the summaries to make sure no duplicate info or hallucinations existed and it only needed a couple of corrections.

Beats manually summarizing that info myself.

Maybe their prompt sucks?

@RagnarokOnline @dgerard “They failed to say the magic spells correctly”

I also use it for that pretty often. I always double check and usually it’s pretty good. Once in a great while it turns the summary into a complete shitshow but I always catch it on a reread, ask a second time, and it fixes things up. My biggest problem is that I’m dragged into too many useless meetings every week and this saves a ton of time over rereading entire transcripts and doing a poor job of summarizing because I have real work to get back to.

I also use it as a rubber duck. It works pretty well if you tell it what it’s doing and tell it to ask questions.

Isn’t the whole point of rubber duck debugging that the method works when talking to a literal rubber duck?

what if your rubber duck released just an entire fuckton of CO2 into the environment constantly, even when you weren’t talking to it? surely that means it’s better

“Are you sure you’re holding it correctly?”

christ, every damn time

That is how tools tend to work, yes.

we find they tend to post here, though not for long

“tools” doesn’t mean “good”

good tools are designed well enough so it’s clear how they are used, held, or what-fucking-ever.

fuck these simpleton takes are a pain in the arse. They’re always pushed by these idiots that have based their whole world view on fortune cookie aphorisms

Said like a person who wouldn’t be able to correctly hold a hammer on first try

I got AcausalRobotGPT to summarise your post and it said “I’m not saying it’s always programming.dev, but”

Did you conduct or read all the interviews in full in order to verify no hallucinations?

How did you make sure no hallucinations existed without reading the source material; and if you read the source material, what did using an LLM save you?

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Don’t use AI to summarize documents — it’s worse than humans in every way(pivot-to-ai.com)

TechTakes

!techtakes@awful.systems

Community stats

Community moderators