Lemdro.id

0 points

8 months ago

So super informed OP, tell me how they work. technically, not CEO press release speak. explain the theory.

PM_ME_VINTAGE_30S [he/him]@lemmy.sdf.org

report

[ - ]

10 points

8 months ago

I’m not OP, and frankly I don’t really disagree with the characterization of ChatGPT as “fancy autocomplete”. But…

I’m still in the process of reading this cover-to-cover, but Chapter 12.2 of Deep Learning: Foundations and Concepts by Bishop and Bishop explains how natural language transformers work, and then has a short section about LLMs. All of this is in the context of a detailed explanation of the fundamentals of deep learning. The book cites the original papers from which it is derived, most of which are on ArXiv. There’s a nice copy on Library Genesis. It requires some multi-variable probability and statistics, and an assload of linear algebra, reviews of which are included.

So obviously when the CEO explains their product they’re going to say anything to make the public accept it. Therefore, their word should not be trusted. However, I think that when AI researchers talk simply about their work, they’re trying to shield people from the mathematical details. Fact of the matter is that behind even a basic AI is a shitload of complicated math.

At least from personal experience, people tend to get really aggressive when I try to explain math concepts to them. So they’re probably assuming based on their experience that you would be better served by some clumsy heuristic explanation.

IMO it is super important for tech-inclined people interested in making the world a better place to learn the fundamentals and limitations of machine learning (what we typically call “AI”) and bring their benefits to the common people. Clearly, these technologies are a boon for the wealthy and powerful, and like always, have been used to fuck over everyone else.

IMO, as it is, AI as a technology has inherent patterns that induce centralization of power, particularly with respect to the requirement of massive datasets, particularly for LLMs, and the requirement to understand mathematical fundamentals that only the wealthy can afford to go to school long enough to learn. However, I still think that we can leverage AI technologies for the common good, particularly by developing open-source alternatives, encouraging the use of open and ethically sourced datasets, and distributing the computing load so that people who can’t afford a fancy TPU can still use AI somehow.

I wrote all this because I think that people dismiss AI because it is “needlessly” complex and therefore bullshit. In my view, it is necessarily complex because of the transformative potential it has. If and only if you can spare the time, then I encourage you to learn about machine learning, particularly deep learning and LLMs.

report

[ - ]

4 points

8 months ago

Fact of the matter is that behind even a basic AI is a shitload of complicated math.

Depending on how simple something can be to be considered an AI, the math is surprisingly simple compared to what an average person might expect. The theory behind it took a good amount of effort to develop, but to make something like a basic image categorizer (eg. optical character recognition) you really just need some matrix multiplication and calculating derivatives-- non-math-major college math type stuff.

report

PM_ME_VINTAGE_30S [he/him]@lemmy.sdf.org

[ - ]

4 points

8 months ago

you really just need some matrix multiplication and calculating derivatives-- non-math-major college math type stuff.

Well sure you don’t need a math degree for that, but most people really need to put some time into those topics. I.e., that kind of math is complex enough to constitute a barrier to entry into the field, particularly people with no free time to self-study or money for school.

Said differently: matrix math and basic calculus is hard, just not for you and I.

report

[ - ]

1 point

8 months ago

Point taken

report

[ - ]

-3 points

8 months ago

Come on… It’s not impressive to just not be aware of where the bar is for most people. No, it’s not complex math but you are debating people that read headlines only and then go fully into imagination of what it says

report

[ - ]

3 points

8 months ago

That’s my point. OP doesn’t know the maths, has probably never implemented any sort of ML, and is smugly confident that people pointing out the flaws in a system generating one token at a time are just parroting some line.

These tools are excellent at manipulating text (factoring in the biases they have, I wouldn’t recommended trying to use one in a multinational corporation in internal communications for example, as they’ll clobber non euro derived culture) where the user controls both input and output.

Help me summarise my report, draft an abstract for my paper, remove jargon from my email, rewrite my email in the form of a numbered question list, analyse my tone here, write 5 similar versions of this action scene I drafted to help me refine it. All excellent.

Teach me something I don’t know (e.g. summarise article, answer question etc?) disaster!

report

[ - ]

3 points

8 months ago

They can summarize articles fairly well

report

[ - ]

2 points

8 months ago

No, they can summarise articles very convincingly! Big difference.

They have no model of what’s important, or truth. Most of the time they probably do ok but unless you go read the article you’ll never know if they left out something critical, hallucinated details, or inverted the truth or falsity of something.

That’s the problem, they’re not an intern they don’t have a human mind. They recognise patterns in articles and patterns in summaries, they non deterministically adjust the patterns in the article towards the patterns in summaries of articles. Do you see the problem? They produce stuff that looks very much like an article summary but do not summarise, there is no intent, no guarantee of truth, in fact no concern for truth at all except what incidentally falls out of the statistical probability wells.

report

[ - ]

1 point

8 months ago

That’s a good way of explaining it. I suppose you’re using a stricter definition of summary than I was.

report

[ - ]

2 points

8 months ago

I think it’s really important to keep in mind the separation between doing a task and producing something which looks like the output of a task when talking about these things. The reason being that their output is tremendously convincing regardless of its accuracy, and given that writing text is something we only see human minds do it’s so easy to ascribe intent behind the emission of the model that we have no reason to believe is there.

Amazingly it turns out that often merely producing something which looks like the output of a task apparently accidentally accomplishes the task on the way. I have no idea why merely predicting the next plausible word can mean that the model emits something similar to what I would write down if I tried to summarise an article! That’s fascinating! but because it isn’t actually setting out to do that there’s no guarantee it did that and if I don’t check the output will be indistinguishable to me because that’s what the models are built to do above all else.

So I think that’s why we to keep them in closed loops with person -> model -> person, and explaining why and intuiting if a particularly application is potentially dangerous or not is hard if we don’t maintain a clear separation between the different processes driving human vs llm text output.

report