Lemdro.id

Local All Communities Log in Sign up

Local All Communities

204

ChatGPT gets code questions wrong 52% of the time(www.theregister.com)

posted 1 year ago

by

☆ Yσɠƚԋσʂ ☆@lemmy.ml

in

technology@lemmy.ml

Sort:

Hot Top Controversial New Old

[ +- ]

theluddite@lemmy.ml

60 points

1 year ago

The real problem with LLM coding, in my opinion, is something much more fundamental than whether it can code correctly or not. One of the biggest problems coding faces right now is code bloat. In my 15 years writing code, I write so much less code now than when I started, and spend so much more time bolting together existing libraries, dealing with CI/CD bullshit, and all the other hair that software projects has started to grow.

The amount of code is exploding. Nowadays, every website uses ReactJS. Every single tiny website loads god knows how many libraries. Just the other day, I forked and built an open source project that had a simple web front end (a list view, some forms – basic shit), and after building it, npm informed me that it had over a dozen critical vulnerabilities, and dozens more of high severity. I think the total was something like 70?

All code now has to be written at least once. With ChatGPT, it doesn’t even need to be written once! We can generate arbitrary amounts of code all the time whenever we want! We’re going to have so much fucking code, and we have absolutely no idea how to deal with that.

report

reply

[ +- ]

space_comrade [he/him]@hexbear.net

12 points

1 year ago

I don’t think it’s gonna go that way. In my experience the bigger the chunk of code you make it generate the more wrong it’s gonna be, not just because it’s a larger chunk of code, it’s gonna be exponentially more wrong.

It’s only good for generating small chunks of code at a time.

report

reply

[ +- ]

FunkyStuff [he/him]@hexbear.net

7 points

1 year ago

It won’t be long (maybe 3 years max) before industry adopts some technique for automatically prompting a LLM to generate code to fulfill a certain requirement, then iteratively improve it using test data to get it to pass all test cases. And I’m pretty sure there already are ways to get LLM’s to generate test cases. So this could go nightmarishly wrong very very fast if industry adopts that technology and starts integrating hundreds of unnecessary libraries or pieces of code that the AI just learned to “spam” everywhere so to speak. These things are way dumber than we give them credit for.

report

reply

[ +- ]

space_comrade [he/him]@hexbear.net

7 points

1 year ago

*

Oh that’s definitely going to lead to some hilarious situations but I don’t think we’re gonna see a complete breakdown of the whole IT sector. There’s no way companies/institutions that do really mission critical work (kernels, firmware, automotive/aerospace software, certain kinds of banking/finance software etc.) will let AI write that code any time soon. The rest of the stuff isn’t really that important and isn’t that big of a deal it if breaks for a few hours/days because the AI spazzed out.

report

reply

Show more comments

Show more comments

[ +- ]

possibly a cat@lemmy.ml

1 point

1 year ago

*

Deleted by creator

report

reply

[ +- ]

theluddite@lemmy.ml

2 points

1 year ago

Yes I agree. I meant the fundamental problem with the idea of LLMs doing more and more of our code, even if they get quite good.

report

reply

[ +- ]

BloodyDeed@feddit.ch

12 points

1 year ago

*

This is so true. I feel like my main job as a senior software engineer is to keep the bloat low and delete unused code. Its very easy to write code - maintaining it and focusing on the important bits is hard.

This will be one of the biggest and most challenging problems Computer Science will have to solve in the coming years and decades.

report

reply

[ +- ]

floofloof@lemmy.ca

6 points

1 year ago

*

It’s easy and fun to write new code, and it wins management’s respect. The harder work of maintaining and improving large code bases and data goes mostly unappreciated.

report

reply

[ +- ]

DefinitelyNotAPhone [he/him]@hexbear.net

9 points

1 year ago

There’s the other half of this problem, which is that the kind of code that LLMs are relatively good at pumping out with some degree of correctness are almost always the bits of code that aren’t difficult to begin with. A sorting algorithm on command is nice, but if you’re working on any kind of novel implementation then the hard bits are the business logic which in all likelihood has never been written before and is either sensitive information or just convoluted enough to make turning into a prompt difficult. You still have to have coders who understand architecture and converting requirements into raw logic to do that even with the LLMs.

report

reply

[ +- ]

AlexWIWA@lemmy.ml

6 points

1 year ago

Makes the Adeptus Mechanicus look like a realistic future. Really advanced tech, but no one knows how it works

report

reply

[ +- ]

Antiwork [none/use name]@hexbear.net

5 points

1 year ago

*

uhhhh let me code here

report

reply

[ +- ]

buh [she/her]@hexbear.net

36 points

1 year ago

Every time I’ve used it to generate code, it invents frameworks that don’t exist

report

reply

[ +- ]

just_browsing@reddthat.com

32 points

1 year ago

CTO material right there.

report

reply

[ +- ]

aaaaaaadjsf [he/him, comrade/them]@hexbear.net

6 points

1 year ago

Lmaooo

report

reply

[ +- ]

maynarkh@feddit.nl

15 points

1 year ago

Just like an average enterprise architect.

report

reply

[ +- ]

CriticalResist8 [he/him]@hexbear.net

6 points

1 year ago

I’ve had some success with it if I’m giving it small tasks and describe in as much detail as possible. By design (from what I gather) it can only work on stuff it was able to use in training, which means the language needs to be documented extensively for it to work.

Stuff like Wordpress or MediaWiki code it does generally good at, actually helped me make the modules and templates I needed on mediawiki, but for both of those there’s like a decade of forum posts, documentation, papers and other material that it could train with. Fun fact: in one specific problem (using a mediawiki template to display a different message whether you are logged in or not), it systematically gives me the same answer no matter how I ask. It’s only after enough probing that GPT tells me because of cache issues, this is not possible lol. I figure someone must have asked about this same template somewhere and it’s the only thing it can work off of from its training set to answer that question.

I also always double-check the code it gives me for any error or things that don’t exist.

report

reply

[ +- ]

Coolkidbozzy [he/him]@hexbear.net

21 points

1 year ago

Relatable

report

reply

[ +- ]

21 points

1 year ago

If I’m going to use AI for something, I want it to be right more often than I am, not just as often!

report

reply

[ +- ]

space_comrade [he/him]@hexbear.net

14 points

1 year ago

It actually doesn’t have to be. For example the way I use Github Copilot is I give it a code snippet to generate and if it’s wrong I just write a bit more code and the it usually gets it right after 2-3 iterations and it still saves me time.

The trick is you should be able to quickly determine if the code is what you want which means you need to have a bit of experience under your belt, so AI is pretty useless if not actively harmful for junior devs.

Overall it’s a good tool if you can get your company to shell out $20 a month for it, not sure if I’d pay it out of my own pocket tho.

report

reply

[ +- ]

11 points

1 year ago

It… it was a joke. I was implying that 52% was better than me.

report

reply

[ +- ]

space_comrade [he/him]@hexbear.net

9 points

1 year ago

Ah ok I guess I misread that. My point is that by itself it’s not gonna help you write either better or shittier code than you already do.

report

reply

[ +- ]

jvisick@programming.dev

8 points

1 year ago

GitHub Copilot is just intellisense that can complete longer code blocks.

I’ve found that it can somewhat regularly predict a couple lines of code that generally resemble what I was going to type, but it very rarely gives me correct completions. By a fairly wide margin, I end up needing to correct a piece or two. To your point, it can absolutely be detrimental to juniors or new learners by introducing bugs that are sometimes nastily subtle. I also find it getting in the way only a bit less frequently than it helps.

I do recommend that experienced developers give it a shot because it has been a helpful tool. But to be clear - it’s really only a tool that helps me type faster. By no means does it help me produce better code, and I don’t ever see it full on replacing developers like the doomsayers like to preach. That being said, I think it’s $20 well spent for a company in that it easily saves more than $20 worth of time from my salary each month.

report

reply

[ +- ]

SirGolan@lemmy.sdf.org

21 points

1 year ago

*

Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).

report

reply

[ +- ]

DPRK_Chopra [comrade/them]@hexbear.net

5 points

1 year ago

ChatGPT is 3.5, 4 is just called GPT4

report

reply

[ +- ]

SirGolan@lemmy.sdf.org

4 points

1 year ago

Hmm that’s incorrect. ChatGPT (if you pay for it) does both.

report

reply

[ +- ]

DPRK_Chopra [comrade/them]@hexbear.net

7 points

1 year ago

I’m talking about the models and how they’re written about in the literature. I don’t care how OpenAI brands their products.

From the paper itself:

For the additional 2000 SO questions, ChatGPT 3.5 Turbo API is used.

https://arxiv.org/pdf/2308.02312.pdf

report

reply

Show more comments

Show more comments

[ +- ]

floofloof@lemmy.ca

3 points

1 year ago

Whatever GitHub Copilot uses (the version with the chat feature), I don’t find its code answers to be particularly accurate. Do we know which version that product uses?

report

reply

[ +- ]

SirGolan@lemmy.sdf.org

4 points

1 year ago

If we are talking Copilot then that’s not ChatGPT. But I agree it’s ok. Like it can do simple things well but I go to GPT 4 for the hard stuff. (Or my own brain haha)

report

reply

[ +- ]

Corkyskog@sh.itjust.works

3 points

1 year ago

Is GPT4 publicly available?

report

reply

[ +- ]

newIdentity@sh.itjust.works

3 points

1 year ago

Yes… If you pay $20 a month

report

reply

[ +- ]

SirGolan@lemmy.sdf.org

3 points

1 year ago

Yes available to anyone in the API or anyone who pays for ChatGPT subscription.

report

reply

[ +- ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

-2 points

1 year ago

Oh that’s possible, not sure which one they used either.

report

reply

Technology

!technology@lemmy.ml

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Community stats

3.9K
Monthly active users
2.5K
Posts
40K
Comments

Community moderators

MinutePhrase@lemmy.ml

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.3 (github)