ChatGPT gets code questions wrong 52% of the time

[ - ]

humanplayer2@lemmy.ml

1 point

1 year ago

Condorcet sobs “so close”.

permalink

report

reply

[ - ]

Fuckass@hexbear.net

20 points

1 year ago

*

Deleted by creator

permalink

report

reply

[ - ]

HiddenLayer5@lemmy.ml

9 points

1 year ago

Because it was marketing hype (read: marketing propaganda).

permalink

report

parent

reply

[ - ]

GBU_28@lemm.ee

6 points

1 year ago

The trick is you have to correct for the hallucinations, and teach it to revert back to a health path when going off course. This isn’t possible with current consumer tools.

permalink

report

parent

reply

[ - ]

s20@lemmy.ml

21 points

1 year ago

If I’m going to use AI for something, I want it to be right more often than I am, not just as often!

permalink

report

reply

[ - ]

space_comrade [he/him]@hexbear.net

14 points

1 year ago

It actually doesn’t have to be. For example the way I use Github Copilot is I give it a code snippet to generate and if it’s wrong I just write a bit more code and the it usually gets it right after 2-3 iterations and it still saves me time.

The trick is you should be able to quickly determine if the code is what you want which means you need to have a bit of experience under your belt, so AI is pretty useless if not actively harmful for junior devs.

Overall it’s a good tool if you can get your company to shell out $20 a month for it, not sure if I’d pay it out of my own pocket tho.

permalink

report

parent

reply

[ - ]

jvisick@programming.dev

8 points

1 year ago

GitHub Copilot is just intellisense that can complete longer code blocks.

I’ve found that it can somewhat regularly predict a couple lines of code that generally resemble what I was going to type, but it very rarely gives me correct completions. By a fairly wide margin, I end up needing to correct a piece or two. To your point, it can absolutely be detrimental to juniors or new learners by introducing bugs that are sometimes nastily subtle. I also find it getting in the way only a bit less frequently than it helps.

I do recommend that experienced developers give it a shot because it has been a helpful tool. But to be clear - it’s really only a tool that helps me type faster. By no means does it help me produce better code, and I don’t ever see it full on replacing developers like the doomsayers like to preach. That being said, I think it’s $20 well spent for a company in that it easily saves more than $20 worth of time from my salary each month.

permalink

report

parent

reply

[ - ]

s20@lemmy.ml

11 points

1 year ago

It… it was a joke. I was implying that 52% was better than me.

permalink

report

parent

reply

[ - ]

space_comrade [he/him]@hexbear.net

9 points

1 year ago

Ah ok I guess I misread that. My point is that by itself it’s not gonna help you write either better or shittier code than you already do.

permalink

report

parent

reply

[ - ]

r00ty@kbin.life

15 points

1 year ago

I used ChatGPT once. It created non functional code. But, the general idea did help me get to where I wanted. Maybe it works better as a rubber duck substitute?

permalink

report

reply

[ - ]

GBU_28@lemm.ee

2 points

1 year ago

Use it as a boilerplate blaster, for shit you could write yourself

permalink

report

parent

reply

[ - ]

dom@lemmy.ca

1 point

1 year ago

*

I did my first game jam with the help of chat gpt. It didn’t write any code in the game, but I was able to ask it how to accomplish certain things generally and it would give me ideas and it would be up to me to implement.

There were other things I knew my engine could do but i couldn’t figure out using the documentation, ao I would ask chat gpt “how do you xyz in godot” and it would give me step by step. This was especially useful for the things that get done in the engine ui and not in code.

permalink

report

parent

reply

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

0 points

1 year ago

Yeah, generating some ideas to get you going might be the best use for this kind of stuff.

permalink

report

parent

reply

[ - ]

WarmSoda@lemm.ee

6 points

1 year ago

*

That’s how I view AI generated art. It can come up with some really cool mash ups. But you have to do the rest. Anyone just using what it outputs like that’s the end of the story isn’t ‘using it right’ in my opinion.

permalink

report

parent

reply

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

-2 points

1 year ago

Right, I expect stuff like stable diffusion will become a part of the toolkit actual artists use. The workflows with this stuff are already getting pretty intricate where people use control net for posing, and inpainting of specific details, and so on. I would liken it to doing photography. You can’t just give a camera to anybody and get good results, it takes a person with a skill and taste to produce an interesting image.

permalink

report

parent

reply

[ - ]

EssentialCoffee@midwest.social

1 point

1 year ago

I’m not sure there’s a way to ‘use art right.’

permalink

report

parent

reply

Show more comments

[ - ]

SirGolan@lemmy.sdf.org

21 points

1 year ago

*

Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).

permalink

report

reply

[ - ]

DPRK_Chopra [comrade/them]@hexbear.net

5 points

1 year ago

ChatGPT is 3.5, 4 is just called GPT4

permalink

report

parent

reply

[ - ]

SirGolan@lemmy.sdf.org

4 points

1 year ago

Hmm that’s incorrect. ChatGPT (if you pay for it) does both.

permalink

report

parent

reply

[ - ]

DPRK_Chopra [comrade/them]@hexbear.net

7 points

1 year ago

I’m talking about the models and how they’re written about in the literature. I don’t care how OpenAI brands their products.

From the paper itself:

For the additional 2000 SO questions, ChatGPT 3.5 Turbo API is used.

https://arxiv.org/pdf/2308.02312.pdf

report

reply

[ - ]

3 points

1 year ago

Whatever GitHub Copilot uses (the version with the chat feature), I don’t find its code answers to be particularly accurate. Do we know which version that product uses?

permalink

report

parent

reply

[ - ]

SirGolan@lemmy.sdf.org

4 points

1 year ago

If we are talking Copilot then that’s not ChatGPT. But I agree it’s ok. Like it can do simple things well but I go to GPT 4 for the hard stuff. (Or my own brain haha)

permalink

report

parent

reply

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

-2 points

1 year ago

Oh that’s possible, not sure which one they used either.

permalink

report

parent