"AI" CTO's when confronted with some of the most basic followup question in existence(hexbear.net)

posted 9 months ago

Posadas [he/him, they/them]@hexbear.net

chapotraphouse@hexbear.net

17 commentshide report

https://twitter.com/TechBurritoUno/status/1768363023192768799

Sort:

Hot Top Controversial New Old

[ - ]

Thordros [he/him, comrade/them]@hexbear.net

35 points

9 months ago

Q: “So, what data did you use to train your model?”

A: “I am sorry, my capability to answer this question is limited, as I am an AI language model. I am not privy to the inner workings of private organizations, and can only answer general questions.”

permalink

report

[ - ]

junebug2 [comrade/them, she/her]@hexbear.net

29 points

9 months ago

I feel like it’s not that she doesn’t know the answers, it’s that the answers are not politically convenient. My understanding of the situation is that in the course of training the various GPTs, OpenAI and Microsoft have realistically scanned every piece of text and imagery that’s available on the internet. It didn’t matter how good or bad or who made it, the models needed every available data point. That was all well and good until covid led to a tightening of interest rates, which meant the VC overlords of Silicon Valley finally had to pay a bill. All the vapor ware companies that have never turned a profit are scrambling now, and we see the mass layoffs of the last three years. Microsoft, however, got to be King Shit of VC Mountain because one of their startups invented “AI”. Say what you will about it (and I will), the public interest in and corporate adoption of AI has meant that there is a positive revenue for a tech company. Now regardless of rationality, all tech executives must find a way to cash in on the Golden Calf. Some companies are designing new applications or creating new services. The majority are realizing that they some how, sort of kind of, are the original data the models were made from, and they’re trying to extract rents from it. For now, that’s really only for content in the future. If the CTO here publicly claims that their product relies on YouTube or anything, Alphabet or whatever parent would be stupid not to come and sue for whatever they might get.

permalink

report

[ - ]

SSJ2Marx@hexbear.net

22 points

9 months ago

The way she keeps repeating “publicly available and licensed data” makes me one hundred percent positive that this is a lawyer-written phrase and she knows that she’s in a potential legal minefield and is sticking as closely to it as she can.

permalink

report

[ - ]

hello_hello [comrade/them]@hexbear.net

15 points

9 months ago

Please give answer I have twitter blocked on my DNS.

permalink

report

[ - ]

henfredemars@infosec.pub

19 points

9 months ago

My quick lazy manual transcription:

What data was used to train Sora?
We used publicly available data and licensed data.

So, videos on YouTube?
I’m actually not sure about that.

OK, videos from Facebook? Instagram?
You know if they were publicly available, um yeah, publicly available to use there might be the data but I’m not sure. I’m not confident about it.

What about Shutterstock? I know you guys have a deal with them.
I’m just not gonna go into the details of the data that was used but it was publicly available or licensed data.

EDIT: Please help, can’t figure out how preserve line breaks. Edit: Improved it a bit.

permalink

report

parent

[ - ]

InevitableSwing [none/use name]@hexbear.net

10 points

9 months ago

Two spaces on the end.

-–

Yada yada verse
Yada yada verse
Yada yada verse

Yada yada chorus
Yada yada chorus
Yada yada chorus

permalink

report

parent

[ - ]

henfredemars@infosec.pub

5 points

9 months ago

Thank you! I’m not sure how or why that works.

permalink

report

parent

Show more comments

[ - ]

flan [they/them]@hexbear.net

11 points

9 months ago

what’s this picture

permalink

report

[ - ]

half_giraffe [comrade/them]@hexbear.net

13 points

9 months ago

I thought it was like a meme but it’s literally a still from the interview. That’s the CTO of OpenAI saying she’s not sure what data was used to train the models lmao.

permalink

report

parent

[ - ]

flan [they/them]@hexbear.net

11 points

9 months ago

OpenAI are data goblins so the lawyers probably don’t allow anyone in the C suite to know where the data comes from

permalink

report

parent

[ - ]