Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

103 points

Even worse is calling a proprietary, absolutely closed source, closed data and closed weight company “OpeanAI”

permalink
report
reply
38 points

Especially after it was founded as a nonprofit with the mission to push open source AI as far and wide as possible to ensure a multipolar AI ecosystem, in turn ensuring AI keeping other AI in check so that AI would be respectful and prosocial.

permalink
report
parent
reply
22 points

Sorry, that was a PR move from the get-go. Sam Altman doesn’t have an altruistic cell in his whole body.

permalink
report
parent
reply
15 points

It’s even crazier that Sam Altman and other ML devs said that they reached the peak of what current Machine Learning models were capable of years ago

https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/

But that doesn’t mean shit to the marketing departments

permalink
report
parent
reply
9 points

“Look at this shiny.”

Investment goes up.

“Same shiny, but look at it and we need to warn you that we’re developing a shinier one that could harm everyone. But think of how shiny.”

Investment goes up.

“Look at this shiny.”

Investment goes up.

“Same shiny, but look at it and we need to warn you that we’re developing a shinier one that could harm everyone. But think of how shiny.”

permalink
report
parent
reply
29 points
*

The training data would be incredible big. And it would contain copyright protected material (which is completely okay in my opinion, but might invoce criticism). Hell, it might even be illegal to publish the training data with the copyright protected material.

They published the weights AND their training methods which is about as open as it gets.

permalink
report
reply
20 points

They could disclose how they sourced the training data, what the training data is and how you could source it. Also, did they publish their hyperparameters?

They could jpst not call it Open Source, if you can’t open source it.

permalink
report
parent
reply
12 points

For neural nets the method matters more. Data would be useful, but at the amount these things get trained on the specific data matters little.

They can be trained on anything, and a diverse enough data set would end up making it function more or less the same as a different but equally diverse set. Assuming publicly available data is in the set, there would also be overlap.

The training data is also by necessity going to be orders of magnitude larger than the model itself. Sharing becomes impractical at a certain point before you even factor in other issues.

permalink
report
parent
reply
2 points

That… Doesn’t align with years of research. Data is king. As someone who specifically studies long tail distributions and few-shot learning (before succumbing to long COVID, sorry if my response is a bit scattered), throwing more data at a problem always improves it more than the method. And the method can be simplified only with more data. Outside of some neat tricks that modern deep learning has decided is hogwash and “classical” at least, but most of those don’t scale enough for what is being looked at.

Also, datasets inherently impose bias upon networks, and it’s easier to create adversarial examples that fool two networks trained on the same data than the same network twice freshly trained on different data.

Sharing metadata and acquisition methods is important and should be the gold standard. Sharing network methods is also important, but that’s kind of the silver standard just because most modern state of the art models differ so minutely from each other in performance nowadays.

Open source as a term should require both. This was the standard in the academic community before tech bros started running their mouths, and should be the standard once they leave us alone.

permalink
report
parent
reply
2 points

Hell, for all we know it could be full of classified data. I guess depending on what country you’re in it definitely is full of classified data…

permalink
report
parent
reply
11 points

I like how when America does it we call it AI, and when China does it it’s just an LLM!

permalink
report
reply
7 points
*

I’m including Facebook’s LLM in my critique. And I dislike the current hype on LLMs, no matter where they’re developed.

And LLMs are not “AI”. I’ve called them “so-called ‘AIs’” waaay before.

permalink
report
parent
reply
6 points

Yeah, this shit drives me crazy. Putting aside the fact that it all runs off stolen data from regular people who are being exploited, most of this “AI” shit is basically just freeware if anything, it’s about as “open source” as Winamp was back in the day.

permalink
report
reply
12 points

Judging by OP’s salt in the comments, I’m guessing they might be an Nvidia investor. My condolences.

permalink
report
reply
7 points

Nah, just a 21st century Luddite.

permalink
report
parent
reply

memes

!memes@lemmy.world

Create post

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads

No advertisements or spam. This is an instance rule and the only way to live.

Sister communities

Community stats

  • 13K

    Monthly active users

  • 3.6K

    Posts

  • 87K

    Comments