‘In awe’: scientists impressed by latest ChatGPT model o1(www.nature.com)

posted 2 months ago

QuillcrestFalconer [he/him]@hexbear.net

technology@hexbear.net

38 commentshide report

I know people here are very skeptical of AI in general, and there is definitely a lot of hype, but I think the progress in the last decade has been incredible.

Here are some quotes

“In my field of quantum physics, it gives significantly more detailed and coherent responses” than did the company’s last model, GPT-4o, says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany.

Strikingly, o1 has become the first large language model to beat PhD-level scholars on the hardest series of questions — the ‘diamond’ set — in a test called the Graduate-Level Google-Proof Q&A Benchmark (GPQA)1. OpenAI says that its scholars scored just under 70% on GPQA Diamond, and o1 scored 78% overall, with a particularly high score of 93% in physics

OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

Catherine Brownstein, a geneticist at Boston Children’s Hospital in Massachusetts, says the hospital is currently testing several AI systems, including o1-preview, for applications such as connecting the dots between patient characteristics and genes for rare diseases. She says o1 “is more accurate and gives options I didn’t think were possible from a chatbot”.

Sort:

Hot Top Controversial New Old

[ - ]

hotcouchguy [he/him]@hexbear.net

49 points

2 months ago

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

Bro it was trained on your thesis

permalink

report

[ - ]

UlyssesT [he/him]@hexbear.net

29 points

2 months ago

Deleted by creator

permalink

report

parent

[ - ]

woodenghost [comrade/them]@hexbear.net

5 points

2 months ago

I get it, but code isn’t usually included in publications. Unless it was put on GitHub.

permalink

report

parent

[ - ]

Barx [none/use name]@hexbear.net

7 points

2 months ago

Physicist code tends to be pretty simple, particularly when it’s just implementing some closed form solution. It is also possible that a model focused on parsing the math in papers - like equations in his thesis - would just reproduce this in Python or whatever.

permalink

report

parent

[ - ]

KobaCumTribute [she/her]@hexbear.net

43 points

2 months ago

All of their models have consistently done pretty good on any sort of standard test, and then performed horribly in real use. Which makes sense, because if they can train it specifically to make something that looks like the answers to that test it will probably be good at making the answers to that, but it’s still fundamentally just a language parser and predictor without knowledge or any sort of internal modeling.

Their entire approach is just so fundamentally lazy and grifty, burning massive amounts of energy on what is fundamentally a dumbshit approach to building AI. It’s like trying to make a brain by just making the speech processing lobe bigger and bigger and expecting it’ll eventually get so good at talking that the things it says will be intrinsically right instead of only looking like text.

permalink

report

[ - ]

context [fae/faer, fae/faer]@hexbear.netM

6 points

2 months ago

All of their models have consistently done pretty good on any sort of standard test, and then performed horribly in real use.

fuck maybe i am a chatbot

permalink

report

parent

[ - ]

Dirt_Owl [comrade/them, they/them]@hexbear.net

32 points

2 months ago

Oh cool, they’re using it in hospitals now

permalink

report

[ - ]

gay_king_prince_charles [she/her, he/him]@hexbear.net

18 points

2 months ago

And the bio accuracy is 31% wrong…

permalink

report

parent

[ - ]

dat_math [they/them]@hexbear.net

11 points

2 months ago

It pains me to say this is better than some of the physicians I’ve worked with.

permalink

report

parent

[ - ]

UlyssesT [he/him]@hexbear.net

5 points

2 months ago

Deleted by creator

permalink

report

parent

Show more comments

[ - ]

InevitableSwing [none/use name]@hexbear.net

29 points

2 months ago

There is definitely a lot of hype.

I’m not being sarcastic when I say I have yet to see a single real world example where the AI does extraordinarily well and lives up to the hype. It’s always the same.

It’s brilliant!*

*When it’s spoonfed in a non real world situation. Your results may vary. Void were prohibited.

OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

Ah, I read an article on the Mathematics Olympiad. The NYT agrees!..

Move Over, Mathematicians, Here Comes AlphaProof

A.I. is getting good at math — and might soon make a worthy collaborator for humans.

The problem - as always - is the US media is shit. Comments on that article by randos are better and far more informative than that PR-hype article pretending to be journalism.

Major problem with this article: competition math problems use a standardized collection of solution techniques, it is known in advance that a solution exists, and that the solution can be obtained by a prepared competitor within a few hours.

“Applying known solutions to problems of bounded complexity” is exactly what machines always do and doesn’t compete with the frontier in any discipline.

-–

Note in the caption of the figure that the problem had to be translated into a formalized statement in AlphaGeometry’s own language (presumably by people). This is often the hardest part of solving one of these problems.

AI tech bros keep promising the moon and the stars. But then their AI doesn’t deliver so tech bros lie even more about everything to get more funding. But things don’t pan out again. And the churn continues. Tech bros promise the moon and the stars…

permalink

report

[ - ]

UlyssesT [he/him]@hexbear.net

13 points

2 months ago

Deleted by creator

permalink

report

parent

[ - ]

batsforpeace [any, any]@hexbear.net

4 points

2 months ago

Despite skepticism over whether nuclear fusion—which doesn’t emit greenhouse gases or carbon dioxide—will actually come to fruition in the next few years or decades, Gates said he remains optimistic. “Although their timeframes are further out, I think the role of fusion over time will be very, very critical,” he told The Verge.

don’t worry climate folks, we will throw some dollars at nuclear fusion startups and they will make us beautiful clean energy for AI datacenters in just a few years, only a few more years of big fossil fuel use while we wait, promise

Oracle currently has 162 data centers in operation and under construction globally, Ellison told analysts during a recent earnings call, adding that he expects the company to eventually have 1,000 to 2,000 of these facilities. The company’s largest data center is 800 megawatts and will contain “acres” of Nvidia (NVDA)’s graphics processing units (GPUs) to train A.I. models, he said.

I want football fields of gpus

Ellison described a dinner with Elon Musk and Jensen Huang, the CEO of Nvidia, where the Oracle head and Musk were “begging” Jensen for more A.I. chips. “Please take our money. No, take more of it. You’re not taking enough, we need you to take more of it,” recalled Ellison, who said the strategy worked.

give us more chips brooo

permalink

report

parent

[ - ]

Tomorrow_Farewell [any, they/them]@hexbear.net

5 points

2 months ago

permalink

report

parent

[ - ]

Barx [none/use name]@hexbear.net

6 points

2 months ago

You have to admire the grift.

Shame it requires the energy use of entire countries and is a weapon for disciplining labor.

permalink

report

parent

[ - ]

hypercracker@hexbear.net

15 points

2 months ago

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

yeah I’m gonna doubt that, or he didn’t actually compile/run/test that code. like all LLMs it’s amazing until you interact with it a bit and see how incredibly limited it is.

permalink

report

technology

!technology@hexbear.net

Create post

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

Community stats

1.1K
Monthly active users
1.6K
Posts
20K
Comments

Community stats

Community moderators