Begin typing your search above and press return to search.

Kerala

Kerala Police hunt for man who stabbed wife, attacked two others

2025-08-03T14:13:12+05:30

Education

IIT Palakkad to ‘lend’ people as books in first-of-its-kind human...

2025-08-03T13:50:39+05:30

India

Kerala’s ‘two-rupee doctor’ Dr A.K. Ryru Gopal dies at 80

2025-08-03T13:30:07+05:30

Cricket

PCB imposes blanket ban on future participation in WCL, cites...

2025-08-03T12:35:14+05:30

India

Amit Malviya accuses Tejashwi Yadav of holding multiple EPICs

2025-08-03T12:30:11+05:30

Entertainment

India to see Akon’s performance in 3 Indian cities in November

2025-08-03T12:26:27+05:30

DEEP READ

Deep Read

Spelling Bee @100: What the former champions say - and achieved?

2025-05-26T11:06:50+05:30

Deep Read

The Trump plan to annex Canada and Greenland as the US 51st state

2025-01-22T16:21:52+05:30

World

The Russian plan: Invade Japan and South Korea

2025-01-16T15:32:24+05:30

Posted On

21 April 2025 12:19 PM GMT

Updated On

2025-04-21T17:49:33+05:30

OpenAI’s o3 model falls short of benchmark expectations, scores lower than initial claim

This score still makes o3 the top-performing model on the benchmark. But, it falls well short of the previously claimed 25%.

OpenAI’s recently released o3 AI model is facing scrutiny after independent testing revealed it scored significantly lower than previously reported on a leading mathematics benchmark.

The discrepancy has sparked debate in the AI community, raising questions about performance transparency and model comparisons.

Back in December 2024, OpenAI showcased its new o3 model during a live announcement, touting its advanced reasoning abilities. During the presentation, the company claimed that o3 achieved an unprecedented 25% score on the FrontierMath benchmark - a rigorous test designed by more than 70 mathematicians to evaluate problem-solving capabilities in AI. The benchmark is considered resistant to overfitting, as it uses entirely unpublished math problems.

However, with the public release of the o3 and o4-mini models last week, Epoch AI - the organisation behind the FrontierMath test - ran its own evaluation and reported a much lower score: just 10%. While this score still makes o3 the top-performing model on the benchmark, it falls well short of the previously claimed 25%.

Importantly, this gap does not suggest OpenAI misled the public.

Experts believe the version of o3 used internally by OpenAI in December likely operated on significantly higher compute resources than the public release. To make the model more efficient for everyday use, it may have been fine-tuned, which possibly reduced some of its raw performance power.

Supporting this, the ARC Prize organization - which oversees the ARC-AGI benchmark for general intelligence - also commented on the discrepancy. They confirmed that the o3 model currently available is not the same as the one tested late last year. According to ARC, the released version has lower computing capabilities but was not trained on ARC-AGI data at any stage.

Both ARC Prize and Epoch AI have announced plans to re-evaluate the newly released o3 and o4-mini models and update their benchmark results accordingly.

Show Full Article

Uddhav Thackeray's resignation is not a matter of joy for us: Rebel...

There are crimes and victims, but no criminals or punishment

Ukraine’s overnight strike: Russian oil depot catches fire

5th Test: Harry Brook corrodes Indian dreams with blazing ton

Pune BJP approaches police to take down loudspeakers from Masjids

Sri Ram Sena leader arrested for poisoning K'ntka school water tank!