国产三级大片在线观看-国产三级电影-国产三级电影经典在线看-国产三级电影久久久-国产三级电影免费-国产三级电影免费观看

Set as Homepage - Add to Favorites

【?? ?? ???】OpenAI's o3 and o4

Source:Feature Flash Editor:hotspot Time:2025-07-02 22:48:27

By OpenAI's own testing,?? ?? ??? its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1.

First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination rate is 33 percent, and o4-mini's hallucination rate is 48 percent — almost half of the time. By comparison, o1's hallucination rate is 16 percent, meaning o3 hallucinated about twice as often.

SEE ALSO: All the AI news of the week: ChatGPT debuts o3 and o4-mini, Gemini talks to dolphins

The system card noted how o3 "tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims." But OpenAI doesn't know the underlying cause, simply saying, "More research is needed to understand the cause of this result."


You May Also Like

OpenAI's reasoning models are billed as more accurate than its non-reasoning models like GPT-4o and GPT-4.5 because they use more computation to "spend more time thinking before they respond," as described in the o1 announcement. Rather than largely relying on stochastic methods to provide an answer, the o-series models are trained to "refine their thinking process, try different strategies, and recognize their mistakes."

However, the system card for GPT-4.5, which was released in February, shows a 19 percent hallucination rate on the PersonQA evaluation. The same card also compares it to GPT-4o, which had a 30 percent hallucination rate.

Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. Thanks for signing up!

In a statement to Mashable, an OpenAI spokesperson said, “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability.”

Evaluation benchmarks are tricky. They can be subjective, especially if developed in-house, and research has found flaws in their datasets and even how they evaluate models.

Plus, some rely on different benchmarks and methods to test accuracy and hallucinations. HuggingFace's hallucination benchmark evaluates models on the "occurrence of hallucinations in generated summaries" from around 1,000 public documents and found much lower hallucination rates across the board for major models on the market than OpenAI's evaluations. GPT-4o scored 1.5 percent, GPT-4.5 preview 1.2 percent, and o3-mini-high with reasoning scored 0.8 percent. It's worth noting o3 and o4-mini weren't included in the current leaderboard.

That's all to say; even industry standard benchmarks make it difficult to assess hallucination rates.


Related Stories
  • Is OpenAI building a social network for ChatGPT's viral image generator?
  • We tried the ChatGPT 'reverse location search' trend, and it's scary
  • The latest ChatGPT trend? People are using it to turn their pets into humans.

Then there's the added complexity that models tend to be more accurate when tapping into web search to source their answers. But in order to use ChatGPT search, OpenAI shares data with third-party search providers, and Enterprise customers using OpenAI models internally might not be willing to expose their prompts to that.

Regardless, if OpenAI is saying their brand-new o3 and o4-mini models hallucinate higher than their non-reasoning models, that might be a problem for its users.

UPDATE: Apr. 21, 2025, 1:16 p.m. EDT This story has been updated with a statement from OpenAI.

0.1683s , 9824.421875 kb

Copyright © 2025 Powered by 【?? ?? ???】OpenAI's o3 and o4,Feature Flash  

Sitemap

Top 主站蜘蛛池模板: 国产69式性姿免费视频穿越剧 | 无码专区人妻系列日韩视频 | 亚洲丁香婷婷综合久久小说 | 国产三级日本三级在线播放 | 国产亚洲人在线观看 | 中文人妻熟妇乱又伦精品 | 无码成人片久久 | 欧美亚洲另类在线观看 | 久久精品www | 亚洲国产熟妇无码一区二区 | 欧美 亚洲 另类 综合网 | 久久精品国产三级不卡 | 国语对白免费观看网址 | 亚洲国产精品久久大片 | 亚洲欧美日韩精品在线 | 亚洲国产成人精品无码区在线观 | 国色一卡2卡3卡4卡在线新区 | 天美传媒mv免费观看英雄救美 | 亚洲欧美日韩国产精品专区 | 精品国产一级毛片国语版 | 亚洲av无码成人网站国产 | 天天鲁一区摸一摸爽一爽 | 欧美禽牲交在线观看 | 国产高潮国产高潮久久久m3u8 | 日本免费一区二区三区最新vr | 久久精品无码一区二区三区 | 国产乱伦无码伦av在线a | 国产美女视频一区二区二三区 | 精品偷拍一区二区三区在线看 | a码片全部免费无码播放 | 91成人爽a毛片一区二区动漫 | 国产国产人精品视频69 | 国产成人无码av在线观看乳水 | 高清国产精品热舞在线一区二区三区 | 91国偷自产中文字幕婷婷 | 在线观看黄色的网站 | 狠狠色狠狠色综合久久伊人 | 99国产精品人妻无码免费 | 91福利精品老师国产自产在线 | 黄色一级片免费在线观看 | 一区二区三区精密机械 |