GPT-5.4 Unleashed: AI Masters Computer Use and Outperforms Experts – OVEX NEWS

BREAKING

Wemby Dominates, LeBron Sets Record, Tatum Returns!

Reese’s Brother Leans on Sister Angel’s Advice

ICE Ignored Violent Warnings, Fueled Escalation: Report

Trump’s DHS Pick Faces Scrutiny Amidst Iran Tensions

Iran War Threatens Global Economy: Strait of Hormuz Closure Sparks Crisis

Trump’s Ruthless Purge: Noem’s Downfall Signals a New Era

Walz Vows ‘Never Forget’ as Noem Departs, Demands Accountability

Trump Fires DHS Secretary Noem Over Minneapolis Fallout

Seal’s Adorable Duck Toy Friendship Captivates Online

Iranian Resilience: Survivor Shares Hope Amidst Unrest

Wemby Dominates, LeBron Sets Record, Tatum Returns!

Reese’s Brother Leans on Sister Angel’s Advice

ICE Ignored Violent Warnings, Fueled Escalation: Report

Trump’s DHS Pick Faces Scrutiny Amidst Iran Tensions

Iran War Threatens Global Economy: Strait of Hormuz Closure Sparks Crisis

Trump’s Ruthless Purge: Noem’s Downfall Signals a New Era

Walz Vows ‘Never Forget’ as Noem Departs, Demands Accountability

Trump Fires DHS Secretary Noem Over Minneapolis Fallout

Seal’s Adorable Duck Toy Friendship Captivates Online

Iranian Resilience: Survivor Shares Hope Amidst Unrest

GPT-5.4 Unleashed: AI Masters Computer Use and Outperforms Experts

GPT-5.4 Arrives with Native Computer Use, Outperforming Human Experts

The artificial intelligence landscape is buzzing with a significant announcement: the release of GPT-5.4. This latest iteration from OpenAI appears to represent a substantial leap forward, particularly with its newly integrated computer use capabilities and impressive performance on tasks previously dominated by human experts. The sentiment from industry observers, like Noam Brown, suggests a rapid acceleration in AI development, with the phrase “we see no wall” capturing the feeling of relentless progress.

GPT-5.4 Dominates Expert Benchmarks

One of the most striking aspects of GPT-5.4 is its performance on the GDP Val benchmark. This benchmark is designed to evaluate AI models against human experts, who typically possess around 12-14 years of experience in their respective fields. These experts create rubrics to grade complex project deliverables across various industries, from manufacturing engineering to order processing and production.

Historically, AI models have struggled to match the nuanced judgment and quality of work produced by seasoned professionals. However, GPT-5.4, and its Pro variant, are now achieving remarkable success rates. GPT-5.4 Pro, for instance, reportedly scores an 82% win or tie rate against human expert deliverables, with a direct win rate of around 70%. This means that in a majority of cases, the AI’s output is judged to be equal to or better than that of a human with years of specialized experience. This advancement raises significant questions about the future of certain professions and the potential for AI to automate complex, high-skill tasks.

Native Computer Use: A New Frontier for AI Agents

Beyond its prowess in expert-level tasks, GPT-5.4 introduces native computer use capabilities, a feature described as unprecedented in a general-purpose model. This integration allows AI agents to perform tasks across websites and software systems by directly interacting with computer interfaces. Developers can leverage this to build sophisticated agents capable of navigating digital environments, issuing commands, and responding to visual cues.

The model demonstrates proficiency in writing code for browser automation, such as using Playwright, and can interpret screenshots to execute mouse and keyboard commands. In the OS World benchmark, which tests a model’s ability to navigate a desktop environment, GPT-5.4 achieved a state-of-the-art 75% success rate. This not only surpasses its predecessor, GPT-5.2 (which scored 47%), but also exceeds human performance, which stands at 72.4%. This capability opens up new avenues for AI in areas like software testing, troubleshooting visual applications, and even automated game development and testing, as evidenced by a developer who used GPT-5.4 and Codex to build and test an RPG.

Anthropic Faces Supply Chain Risk Label, Challenges Ruling

In parallel to OpenAI’s advancements, a significant development has emerged concerning Anthropic. The company has been officially labeled a “supply chain risk” by the Department of War. While this news is concerning, Anthropic has indicated it will challenge the ruling in court. Crucially, the label appears to apply specifically to the use of Anthropic’s Claude models as a direct part of contracts with the Department of War, rather than all customer use under such contracts. Anthropic is reportedly back in negotiations with the government, and the company is hopeful that the narrow scope of the ruling will mitigate its impact.

AI’s Impact on the Labor Market

Adding to the day’s significant news, Anthropic also released a report on the labor market impacts of AI. The findings suggest that while widespread, immediate impacts are not yet evident, there is a noticeable slowdown in hiring for early-career professionals. Job growth appears to be particularly affected for individuals in the first few years after college as they attempt to build skills and enter the workforce. This aligns with previous research, including a Stanford paper that utilized Anthropic’s data, highlighting that current workplace automation represents only a fraction of what is technically possible.

OpenAI’s Strategic Moves and Financial Focus

OpenAI appears to be adopting strategies previously seen from Anthropic, introducing “skills” and features that facilitate migration from Anthropic to OpenAI platforms. They have also launched “ChatGPT for Excel,” indicating a move towards specialized tools for business applications.

Furthermore, OpenAI is rolling out a suite of financial service tools, mirroring Anthropic’s expansion into specialized sectors like legal and cybersecurity. OpenAI specifically identifies the financial industry as its next major target, with a representative stating that finance is poised to benefit from AI model improvements more acutely than any other field after software engineering. GPT-5.4 is reportedly the top-scoring model on an internal investment banking benchmark, achieving an 87% success rate on tasks like financial modeling, scenario analysis, and data extraction, significantly outperforming previous models like GPT-5.2 Pro (71%) and Opus 4.6 (64%).

Additional features for GPT-5.4 include a “priority mode” for faster responses and the ability to interrupt and redirect the model mid-generation, offering greater user control.

Talent Movement in the AI Field

The dynamic nature of the AI industry is also highlighted by a key personnel move: Max Schwarzer, a prominent OpenAI researcher involved in GPT-5 and the development of reasoning paradigms, has departed OpenAI to join Anthropic. Schwarzer cited a desire to work with trusted colleagues who have moved to Anthropic in recent years, indicating a potential talent drain from OpenAI to its competitor.

Broader AI Ecosystem Developments

The GPT-5.4 release is situated within a broader wave of AI advancements. This includes OpenAI’s new research on chain-of-thought controllability, Google’s recent release of Gemini 3.1 Flash, and the beta 2 release of Grok 4.20.

The Dawn of a New Era?

The capabilities demonstrated by GPT-5.4, particularly its native computer use and expert-level performance, suggest a pivotal moment in AI development. The ability for AI to not only generate sophisticated outputs but also to directly interact with and manipulate digital environments marks a significant shift. For developers and users alike, this heralds a future where AI agents can undertake complex, multi-step tasks with greater autonomy and effectiveness, potentially reshaping industries and workflows across the board.

Source: GPT 5.4 "we see no wall" (YouTube)

Tags: AI Capabilities Artificial Intelligence Computer Vision GPT-5.4 Labor Market Impact

Written by

Joshua D. Ovidiu

I enjoy writing.

4,474 articles published

Leave a Comment