GPT-5.4 Unleashed: AI Masters Computer Use and Outperforms Experts
OpenAI has unveiled GPT-5.4, a groundbreaking AI model featuring native computer use capabilities and outperforming human experts on complex tasks. The release also includes specialized financial tools and raises questions about AI's growing impact on the labor market.
GPT-5.4 Arrives with Native Computer Use, Outperforming Human Experts
The artificial intelligence landscape is buzzing with a significant announcement: the release of GPT-5.4. This latest iteration from OpenAI appears to represent a substantial leap forward, particularly with its newly integrated computer use capabilities and impressive performance on tasks previously dominated by human experts. The sentiment from industry observers, like Noam Brown, suggests a rapid acceleration in AI development, with the phrase “we see no wall” capturing the feeling of relentless progress.
GPT-5.4 Dominates Expert Benchmarks
One of the most striking aspects of GPT-5.4 is its performance on the GDP Val benchmark. This benchmark is designed to evaluate AI models against human experts, who typically possess around 12-14 years of experience in their respective fields. These experts create rubrics to grade complex project deliverables across various industries, from manufacturing engineering to order processing and production.
Historically, AI models have struggled to match the nuanced judgment and quality of work produced by seasoned professionals. However, GPT-5.4, and its Pro variant, are now achieving remarkable success rates. GPT-5.4 Pro, for instance, reportedly scores an 82% win or tie rate against human expert deliverables, with a direct win rate of around 70%. This means that in a majority of cases, the AI’s output is judged to be equal to or better than that of a human with years of specialized experience. This advancement raises significant questions about the future of certain professions and the potential for AI to automate complex, high-skill tasks.
Native Computer Use: A New Frontier for AI Agents
Beyond its prowess in expert-level tasks, GPT-5.4 introduces native computer use capabilities, a feature described as unprecedented in a general-purpose model. This integration allows AI agents to perform tasks across websites and software systems by directly interacting with computer interfaces. Developers can leverage this to build sophisticated agents capable of navigating digital environments, issuing commands, and responding to visual cues.
The model demonstrates proficiency in writing code for browser automation, such as using Playwright, and can interpret screenshots to execute mouse and keyboard commands. In the OS World benchmark, which tests a model’s ability to navigate a desktop environment, GPT-5.4 achieved a state-of-the-art 75% success rate. This not only surpasses its predecessor, GPT-5.2 (which scored 47%), but also exceeds human performance, which stands at 72.4%. This capability opens up new avenues for AI in areas like software testing, troubleshooting visual applications, and even automated game development and testing, as evidenced by a developer who used GPT-5.4 and Codex to build and test an RPG.
Anthropic Faces Supply Chain Risk Label, Challenges Ruling
In parallel to OpenAI’s advancements, a significant development has emerged concerning Anthropic. The company has been officially labeled a “supply chain risk” by the Department of War. While this news is concerning, Anthropic has indicated it will challenge the ruling in court. Crucially, the label appears to apply specifically to the use of Anthropic’s Claude models as a direct part of contracts with the Department of War, rather than all customer use under such contracts. Anthropic is reportedly back in negotiations with the government, and the company is hopeful that the narrow scope of the ruling will mitigate its impact.
AI’s Impact on the Labor Market
Adding to the day’s significant news, Anthropic also released a report on the labor market impacts of AI. The findings suggest that while widespread, immediate impacts are not yet evident, there is a noticeable slowdown in hiring for early-career professionals. Job growth appears to be particularly affected for individuals in the first few years after college as they attempt to build skills and enter the workforce. This aligns with previous research, including a Stanford paper that utilized Anthropic’s data, highlighting that current workplace automation represents only a fraction of what is technically possible.
OpenAI’s Strategic Moves and Financial Focus
OpenAI appears to be adopting strategies previously seen from Anthropic, introducing “skills” and features that facilitate migration from Anthropic to OpenAI platforms. They have also launched “ChatGPT for Excel,” indicating a move towards specialized tools for business applications.
Furthermore, OpenAI is rolling out a suite of financial service tools, mirroring Anthropic’s expansion into specialized sectors like legal and cybersecurity. OpenAI specifically identifies the financial industry as its next major target, with a representative stating that finance is poised to benefit from AI model improvements more acutely than any other field after software engineering. GPT-5.4 is reportedly the top-scoring model on an internal investment banking benchmark, achieving an 87% success rate on tasks like financial modeling, scenario analysis, and data extraction, significantly outperforming previous models like GPT-5.2 Pro (71%) and Opus 4.6 (64%).
Additional features for GPT-5.4 include a “priority mode” for faster responses and the ability to interrupt and redirect the model mid-generation, offering greater user control.
Talent Movement in the AI Field
The dynamic nature of the AI industry is also highlighted by a key personnel move: Max Schwarzer, a prominent OpenAI researcher involved in GPT-5 and the development of reasoning paradigms, has departed OpenAI to join Anthropic. Schwarzer cited a desire to work with trusted colleagues who have moved to Anthropic in recent years, indicating a potential talent drain from OpenAI to its competitor.
Broader AI Ecosystem Developments
The GPT-5.4 release is situated within a broader wave of AI advancements. This includes OpenAI’s new research on chain-of-thought controllability, Google’s recent release of Gemini 3.1 Flash, and the beta 2 release of Grok 4.20.
The Dawn of a New Era?
The capabilities demonstrated by GPT-5.4, particularly its native computer use and expert-level performance, suggest a pivotal moment in AI development. The ability for AI to not only generate sophisticated outputs but also to directly interact with and manipulate digital environments marks a significant shift. For developers and users alike, this heralds a future where AI agents can undertake complex, multi-step tasks with greater autonomy and effectiveness, potentially reshaping industries and workflows across the board.
Source: GPT 5.4 "we see no wall" (YouTube)





