OpenAI Unleashes GPT-5.4, Challenging Competitors

OpenAI has launched GPT-5.4, showcasing significant improvements in benchmarks and real-world applications, including coding and complex research. The new model is available via API and integrated into ChatGPT and Codex, offering a more cost-effective alternative to competitors like Claude Opus, while still presenting areas where other models maintain an edge.

2 hours ago
4 min read

OpenAI Launches GPT-5.4, Boosting Capabilities and API Access

OpenAI has officially released GPT-5.4, a significant upgrade to its large language model, promising enhanced performance across a wide range of tasks. The new model is now available via the API and integrated into OpenAI’s Codex platform, with a phased rollout to ChatGPT users beginning with Pro subscribers, followed by Plus users, and eventually all users.

Performance Benchmarks and Comparisons

Early benchmarks suggest GPT-5.4 offers notable improvements over its predecessor, GPT-4.6. In the OS World Verify benchmark for computer use, GPT-5.4 demonstrated a 2.3% improvement. The web browsing benchmark also showed gains, and on the valuable knowledge tasks (VKT) benchmark, which reflects real-world applications, GPT-5.4 achieved a 5% increase over GPT-4.6. Notably, the ‘Thinking’ version of GPT-5.4 appears more powerful than the ‘Pro’ version in some benchmarks, though the ‘Pro’ version excelled in agentic browsing, outperforming GPT-4.6 by 5.3%.

In scientific reasoning, specifically the GPQA Diamond benchmark, GPT-5.4 shows a substantial leap, outperforming GPT-4.6 by 3%. While its performance is comparable to Gemini 3.1 Pro on this metric, the improvement is considered significant. On the ARGI 2 benchmark, which heavily emphasizes image processing, GPT-5.4’s ‘Deep Think’ version performed closely to Gemini 3.1 Pro, a notable achievement given Google’s advantage in image-centric datasets.

Real-World Applications and Demonstrations

Beyond benchmarks, GPT-5.4’s practical capabilities are being showcased. A compelling demonstration involved a prompt to create a 30-second visual summary of the current Premier League season using real-time web data. The output featured dynamic motion graphics and rankings, grounded in current information, highlighting the model’s ability to synthesize dynamic, real-world data.

OpenAI has also integrated GPT-5.4 into Codex with a ‘Slash Fast’ mode, allowing users to expedite inference speeds. While this feature incurs additional costs, it offers a performance boost for users who require faster processing, such as those running multiple AI agents concurrently.

Deeper Dives: ‘Thinking’ vs. ‘Pro’ and Advanced Research

GPT-5.4 is available in two main variants: ‘Thinking’ and ‘Pro’. The ‘Pro’ version is exclusive to ChatGPT Pro subscribers ($200/month) and is described as a high-performance model. The ‘Thinking’ version is also accessible to these users.

A demonstration of GPT-5.4’s advanced research capabilities involved generating a comprehensive preparation guide for a potential World War II scenario, assuming escalation from the Iran conflict. This task required extensive web searching across hundreds of sources. The model’s output was detailed, including specific recommendations for safety, finances, supplies, and digital security, structured across immediate, 30-day, and long-term plans.

Summarization Skills: GPT-5.4 vs. Claude Opus 4.6

To test summarization, the extensive World War II preparation report generated by GPT-5.4 was fed into both GPT-5.4 and Claude Opus 4.6 with a prompt for a concise TL;DR. GPT-5.4’s summary was praised for its superior structure and readability, presenting information in a more organized and prioritized manner compared to Opus 4.6. This suggests an improvement in GPT-5.4’s ability to distill complex information effectively.

Coding and Development Capabilities

In coding tasks, GPT-5.4 demonstrated advancements, particularly in visual representation. A comparison with Opus 4.6 on a ‘Space Invaders’ style game showed GPT-5.4 producing superior graphics, animations, and physics, making the game more visually appealing and functional.

Codex, powered by GPT-5.4, is being used to develop complex applications. One example showcased a full RPG game built using GPT-5.4 and Playwright for testing, featuring AI-generated visuals. Another impressive feat was the creation of a Minecraft clone in 24 minutes, a task that reportedly took the original developer months. These examples highlight AI’s accelerating potential in game development and software creation.

Pricing and Cost-Effectiveness

GPT-5.4 is priced at $2.5 per million input tokens and $15 per million output tokens. This is notably more cost-effective than Claude Opus, which is priced at $5 per million input tokens and $25 per million output tokens. This pricing structure positions GPT-5.4 as a more accessible, yet potentially more powerful, alternative.

Comparative Analysis: Strengths and Weaknesses

While GPT-5.4 shows strong performance in benchmarks and certain practical applications like coding and summarization, direct comparisons with Claude Opus 4.6 reveal nuanced differences. In creative writing, Opus 4.6 was favored for its conciseness and narrative coherence. Similarly, for explaining complex technical papers like the ‘Attention Is All You Need’ transformer paper, Opus 4.6 provided a clearer, more pedagogical explanation, allocating more words where detail was needed.

In political analysis, Opus 4.6 was faster and more decisive, clearly stating fault, whereas GPT-5.4’s response was more hedged and took significantly longer, failing to meet the prompt’s requirement for decisiveness. However, in visual reasoning, specifically estimating calories from an image, GPT-5.4 performed better, though still with considerable inaccuracy compared to the actual meal’s calorie count.

Why This Matters

The release of GPT-5.4 signifies a critical step in the AI arms race, directly challenging competitors like Anthropic. Its enhanced capabilities in coding, data synthesis, and complex research tasks, coupled with a more competitive pricing model, make it a compelling option for developers and businesses. The ongoing improvements in AI models are democratizing complex tasks, accelerating innovation cycles, and enabling the creation of sophisticated applications that were previously resource-prohibitive.

The nuanced performance across different tasks underscores the importance of task-specific evaluation rather than relying solely on benchmarks. While GPT-5.4 excels in some areas, established models like Claude Opus 4.6 maintain an edge in others, such as clear explanations and decisive analysis. This competitive landscape drives rapid advancement, pushing the boundaries of what AI can achieve.


Source: Codex with GPT 5.4 is insane… just watch (YouTube)

Written by

Joshua D. Ovidiu

I enjoy writing.

4,444 articles published
Leave a Comment