Google, OpenAI Boost AI Models; Microsoft Unveils Agentic AI
Google and OpenAI have unveiled significant upgrades to their AI models, while Microsoft is pushing forward with agentic AI capable of performing tasks. Robotics also sees advancements in memory and real-world application.
Google, OpenAI Debut Advanced AI Models; Microsoft Pivots to Actionable AI
The artificial intelligence landscape is rapidly evolving, with major players like Google and OpenAI unveiling significant upgrades to their foundational models, while Microsoft signals a shift towards AI agents capable of performing complex tasks. This week’s developments highlight advancements in multimodal capabilities, reasoning prowess, and the emergence of AI systems designed to execute actions rather than just respond to queries.
Google Enhances Gemini and NotebookLM Capabilities
Google has rolled out several key updates, including the second iteration of its image generation model, NanoBanana 2. This new version, available within the Gemini Pro plan, boasts enhanced detail, advanced world knowledge, precise text rendering and translation, and improved subject consistency for up to five characters and 14 objects. While 4K upscaling was present in previous versions, the integration of aspect ratio control and subject consistency marks notable improvements.
Beyond image generation, Google has upgraded its educational tool, NotebookLM, with a ‘cinematic overview’ feature. This enhancement allows the tool to generate animated and motion-graphic-rich video summaries of information. While the underlying technology for generating on-demand motion graphics is not publicly detailed, the feature, accessible to users of Google’s top-tier Ultra plan ($200-$250/month), promises a novel way to consume and present educational content, albeit with significant generation times.
Furthermore, Google has released Gemini 3.1 Pro, its flagship model designed as a natively multimodal, high-reasoning system. This model excels at processing diverse inputs including video, audio, and images, a capability that sets it apart from many single-modality AI models. The MMU Pro version has seen significant improvements, reaching an estimated 76.8 score, reflecting Google’s focus on building ‘world models’ with strong reasoning, enhanced reliability, and longer, more structured outputs. With a context window of up to 1 million tokens, Gemini 3.1 Pro is adept at handling extensive documents, codebases, and long video or audio files, also supporting function calling and search grounding with Google Search.
OpenAI Counters with GPT-5.4 Pro, Focusing on Edge Cases
In response, OpenAI has launched GPT-5.4 Pro, positioned as its most advanced model, reportedly outperforming others in highly specialized domains. While Gemini 3.1 Pro is favored for its native multimodality, GPT-5.4 Pro reportedly excels in frontier mathematics, computer use, and complex scientific problem-solving. This makes it a potential go-to for scientists, researchers, and professionals engaged in high-stakes technical work. GPT-5.4 Pro is available to ChatGPT Plus and Enterprise users via the API and standard chat interface. OpenAI has also addressed previous issues with reasoning mistakes and standard conversational flow, aiming for more consistent and reliable responses.
Microsoft’s C-Pilot Signals a New Era of AI Agents
Microsoft is making a significant strategic pivot with the introduction of ‘C-Pilot Tasks,’ a new class of AI agents designed to execute tasks autonomously. Described as a ‘to-do list that does itself,’ C-Pilot Tasks allows users to define objectives in natural language, which the AI then plans and executes using its own computing resources and browser access across various applications. This marks a departure from conversational AI, which Microsoft terms the ‘first chapter,’ ushering in a ‘second chapter’ focused on AI that actively performs work.
Potential applications include surfacing urgent emails with draft replies, automatically unsubscribing from promotional mail, tracking apartment listings and booking viewings, generating morning briefings, and creating personalized study plans. C-Pilot Tasks includes safeguards, requiring user consent for significant actions like financial transactions or sending messages on behalf of the user, with options to review, pause, or cancel tasks. Currently in a limited research preview, this initiative positions Microsoft to compete in the growing agentic AI space, targeting mainstream and consumer users.
Microsoft is also navigating public perception, with the term ‘Microslop’ gaining traction on platforms like Discord as a derogatory label for its aggressive AI integration into products, perceived by some as low-value ‘slop.’ The company’s decision to ban the term on its official Discord server has further fueled its use by critics.
Perplexity AI Launches Advanced Digital Worker
Perplexity AI has introduced a sophisticated ‘general-purpose digital worker’ that operates within user interfaces, capable of reasoning, delegating, searching, building, remembering, coding, and delivering results. This system leverages a diverse array of up to 19 AI models, including Claude Opus 4.6 as its core reasoning engine, orchestrating specialized sub-agents for tasks like deep research (Gemini), image processing (NanoBanana), video analysis (V3.1), lightweight tasks (Grok), and long-term recall (GPT-5.2). This model-agnostic approach allows Perplexity to select the optimal AI for each part of a task automatically.
The system operates in isolated compute environments with access to file systems, browsers, and tool integrations. It can manage projects end-to-end, from research and design to coding and deployment, with memory capabilities for recalling past work and connecting to hundreds of services. The Perplexity Max tier, priced at $200 per month, offers substantial usage credits, making it a powerful tool for non-technical power users seeking to automate complex workflows without deep technical setup.
Anthropic Faces Government Scrutiny Over AI Use
Anthropic has found itself at the center of controversy following a public disagreement with the U.S. government regarding the use of its AI models. The company stated it was not ready to allow its Claude AI for applications in mass surveillance and autonomous weapons. This stance led to a public statement from former President Trump, who criticized Anthropic as a ‘radical woke company’ and ordered federal agencies to cease using its technology, mandating a six-month phase-out period and threatening consequences for non-compliance. This situation highlights the complex ethical considerations and governmental oversight challenges in the rapid deployment of advanced AI.
‘Quit GPT’ Movement Gains Momentum Amidst Ethical Concerns
A growing ‘Quit GPT’ movement, estimated to have impacted millions of users, reflects dissatisfaction with OpenAI’s direction. Key catalysts include a $25 million donation by OpenAI President Greg Brockman to a pro-Trump PAC, the reported use of ChatGPT in an ICE resume screening tool, and OpenAI’s acceptance of a Department of Defense contract that Anthropic had refused on ethical grounds related to surveillance and autonomous weapons. These events, coupled with perceived degradation in ChatGPT’s performance, particularly with the 5.2 model, have led users, including public figures, to seek alternatives like Claude.
Robotics Advancements: Memory, Intuition, and Humanoids Enter Factories
In robotics, Stanford researchers have developed FSM (Few-Shot Memory), a system enabling robots to learn physical principles in real-time without full retraining. FSM addresses the gap between abstract AI knowledge and real-world experience by using a three-tier memory system: episodic memory for raw experiences, hypothesis generation for understanding ‘why,’ and the promotion of verified principles for future actions. This approach significantly improves success rates in real-world tasks, allowing robots to develop intuitive learning capabilities akin to humans.
Physical Intelligence, a well-funded robotics AI startup, has introduced MEM (Multiskill Embodied Memory), combining short-term visual tracking with long-term natural language narratives. This allows robots to maintain focus for extended periods, enabling complex tasks like cleaning a kitchen or preparing a meal. The system differentiates between dense visual memory and summarized semantic events, enabling context adaptation and improved task execution, as demonstrated by a robot successfully adapting its grip strategy after an initial failure.
Faraday Future has launched Embodied AI robots, rebranding Chinese-made AGIBot models, including a full-size humanoid starting at $35,000 and a more athletic version at $20,000. Meanwhile, BMW has deployed its first humanoid robots in a European factory, exploring their utility in manufacturing processes. While still in early stages, these developments signal a growing trend towards integrating humanoid robots into industrial settings, promising increased automation and efficiency.
Source: AI News – New Models From Google & OpenAI , AI Drama & Humanoids In Factories (YouTube)





