Google Releases Open AI Models That Run Locally

Google has launched Gemma 4, a new family of open AI models designed to run on personal devices like phones and laptops. This move allows for greater privacy, cost savings, and accessibility in AI development and usage.

3 hours ago
4 min read

Google Drops Open AI Models for Local Use

Google has surprised the AI community by releasing Gemma 4, a new family of open AI models. These models are designed to run directly on personal devices like phones, laptops, and desktops. This move allows developers and users to run powerful AI tools without relying on cloud servers, offering greater privacy and potentially lower costs.

What is Gemma 4?

Gemma 4 models are built using the same advanced research and technology behind Google’s Gemini AI. For the first time, Google is releasing these models under the open-source Apache 2.0 license. This means anyone can use, modify, and distribute the models freely.

Google designed Gemma 4 for the growing need for AI agents that can perform complex tasks. These models can handle difficult logic, plan multi-step actions, and manage tasks efficiently. They use tokens, which are like digital pieces of information, smartly to power their intelligence.

Model Sizes and Capabilities

The Gemma 4 family includes several models:

  • 26B (26 billion parameters): This is a “mixture of experts” model. It’s very fast and uses about 3.8 billion parameters when active. It’s great for running advanced reasoning and coding tasks on your personal computer without sending data elsewhere.
  • 31B (31 billion parameters): This is a “dense” model, meaning all its parameters are active. It’s optimized for the highest output quality.
  • 2B and 4B (2 billion and 4 billion parameters): These smaller models are built for maximum efficiency, especially on devices with limited memory. They bring strong AI capabilities to mobile phones and IoT devices.

The larger models, like the 31B, can handle a “context window” of up to 4 million tokens. This allows them to process and understand very large amounts of information, such as entire codebases, or engage in long, complex conversations.

Running AI Locally: Why It’s a Big Deal

One of the most significant aspects of Gemma 4 is its ability to run locally. This means you can download the models and use them on your own hardware. You don’t need a constant internet connection or to pay for cloud computing services to run them.

For example, the 31B model is said to offer performance comparable to models like Google’s own Gemini 2.5 Pro, but it’s a much smaller model. If you have enough RAM on your computer, you can run it privately and securely. This avoids sending your sensitive data to external servers.

The smaller 2B and 4B models are especially impressive for their efficiency. Google demonstrated a 4B model running on an iPhone 15 Pro, using minimal RAM. This opens up possibilities for advanced AI features directly on smartphones and other small devices.

Multimodal and Multilingual Support

Gemma 4 models are designed to be multimodal, meaning they can understand and process different types of information, not just text. They have native support for audio and can process images. This allows for real-time interactions that involve seeing and hearing the world.

The models also support over 140 languages. This broad language capability makes them useful for a global audience and a wide range of applications.

Security and Trust

Developed by Google DeepMind, Gemma 4 models undergo the same strict security checks as Google’s private AI products. This provides a trusted foundation for businesses and developers building AI applications.

Performance vs. Size

While benchmarks might initially suggest these models aren’t top-tier compared to the largest cloud-based models, their efficiency is where they shine. When considering model performance versus their size (number of parameters), Gemma 4 models are highly efficient. For instance, a 31B parameter model is about ten times smaller than some of the largest commercial models. This means they require significantly less computing power, making them practical for local use.

This efficiency is crucial for the open-source community. Previously, powerful open-source models often required expensive cloud infrastructure to run. Gemma 4 offers a way to get high-quality AI performance at a fraction of the cost and computational footprint, even on personal devices.

Why This Matters

The release of Gemma 4 marks a significant step towards democratizing advanced AI. By making powerful models freely available and runnable on local hardware, Google is empowering a wider range of users.

  • Privacy: Users can keep their data private by running AI tasks on their own devices.
  • Cost Savings: Running models locally eliminates the ongoing costs associated with cloud-based AI services.
  • Accessibility: Developers can experiment and build AI applications without needing massive computing resources.
  • Innovation: The open-source nature encourages rapid development and new use cases, especially for AI agents that can act on behalf of users.
  • On-Device AI: This push towards local processing could lead to a future where many AI tasks are handled directly by our phones and computers, making AI more integrated into our daily lives.

Google has made the Gemma 4 model weights available for download. Developers can start experimenting with them today using familiar tools. Google plans to release guides on how to use Gemma 4 locally on various devices, including phones.


Source: Googles Gemma 4 Just Shocked The AI Industry (YouTube)

Written by

Joshua D. Ovidiu

I enjoy writing.

13,458 articles published
Leave a Comment