Hermes Agent Masters Browsers, Nearing AGI

Hermes agent, a rapidly growing AI tool, has integrated with 'browser harness' to perform complex web tasks, nearing human-level computer interaction. This combination allows the AI to see, click, and type on websites, enabling sophisticated automation and self-improvement.

3 hours ago
4 min read

Hermes Agent Masters Browsers, Nearing AGI

A new AI tool called Hermes agent is rapidly gaining popularity, showing impressive growth on GitHub and potentially challenging established platforms. Its recent integration with a tool called ‘browser harness’ allows it to perform complex tasks across the internet, bringing it closer to human-level computer interaction. This combination of advanced AI and web control promises to unlock new possibilities for automation and assistance.

The speed at which Hermes agent has grown is remarkable, reaching 100,000 GitHub stars faster than almost any other project. This rapid development is fueled by frequent updates, with the team releasing multiple major versions and hundreds of changes in just a few weeks. This constant improvement suggests a highly active and dedicated development community behind the project.

What Can Hermes Agent Do?

Hermes agent can already perform a wide range of tasks. One notable example shows it ‘jailbreaking’ an AI model, enabling it to answer any question. This was achieved with minimal human input, demonstrating the agent’s ability to learn and adapt.

More practically, Hermes agent has been used to create a full video in Mandarin. This involved generating HTML, text-to-speech, and rendering a video file. Such capabilities could significantly impact marketing and education, especially for language learning.

Another impressive feat was creating unique animated graphics for a creative hackathon. The agent produced high-quality animations that felt custom and branded, moving beyond typical AI-generated content.

Browser Harness: The Key to Web Control

The ability for Hermes agent to interact with the internet like a human is largely thanks to the new ‘browser harness’ project. This tool is described as a thin, self-healing system that gives AI models full freedom to complete any task within a web browser.

Think of it this way: Hermes agent is the brain, and browser harness is the hands. If the AI needs to see, click, or type on a website, browser harness enables it to do so. It can even create new functions or ‘skills’ if the default tools aren’t enough.

The creators are so confident in browser harness that they’ve offered a free Mac Mini to anyone who can find a task it cannot complete. This highlights the comprehensive nature of its web interaction capabilities.

Setting Up Hermes Agent with Browser Harness

To run Hermes agent and browser harness continuously, a Virtual Private Server (VPS) is recommended. Hostinger offers a streamlined setup process with pre-built options for Hermes agent, making it easier for users to deploy. The process involves selecting a server plan, location, and then deploying the Hermes agent application.

Once the VPS is set up, users can access the Hermes agent interface. Here, they can choose which AI model to use, such as Opus 4.7 from OpenRouter, which provides access to a wide variety of AI models. Users will need to add their API key from a service like OpenRouter to enable model access.

The next step involves installing browser harness. While some initial commands can be run directly, a more robust installation involves cloning the browser harness repository and running its installer script. This script ensures all necessary dependencies, like Python, Git, and Node.js, are installed, and even sets up headless Chrome for browser interaction directly on the VPS.

For easier interaction and visibility, using browser harness cloud is suggested. Users can obtain an API key from browser.cloud, which allows them to see the agent’s actions in real-time as it clicks and navigates websites.

Real-World Applications and Future Potential

With Hermes agent and browser harness combined, users can delegate complex online tasks. This includes scraping data from websites like Hacker News, posting on social media, or even making purchases on e-commerce sites. The agent can autonomously navigate to websites, extract specific information like titles, scores, and URLs, and save it in structured formats like JSON.

A particularly advanced demonstration involved the agent visiting a YouTube channel, identifying the 12 most recent videos, and compiling them into a single grid image saved as a PNG. This complex task, which would take a human significant time and effort, was handled by the AI.

Crucially, the browser harness system is designed to be self-improving and self-healing. When the agent encounters new websites or complex tasks, it can create new skills or update existing ones. This means that over time, the agent becomes more efficient and capable, learning from each interaction and building a knowledge base for future tasks.

The agent also identified potential issues and quirks when interacting with websites, documenting them in a ‘domain skills’ file. This proactive knowledge building suggests a path towards more sophisticated AI agents that not only perform tasks but also contribute to a shared understanding of the digital world.

The combination of Hermes agent and browser harness represents a significant step towards artificial general intelligence (AGI), enabling AI to interact with the digital world in a way that closely mirrors human capabilities. This setup is available now for users who wish to explore its potential.


Source: Hermes Agent + Browser Harness = Local AGI (YouTube)

Written by

Joshua D. Ovidiu

I enjoy writing.

19,616 articles published
Leave a Comment