Google Rethinks AI Intelligence Testing
Google DeepMind has introduced a new framework for testing artificial general intelligence (AGI) that functions like an IQ test for AI. It evaluates systems across 10 cognitive abilities, comparing them to human performance to create a detailed profile rather than relying on single scores. This aims to bring scientific rigor to the debate on AI progress.
Google Rethinks AI Intelligence Testing
Google DeepMind has proposed a new way to measure artificial general intelligence (AGI), aiming to settle long-standing debates about AI progress. They introduced a framework that acts like an IQ test for AI, evaluating its abilities across 10 different cognitive areas. This approach moves away from single scores and instead creates a detailed profile of an AI’s strengths and weaknesses, compared directly to human performance.
The challenge in AI development has been the lack of agreement on what AGI truly means. Different major AI labs like OpenAI and Google have offered various definitions, often based on economic value or general cognitive tasks. This new framework, detailed in a research paper, seeks to provide a more standardized and scientific method for tracking progress.
Breaking Down Intelligence into 10 Cognitive Faculties
The proposed system uses a “cognitive taxonomy” based on decades of research in psychology and neuroscience. These 10 faculties are seen as the building blocks of human cognition:
- Perception: The ability to see, hear, and read, and truly understand the information.
- Generation: Creating useful outputs like text, speech, or actions.
- Attention: Focusing on important information and ignoring distractions.
- Learning: Acquiring new knowledge in real-time, not just during initial training.
- Memory: Storing, recalling, and even forgetting information over time.
- Reasoning: Drawing logical conclusions through deduction, induction, and math.
- Meta-cognition: Knowing what it knows and recognizing uncertainty, a key area where current AI often struggles.
- Executive Functions: Planning, controlling impulses, and adapting strategies to achieve goals.
- Problem-Solving: Using perception, reasoning, and planning to tackle new challenges.
- Social Cognition: Understanding social cues, inferring others’ thoughts, and cooperating.
Importantly, this framework focuses on what an AI can achieve, not how it achieves it. The underlying technology, whether it’s transformers or something new, doesn’t matter as much as the final results.
A Three-Stage Testing Process
Google DeepMind outlines a three-stage evaluation protocol:
- Cognitive Assessment: AI systems are tested on specific tasks designed to isolate each cognitive faculty. These tests must be private to prevent memorization and independently verified. This addresses the major issue of “data contamination” in current AI benchmarks, where models might simply recall answers they saw during training rather than truly understanding.
- Human Baselines: The same tasks are given to a large, representative group of humans with at least a high school education. This establishes a clear benchmark of typical human performance across each ability.
- Cognitive Profiles: The AI’s performance is plotted against the human data, creating a visual “radar chart.” This chart shows exactly where an AI excels and where it falls short compared to human capabilities.
A system scoring at the 99th percentile across all 10 faculties would be a significant milestone, though the paper notes it wouldn’t definitively prove AGI. Even advanced AI today often shows a “jagged frontier,” performing exceptionally in some areas while struggling significantly in others, like a child’s basic counting ability.
Limitations and Next Steps
The framework acknowledges its limitations. It doesn’t currently measure:
- Speed: How quickly an AI can perform tasks, which is critical for real-world applications like self-driving cars.
- System Propensities: Behavioral tendencies like risk aversion or alignment with human values, which are crucial for safe deployment.
- Creativity: While related cognitive processes are covered, pure creativity remains hard to measure objectively.
The paper also discusses the challenge of testing AI systems that use tools or external resources, likening it to giving a human a calculator during an IQ test. The goal is to test the AI’s inherent abilities, not just its ability to use other tools effectively.
Putting the Framework into Practice
Google isn’t just proposing a theory; they are investing in it. They have launched a $200,000 Kaggle hackathon encouraging researchers worldwide to develop the specific evaluation tasks needed for this framework. The competition focuses on the areas with the biggest evaluation gaps: learning, meta-cognition, attention, executive functions, and social cognition. The results are expected to be announced in June.
This initiative aims to bring scientific rigor to the often speculative discussions about AGI timelines. Without a shared measurement system, claims about when AGI might arrive are based more on intuition than data. Google’s framework offers a concrete path toward understanding and quantifying AI progress.
Why This Matters
Developing a reliable way to measure AI intelligence is crucial for several reasons. It allows researchers to track progress objectively, identify specific areas needing improvement, and better understand the capabilities and limitations of AI systems. This clarity is essential for responsible development, safety, and deployment decisions. By creating a standardized cognitive profile, Google’s approach could significantly advance the scientific understanding of artificial intelligence and its journey toward human-level capabilities.
Source: Google Just Changed the Definition of AGI (YouTube)





