AI Model Generates Novel Cancer Drug Hypothesis
AI is making waves in drug discovery with a new model generating a novel cancer drug hypothesis. Meanwhile, debates on AGI continue, with benchmarks highlighting both progress and limitations in areas like continual learning, as researchers express caution about real-time user-integrated training.
AI Model Generates Novel Cancer Drug Hypothesis
In a significant stride for AI-driven scientific discovery, a language model has successfully generated a novel hypothesis for a cancer treatment drug. The model, named C2S-scale, demonstrated the ability to predict how cells would react to drugs, particularly in relation to interferon, a key component in making cold cancer tumors detectable by the immune system.
This groundbreaking work, detailed in a 29-page paper, involved training a language model based on Google’s Gemma 2 architecture. While not the latest iteration of Gemma, this model underwent specialized training using reinforcement learning, rewarding it for accurately predicting cellular responses to drugs. The core innovation lies in the model’s ability to translate complex gene activity within cells into a form that a large language model (LLM) can process, effectively allowing it to ‘read’ and understand biological processes much like it reads text.
Technically, C2S-scale was trained to identify drugs that could reliably amplify the effects of interferon, especially in scenarios where interferon is present but not sufficiently active. This mirrors how an LLM learns to predict the continuation of a sentence, but applied to the complex interactions of the immune system and drug therapies.
The researchers noted that scaling the model to 27 billion parameters led to consistent improvements. While this is considerably smaller than some of the largest, state-of-the-art LLMs, its scientific output is proving to be significant. Crucially, the drug candidate identified by C2S-scale, named Silmitertib, had no prior links in existing literature for its potential to treat cancer in this manner. This suggests the model was not merely regurgitating known information but was indeed generating a new, testable hypothesis.
Further validating the model’s predictive power, its ‘in silico’ (computer-simulated) prediction was confirmed ‘in vitro’ (in the lab) using human cells. While human trials are still years away, this experimental validation marks a critical step, demonstrating the real-world applicability of the AI’s findings. The authors of the paper propose this approach as a blueprint for a new era of biological discovery, accelerated by AI.
This development stands in contrast to some concerns about LLMs focusing primarily on commercial applications like video generation or chatbots, potentially at the expense of frontier research. However, the success of C2S-scale underscores the potential of current LLM capabilities to push the boundaries of scientific understanding and accelerate breakthroughs in critical fields like medicine.
LLM Capabilities and AGI Benchmarking
Beyond scientific discovery, the conversation around LLM capabilities continues to evolve, with ongoing debates about their progress towards Artificial General Intelligence (AGI). While some reports highlight Google DeepMind’s Gemini 2.5 DeepThink achieving record performance on complex mathematical benchmarks, others suggest that OpenAI’s models, particularly GPT-5 Pro, are also demonstrating strong raw intelligence.
A researcher’s personal benchmark, ‘Simple Bench,’ which tests models on nuanced and trick questions, found GPT-5 Pro performing competitively, narrowly missing the performance of Gemini 2.5 Pro. The lack of an API for Gemini DeepThink currently limits direct comparison on this specific benchmark.
In the realm of coding, OpenAI’s Codex, integrated into GPT-5, is reportedly outperforming alternatives, including Google’s offerings and Anthropic’s Claude Code. Despite Claude Code’s mobile accessibility and Anthropic’s specialization in coding, anecdotal evidence suggests GPT-5 Codex is more reliable, with Claude Code occasionally making critical errors, such as attempting to delete essential code sections.
Defining AGI and the Challenge of Continual Learning
A recent paper proposing a definition for AGI, based on the ‘theory of cognitive capacity’ (also known as the Cattell-Horn-Carroll theory), attempts to quantify AI progress. This theory breaks down cognition into ten discrete categories, each weighted equally in an AGI score. For instance, GPT-4 scores 27% and GPT-5 scores 58% on this scale.
However, the paper highlights significant limitations in current AI models, particularly concerning memory and continual learning. LLMs can retain information within a given conversation’s context window but lack the ability to learn and retain information over time or across different interactions. This ‘amnesia’ forces them to relearn context repeatedly, limiting their utility and often leading to errors due to insufficient contextual understanding. This limitation is exacerbated by cost considerations, as larger context windows increase the expense of each interaction.
The challenge of enabling true continual learning in AI systems is a major focus. While reinforcement learning (RL) can be applied to LLMs, it’s typically conducted in separate training runs rather than in real-time, user-integrated learning. OpenAI’s VP of Research, Jerry Tuar, expressed caution regarding real-time, user-driven continual learning for large-scale models like GPT.
Tuar explained that while theoretically possible to train models by directly responding to users and reinforcing desired behaviors, this approach carries significant risks. Without robust safeguards, there’s a danger of models learning undesirable or even harmful behaviors from user interactions. Consequently, OpenAI is not currently implementing this form of continual learning for its major products, emphasizing the need for advanced safety mechanisms before such a paradigm can be safely adopted for complex, large-scale AI systems.
Emerging Capabilities in Video Generation
In a surprising development, even video generation models are showing advanced reasoning capabilities. Sora 2, for instance, has demonstrated the ability to answer benchmark-level questions, including complex mathematical and coding problems, within its video output. While its performance in these areas may not match specialized AI models, this indicates that video generators are performing sophisticated on-the-fly physics calculations and reasoning, hinting at deeper underlying intelligence.
Source: Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked? (YouTube)





