Week 16: Silent Leaps
Security Shocks, Soaring Valuations, and Systems Outgrowing Control
Another Monday, another post to keep you up to speed with the AI world.
Here's what happened in the global AI market this week.
AI found open cracks, and that raised some serious security concerns. OpenAI had a $122B funding round, but it goes beyond just numbers. Benchmarks are no longer enough. AI is getting ahead of human-made performance tests. And many more intriguing stories.
Here's everything you need to know before Monday gets the best of you.
Claude Found Thousands of Zero-Day Vulnerabilities. Anthropic Kept The Matter Among Trustees
This wasn't a demo or a controlled experiment. Anthropic's Claude Mythos Preview autonomously discovered thousands of previously unknown security vulnerabilities a.k.a zero-days across major operating systems and browsers. The scope was significant enough that Anthropic didn't publish the findings. Instead, it launched Project Glasswing: a coordinated disclosure effort with major tech companies to quietly fix the flaws before anyone else could find and exploit them.
The significance of what happened here is easy to underestimate. Finding zero-days has historically required elite human researchers going through code day in and day out for weeks and sometimes, even months. Claude did it at scale, autonomously, in a fraction of the time. That's not just a capability improvement. It just changed what was thought possible in cybersecurity. And now that everyone knows the capability to find these flaws at such a huge scale exists, the danger of its weaponisation has gone sky-high.
Anthropic made the right call by coordinating disclosure and not sweeping things under the rug. But the conclusion of this scenario is hard to ignore. Once this technology exists at one place, it's a matter of time before it is replicated. The security industry's entire threat model just changed.
Why it matters
Anthropic won this round. But the cat is now very much aware of the bag. Every future AI researcher now knows this is possible.
OpenAI's $122B Round: The Numbers Are Real, But They're Not Everything
OpenAI closed a funding round that valued it at $852 billion, a number that would have seemed like a fairytale two years ago. But the structure of the deal is worth understanding before being blown away by mere numbers. Of the $122B raised, Amazon, Nvidia, and SoftBank account for $110B. Much of that isn't clean equity. It's vendor deals and contingent capital, meaning a considerable portion is tied to commercial arrangements rather than pure investment.
Then there's the side deal: OpenAI is reportedly negotiating a ~$10B private equity joint venture that comes with a guaranteed minimum return of 17.5%. That's an unusual structure for the most prominent AI company in the world. Guaranteed returns suggest that the equity story was insufficient to attract capital on standard terms and the investors needed more certainty.
Separately, OpenAI confirmed it expects its advertising business to generate $2.5 billion in revenue this year, with an estimate of $100 billion by 2030. The company is no longer just a model provider. It's building a media and ad business and looking to compete directly with big names like Google and Meta, both of whom it currently relies on for infrastructure and distribution.
Why it matters
The world's most valuable AI company is paying guaranteed returns to raise capital and building an ad business to sustain itself. That's not a startup dynamic anymore. It's the behavior of a scaled platform company that is in desperate need of revenue.
Our Tests Are No Longer Good Enough to Keep Up With AI
Time Horizon suite, one of the most rigorous evaluation frameworks in existence, is now at the ceiling. Frontier models are saturating it, meaning they're performing at or near the ceiling on almost all of the tasks, making it impossible to distinguish between different capability levels at the top end.
The problem isn't just that one benchmark is getting old. It's that building replacements is genuinely hard. New benchmarks require expensive human verification, careful task design, and time. A lot of time. And AI capabilities are outpacing it way too fast for the tests to keep up. Researchers now warn that by mid-2027, no benchmark created in 2026 or earlier will reliably be able to rule out dangerous capabilities in frontier systems. You can't govern what you can't measure, and right now, measurement is falling behind.
This matters beyond AI safety circles. Regulators, enterprise buyers, and governments rely on benchmark scores to make decisions about which AI systems are safe to deploy. If those scores can no longer differentiate between systems that are safe and systems that aren't, the entire evaluation infrastructure that decides the fate of the models goes extinct.
Why it matters
If we can't measure what AI can do, we can't control it. The infrastructure we use to govern AI is quietly and quickly becoming outdated and nobody has a clean solution yet.
Drug Discovery, $10B In Revenue And More In Anthropic's Historic Week
While Project Glasswing took the headlines, Anthropic had a week that would have been front page in the news in either way. First, the company acquired Coefficient Bio, a 10-person startup founded by ex-Genentech researchers, for $400M in stock. Coefficient Bio was using AI to accelerate drug discovery and biological research, and the team will now integrate into Anthropic's health and life sciences division. Drug discovery is now on Anthropic's agenda.
At the same time, the company signed a deal with Google and Broadcom for multiple gigawatts of next-gen TPU capacity starting 2027. This goes beyond renting cloud compute. Anthropic is securing the physical hardware foundation needed to train and serve future generations of models at a scale that matches its ambitious plans.
And quietly, Anthropic hit $10 billion in annual revenue, beating Shopify, ServiceNow, or almost any software company in history to this milestone. It hasn't been four years since founding. That means despite fierce competition, Anthropic has been outpacing its competitors on all major fronts, quickly becoming a major player in the AI market and now has the revenue numbers to show for it.
Why it matters
Anthropic is no longer just a second option to ChatGPT. It's becoming a full-stack technology company with healthcare branches, independent compute, and revenue that has to be taken seriously.
Reputed Researchers Are Now Cutting Their AI Timeline Estimates By Half
Researcher Eli Lifland updated his probability estimates this week, placing nearly 2x higher odds on full AI R&D automation by the end of 2028.
The sudden shift isn't the result of any major breakthroughs in creativity. It's driven by something boring but powerful. AI can now perform large-scale, specification-driven software engineering faster and cheaper than human engineers. The tasks that require straight-forward execution. Refactoring, testing, debugging, implementing specified features. The kind of work that takes up a major chunk of the software team's time.
What makes this consequential is that a large portion of AI R&D itself consists of exactly this type of task. If AI can do that work faster and cheaper, it can multiply the rate of its own development. The loop begins. AI improves AI. And that loop starting means that humans can no longer predict how advanced AI will be 2 or 3 years into the future. Lifland's update isn't a fire-hazard alarm. But it's something worth keeping an eye on.
Why it matters
The "AI speeds up AI" dynamic is no longer theoretical. When reputed forecasters update their timelines by 2x, it's worth paying attention to what changed in their reasoning, not just the number.
Meta's Muse Spark Is the Most Competitive Model They've Ever Released
Meta launched Muse Spark, multimodal reasoning with visual chain-of-thought, tool use, multi-agent orchestration, and the early benchmarks are quite impressive. Impressive enough to put it in competition with the leading frontier models across multiple domains, not just the categories where Meta has always been big. This is positioned as part of the company's push toward what Zuckerberg has been calling "personal superintelligence."
The open-source question is still there. Meta confirmed new models are coming and some will eventually carry an open license. Whether that's Muse Spark or other variants in the same family hasn't been confirmed. Zuckerberg has been vocal about open-source AI as a long-term differentiator, and Llama's track record has been consequential in shaping the open model ecosystem. Whether Muse Spark itself gets open-sourced, or whether that designation applies to other future models in the lineup, hasn't been confirmed.
What's changed with this release is the capability profile. Previous Meta models were strong in specific areas. Muse Spark appears to be competitive across much more. Reasoning, vision, tool use, and orchestration, which puts it in a different ballpark than what Meta was in six months ago.
Why it matters
Meta has historically been the company most willing to open-source capable models. If Muse Spark-tier performance eventually ships under an open license, the accessible capability baseline for anyone building on open models moves up significantly.
Perplexity Has Grown 50%In Revenue And Also Gained Access To Your Bank Account
Perplexity reported 50% revenue growth after its strategic shift from AI-powered search to AI agents. The company's bet is paying off as more and more users now want AI that does things, finds things. The search option still exists, but the company's growth is now being driven by agent capabilities.
Alongside the revenue boost, Perplexity launched a personal finance dashboard powered by Plaid integration. Users can now link checking accounts, savings accounts, credit cards, and loans directly to Perplexity and receive AI-driven analysis across all of it. Spending breakdowns, liability tracking, net worth calculation and much more. Traditional personal finance apps show you data. Perplexity asks what you want to understand about it and tells you.
Perplexity is building a product that covers search, agents, and now personal finance all under the same interface. The underlying bet is that users will anchor their daily AI usage around Perplexity the way they anchored their daily searches around Google. The 50% revenue growth suggests people want the product enough to pay for it. The product experience is good enough that users are connecting sensitive financial data to it. That level of trust is not easy to earn and even harder to maintain at scale. But with the way Perplexity is moving, the next 12 months are going to be interesting.
Why it matters
Perplexity is quietly building an AI-native super-app. The personal finance integration shows how far beyond search the company is willing to go to capture habitual daily usage.
Anthropic's Updates Just Completely Changed How You Build With Claude
Claude just made 3 new updates each targeting a very real problem.
Managed Agents separates the agent interface from the model layer, meaning systems built on Claude can evolve with new model versions without requiring architectural rebuilds. For anyone who has managed a production AI system through a major model update, the value of that stability is enormous. It removes a category of technical debt that has made long-term AI infrastructure planning very difficult.
Claude Cowork has been updated to enterprise readiness, adding role-based access controls, group spend limits, detailed usage analytics, and Zoom integration. Companies like Zapier and Airtree are already using it for project management and operational workflows. The enterprise layer matters because it's where Anthropic's $10B revenue is actually being generated, and this update just made that level even stronger.
The Advisor API Tool is the most technically interesting of the three. It allows developers to configure Opus as a reasoning advisor while running Sonnet or Haiku as the executor for the actual work. You get Opus-level thinking on the important decisions, at Sonnet-level pricing on the execution. The fundamental tradeoff that has forced product teams to choose between quality and sustainability now has a structural solution. This will change product decisions at hundreds of companies.
Why it matters
The Advisor Tool alone could meaningfully reduce the per-task cost for AI product builders without sacrificing output quality. That's the kind of infrastructure update that compounds quietly across thousands of products.
And that wraps up this week. Tune in next Monday, same time for another deep-dive into the major AI news.
The Sentinel lands in your inbox every Monday so you can catch up with the fast-moving AI space while sipping your morning coffee. Every detail that matters, none that doesn't.





