Xiaohu AI Daily — 2026-05-30

🌟 Today's Headline

Anthropic rolls out Claude Opus 4.8 with near-Mythos level alignment and 3x cheaper fast mode

Anthropic launched Claude Opus 4.8, a new flagship model that prioritizes reliability over raw performance. The model introduces a five-tier Thinking effort selector allowing users to balance computation and output quality. Opus 4.8 scores 88.6% on SWE-bench Verified and 74.6% on Terminal-Bench 2.1, outperforming GPT-5.5 and Gemini 3.1 Pro. Its defining feature is reduced likelihood of silently approving flawed code—four times lower than version 4.7—while actively flagging uncertainties and questioning unsupported assumptions. A new Fast Mode delivers 2.5x faster output with significantly lower API pricing: $10 per million input tokens and $50 per million output tokens. Dynamic Workflows in Claude Code enable single prompts to spawn multi-agent teams for complex tasks. This release signals a strategic shift in frontier model development from capability maximization to trust and alignment.

💬 Editor's Note

Anthropic is playing a different game: betting on reliability over benchmark scores. Opus 4.8's refusal to silently pass bad code and admission of uncertainty matter more to real-world developers than marginal performance gains.

Read more → Product

Anthropic Raises $65B at $965B Valuation, Becomes World's Most Valuable Startup

10/10

Anthropic has completed a historic $65 billion funding round at a $965 billion valuation, making it the world's most valuable startup and officially surpassing OpenAI in market value. The funding was led by Greenoaks, Sequoia, Altimeter, and Dragoneer, with strategic new investors including semiconductor giants Samsung, Micron, and SK Hynix joining the round.

9 demos of Gemini Omni and Gemini 3.5 in action

9/10 New Product

Google showcases Gemini Omni and Gemini 3.5 through nine live demonstrations highlighting multimodal capabilities, including real-time video understanding, speech interaction, and cross-modal reasoning. The demos illustrate the models' practical applications across different use cases.

OpenAI gives GPT-5.5 Instant a readability upgrade while phasing out two older models

9/10 New Product

OpenAI updates GPT-5.5 Instant for more natural, human-like responses, discontinues Canvas feature by moving writing and coding tasks directly into chat. The company also retires older o3 and GPT-4.5 models from ChatGPT, streamlining the available model lineup.

v0.30.0

9/10 New Product

Ollama v0.30.0 restructures the underlying architecture to directly support llama.cpp instead of GGML, enabling full GGUF file format compatibility. MLX acceleration on Apple Silicon is integrated to improve inference performance on Mac devices.

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M

9/10 News

Chipmaker Groq is looking to raise $650 million in internal funding as it pivots from hardware to focus more on AI inference, the process of refining the way AI models respond to prompted requests, per Axios.

TeamCity 2026.1.1 Is Now Available

9/10 News

Today we’re rolling out the first bug-fix for TeamCity On-Premises 2026.1 servers. This update addresses over 20 issues and performance issues, including: See TeamCity 2026.1.1 Release Notes for the complete list of resolved issues. Why update? Staying up to date with minor releases ensures yo

🕐 ~9 min read · Industry 8/10

Meta Launches Tiered Subscription Model Across Apps with Paid AI Features

💡 Industry trends and analysis

Meta has officially launched tiered subscription services across Instagram, Facebook, WhatsApp, and Meta AI under a unified "Meta One" brand, marking a significant shift in the company's core business model. The subscription offerings include Instagram Plus and Facebook Plus at $3.99/month with customization features and enhanced analytics; WhatsApp Plus at $2.99/month for advanced functionality; and two Meta AI tiers—Meta One Plus ($7.99/month) and Premium ($19.99/month), with Premium tier offering faster "thinking mode" responses for complex queries. Additional creator and business subscription options are being tested with verification badges, expanded promotional tools, and analytics capabilities. This strategic diversification reflects Meta's escalating AI infrastructure costs, with the company committing up to $145 billion to AI in 2026 alone, requiring new revenue streams beyond advertising to fund massive AI investments.

🕐 ~3 min read · Tutorial 7/10

这个 skill 看着不错，可将文字、URL 或文章直接生成公众号首图、小红书图文卡、教程步骤卡等视觉物料，支持 28 种布局和 10 种主题。

💡 Can be adapted into tutorial material

claude-design-card 是一款专为中文内容创作者设计的 Skill。它能将文字、URL 或文章直接转化为可发布的视觉卡片，如公众号首图、小红书图文卡、教程步骤卡等，支持 28 种布局与 10 种主题。其核心价值在于自动化了"写完文章"后最繁琐的流程：自动提炼重点、选择版式、生成 HTML 并截图成 PNG，替代了以往手动使用 Figma 或 Canva 等工具的步骤。该工具开源，适合经常撰写相关内容的创作者尝试。

🕐 ~3 min read · Tutorial 7/10

The team at @llama_index built an awesome template using LlamaParse and the new Managed Agents in th…

💡 Can be adapted into tutorial material

LlamaIndex 团队基于 Google 新发布的 Agents API 构建了一个模板，使智能体能够访问 LlamaParse 和 LiteParse，从而自动处理非结构化文档。其工作流程为：配置数据与输出的 Git 仓库，将仓库克隆至智能体沙箱，安装 LiteParse CLI 与 LlamaParse SDK 及相关技能，最后通过提示词驱动智能体自主执行任务。该模板最终形成一个可直接使用 LlamaParse 和 LiteParse 处理真实世界文档的智能体。

New Product

Now you can use your OpenRouter models directly inside @ComfyUI workflows！

现在你可以直接在ComfyUI工作流中使用你的OpenRouter模型了！【引用 @ComfyUI】：ComfyUI刚刚添加了@OpenRouter支持。你不再局限于单一的大语言模型，现在可以直接在Comfy中访问20多个模型。更多灵活性，更少摩擦，同样的工作流。工作流链接在下方👇

codex for managing the codex UI：

Codex用于管理Codex界面：【引用 @guinnesschen】：如果你厌倦了管理Codex对话线程，就让Codex自己管理自己吧！Codex现在可以创建对话线程、搜索它们、整理它们、固定重要的线程，并为并行任务启动工作树。

For every ChatGPT conversation that started as "one quick thing" and became a full on saga： table of…

对于每个始于"就问一件事"却演变成完整长篇的ChatGPT对话：目录功能现已推出。适用于包含5条以上回复的对话。

Opinion

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Anthropic researchers demonstrate that sparse autoencoders can extract interpretable features from Claude 3 Sonnet at production scale, with up to 34 million features extracted from the model's residual stream. The breakthrough shows dictionary learning methods scale beyond small transformers.

First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope

Researchers conduct the first head-to-head benchmark comparing Claude Code (Anthropic) and Codex (OpenAI) on autonomous gravitational wave data analysis pipelines. Both agentic systems execute tasks without human intervention on shared infrastructure, revealing performance differences in complex scientific workflows.

Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension

Extension of Willis et al.'s evolutionary game theory benchmark (Iterated Prisoner's Dilemma) to newer frontier models, investigating whether larger, diverse LLMs retain the cooperative biases observed in ChatGPT-4o and Claude 3.5 Sonnet or exhibit different equilibrium behavior.

Industry

UEFA and UC3 have announced Alibaba Group as global AI， Cloud Computing and E-Commerce Partner for U…

阿里云和Qwen成为UEFA官方独家AI、云计算与电子商务合作伙伴，合作期覆盖2027/2028赛季至2032/2033赛季的UEFA男子俱乐部赛事，以及UEFA EURO 2028。阿里巴巴集团主席蔡崇信表示，将投入云计算、全栈AI及全球电商平台能力，支持赛事运营。

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

Study analyzes ClinicalTrials.gov records to track temporal trends in AI terminology usage and geographical distribution of AI-driven clinical trials. Researchers employed GPT-5.5 combined with human review to systematically characterize human-AI interaction patterns in medical research.

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

Audit of MathCheck benchmark identifies 4 semantically flawed paraphrases (3.1% of test set), causing ranking volatility—GPT-4o drops from rank 2 to 4, while Claude Haiku and DeepSeek V3 rise above it. Cross-model consensus (≥3/4 models) detected errors automatically at minimal cost.

Tutorial

How Braintrust turns customer requests into code with Codex

Braintrust engineers leverage Codex and GPT-5.5 to accelerate code generation for customer-facing features. The case study demonstrates how AI-assisted coding reduces development time and enables faster experimentation cycles for building new capabilities.

Cognition's Scott Wu says AI coding agents shouldn't replace humans

Cognition公司开发了Devvin，这是一个号称首个且最成功的AI编程智能体。其著名程序员创始人Scott Wu明确表示，该智能体并非旨在取代人类程序员。

I had to test it myself to believe this unreal inference speed. 3，000 tokens/s for 1 user on standa…

Kog团队在标准数据中心GPU上实现了极高的单用户推理速度，在8× AMD MI300X GPUs上达到3，000 tokens/s，在8× NVIDIA H200上达到2，100 tokens/s。相比常规推理速度（约100-300 tokens/s），实现了10-30倍提升。

📭Skip Today

Auto-filtered. Here's why — so you know you're not missing out:

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
→ Single-source paper, low reader value
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
→ Single-source paper, low reader value
First head-to-head comparison of agentic AI applied to the analysis of simulated data of the Einstein Telescope
→ Single-source paper, low reader value
FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks
→ Single-source paper, low reader value
Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension
→ Single-source paper, low reader value
Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate
→ Single-source paper, low reader value
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
→ Single-source paper, low reader value
CA-AC-MPC: CUDA-Accelerated Actor-Critic Model Predictive Control
→ Single-source paper, low reader value

Subscribe to Xiaohu AI Daily