Xiaohu AI Daily — 2026-05-17

[AINews] Cerebras' $60B IPO: Slowly, then All at Once

9/10 News

Cerebras, a leading AI chip manufacturer, completed a $60 billion IPO, marking a major milestone for AI infrastructure companies. The valuation reflects strong investor confidence in specialized AI computing hardware as demand for model training and inference accelerates globally.

Ubuntu Embraces Local AI Instead of Cloud-First OS Integration

9/10 News

Ubuntu has outlined its AI strategy, describing it as a deliberate departure from industry trends towards cloud-centric, AI-first operating systems. Instead, the company says, Ubuntu will focus future releases on local intelligence, modular design, and strict user control. By Sergio De Simone

Microsoft Releases Aspire 13.3 with Major Deployment and Frontend Updates

9/10 News

Microsoft has released Aspire 13.3, introducing a new aspire destroy command for tearing down deployments across Azure, Kubernetes, and Compose. The release adds native Kubernetes deployment in preview, first-class JavaScript publishing for Next.js and Vite, browser log capture, and a default-enable

Santa Clara County Sues Meta Over Alleged Scam Ads

9/10 News

Brandon Pho, reporting for San Jose Spotlight: The lawsuit filed Monday alleges that instead of cracking down on deceptive ads designed to trick users out of their money, Meta has hamstrung its own fraud prevention teams and helped fake companies bypass its filters to enable the tech powerhouse to e

Zerostack--一款采用纯Rust语言编写、受Unix启发的编程代理

9/10 New Product

Zerostack是一款采用纯Rust语言编写、受Unix哲学启发的编程代理工具，已正式发布1.0.0版本并在Rust包管理平台crates.io上提供。该发布在技术社区Hacker News上获得115点关注，反映出开发者对其的高度兴趣。

We're having way too much fun working through your feedback. （Please， keep it coming.） Keyboard sh…

9/10 New Product

处理大家的反馈让我们感到太有趣了。（请继续反馈。）键盘快捷键现已支持自定义。围绕你的实际工作方式设置 Codex，然后通过设置调整快捷键，无需再适应我们的默认配置。

🕐 ~3 min read · Tutorial 7/10

RLVR might be disproportionately bad at science

💡 Can be adapted into tutorial material

RLVR（强化学习与验证）在科学理论验证中可能表现出不成比例的缺陷。科学理论的验证循环周期长达数十年甚至数个世纪，且当前被视为更优的理论实际上常会做出更差的预测。这一矛盾揭示了基于短期反馈的强化学习范式与科学探索长期性、复杂性之间的根本冲突，凸显了现有AI方法在应对科学发现这类超长反馈周期任务时存在的结构性局限。

🕐 ~3 min read · Opinion 7/10

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts

💡 Views and arguments worth studying

Researchers at Allen Institute for AI and UC Berkeley developed EMO, a mixture-of-experts (MoE) model that achieves near-full performance using only 12.5% of its experts. This efficiency breakthrough demonstrates significant potential for reducing computational costs in large-scale AI systems while maintaining model quality and inference speed.

🕐 ~3 min read · Industry 7/10

哎！这玩意越看让人越有点感慨不已！人形机器人的真的逐渐在替代某些岗位啊！ Figure 人形机器人已经进入第4天 nonstop autonomous operations了。 F.03 正在…

💡 Industry trends and analysis

Figure公司的F.03人形机器人已进入第四天不间断自主运行测试，在真实仓库环境中24/7连续工作直至出现故障。测试核心在于评估机器人执行抓取、搬运、分拣等任务的长期耐力，并收集故障数据、维护需求及安全恢复机制等信息。这标志着人形机器人从展示单次动作的"能动"阶段，进入了考验持续工作能力的"能干"实用化关键阶段。

🕐 ~3 min read · Tutorial 7/10

Interesting interpretability paper on tool-using agents. The authors probe hidden states and find t…

💡 Can be adapted into tutorial material

该可解释性论文聚焦工具使用代理，通过探测隐藏状态发现模型常能识别应调用工具，但实际调用失败，不匹配率达26%-54%。问题完全集中于认知到行动的过渡阶段，而非认知本身。内部探测方向可解码，但后期层的最后令牌机制使信号旋转，几乎与产生的行动正交。研究旨在预测干预措施效果，指出常见归因如提示或训练不足可能忽略后期层几何结构，这为工具使用提示A/B测试中的性能上限提供了合理解释。

New Product

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

This month brought a surge of flagship open-source model releases: Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1. Each represents major capability improvements across different model families, providing developers and creators with powerful open alternatives to commercial offerings.

I don't know why we ship on Saturdays now， but here are a bunch of nice improvements for Codex. Di…

不知为何我们选择周六发布更新，但Codex确实迎来了一系列优化。这些改进让使用体验愉悦许多，实在不该拖到周二再公布。键盘快捷键现已支持自定义。根据实际工作方式配置Codex，通过设置调整快捷键，不必再迁就默认设定。

Another day0 collaboration， another community win. Thanks @vllm_project team for the always reliable…

又一次Day0协作，又一次社区胜利。感谢@vllm_project团队始终可靠的支持~ 🫡🫡

Opinion

Unsteady Metrics and Benchmarking Cultures of AI Model Builders

This research examines how AI model builders have shifted from peer-reviewed benchmarks to selectively highlighting results on company blog posts, transforming benchmark choice into marketing strategy. The paper questions whether current practices accurately represent true model capabilities.

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

Large-scale study analyzing Google AI Overviews deployment across 2 billion users. Researchers examine which sources AI chooses, claim fidelity, and the unprecedented editorial power concentration when one synthetic answer replaces traditional ranked search results.

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

This research reveals critical failures in current defenses against malicious fine-tuning when facing adaptive adversaries. Robustness claims collapse against attacks specifically designed to circumvent defenses, exposing fundamental gaps in open-weight model safety strategies.

Industry

AI made a tiny slice of Silicon Valley filthy rich and left the rest wondering why they bother

According to Menlo Ventures partner Deedy Das, approximately 10,000 Silicon Valley employees at Anthropic, OpenAI, xAI, Meta, and Nvidia have accumulated $20+ million fortunes from the AI boom. Meanwhile, middle management feels hollowed out and many others question whether they missed the opportunity.

Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

This qualitative research explores how international students in the U.S. use conversational AI like ChatGPT and Google Gemini to navigate overlapping cultural, academic, and psychological challenges. The study reveals AI's emerging role in supplementing fragmented university support systems.

Google says GEO and AEO are a myth and traditional SEO is all you need for AI search

Google officially debunked emerging SEO trends like Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO), clarifying they are simply traditional SEO practices rebranded. The company's new documentation dismisses common tactics such as LLMS.txt files and content chunking, reaffirming that quality content and standard SEO fundamentals remain essential for AI search ranking.

Tutorial

兄弟们，设计和开发终于真正合体了。 @skirano（MagicPathAI CEO，前Anthropic、Brex、Uber、Facebook）刚刚放出重磅演示：你现在可以把MagicPa…

MagicPath AI CEO @skirano 演示了其产品与Codex的深度整合。用户现可将MagicPath作为原生画布直接在Codex中运行，通过拖拽设计UI，Codex能实时感知项目并自动生成、编辑代码，实现设计与开发的无缝衔接，无需在Figma和IDE间切换。

Interesting！

Eric Jang在过去几个月从零开始实现AlphaGo，这是2016年激发他进入深度学习的AI突破。他最初理解AlphaGo为"通过自我对弈训练的搜索增强深度神经网络"，但通过亲手构建获得了更深层次的理解。

Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques

This paper addresses reliable agent performance evaluation in multiagent environments with limited samples or high costs. It identifies pathologies in variance reduction techniques and proposes improvements to the AIVAT family of methods for unbiased performance estimation.

📭Skip Today

Auto-filtered. Here's why — so you know you're not missing out:

Unsteady Metrics and Benchmarking Cultures of AI Model Builders
→ Single-source paper, low reader value
Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques
→ Single-source paper, low reader value
Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact
→ Single-source paper, low reader value
Towards Fine-Grained and Verifiable Concept Bottleneck Models
→ Single-source paper, low reader value
Wavelet-Based Observables for Koopman Analysis: An Extended Dynamic Mode Decomposition Framework
→ Single-source paper, low reader value
One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries
→ Single-source paper, low reader value
Action-Inspired Generative Models
→ Single-source paper, low reader value
Widening the Gap: Exploiting LLM Quantization via Outlier Injection
→ Single-source paper, low reader value

Subscribe to Xiaohu AI Daily