2026-05-17 · Sun generated 10:18:16
Sources
178
Items
446
Score 8+
12
Clusters
3
🌟 Today's Headline
New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
Researchers at Carnegie Mellon developed a benchmark measuring AI agents' ability to autonomously exploit real vulnerabilities in Google's V8 engine. Claude Mythos significantly outperforms GPT-5.5 but costs twelve times more, raising important questions about cost-efficiency tradeoffs in AI safety research.
Read more → Deep Dive
🔥Today's Highlights
9/10 News
Cerebras, a leading AI chip manufacturer, completed a $60 billion IPO, marking a major milestone for AI infrastructure companies. The valuation reflects strong investor confidence in specialized AI computing hardware as demand for model training and inference accelerates globally.
9/10 News
Ubuntu has outlined its AI strategy, describing it as a deliberate departure from industry trends towards cloud-centric, AI-first operating systems. Instead, the company says, Ubuntu will focus future releases on local intelligence, modular design, and strict user control. By Sergio De Simone
9/10 News
Microsoft has released Aspire 13.3, introducing a new aspire destroy command for tearing down deployments across Azure, Kubernetes, and Compose. The release adds native Kubernetes deployment in preview, first-class JavaScript publishing for Next.js and Vite, browser log capture, and a default-enable
9/10 News
Brandon Pho, reporting for San Jose Spotlight: The lawsuit filed Monday alleges that instead of cracking down on deceptive ads designed to trick users out of their money, Meta has hamstrung its own fraud prevention teams and helped fake companies bypass its filters to enable the tech powerhouse to e
9/10 New Product
Zerostack是一款采用纯Rust语言编写、受Unix哲学启发的编程代理工具,已正式发布1.0.0版本并在Rust包管理平台crates.io上提供。该发布在技术社区Hacker News上获得115点关注,反映出开发者对其的高度兴趣。
9/10 New Product
处理大家的反馈让我们感到太有趣了。 (请继续反馈。) 键盘快捷键现已支持自定义。 围绕你的实际工作方式设置 Codex,然后通过设置调整快捷键,无需再适应我们的默认配置。
📊Topic Clusters
📌 AI 智能体系统
Agent 框架架构、评估基准、多智能体协作、记忆管理与工作流编排等,最热的话题方向。
📌 LLM 推理优化
推理加速、量化、蒸馏、KV 缓存压缩、Token 优化等效率提升,应对延迟和成本压力。
📌 视觉生成与多模态
视频生成、3D 生成编辑、多模态模型、医学影像重建等视觉任务突破。
📖Worth a Deep Read
🕐 ~3 min read · Tutorial 7/10
RLVR might be disproportionately bad at science
💡 Can be adapted into tutorial material
RLVR(强化学习与验证)在科学理论验证中可能表现出不成比例的缺陷。科学理论的验证循环周期长达数十年甚至数个世纪,且当前被视为更优的理论实际上常会做出更差的预测。这一矛盾揭示了基于短期反馈的强化学习范式与科学探索长期性、复杂性之间的根本冲突,凸显了现有AI方法在应对科学发现这类超长反馈周期任务时存在的结构性局限。
Read more →
🕐 ~3 min read · Opinion 7/10
Researchers train AI model that hits near-full performance with just 12.5 percent of its experts
💡 Views and arguments worth studying
Researchers at Allen Institute for AI and UC Berkeley developed EMO, a mixture-of-experts (MoE) model that achieves near-full performance using only 12.5% of its experts. This efficiency breakthrough demonstrates significant potential for reducing computational costs in large-scale AI systems while maintaining model quality and inference speed.
Read more →
🕐 ~3 min read · Industry 7/10
哎!这玩意越看让人越有点感慨不已! 人形机器人的真的逐渐在替代某些岗位啊! Figure 人形机器人已经进入第4天 nonstop autonomous operations了。 F.03 正在…
💡 Industry trends and analysis
Figure公司的F.03人形机器人已进入第四天不间断自主运行测试,在真实仓库环境中24/7连续工作直至出现故障。测试核心在于评估机器人执行抓取、搬运、分拣等任务的长期耐力,并收集故障数据、维护需求及安全恢复机制等信息。这标志着人形机器人从展示单次动作的"能动"阶段,进入了考验持续工作能力的"能干"实用化关键阶段。
Read more →
🕐 ~3 min read · Tutorial 7/10
Interesting interpretability paper on tool-using agents. The authors probe hidden states and find t…
💡 Can be adapted into tutorial material
该可解释性论文聚焦工具使用代理,通过探测隐藏状态发现模型常能识别应调用工具,但实际调用失败,不匹配率达26%-54%。问题完全集中于认知到行动的过渡阶段,而非认知本身。内部探测方向可解码,但后期层的最后令牌机制使信号旋转,几乎与产生的行动正交。研究旨在预测干预措施效果,指出常见归因如提示或训练不足可能忽略后期层几何结构,这为工具使用提示A/B测试中的性能上限提供了合理解释。
Read more →
🕐 ~3 min read · Industry 7/10
美国开始出现人工智能相关岗位的大规模裁员
💡 Industry trends and analysis
美国人工智能相关岗位正出现大规模裁员。根据彭博社报道,受AI影响的职位开始经历严重的就业岗位流失。这一趋势表明AI技术对劳动力市场的冲击已从理论讨论进入现实阶段,具体裁员数字和涉及的行业领域在进一步显现中。
Read more →
📂Browse by Category
New Product
This month brought a surge of flagship open-source model releases: Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1. Each represents major capability improvements across different model families, providing developers and creators with powerful open alternatives to commercial offerings.
不知为何我们选择周六发布更新,但Codex确实迎来了一系列优化。 这些改进让使用体验愉悦许多,实在不该拖到周二再公布。 键盘快捷键现已支持自定义。 根据实际工作方式配置Codex,通过设置调整快捷键,不必再迁就默认设定。
又一次Day0协作,又一次社区胜利。感谢@vllm_project团队始终可靠的支持~ 🫡🫡
Opinion
This research examines how AI model builders have shifted from peer-reviewed benchmarks to selectively highlighting results on company blog posts, transforming benchmark choice into marketing strategy. The paper questions whether current practices accurately represent true model capabilities.
Large-scale study analyzing Google AI Overviews deployment across 2 billion users. Researchers examine which sources AI chooses, claim fidelity, and the unprecedented editorial power concentration when one synthetic answer replaces traditional ranked search results.
This research reveals critical failures in current defenses against malicious fine-tuning when facing adaptive adversaries. Robustness claims collapse against attacks specifically designed to circumvent defenses, exposing fundamental gaps in open-weight model safety strategies.
Industry
According to Menlo Ventures partner Deedy Das, approximately 10,000 Silicon Valley employees at Anthropic, OpenAI, xAI, Meta, and Nvidia have accumulated $20+ million fortunes from the AI boom. Meanwhile, middle management feels hollowed out and many others question whether they missed the opportunity.
This qualitative research explores how international students in the U.S. use conversational AI like ChatGPT and Google Gemini to navigate overlapping cultural, academic, and psychological challenges. The study reveals AI's emerging role in supplementing fragmented university support systems.
Google officially debunked emerging SEO trends like Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO), clarifying they are simply traditional SEO practices rebranded. The company's new documentation dismisses common tactics such as LLMS.txt files and content chunking, reaffirming that quality content and standard SEO fundamentals remain essential for AI search ranking.
Tutorial
MagicPath AI CEO @skirano 演示了其产品与Codex的深度整合。用户现可将MagicPath作为原生画布直接在Codex中运行,通过拖拽设计UI,Codex能实时感知项目并自动生成、编辑代码,实现设计与开发的无缝衔接,无需在Figma和IDE间切换。
Eric Jang在过去几个月从零开始实现AlphaGo,这是2016年激发他进入深度学习的AI突破。他最初理解AlphaGo为"通过自我对弈训练的搜索增强深度神经网络",但通过亲手构建获得了更深层次的理解。
This paper addresses reliable agent performance evaluation in multiagent environments with limited samples or high costs. It identifies pathologies in variance reduction techniques and proposes improvements to the AIVAT family of methods for unbiased performance estimation.
📭Skip Today

Auto-filtered. Here's why — so you know you're not missing out:

📎 Long Tail (165) · click to expand
Warelay -> OpenClaw 5
Quoting Julia Evans 5
Monitoring Data-aware Temporal Properties (Extended Version) 5
Fast Rates for Inverse Reinforcement Learning 5
Conditional Attribute Estimation with Autoregressive Sequence Models 5
ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation 5
Modeling Bounded Rationality in Drug Shortage Pharmacists Using Attention-Guided Dynamic Decomposition 5
Precise Verification of Transformers through ReLU-Catalyzed Abstraction Refinement 5
Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems 5
CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation 5
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces 5
Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning 5
How Sensitive Are Radiomic AI Models to Acquisition Parameters? 5
Identifying Culprits Through Deep Deterministic Policy Gradient Deep Learning Investigation 5
Interestingness as an Inductive Heuristic for Future Compression Progress 5
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions 5
COREKG: Coreset-Guided Personalized Summarization of Knowledge Graphs 5
Learning Developmental Scaffoldings to Guide Self-Organisation 5
SparseOIT: Improving Order-Independent Transparency 3DGS via Active Set Method 5
ARES-LSHADE: Autoresearch-Enhanced LSHADE with Memetic Polish for the GNBG Benchmark 5
Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention 5
A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study 5
AIS: Adaptive Importance Sampling for Quantized RL 5
A Regret Perspective on Online Multiple Testing 5
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors 5
CineMesh4D: Personalized 4D Whole Heart Reconstruction from Sparse Cine MRI 5
Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signal 5
R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning 5
Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening 5
ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows 5
LLM-Based Robustness Testing of Microservice Applications: An Empirical Study 5
AudioMosaic: Contrastive Masked Audio Representation Learning 5
AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction 5
RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression 5
LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning 5
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games 5
MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse 5
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization 5
Asymmetric Generative Recommendation via Multi-Expert Projection and Multi-Faceted Hierarchical Quantization 5
PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media 5
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation 5
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization 5
How to Evaluate and Refine your CAM 5
Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications 5
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization 5
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke 5
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability 5
Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions 5
Compositional Sparsity as an Inductive Bias for Neural Architecture Design 5
Beyond What to Select: A Plug-and-play Oscillatory Data-Volume Scheduling for Efficient Model Training 5
XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference 5
IFPV: An Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification 5
Exploitation of Hidden Context in Dynamic Movement Forecasting: A Neural Network Journey from Recurrent to Graph Neural Networks and General Purpose Transformers 5
Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought 5
Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers 5
Critic-Driven Voronoi-Quantization for Distilling Deep RL Policies to Explainable Models 5
Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations 5
Quantitative Video World Model Evaluation for Geometric-Consistency 5
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning 5
A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection 5
From User Preferences to Base Score Extraction Functions in Gradual Argumentation (with Appendix) 5
Personalized Digital Health Modeling with Adaptive Support Users 5
What Do EEG Foundation Models Capture from Human Brain Signals? 5
Sequential Resource Trading Using Comparison-Based Gradient Estimation 5
Safe Bayesian Optimization for Complex Control Systems via Additive Gaussian Processes 5
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning 5
Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation 5
Dual Ascent Diffusion for Inverse Problems 5
Distributions as Actions: A Unified Framework for Diverse Action Spaces 5
AVEX: What Matters for Animal Vocalization Encoding 5
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI 5
A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks 5
From Ranking to Reasoning: Explainable Web API Recommendation via Semantic Reasoning 5
Descriptor: Distance-Annotated Traffic Perception Question Answering (DTPQA) 5
AI-Driven Optimization under Uncertainty for Mineral Processing Operations 5
VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing 5
Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning 5
Krause Synchronization Transformers 5
ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning 5
Numerical exploration of the range of shape functionals using neural networks 5
Artificial Intelligence Specialization in the European Union: Underexplored Role of the Periphery at NUTS-3 Level 5
Grokking Finite-Dimensional Algebra 5
Gradient Iterated Temporal-Difference Learning 5
Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation 5
PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos 5
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory 5
Explainable Detection of Depression Status Shifts from User Digital Traces 5
GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives 5
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation 5
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs 5
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents 5
Unpredictability dissociates from structured control in language agents 5
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning 5
fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding 5
Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks 5
CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications 5
Residual Stream Duality in Modern Transformer Architectures 5
UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration 5
Evaluating Adaptive Personalization of Educational Readings with Simulated Learners 5
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising 5
Evolutionary Ensemble of Agents 5
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts 5
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale 5
Scaling few-shot spoken word classification with generative meta-continual learning 5
Announcing Genkit Middleware: Intercept, extend, and harden your agentic apps 5
Accelerating on-device AI: A look at Arm and Google AI Edge optimization 5
Build Long-running AI agents that pause, resume, and never lose context with ADK 5
Troubleshoot performance issues faster with the new Grafana Assistant integration for Database Observability 5
Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning 4
On Strong Equivalence Notions in Logic Programming and Abstract Argumentation 4
Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity 4
Parallelizing Counterfactual Regret Minimization 4
Learning Scenario Reduction for Two-Stage Robust Optimization with Discrete Uncertainty 4
BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring 4
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling 4
AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification 4
Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability 4
Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games 4
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients 4
Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry 4
Analog RF Computing: A New Paradigm for Energy-Efficient Edge AI Over MU-MIMO Systems 4
Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization 4
Optimal Pattern Detection Tree for Symbolic Rule-Based Classification 4
Energy-Efficient Quadruped Locomotion with Compliant Feet 4
A plug-and-play generative framework for multi-satellite precipitation estimation 4
Quantifying Cyber-Vulnerability in Power Electronics Systems via an Impedance-Based Attack Reachable Domain 4
Vision-Based Water Level and Flow Estimation 4
Spontaneous symmetry breaking and Goldstone modes for deep information propagation 4
Addressing Terminal Constraints in Data-Driven Demand Response Scheduling 4
REALM: Retrospective Encoder Alignment for LFP Modeling 4
SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition 4
Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication 4
MicroscopyMatching: Towards a Ready-to-use Framework for Microscopy Image Analysis in Diverse Conditions 4
Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition 4
Predicting Response to Neoadjuvant Chemotherapy in Ovarian Cancer from CT Baseline Using Multi-Loss Deep Learning 4
Generalized Priority-Aware Shapley Value 4
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction 4
Logging Policy Design for Off-Policy Evaluation 4
Evidential Reasoning Advances Interpretable Real-World Disease Screening 4
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry 4
How I use LLMs as a staff engineer in 2026 4
EP215: The Anatomy of an AI Agent 4
Sony tries to explain that its AI Camera Assistant doesn’t suck 4
Some Asexuals Are Using AI Companions for Intimacy Without the Sex 4
Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination 4
The Multi-View Paradigm Shift in MRI Radiomics: Predicting MGMT Methylation in Glioblastoma 4
Quotient-Space Diffusion Models 4
Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors 4
Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport 4
MLGIB: Multi-Label Graph Information Bottleneck for Expressive and Robust Message Passing 4
Constitutional Governance in Metric Spaces 4
AI-assisted testing, extensions updates, and more: k6 2.0 is here 4
Deciphering Neural Reparameterized Full-Waveform Inversion with Neural Sensitivity Kernel and Wave Tangent Kernel 3
Agreement, Diversity, and Polarization Indices for Approval Elections 3
Pluralistic: Making sense of Trump's unscheduled sudden midair disassembly of the American empire (16 May 2026) 3
Reading List 05/16/26 3
SQLAlchemy 2 In Practice - Chapter 8: SQLAlchemy and the Web 3
The Tomy Tutor and the state of 1983 home computers 3
A nicer voltmeter clock 3
ENSEMBITS: an alphabet of protein conformational ensembles 3
Superlinear Returns 3
How to Do Great Work 3
How to Get New Ideas 3
Eliminate noisy log lines with Adaptive Logs drop rules 3
Reddit Is Blocking Some Users From Accessing Its Website From Mobile Devices 2