Xiaohu AI Daily — 2026-05-02

✓ Link copied

DAILY DIGEST

2026-05-02

Sat · 10:22:14 generated

Sources

135

Items

437

Score 8+

Clusters

🌟 Today's Headline

GPT-5.5 matches Claude Mythos in cyber attack tests, UK AI Security Institute finds

OpenAI's GPT-5.5 became the second AI model to autonomously solve a full network attack simulation, according to the UK AI Security Institute. Its red-teaming performance nearly matches Claude Mythos, though Claude Mythos remains unavailable to the general public.

🔥Today's Highlights

Google Deepmind's "AI co-clinician" beats GPT-5.4 in blind doctor tests but still trails experienced physicians

9/10 New Product

Google Deepmind developed an AI co-clinician system that outperformed GPT-5.4 in blind physician evaluation tests. While showing promise in clinical simulations, the system still falls short of experienced physicians, highlighting both AI's potential and current limitations in healthcare.

Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation

9/10 Tutorial

Researchers adapted the Reliable Change Index from clinical psychology to detect statistically significant LLM version differences on MMLU-Pro benchmarks. Testing Llama 3→3.1 and Qwen 2.5→3 transitions, they found most items showed no reliable change, highlighting measurement reliability challenges.

Big tech's AI spending balloons to $725 billion this year

9/10 Industry

According to the Financial Times, Google, Amazon, Microsoft, and Meta have a combined AI budget of approximately $725 billion for 2026, covering infrastructure, chips, and data centers. This reflects the escalating commitment of big tech companies to AI infrastructure development.

rust-v0.129.0-alpha.3

9/10 New Product

Rust programming language released version 0.129.0-alpha.3. This alpha pre-release includes bug fixes and improvements for developers using the language.

0.129.0-alpha.2

9/10 New Product

Rust programming language released version 0.129.0-alpha.2. This alpha pre-release contains updates and improvements for the language ecosystem.

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

9/10 Tutorial

Researchers presented an agentic framework that constructs knowledge graphs from AI policy documents to support compliance reasoning. The system demonstrates how structured knowledge representation can enhance policy-based reasoning for AI governance and safety compliance.

📖Worth a Deep Read

🕐 ~3 min read · Tutorial 9/10

Knowledge Affordances for Hybrid Human-AI Information Seeking

💡 Can be adapted into tutorial material

The paper introduces the concept of knowledge affordance to systematize how humans and AI agents identify information-seeking opportunities in hybrid environments. The framework helps agents determine when to query humans versus AI systems, improving collaboration efficiency.

🕐 ~3 min read · Tutorial 9/10

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

💡 Can be adapted into tutorial material

Claw-Eval-Live is a live benchmark for evaluating workflow agents on evolving real-world tasks. Unlike static benchmarks, it separates signal and grading layers, supporting continuous updates and execution verification across software tools and business services.

🕐 ~3 min read · Industry 9/10

AI Models for Depressive Disorder Detection and Diagnosis: A Review

💡 Industry trends and analysis

A comprehensive survey of 55 key studies on AI methods for depression detection and diagnosis. The review examines how machine learning and AI can develop objective, scalable diagnostic tools to complement subjective clinical assessments for Major Depressive Disorder.

🕐 ~3 min read · Tutorial 9/10

Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning

💡 Can be adapted into tutorial material

Proposes a Transformer-based actor-critic reinforcement learning approach for optimizing virtualized network function management in 6G networks. The method targets ultra-low latency and high bandwidth requirements through improved service function chain partitioning.

🕐 ~3 min read · Opinion 9/10

NanoKnow: How to Know What Your Language Model Knows

💡 Views and arguments worth studying

Using the open nanochat LLM family with fully transparent pre-training data, researchers investigate how LLMs encode and retrieve knowledge from training. The study reveals parametric knowledge sources and mechanisms, advancing understanding of language model internals.

📂Browse by Category

New Product

Claude Jupiter Model Leaks Ahead of Anthropic Developer Conference

A new Claude model labeled claude-jupiter-v1-p has surfaced in red-team testing ahead of Anthropic's Code with Claude developer conference on May 6, 2026. TestingCatalog discovered the model undergoing internal security testing, typically indicating a release candidate.

Google Photos AI Virtual Closet and Try-On Feature

Google Photos has announced a new AI-powered feature that transforms clothing photos into a digital wardrobe for virtual outfit planning and try-on. The system uses advanced computer vision to identify and extract clothing items from photos, automatically adding them to a virtual closet. Users can then experiment with different outfit combinations digitally before wearing them in real life.

Anthropic launches Claude Security to give defenders the same AI edge attackers already have

Anthropic launched Claude Security, providing cybersecurity defenders with advanced AI-powered threat detection and response capabilities. The tool leverages offensive AI capabilities previously restricted in other models, now repurposed for defensive security operations.

Opinion

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Researchers audited five frontier vision-language models (Gemini 2.5 Pro, GPT-5, o3, GLM-4.5V, Qwen 2.5 VL) on medical visual question answering. The study revealed critical failures in anatomical localization across all models, raising significant safety concerns for clinical deployment.

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

A comprehensive review examining security risks in autonomous agent frameworks built on LLMs. The paper analyzes attack surfaces beyond prompt injection, including tool integration, continuous operation, and system-level vulnerabilities as agents become increasingly complex.

Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation

This study evaluates three small language models (EuroLLM, Aya Expanse, Gemma) on preserving fine-grained emotions during machine translation. Using the GoEmotions dataset with 28 emotion categories, the research reveals challenges in maintaining emotional fidelity alongside semantic accuracy.

Industry

Upskilling with Generative AI: Practices and Challenges for Freelance Knowledge Workers

This study examines how freelance knowledge workers leverage generative AI tools like ChatGPT to acquire new skills in competitive online labor markets. Unlike traditional employees with organizational training infrastructure, freelancers lack formal mentorship. The research explores how AI-powered learning tools reshape emerging skill demands and provide on-demand support for career advancement.

Self-Driving Cars Creating Emergency Response Challenges

Emergency first responders have formally notified federal regulators that autonomous vehicles, particularly Waymo, are creating operational challenges. Self-driving cars have repeatedly frozen during normal operations and sometimes blocked access to fire stations, delaying emergency response.

AI Accounted for 75% of U.S. GDP Growth in Q1 2026

Analysis of first-quarter 2026 U.S. GDP shows that AI-related investment and economic activity accounted for approximately 75% of total economic growth. This figure underscores AI's dominant role in the broader U.S. economy, driven by massive capital investments in infrastructure, model development, and deployment across industries.

Tech

OpenAI Achieves 10-Gigawatt Compute Target Ahead of Schedule

OpenAI has reached its 10-gigawatt computational capacity target ahead of schedule, a major infrastructure milestone. This massive computing power provides the foundation needed to train and deploy advanced AI models at scale. The accelerated timeline indicates OpenAI's infrastructure buildout is progressing faster than originally planned, potentially enabling faster model development cycles.

Tutorial

OpenAI o1 System Card

OpenAI released the system card for the o1 model series, detailing large-scale reinforcement learning training for chain-of-thought reasoning. The document highlights advances in safety and robustness, including deliberative alignment for handling unsafe prompts.

Proactive Dialogue Model with Intent Prediction

Introduces a lightweight intent-transition prior derived from temporal Bayesian networks to enable proactive dialogue prediction. By injecting this prior into system prompts, the model anticipates user intents, reducing redundant interactions in multi-intent conversations.

JaiTTS: A Thai Voice Cloning Model

JaiTTS-v1.0 is a state-of-the-art Thai voice cloning TTS model built on large Thai speech corpus. Based on VoxCPM architecture, it directly handles numerals and Thai-English code-switching without explicit text normalization, enabling high-quality speech generation in realistic settings.

📎 Long Tail (184) · click to expand

Learning Rate Transfer in Normalized Transformers 5

Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval 5

AI will create jobs 5

AI uses less water than the public thinks 5

How Meta Is Strengthening End-to-End Encrypted Backups 5

The Federal Data Paradox: Rich in Data, Poor in Access 5

Unpacking Vibe Coding: Help-Seeking Processes in Student-AI Interactions While Programming 5

Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings 5

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis 5

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms 5

The Two Boundaries: Why Behavioral AI Governance Fails Structurally 5