🌟 本日のヘッドライン
OpenAI launches GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs
OpenAI released three production-ready real-time voice models marking a major leap in voice agent capability. GPT-Realtime-2 delivers GPT-5-level reasoning in live speech, achieving 96.6% accuracy on Big Bench Audio versus 81.4% for its predecessor—a 15-point performance jump. Key features include simultaneous multi-tool execution, thinking-while-speaking functionality, 128K context window (4x expansion), adjustable reasoning levels (minimal through xhigh), improved specialized terminology retention, graceful error handling, and audible task notifications. GPT-Realtime-Translate covers 70+ languages for real-time interpretation. GPT-Realtime-Whisper provides streaming transcription. Early-stage customers—Zillow (real estate), Priceline (travel bookings), Deutsche Telekom (customer support)—are already deploying these. The release signals industry shift from turn-based to continuous voice interactions, positioning audio as the primary interface for next-generation AI agents.
💬 編集コメント
技術指標の向上より、音声AIが実務ツールへ転換する分岐点が本質。128K文脈と複数ツール同時呼び出しは、単なるデモから実践的なボイスアシスタントへの進化を意味する。70言語対応は、グローバル音声ワークフロー戦略の表明。
10/10
テック
Anthropic published research on Natural Language Autoencoders, a breakthrough technique that decodes Claude's internal activations (the mathematical representation of what the model is thinking before generating output) into human-readable natural language.
10/10
新製品
Hugging Face expanded its Reachy Mini robot ecosystem by launching a dedicated app store, allowing non-technical users to build customized robotic applications without programming expertise. The platform currently hosts approximately 200 pre-built applications spanning office receptionists, baby monitors, cooking assistants, distraction trackers, and other use cases.
10/10
新製品
OpenAI released three new realtime audio models through its API platform: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 represents the major advancement—it quadruples the context window from 32K to 128K tokens, enabling AI to maintain longer conversations and customer histories during calls.
10/10
新製品
OpenAI has rolled out GPT-5.5 Instant as the default ChatGPT model for all users, replacing GPT-5.3 Instant (which remains available to paid subscribers for three more months). The upgrade delivers measurable accuracy improvements: in internal testing, GPT-5.5 Instant made 52.5% fewer false claims in high-stakes domains like law, finance, and medicine.
9/10
ニュース
Deepseek is planning a funding round up to $7.35 billion, the largest ever for a Chinese AI company, with Deepseek V4.1 launching in June. Concurrently, Core Automation—founded by ex-OpenAI researcher Jerry Tworek just six weeks ago—is targeting a $4 billion valuation, signaling explosive investor appetite for AI infrastructure startups.
9/10
ニュース
SoftBank has reduced a loan secured by OpenAI shares from $10 billion to approximately $6 billion. Lenders are reportedly reluctant to reliably assess the valuation of a private, unlisted company like OpenAI, reflecting broader concerns about valuing private AI companies.
新製品
OpenAI is releasing GPT-5.5-Cyber, a specialized model variant that rejects significantly fewer security requests and actively executes exploits against test servers. Access is restricted to verified critical infrastructure defenders including Cisco, CrowdStrike, and Cloudflare.
Databricks introduces Genie, a state-of-the-art data agent designed to answer complex questions over enterprise data. The agent represents a frontier in how AI can automate data analysis workflows and democratize data insights.
EMO是一种新型专家混合模型,通过端到端预训练使模块化结构直接从数据中涌现,无需依赖人类定义的先验。该模型允许在特定任务中仅使用12.5%的专家子集(即8个活跃专家中的部分),同时保持接近全模型的性能;当所有128个专家共同使用时,它仍作为强大的通用模型。
オピニオン
This paper argues that self-consistency—sampling multiple reasoning paths to select the most frequent answer—has become increasingly inefficient as models grow stronger. Using Gemini 2.5 models on benchmarks like HotpotQA, the authors show that accuracy gains diminish while computational costs rise.
Research across OLMo-3, Llama-3.1, Qwen3, and Mistral reveals an inverse correlation between model confidence and accuracy—models report highest confidence precisely when fabricating. AUC ranges from 0.28 to 0.36 where 0.5 is random chance, suggesting this is an observability problem, not a capability gap.
This paper introduces ANGOFA, four tailored pre-trained language models for Angolan languages, addressing the gap in multilingual NLP for very-low resource languages. The approach leverages OFA embedding initialization and synthetic data generation.
業界分析
Comparative study across five frontier LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, Qwen3.5 397B) examining whether reasoning mode changes moral judgments. Results show statistically consistent moral verdict agreement between instant and thinking modes (Krippendorff's alpha: 0.78 vs 0.79).
Databricks explores how AI can address the growing capacity challenge in HR departments by automating routine administrative tasks and augmenting human capabilities. AI-powered solutions enable HR teams to scale their impact without proportional team expansion, tackling critical challenges in recruitment, onboarding, and employee retention.
This case study demonstrates how real-time analytics powers energy trading operations, enabling traders to forecast prices and optimize trading decisions in volatile markets. Advanced analytics help identify trading opportunities and manage risk dynamically, critical for maintaining competitive advantage in commodity trading where milliseconds matter.