Even the latest AI models make three systematic reasoning errors, ARC-AGI-3 analysis shows
The ARC Prize Foundation analyzed 160 game runs of OpenAI's GPT-5.5 and Anthropic's Opus 4.7 on the ARC-AGI-3 benchmark. Three systematic error patterns explain why both models stay below 1 percent on tasks that humans can solve without much trouble. The article <
xAI now lets developers clone their own voices for AI applications. The new "Custom Voices" feature builds on the recently launched Grok Speech-to-Text and Text-to-Speech APIs. The article <a href="https://the-decoder.com/xais-new-custom-voices-feature-turns-a-m
/elsewhere/sightings/ I have a new camera (a Canon R6 Mark II) so I'm taking a lot more photos of birds. I share my best wildlife photos on iNaturalist , and based on yesterday's successful prototype I decided to add those to my blog. <img alt="Screenshot of a "Sightings" webpage with a s
arXiv:2604.27251v1 Announce Type: new Abstract: Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as i
Routiium is a policy-governed LLM gateway enabling instructors to control AI assistance timing, content, and cost in engineering labs. The system balances providing sufficient help with preserving learning opportunities through configurable prompt and model management.
This paper investigates how LLMs respond when explicitly instructed to underperform on multiple-choice evaluations. Using Llama-3-8B and Llama-3.1-8B on 2,000 MMLU-Pro items across varying instruction-specificity gradients, researchers examine whether models engage with question content or collapse into positional shortcuts. Results reveal a critical boundary where instruction complexity determines content engagement versus position-based heuristics.
This paper examines how activation function design influences neural network plasticity in continual learning scenarios. Unlike standard i.i.d. training where activation differences diminish with proper tuning, continual learning reveals distinct effects: models can progressively lose adaptation ability beyond catastrophic forgetting. The study investigates how activation choices sustain or undermine plasticity across sequential tasks.
This theoretical work develops a geometric and information-theoretic framework for encoder-decoder learning based on the Information Bottleneck principle. By recasting representation learning as rate-distortion with KL divergence, authors prove optimal representations at any distortion level form soft clusterings of the predictive manifold, enabling linear decoders. This explains why self-supervised encoders naturally produce normally-distributed features.
This study challenges whether sparse autoencoders (SAEs), widely used for extracting interpretable features from neural networks, actually capture concept manifolds. While SAEs assume concepts correspond to independent linear directions, evidence suggests many concepts organize along low-dimensional manifolds with continuous geometric relationships. The paper addresses fundamental questions about SAE interpretation under this manifold perspective.
Researchers audited five frontier vision-language models (Gemini 2.5 Pro, GPT-5, o3, GLM-4.5V, Qwen 2.5 VL) on medical visual question answering. The study revealed critical failures in anatomical localization across all models, raising significant safety concerns for clinical deployment.
A comprehensive review examining security risks in autonomous agent frameworks built on LLMs. The paper analyzes attack surfaces beyond prompt injection, including tool integration, continuous operation, and system-level vulnerabilities as agents become increasingly complex.
This study evaluates three small language models (EuroLLM, Aya Expanse, Gemma) on preserving fine-grained emotions during machine translation. Using the GoEmotions dataset with 28 emotion categories, the research reveals challenges in maintaining emotional fidelity alongside semantic accuracy.
A comprehensive survey of 55 key studies on AI methods for depression detection and diagnosis. The review examines how machine learning and AI can develop objective, scalable diagnostic tools to complement subjective clinical assessments for Major Depressive Disorder.
This study examines how freelance knowledge workers leverage generative AI tools like ChatGPT to acquire new skills in competitive online labor markets. Unlike traditional employees with organizational training infrastructure, freelancers lack formal mentorship. The research explores how AI-powered learning tools reshape emerging skill demands and provide on-demand support for career advancement.
Researchers presented an agentic framework that constructs knowledge graphs from AI policy documents to support compliance reasoning. The system demonstrates how structured knowledge representation can enhance policy-based reasoning for AI governance and safety compliance.
Researchers adapted the Reliable Change Index from clinical psychology to detect statistically significant LLM version differences. Testing Llama 3→3.1 and Qwen 2.5→3 transitions, they found most performance changes were not statistically significant across analyzed items.
The paper introduces the concept of knowledge affordance to systematize how humans and AI agents identify information-seeking opportunities in hybrid environments. The framework helps agents determine when to query humans versus AI systems, improving collaboration efficiency.