Authors: || Published: 2025-10-25T23:38:00 || Updated: 2025-10-25T23:38:00 || 4 min read
Categories: || Tags: || Post-format: link
Recent AI Reading [25 October 2025]
Papers
Agentic AI
- Reinforcement Learning for Long-Horizon Interactive LLM Agents
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
- ARE: scaling up agent environments and evaluations
- Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap
- Towards General Agentic Intelligence via Environment Scaling
- Democratizing AI scientists using ToolUniverse
- Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap
- Scaling Agents via Continual Pre-training
- Cross Topics: Training Paradigms
- EnvX: Agentize Everything with Agentic AI
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
- Cross Topics: Training Paradigms
- MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
- Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
- Cross Topics: Training Paradigms
- Mobile-Agent-v3: Fundamental Agents for GUI Automation
- FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
- AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
- A Survey on Agentic Security: Applications, Threats and Defenses
- Emergent Coordination in Multi-Agent Language Models
- Demystifying Reinforcement Learning in Agentic Reasoning
- Cross Topics: Training Paradigms
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- Agent Learning via Early Experience
- Fundamentals of Building Autonomous LLM Agents
- Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
- Thought Communication in Multiagent Collaboration
AI Alignment (with Human Preferences, and other methods)
- Data Shapley in One Training Run
- The Evolution of LLM Adoption in Industry Data Curation Practices
- GPDVal: Evaluation AI Model Performance on Real-World Economically Valuable Tasks
- The AI Productivity Index (APEX)
- HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
- Behavioral Fingerprinting of Large Language Models
Large Language Models
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- LLMs Can Get “Brain Rot”!
- Mathematical research with GPT-5: a Malliavin-Stein experiment
- A Survey on LLM-as-a-Judge
- TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
- Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
- When Does Reasoning Matter? A Controlled Study of Reasoning’s Contribution to Model Performance
- PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
- Why Language Models Hallucinate
- A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
- Cross Topics: Agentic AI
- Self-Adapting Language Models
- InvThink: Towards AI Safety via Inverse Reasoning
Training Paradigms
- Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
- Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
- RL’s Razor: Why Online Reinforcement Learning Forgets Less
- zELO: ELO-inspired Training Method for Rerankers and Embedding Models
- In Their Own Words: Reasoning Traces Tailored for Small Models Make Them Better Reasoners
- Tree Search for LLM Agent Reinforcement Learning
- Cross Topics: Agentic AI
- CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
- FlowRL: Matching Reward Distributions for LLM Reasoning
- Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting
- Cross Topics: AI Alignment
- A Survey of Reinforcement Learning for Large Reasoning Models
- RewardDance: Reward Scaling in Visual Generation
- Cross Topics: AI Alignment
- Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
- Cross Topics: AI Alignment
- TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
- Cross Topics: AI Alignment
- DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
- Cross Topics: AI Alignment
- ExGRPO: Learning to Reason from Experience
- Cross Topics: AI Alignment
- RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
- Cross Topics: AI Alignment
- Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
- Cross Topics: AI Alignment
- RLP: Reinforcement as a Pretraining Objective
- Is In-Context Learning Learning?
- The Art of Scaling Reinforcement Learning Compute for LLMs
- Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
- Training-Free Group Relative Policy Optimization
- Cross Topics: AI Alignment
Model Evaluation
Retrieval-Augmented Generation
- End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
- Cross Topics: Agentic AI, Training Paradigms
- On-Premise AI for the Newsroom: Evaluating Small Language Models for Investigative Document Search
- Cross Topics: Model Evaluation
- ModernVBERT: Towards Smaller Visual Document Retrievers
Embodied AI
- Embodied AI: From LLMs to World Models
- Robotic Control via Embodied Chain-of-Thought Reasoning
- From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
- Cross Topics: Agentic AI
Books
- Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory
- Understanding Deep Learning
Technical Reports
- rStar2-Agent: Agentic Reasoning Technical Report
- Cross Topics: Agentic AI
Articles and Blog Posts
- Defeating Nondeterminism in LLM Inference - Thinking Machines Lab
- Cross Topics: Large Language Models
- Improving Cursor Tab with online RL
- Cross Topics: Training Paradigms
- Building LangGraph: Designing an Agent Runtime from first principles
- Cross Topics: Agentic AI
- A Definition of AGI
Miscellaneous
No next post