Authors: || Published: 2025-05-19T10:35:00 || Updated: 2025-05-19T10:35:00
Categories: || Tags: || Post-format: link
Recent AI Reading [19 May 2025]
Papers
Agentic AI / AI Agents
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
- A-MEM: Agentic Memory for LLM Agents
- Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
- Why Do Multi-Agent LLM Systems Fail?
- Survey on Evaluation of LLM-based Agents
- Large Language Model Agent: A Survey on Methodology, Applications and Challenges
- How to think about agent frameworks
- Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge
AI Alignment with Human Feedback and Preferences, and other methods
- PILAF: Optimal Human Preference Sampling for Reward Modeling
- Aligning Multimodal LLM with Human Preference: A Survey
- Boost Your Human Image Generation Model via Direct Preference Optimization
- Reinforcement Learning for Reasoning in Large Language Models with One Training Example
- GitHub repo
- AlphaPO – Reward shape matters for LLM alignment
Large Language Models
- Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
- NoLiMa: Long-Context Evaluation Beyond Literal Matching
- Competitive Programming with Large Reasoning Models
- Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
- GPT-4.5 System Card
- CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- A Survey on Post-training of Large Language Models
- Measuring AI Ability to Complete Long Tasks
- Auditing Language Models for Hidden Objectives
- On the Biology of a Large Language Model
- Taming the Titans: A Survey of Efficient LLM Inference Serving
- LLMs for Engineering: Teaching Models to Design High Powered Rockets
- LLMs Get Lost In Multi-Turn Conversation
Retrieval-Augmented Generation
Synthetic Data
Miscellaneous
- Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
- Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
- Applications of Large Models in Medicine
Technical Reports
- Gemma 3 Technical Report
- Artificial Intelligence Index Report 2025 from Stanford University.
Articles and Blog Posts
- When Doctors With A.I. Are Outperformed by A.I. Alone
- Crossing the uncanny valley of conversational voice
- Reinforcement Learning from Verifiable Rewards
- Introducing HealthBench - An evaluation for AI systems and human health.