05-13 · LLM 最新论文速览今日候选池 318 篇，硬过滤 + LLM 打分后通过评估 72 篇，精选 Top-

今日候选池 318 篇，硬过滤 + LLM 打分后通过评估 72 篇，精选 Top-10，另列 62 篇速览。

关注方向：多 Agent 系统 / LLM 后训练（RL/SFT） / 扩散语言模型 / 推理加速 / 长上下文 / 量化交易

🌟 精选

1. Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

评分 8.8 · 方向 cs.CL · Computation and Language · arxiv 2605.13839 · PDF

💡 提出 TFlow，把多 agent 的隐藏状态编译成接收端低秩 LoRA 扰动，替代文本消息通信并降低 token 与 KV 开销。

多智能体 Agentic LoRA 权重空间通信

摘要：多智能体 LLM 往往靠自然语言通信，虽然直观，但会带来额外 token 生成、prefill 开销和 KV cache 内存负担。本文提出 TFlow（Thought Flow），将发送方的隐藏状态编译为接收方专属的瞬时低秩 LoRA 权重扰动，而不是写入上下文。该方法在推理时实现实例级适配，不永久改模也不扩展文本上下文。用 3 个 Qwen3-4B agent 实验显示，TFlow 相比单一接收器最高提升 8.5 个准确率点，并减少 32.69% 处理 token；相较文本通信三智能体基线，最高减少 83.27% token、加速 4.6 倍，且在 5 个基准中 4 个保持有竞争力精度。

评分细项：rel 9.5 / nov 8.8 / prac 8.7 / author 6.0

2. KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

评分 9.1 · 方向 cs.AI · Artificial Intelligence · arxiv 2605.13734 · PDF

💡 面向解耦式 LLM 服务提出 KVServe，用贝叶斯 profiling 与 bandit 控制器自适应选择 KV 压缩策略。

KV cache 推理加速 LLM serving 长上下文

Comments：Accepted by SIGCOMM 2026

摘要：在分离式 LLM serving 中，KV 状态需要跨网络和存储传输，常成为端到端瓶颈；而现有 KV cache 压缩多为静态配置，难适应不断变化的负载、带宽和 SLO 约束。本文提出 KVServe：首个面向服务、可自适应的 KV 通信压缩框架。它将压缩方法统一到模块化策略空间，引入 Bayesian Profiling Engine 高效搜索 Pareto 候选集，并通过在线控制器结合延迟模型与 bandit 动态选配。集成到 vLLM 后，在多模型、多数据集、多硬件环境下，PD 分离场景最高实现 9.13× JCT 加速，KV 分离场景最高降低 32.8× TTFT。

评分细项：rel 9.8 / nov 8.4 / prac 9.5 / author 7.5

3. MinT: Managed Infrastructure for Training and Serving Millions of LLMs

评分 8.9 · 方向 cs.AI · Artificial Intelligence · arxiv 2605.13779 · PDF

💡 提出 LoRA 后训练与在线服务基础设施 MinT，以适配器级 handoff 串联 rollout、更新、评测和多策略部署。

LLM后训练 LoRA 推理服务 基础设施

Comments：27 pages. Technical report. Mind Lab

摘要：MinT（MindLab Toolkit）是面向 LoRA 后训练与在线服务的托管基础设施，针对“少量昂贵基座模型支撑海量策略”的场景设计。其核心思路是不为每个策略保存合并后的完整 checkpoint，而是常驻 base model，仅在训练、导出、评测、服务与回滚链路中传递 LoRA adapter。系统在三方面扩展：Scale Up 支持 frontier 级 dense/MoE 架构；Scale Down 让仅适配器交接，4B 稠密模型和 30B MoE 分别提速 18.3× 和 2.85×；Scale Out 支持百万级可寻址策略目录与大规模并发加载。结果表明，MinT 可有效管理和服务百万级 LoRA 策略缓存。

评分细项：rel 9.6 / nov 8.4 / prac 9.7 / author 6.5

4. CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

评分 8.4 · 方向 cs.CL · Computation and Language · arxiv 2605.13295 · PDF

💡 CANTANTE 通过对同一 query 的多组联合 rollout 做 contrastive credit attribution，把系统级 reward 分解到各 agent prompt。

多智能体 agentic 信用分配 Prompt优化

摘要：多智能体 LLM 系统的效果通常只能在系统级观测，但可调参数分散在各 agent，本质上是 credit assignment 问题。本文提出 CANTANTE，通过比较同一查询下多组联合配置的 rollout，将系统级奖励分解为 agent 级更新信号，并将其用于 prompt 优化。与 GEPA、MIPROv2 在 MBPP、GSM8K、HotpotQA 上比较，CANTANTE 取得最佳平均排名，且稳定优于未优化提示词；相对最强基线，在 MBPP 和 GSM8K 上分别提升 18.9 和 12.5 个百分点，同时推理成本更低。信用相关性分析也表明其确实学到了有意义的 agent 级信号。

评分细项：rel 9.2 / nov 8.3 / prac 8.5 / author 4.5

5. Understanding and Accelerating the Training of Masked Diffusion Language Models

评分 8.4 · 方向 cs.CL · Computation and Language · arxiv 2605.13026 · PDF

💡 分析 masked diffusion LM 训练缓慢源于 locality bias，并提出 bell-shaped time sampling 将训练提速至约 4×。

扩散语言模型 训练加速 语言模型

Comments：Preprint

摘要：Masked diffusion language model（MDM）被视为自回归模型的潜在替代方案，但训练明显更慢，限制了其扩展。本文首先分析慢训练原因，指出关键在于语言的局部性偏置：一个 token 的预测信息主要集中在邻近位置，这会拖慢 MDM 学习。基于这一发现，作者提出简单有效的 bell-shaped time sampling 训练策略。在 One Billion Word Benchmark 上，该方法在保持最终性能的同时，使模型达到相同验证 NLL 的速度最高提升约 4×；在生成 perplexity、zero-shot perplexity 和下游任务表现上也展现出更快改进。

评分细项：rel 8.9 / nov 8.1 / prac 8.6 / author 6.5

6. SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

评分 8.4 · 方向 cs.MA · Multiagent Systems · arxiv 2605.13716 · PDF

💡 提出 SkillOps，用 Skill Contract 与分层 Skill Ecosystem Graph 维护 agent 技能库，提升检索、组合与执行成功率。

多Agent 技能库 Agent工程 开源

Comments：23 pages, 9 figures. Submitted to NeurIPS 2026. Code is available at github.com/Hik289/Skil…

摘要：随着 LLM agent 越来越依赖 skill library，多技能的增删、复用和修补会积累“技能技术债”，影响后续检索、组合与执行。本文提出 SkillOps，一个与具体方法无关的技能库维护插件框架。它用类型化 Skill Contract 表示技能，构建分层 Skill Ecosystem Graph，并从效用、兼容性、风险和验证等维度诊断库健康度，再产出可直接供现有检索/规划 agent 使用的维护后技能库。在 ALFWorld 上，SkillOps 作为独立 agent 达到 79.5% 成功率，较最强基线提升 8.8 个百分点，且无需额外任务时 LLM 调用；作为插件也能为多种基线带来 0.68–2.90 个百分点增益...

评分细项：rel 8.7 / nov 7.8 / prac 8.9 / author 7.5

7. FlowCompile: An Optimizing Compiler for Structured LLM Workflows

评分 8.3 · 方向 cs.CL · Computation and Language · arxiv 2605.13647 · PDF

💡 提出 FlowCompile，把结构化 LLM workflow 视作编译问题，通过设计空间搜索与子代理 profiling 生成精度-延迟配置集。

Agentic Workflow 编译器 系统优化

摘要：结构化 LLM workflow 由多个专用子 agent 按图执行，但如何在模型选择、推理预算和工作流结构之间平衡准确率与延迟，是一个组合优化难题。本文提出 FlowCompile，将该问题视为“编译”而非仅仅在线路由：在部署前全局探索设计空间，生成可复用的 workflow 级精度-延迟折中集合。方法上，它先分解工作流并对各子 agent 在不同配置下做 profiling，再借助结构感知代理模型组合估计整体精度与延迟，从而一次编译得到高质量配置集，无需重训练或在线适配。实验表明，FlowCompile 能在多种 workflow 和困难基准上有效发现优质折中方案。

评分细项：rel 8.8 / nov 7.8 / prac 8.8 / author 6.0

8. Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

评分 8.3 · 方向 cs.CL · Computation and Language · arxiv 2605.13641 · PDF

💡 提出 RDPO，先做 Magnitude-Aware Quantile 归一化，再用 Mahalanobis whitening 去相关，稳定多目标混合奖励 RL 后训练。

RLHF 后训练 多目标强化学习

摘要：多目标或混合奖励强化学习中，奖励分布异质且不同奖励维度常彼此相关，容易导致标量 advantage 构造不稳定。本文提出 Reward-Decorrelated Policy Optimization（RDPO）来同时处理这两类问题。RDPO 先用 Magnitude-Aware Quantile normalization 稳定二值、分数型和连续奖励下的 prompt 级优势分配，再在各活跃奖励子空间内施加 Mahalanobis whitening，降低相关性冗余后再聚合。应用于 LongCat-Flash 后训练后，RDPO 提升了指令跟随、写作质量和困难提示下的鲁棒性，同时在推理与代码评测上保持总体竞争力。

评分细项：rel 9.0 / nov 7.5 / prac 8.0 / author 7.0

9. ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

评分 8.3 · 方向 cs.MA · Multiagent Systems · arxiv 2605.12857 · PDF

💡 提出 ChipMATE，用 Verilog agent 与 Python reference-model agent 互验生成 RTL，并通过两阶段 RL 联合训练多代理协作。

多Agent RL后训练 代码生成 芯片

摘要：ChipMATE 面向工业 RTL 生成的真实痛点：现有 agent 系统依赖生成时可用的 golden testbench、闭源 API，且无法利用芯片厂商隔离环境中的私有 RTL 数据；而自训练方法又多为单轮生成，忽视验证环节。本文提出首个自训练多智能体框架 ChipMATE，由 Verilog 代理与 Python 参考模型代理相互校验，无需 golden oracle。方法上结合基于回溯的推理流程和“两阶段训练”策略，并构建可生成 64.4K 高质量参考模型样本的混合数据框架。实验显示，其在 VerilogEval V2 上以 4B/9B 基座模型分别达到 75.0%/80.1% pass@1，优于现有自训练模型，甚至超过更...

评分细项：rel 8.9 / nov 8.3 / prac 8.4 / author 5.5

10. Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

评分 8.5 · 方向 cs.AI · Artificial Intelligence · arxiv 2605.12969 · PDF

💡 从对比学习视角重写 GRPO，提出 ConSPO 用序列 log-prob 替代 clipped ratio，并做组内正负样本间隔优化。

RLVR GRPO 后训练

摘要：本文从对比学习视角重新审视具有可验证奖励的强化学习（RLVR）及代表算法 GRPO。作者指出，GRPO 可等价改写为带权的正负样本分数差优化，由此暴露两点缺陷：优化目标与真实生成 likelihood 不一致，以及 credit assignment 未利用同组正负 rollout 的相对分数差。为此提出 ConSPO（Contrastive Sequence-level Policy Optimization），用长度归一化的序列 log-probability 替代 clipped ratio 分数，并采用组内 InfoNCE 式对比目标，让正样本针对同组负样本进行区分。该方法使优化更贴合自回归生成，并能更有效放大难分正样本的更...

评分细项：rel 9.5 / nov 8.0 / prac 8.0 / author 5.0

📚 速览 · 其他通过评估的工作（62 篇）

一句话扫读，按评分从高到低；点击标题跳转 arxiv。

cs.CL 8.2 Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation · 💡 研究 strong-to-weak on-policy distillation 的 local teachability collapse，用 teacher margin 变点检测截断后缀密集监督。
cs.CL 8.2 Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling · 💡 SU-01 以 reverse-perplexity curriculum 做 SFT，再接两阶段 RL 与 test-time scaling，得到可处理 100K+ token 轨迹的竞赛推理模型。
cs.CL 8.2 GAGPO: Generalized Advantage Grouped Policy Optimization · 💡 GAGPO 用 grouped value proxy 近似值函数，结合 TD/GAE 式优势回传与 action-level importance ratio 做多轮 agent RL。
cs.CL 8.2 STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes · 💡 STOP 以 node segmentation、taxonomy annotation 和 ECN 最早正确节点裁剪长 CoT，减少低数据微调下的冗余推理。
cs.AI 8.4 Teacher-Guided Policy Optimization for LLM Distillation · 💡 提出 TGPO，在 on-policy LLM distillation 中利用 teacher 对 student rollout 的条件预测提供 dense directional guidance，改进 RKL。
cs.AI 8.4 N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation · 💡 提出 N-vium，在多层 exit head 上做 learned mixture 与 token-adaptive routing，实现精确生成且提速 57.9%。
cs.AI 8.4 GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training · 💡 提出 GRACE，以梯度对齐度和轨迹一致性给 CoT 步骤逐步打分，用单次前向代理梯度做后训练数据筛选。
cs.CL 8.0 From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning · 💡 将 SFT 数据选择改写为 fixed-pool recipe search，借助高斯过程排序、局部编辑与 warmup probes 搜索过滤混配流程。
cs.CL 7.7 TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment · 💡 提出 TokAlign++，把源/目标词表视作双语对齐并重排参数，实现新词表适配与 token-level distillation 的快速恢复。
cs.AI 8.0 Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging · 💡 提出 MultiSearch，用并行多查询检索与显式 merging 提升 RAG 推理信噪比，并以多进程奖励设计训练搜索代理。
cs.AI 8.0 Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning · 💡 提出 EGRSD 与 CL-EGRSD，在 on-policy 自蒸馏中用 teacher entropy gate 和 causal lookahead 调整 CoT token 级监督权重。
cs.CV 8.2 Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs · 💡 Realtime-VLA FLASH 为扩散式 VLA 引入 draft model、Action Expert 并行验证与 phase-aware fallback，将重规划延迟降至 19.1ms。
cs.CV 8.2 Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation · 💡 提出 Pyramid Forcing，按 Anchor/Wave/Veil 三类注意力头分配异构 KVCache，并用 ragged-cache attention 提升长视频生成质量。
cs.AI 7.9 High-Rate Quantized Matrix Multiplication II · 💡 把 reverse waterfilling 引入 GPTQ 权重量化，分析 WaterSIC 高码率失真界并改进 LLM 的 weight-only PTQ。
cs.CL 7.6 R^2-Mem: Reflective Experience for Memory Search · 💡 提出 R²-Mem，用 Rubric-guided Evaluator 打分历史轨迹并蒸馏 reflective experience，在线指导 memory search 动作。
cs.CL 7.6 OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention · 💡 提出 OSDN，为 Delta Rule 线性注意力加入在线对角预条件器与 APF 遗忘机制，保持 chunkwise parallel 管线。
cs.CV 8.1 Learning to See What You Need: Gaze Attention for Multimodal Large Language Models · 💡 Gaze Attention 将视觉 KV 按空间分组成 gaze regions，解码时动态选区并配合 context tokens，减少 MLLM 视觉注意力计算。
cs.AI 7.8 MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning · 💡 提出 Map-then-Act 范式，先做全局探索与任务地图构建，再以 cognitive map 增强执行，提升长时程交互式 agent 推理。
cs.CL 7.4 Context Training with Active Information Seeking · 💡 给 context optimizer 接入 Wikipedia 搜索与浏览器，并用 search-based training 维护和剪枝候选上下文。
cs.AI 7.7 MMSkills: Towards Multimodal Skills for General Visual Agents · 💡 提出 MMSkills，把文本 procedure、state cards 与 multi-view keyframes 组合成多模态技能包，供视觉代理运行时检索调用。
cs.CL 7.2 Cognifold: Always-On Proactive Memory via Cognitive Folding · 💡 提出 Cognifold，把事件流持续折叠为图拓扑认知结构，通过合并、衰减、联想重连与意图浮现实现 proactive memory。
cs.CL 7.2 Query-Conditioned Test-Time Self-Training for Large Language Models · 💡 QueST 在测试时从输入 query 构造 problem-solution 对，并用参数高效微调做 query-conditioned self-training。
cs.AI 7.5 GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models · 💡 提出 GRIP-VLM，把视觉 token 剪枝建模为 MDP，并用 GRPO 加预算感知 scorer 在任意压缩率下动态选择保留 token。
cs.AI 7.5 CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models · 💡 提出 LiteLVLM，利用反向 CLIP 文图相似度做免训练 token pruning，并回补上下文 token 以保留像素定位能力。
cs.CV 7.6 Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context · 💡 系统研究 LVLM 从 32K 扩到 128K 的 continued pretraining，比较长文档 VQA、长度分布与 retrieval-heavy 数据配比。
cs.CL 7.0 Many-Shot CoT-ICL: Making In-Context Learning Truly Learn · 💡 系统研究 many-shot CoT-ICL 的扩展规律，分析检索失效与顺序方差，并把其解释为 in-context test-time learning。
cs.AI 7.3 Harnessing Agentic Evolution · 💡 提出 AEvo，把 agentic evolution 表述为可编辑环境状态，由 meta-agent 修改搜索程序或上下文以驱动长程演化。
cs.AI 7.2 AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents · 💡 将软件代理运行时抽象为 AI Harness，定义 context selection、verification、project memory 等 11 个组件及 H0-H3 分级。
cs.CV 7.4 Qwen-Image-VAE-2.0 Technical Report · 💡 Qwen-Image-VAE-2.0 通过 Global Skip Connections、扩展 latent channels 与语义对齐训练，提升高压缩 VAE 重建与 diffusability。
cs.AI 7.0 Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning · 💡 系统研究 SFT 中数据难度与样本预算的耦合关系，用 generalization-extrapolation tradeoff 解释最优难度随预算上移。
cs.CV 7.0 RotVLA: Rotational Latent Action for Vision-Language-Action Model · 💡 用 SO(n) 连续旋转潜动作替代离散量化 LAM，并结合 flow-matching action head 做跨 embodiment VLA 预训练。
cs.CV 7.0 Test-time Sparsity for Extreme Fast Action Diffusion · 💡 为 action diffusion 提出 test-time sparsity，用共享编码轻量 pruner 和跨步缓存约束做动态残差裁剪加速。
cs.CL 6.9 Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory · 💡 提出 PMNet，用 Unitary Phasor Dynamics 与 Hierarchical Learnable Anchors 稳定 BPTT，扩展显式记忆到 85 槽层次树。
cs.CL 6.9 TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints · 💡 TruncProof 基于 LL(1) parser 估计补全 JSON 所需最小 token 数，在 grammar-constrained decoding 下满足长度上限。
cs.CL 6.8 Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents · 💡 提出 Persona Policies，把用户模拟器人格生成建模为 evolutionary program search，优化 Python generator 以覆盖多样行为模式。
cs.AI 6.9 Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models · 💡 提出 VLAs-as-Tools 与 TAPT，让高层 VLM 调用专用 VLA 工具并用 invocation-aligned 单元做后训练。
cs.CL 6.5 GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning · 💡 GateKD 用 confidence-gated soft supervision、hidden-state evolution 与 attention distillation，过滤教师推理中的低置信噪声。
cs.CL 6.5 RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search · 💡 在百度搜索中用 RAG+LLM 预测 query-specific validity horizon，替代固定时间窗做动态内容过期判断。
cs.MA 6.5 A Multi-Agent Orchestration Framework for Venture Capital Due Diligence · 💡 构建面向 VC 尽调的事件驱动多代理工作流，结合实时检索、注册库逆向抓取与 layout-aware OCR 生成结构化情报。
cs.AI 6.8 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation · 💡 提出 AnyFlow，以 flow-map transition learning 和 on-policy distillation 训练任意步数视频扩散模型。
cs.CL 6.4 When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction · 💡 提出 Goal Accessibility Ratio，并结合滑窗消融与 residual probe，解释多轮对话中指令记忆为何随注意力关闭而失效。
cs.MA 6.4 Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention · 💡 提出 DiffLNS，用离散扩散 D3PM 与 sparse social attention 生成 MAPF 初始联合轨迹，再交给 LNS2 修复冲突。
cs.AI 6.7 How to Interpret Agent Behavior · 💡 提出 ACT*ONOMY 三层行为分类体系，并用自动分析流水线解析智能体轨迹与执行日志中的动作模式。
cs.AI 6.7 Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning · 💡 提出 TCE，用双 score-based 生成模型合成目标域一致转移，并按目标接近度扩展 offline RL 覆盖范围。
cs.MA 6.3 Conveyor Parcel Routing with Order-Contiguous Arrivals · 💡 提出 DOPP 求解 online MAPF-OC，分层搜索订单到达序、个体优先级并用 prioritized planning 生成连续到货路径。
cs.AI 6.6 ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding · 💡 提出 ReTool-Video 与 134 项 MVTL 工具库，把高层视频意图递归落地为检索、聚合、重排等工具链。
cs.AI 6.6 Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence · 💡 从谱展平角度解释 Muon，把动量正交化重写为预条件更新，并证明稳定步长受平均奇异值控制。
cs.AI 6.6 AdaFocus: Adaptive Relevance-Diversity Sampling with Zero-Cache Look-back for Efficient Long Video Understanding · 💡 AdaFocus 用 Query-Aware Relevance-Diversity 采样生成视频预览，并以 uncertainty-triggered look-back 做零缓存长视频检索。
cs.AI 6.5 ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles · 💡 提出 ScioMind，将 memory-anchored belief update、层次记忆与检索式动态画像结合到多智能体社会模拟。
cs.AI 6.5 Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation · 💡 提出 CPPO，把 contrastive Q-values 转成 PPO 优势估计，在无奖励设置下进行 on-policy 对比强化学习。
cs.AI 6.5 Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy · 💡 提出 Q-Flow，用 flow policy 的确定性动力学把终点价值显式传播到中间潜变量，避免展开数值求解器做稳定 RL。
cs.AI 6.5 When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems · 💡 为 generate-verify 工作流设计 always-valid release wrapper，利用 hard-negative reference pool 与 e-process 实现可选停止下发布控制。
cs.CL 6.1 Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry · 💡 用 hidden-state transport geometry 做步骤级幻觉检测，借助 contrastive PCA 教师与 BiLSTM 学生定位首个推理错误。
cs.CL 6.1 PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents · 💡 提出 PersonalAI 2.0，把实体抽取、图遍历与 clue-query 生成组成多阶段 GraphRAG 规划式检索流程。
cs.CV 6.6 SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models · 💡 SceneGraphVLM 用 TOON 图序列化做 SFT，再以 hallucination-aware rewards 做 RL，生成图像与视频的紧凑 scene graph。
cs.CV 6.6 Asymmetric Flow Models · 💡 AsymFlow 用低秩噪声预测+全维数据预测的 rank-asymmetric velocity 参数化，解析恢复全维速度并支持 latent-to-pixel 微调。
cs.AI 6.2 D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models · 💡 提出 D-VLA，用 Plane Decoupling、四线程异步 Swimlane 和双池显存管理，提升 VLA 分布式 RL 的并发训练吞吐。
cs.AI 6.1 Language-Based Agent Control · 💡 提出 Language-Based Agent Control，让 agent 生成可类型检查的程序，并以静态类型加运行时约束统一控制工具访问。
cs.AI 6.0 FeatCal: Feature Calibration for Post-Merging Models · 💡 面向模型合并后的性能回退，FeatCal 依据 feature drift 做逐层前向校准，用闭式解更新权重而无需梯度优化。
cs.CV 6.2 QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling · 💡 提出 QLAM，把长序列隐状态表示为量子态超叠加，并用参数化量子电路替代 SSM 的线性递推记忆。
cs.CV 6.1 Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models · 💡 GTA-VLA 让用户以点、框、轨迹提供空间先验，并生成 unified spatial-visual Chain-of-Thought 驱动机器人动作。
cs.CV 6.0 RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data · 💡 RoboEvolve 让 VLM planner 与 VGM simulator 在昼夜双阶段中协同进化，用 near-miss 挖掘与渐进课程学习提升操控数据效率。

数据源：arxiv.org · 评分与中文摘要由 LLM 自动生成，仅供初筛参考