Shiqiang Wang, Herbert Woisetschl\"ager, Hans Arno Jacobsen +1 more
arXiv:2605.18801v1 Announce Type: new
Abstract: Data is fundamental to large language models (LLMs). However, understanding of what makes certain data useful for different stages of an LLM workflow, including training, …
Yao Fehlis, Benjamin Bengfort, Zhangzhang Si +9 more
arXiv:2605.18818v1 Announce Type: new
Abstract: Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production…
arXiv:2605.18937v1 Announce Type: new
Abstract: Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially…
arXiv:2605.19008v1 Announce Type: new
Abstract: Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and…
arXiv:2605.19010v1 Announce Type: new
Abstract: Natural language to SQL (NL2SQL) conversion is an important problem for researchers and enterprises due to the ubiquitous importance of relational databases in…
arXiv:2605.19031v1 Announce Type: new
Abstract: Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance…
arXiv:2605.19035v1 Announce Type: new
Abstract: The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from…
arXiv:2605.19042v1 Announce Type: new
Abstract: Machine unlearning aims to remove the contribution of designated training data from a trained model while preserving performance on the remaining data. Existing work…
Zhiyuan Jerry Lin, Benjamin Letham, Samuel Dooley +2 more
arXiv:2605.19093v1 Announce Type: new
Abstract: System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune…
arXiv:2605.19099v1 Announce Type: new
Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL…
arXiv:2605.19127v1 Announce Type: new
Abstract: LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be …
arXiv:2605.19140v1 Announce Type: new
Abstract: We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact…
arXiv:2605.19151v1 Announce Type: new
Abstract: We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a…
Zhengxin Zhang, Ning Wang, Sainyam Galhotra +1 more
arXiv:2605.19156v1 Announce Type: new
Abstract: Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good…
arXiv:2605.19186v1 Announce Type: new
Abstract: Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, and invoke web services…
arXiv:2605.19192v1 Announce Type: new
Abstract: Multimodal agents use screenshots, documents, and webpages to choose tool calls. When a false visual claim triggers a click, email, extraction, or transfer, hallucination …
arXiv:2605.19215v1 Announce Type: new
Abstract: Adaptive decision-making in biological and artificial intelligence requires balancing the exploitation of known outcomes with the exploration of uncertain alternatives.…
Han Li, Vibhor Malik, Zahra Zanjani Foumani +17 more
arXiv:2605.19219v1 Announce Type: new
Abstract: A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance,…
arXiv:2605.19250v1 Announce Type: new
Abstract: Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To…
arXiv:2605.19260v1 Announce Type: new
Abstract: Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at…
arXiv:2605.19264v1 Announce Type: new
Abstract: Voting methods weighted by stakes are the fundamental governance paradigm in Proof-of-Stake (PoS) blockchains. Such a paradigm is known to be prone to power distortions:…
Md Mehrab Tanjim, Jayakumar Subramanian, Xiang Chen +6 more
arXiv:2605.19330v1 Announce Type: new
Abstract: LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic…
arXiv:2605.19337v1 Announce Type: new
Abstract: A growing body of work explores how Large Language Models (LLMs) can be embedded in trading systems as agents that perceive market information, retrieve context, reason…
arXiv:2605.19376v1 Announce Type: new
Abstract: How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence…
arXiv:2605.19418v1 Announce Type: new
Abstract: LLM-based multi-agent systems (MAS) have demonstrated strong reasoning and decision-making capabilities that consistently surpass those of single LLM agents. However,…
arXiv:2605.19447v1 Announce Type: new
Abstract: Reinforcement learning can train LLM agents from sparse task rewards, but long-horizon credit assignment remains challenging: a single success-or-failure signal must be…
arXiv:2605.19457v1 Announce Type: new
Abstract: Automated bidding is central to modern digital advertising. Early rule-based methods lacked adaptability, while subsequent Reinforcement Learning approaches modeled…
arXiv:2605.19461v1 Announce Type: new
Abstract: On-policy reinforcement learning methods like GRPO suffer from mode collapse: they exhibit reduced solution diversity, concentrating probability mass on a single solution …
arXiv:2605.19485v1 Announce Type: new
Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generating structured, step-by-step reasoning content. However,…
arXiv:2605.19514v1 Announce Type: new
Abstract: Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer…
Carla Castedo, Enrique Iglesias, Manuel Lama +3 more
arXiv:2605.19518v1 Announce Type: new
Abstract: Generating Knowledge Graphs (KGs) remains one of the most time-consuming and labor-intensive tasks for knowledge engineers, as they need to identify semantic equivalences …
Mohamed Ouaguenouni, Felipe Garrido-Lucero, Umberto Grandi +2 more
arXiv:2605.19521v1 Announce Type: new
Abstract: We analyze the structure of the disagreement among a population of voters over a set of alternatives. Surveys typically ask either for pairwise comparisons, simple and…
arXiv:2605.19529v1 Announce Type: new
Abstract: When the same LLM generates assessment items, simulates student responses, and scores them, the validation loop is self-referential. We introduce Generative-Evaluative…
arXiv:2605.19576v1 Announce Type: new
Abstract: Self-evolving skill libraries face a silent failure mode we term \emph{library drift}: unbounded skill accumulation without outcome-driven lifecycle management causes…
arXiv:2605.19587v1 Announce Type: new
Abstract: Indoor scene synthesis underpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the…
Mert Yildiz, Pietro Spadaccino, Alexey Rolich +2 more
arXiv:2605.19593v1 Announce Type: new
Abstract: Modern deployments of Large Language Models (LLMs) increasingly require serving multiple models with diverse architectures, sizes, and specialization on shared,…
arXiv:2605.19604v1 Announce Type: new
Abstract: Large Language Model (LLM) agents increasingly act inside real workspaces, where tools and skills determine whether model reasoning becomes reliable action. Existing…
arXiv:2605.19630v1 Announce Type: new
Abstract: With every advancement in generative AI models, forensics is under increasing pressure. The constant emergence of new generation techniques makes it impossible to collect …
arXiv:2605.19662v1 Announce Type: new
Abstract: Tabular foundation models based on pretrained prior-data fitted networks~(PFNs) have shown strong generalization on diverse tabular tasks, but they are typically designed …
arXiv:2605.19663v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) are becoming the cornerstone of high-level reasoning for robotic automation, enabling robots to parse natural language commands and perceive …
Jo Devriendt, Patrick De Causmaecker, Marc Denecker
arXiv:2605.19671v1 Announce Type: new
Abstract: Applying local search algorithms to combinatorial optimization problems is not an easy feat. Typically, human intervention is required to compile the constraints to input …
arXiv:2605.19674v1 Announce Type: new
Abstract: Strategic classification(SC) studies the interaction between decision models and agents who strategically manipulate their features for favorable outcomes. Existing SC…
arXiv:2605.19721v1 Announce Type: new
Abstract: Graph combinatorial optimization (GCO) has attracted growing interest, as many NP-hard problems naturally admit graph formulations, yet their combinatorial explosion…
Gioele Molinari, Florian Felten, Soheyl Massoudi +1 more
arXiv:2605.19743v1 Announce Type: new
Abstract: Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems…
arXiv:2605.19748v1 Announce Type: new
Abstract: Automatic generation of computer-aided design (CAD) models is a core technology for enabling intelligence in advanced manufacturing. Existing generation methods based on…
Yannis Bendi-Ouis (Mnemosyne), Romain de Coudenhove (ENS-PSL), Xavier Hinaut (Mnemosyne)
arXiv:2605.19758v1 Announce Type: new
Abstract: The ability to maintain and manipulate information over time is a fundamental aspect of living beings and Artificial Intelligence. While modern models have achieved…
arXiv:2605.19762v1 Announce Type: new
Abstract: Code has become a standard component of modern foundation language model (LM) training, yet its role beyond programming remains unclear. We revisit the claim that code…
Meisam Jamshidi Seikavandi, Alice Modica, Anna Obara +10 more
arXiv:2605.19765v1 Announce Type: new
Abstract: Existing affective-computing, social-signal-processing, and meeting corpora capture important parts of human interaction, but they rarely support analysis of affect in…
Pierre Boudart (SIERRA), Pierre Gaillard (Thoth), Alessandro Rudi (PSL +2 more
arXiv:2605.19768v1 Announce Type: new
Abstract: We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms…
arXiv:2605.19769v1 Announce Type: new
Abstract: We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1)…
arXiv:2605.19779v1 Announce Type: new
Abstract: We adapt split conformal prediction and adaptive conformal inference (ACI) to continuous AI agent evaluation, providing distribution-free coverage guarantees for…
arXiv:2605.19781v1 Announce Type: new
Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle…
Dmitry Redko (Applied AI Institute), Albert Fazlyev (AI Talent Hub, ITMO University) +5 more
arXiv:2605.19782v1 Announce Type: new
Abstract: LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery…
Ahmed Y. Gado, Omar Y. Goba, Alaa Hassanein +2 more
arXiv:2605.19824v1 Announce Type: new
Abstract: Recent attempts to support high-level scene interpretation and planning in Autonomous Vehicles (AVs) using ensembles of Large Language Models (LLMs) and Large Multimodal…
arXiv:2605.19826v1 Announce Type: new
Abstract: Operators of safety-critical industrial processes increasingly rely on digital twins to screen control interventions, but such simulators rarely carry certified safety…
arXiv:2605.19895v1 Announce Type: new
Abstract: Constraint programming practitioners accelerate hard problems through a layered set of techniques applied in order of risk. Standard hardening (symmetry-breaking and…
arXiv:2605.19932v1 Announce Type: new
Abstract: Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations,…
Rebecca Ramnauth, Drazen Brscic, Brian Scassellati
arXiv:2605.19940v1 Announce Type: new
Abstract: Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and…
Amin Sghaier, Ali Parviz, Alexia Jolicoeur-Martineau
arXiv:2605.19943v1 Announce Type: new
Abstract: Tiny Recursive Models (TRM) solve complex reasoning tasks with a fraction of the parameters of modern large language models (LLMs) by iteratively refining a latent state…
Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi +1 more
arXiv:2605.20006v1 Announce Type: new
Abstract: Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost…
Samuel Jacob Chacko, James Hugglestone, Chashi Mahiul Islam +1 more
arXiv:2605.20023v1 Announce Type: new
Abstract: Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task pass rates by an average of…
arXiv:2605.20025v1 Announce Type: new
Abstract: Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives,…
arXiv:2605.20072v1 Announce Type: new
Abstract: Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or…
arXiv:2605.20098v1 Announce Type: new
Abstract: Claim verification is an important problem in high-stakes settings, including health and finance. When information underpinning claims is incomplete or conflicting,…
arXiv:2605.20120v1 Announce Type: new
Abstract: AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which …
Utkarsh Tyagi, Xingang Guo, MohammadHossein Rezaei +5 more
arXiv:2605.20164v1 Announce Type: new
Abstract: Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model…
Salma Hoque Talukdar Koli, Fahima Haque Talukder Jely, Md. Samiul Alim +1 more
arXiv:2605.20167v1 Announce Type: new
Abstract: Flash floods in Bangladesh's haor wetlands show up with almost no warning. They wreck the annual boro rice harvest. Current setups, built for riverine floods, miss…
arXiv:2605.20173v1 Announce Type: new
Abstract: Production LLM agents combine stochastic model outputs with deterministic software systems, yet the boundary between the two is rarely treated as a first-class…
arXiv:2604.19892v1 Announce Type: cross
Abstract: Incremental Potential Contact (IPC) guarantees intersection-free simulation but suffers from high computational costs due to the expensive Hessian assembly and linear…
arXiv:2605.18758v1 Announce Type: cross
Abstract: Current benchmarks for graphical user interface (GUI) agents predominantly rely on static screenshots. However, real-world smartphone interaction routinely requires…
arXiv:2605.18759v1 Announce Type: cross
Abstract: Artificial intelligence (AI), exemplified by large language models (LLMs), is rapidly approaching and in some cases surpassing human performance across a wide range of…
arXiv:2605.18760v1 Announce Type: cross
Abstract: Graph Retrieval-Augmented Generation (GraphRAG) is dominated by a retrieve-then-reason paradigm, where context is retrieved using heuristics and then reasoned over.…
arXiv:2605.18762v1 Announce Type: cross
Abstract: Retrieval-Augmented Generation (RAG) is widely used to augment large language models with external knowledge retrieval to improve reliability and generalization.…
arXiv:2605.18763v1 Announce Type: cross
Abstract: Large language models (LLMs) are increasingly applied to analyzing wearable sensing data, which are long-term, multimodal, and highly personalized. A key challenge is…
arXiv:2605.18764v1 Announce Type: cross
Abstract: Artificial Intelligence (AI) pipelines have become integral to modern research, supporting fields such as Medical Sciences, Agriculture, and Social Sciences, and…
arXiv:2605.18765v1 Announce Type: cross
Abstract: To augment Large Language Models (LLMs) for multi-hop question answering, a mainstream solution within Graph Retrieval Augmented Generation (GraphRAG) leverages…
arXiv:2605.18766v1 Announce Type: cross
Abstract: Retrieving relevant tables from extensive databases for a given natural language query is essential for accurately answering questions in tasks such as text-to-SQL.…
arXiv:2605.18767v1 Announce Type: cross
Abstract: Multi-hop question answering requires aggregating information from multiple documents, a critical capability for knowledge-intensive applications. A fundamental…
arXiv:2605.18770v1 Announce Type: cross
Abstract: We present a collaborative agentic GraphRAG framework for expert analysis of commercial registry data. Public registries are often formally accessible, yet difficult to …
arXiv:2605.18772v1 Announce Type: cross
Abstract: Retrieval-Augmented Generation (RAG) improves the factual accuracy of large language model (LLM) outputs by grounding generation in external knowledge. Recent agentic…
arXiv:2605.18773v1 Announce Type: cross
Abstract: Traditional facility management often relies on centralized decision-making structures that limit stakeholder participation, leading to misalignment with occupant needs …
Payel Santra, Lavisha Sharma, Madhusudan Ghosh +1 more
arXiv:2605.18776v1 Announce Type: cross
Abstract: The rapid spread of misinformation on social media highlights the need for robust, automated fact correction frameworks. However, existing works rely on supervised…
arXiv:2605.18780v1 Announce Type: cross
Abstract: Reasoning-based Large Language Models (LLMs) like PO4ISR have set new benchmarks in session-based recommendation. However, the reproducibility of their reasoning…
Adiba Mahbub Proma, Neeley Pate, James N. Druckman +3 more
arXiv:2605.18781v1 Announce Type: cross
Abstract: Can LLMs simulate how humans form and change beliefs in social networks? We put this to the test by replicating an established study on belief dynamics, evaluating 12…
arXiv:2605.18784v1 Announce Type: cross
Abstract: The rapid diffusion of agentic AI has created a new coverage problem for commercial insurance: some AI-mediated losses are now affirmatively insured, some create…
Philipp Stecher, Sandro Radovanovi\'c, Vlasta Sikimi\'c +1 more
arXiv:2605.18789v1 Announce Type: cross
Abstract: Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find…
arXiv:2605.18793v1 Announce Type: cross
Abstract: Accurate spatiotemporal pattern analysis is critical in fields such as urban traffic, meteorology, and public health monitoring. However, existing methods face…
arXiv:2605.18794v1 Announce Type: cross
Abstract: Decoupling is a powerful modeling paradigm for representing multivariate functions as compositions of linear transformations and univariate nonlinear functions. A…
arXiv:2605.18797v1 Announce Type: cross
Abstract: Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer…
arXiv:2605.18799v1 Announce Type: cross
Abstract: Large language models can fail in critic interaction not only by answering incorrectly, but also by abandoning an initially correct scientific solution after user…
arXiv:2605.18800v1 Announce Type: cross
Abstract: Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary…
Ahmet H. G\"uzel, Jenny Seidenschwarz, Benjamin Graham +4 more
arXiv:2605.18803v1 Announce Type: cross
Abstract: Modern action-conditioned video world models achieve strong short-horizon visual realism, yet remain unreliable on rare, interaction-critical transitions that dominate…
arXiv:2605.18804v1 Announce Type: cross
Abstract: We propose Adaptive Multi-Scale Goodness Aggregation (AMSGA), a novel extension of the Forward-Forward (FF) algorithm designed to improve stability, robustness, and…
Asher Labovich, Benjamin Bradley, Vanessa Alexander +1 more
arXiv:2605.18807v1 Announce Type: cross
Abstract: Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic…
Joao Paulo Cavalcante Presa, Savio Salvarino Teles de Oliveira
arXiv:2605.18808v1 Announce Type: cross
Abstract: We characterize a compositional architecture of literary primitives in two instruction-tuned large language models (Llama 3.1 8B-Instruct and Gemma 2 9B-IT) via sparse…
arXiv:2605.18809v1 Announce Type: cross
Abstract: General-sum multi-agent learning is often governed by a stacked update field in which each agent's policy update changes the optimization landscape faced by the others. …
arXiv:2605.18810v1 Announce Type: cross
Abstract: Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel …
Sebastian Stapf, Pablo Acuaviva Huertos, Aram Davtyan +1 more
arXiv:2605.18813v1 Announce Type: cross
Abstract: World models aim to predict plausible futures consistent with past observations, a capability central to planning and decision-making in reinforcement learning. Yet,…
arXiv:2605.18816v1 Announce Type: cross
Abstract: Neural surrogates enable orders-of-magnitude acceleration of computational fluid dynamics (CFD) simulations, with the potential to transform engineering and healthcare…
arXiv:2605.18820v1 Announce Type: cross
Abstract: Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial …
arXiv:2605.18822v1 Announce Type: cross
Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and …
Mohammed Saidul Islam, Negin Baghbanzadeh, Farnaz Kohankhaki +5 more
arXiv:2605.18824v1 Announce Type: cross
Abstract: Evaluation of foundation models often rely on aggregate scores from benchmarks that lack comprehensive coverage and metadata for a fine-grained evaluation. We introduce …
arXiv:2605.18826v1 Announce Type: cross
Abstract: The attention interaction matrix $QK^{\top}$ contains two entangled computations: a skew-symmetric component that redistributes information between positions (routing)…
Bo Long, Deepak Agarwal, Jelena Markovic-Voronov +2 more
arXiv:2605.18832v1 Announce Type: cross
Abstract: The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications:…
arXiv:2605.18833v1 Announce Type: cross
Abstract: Automated data quality assessment is crucial for managing big data, but existing solutions face challenges in achieving accurate context-aware assessment. This paper…
arXiv:2605.18837v1 Announce Type: cross
Abstract: Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor…
arXiv:2605.18838v1 Announce Type: cross
Abstract: Scaling laws predict loss from compute but not how capabilities interact. We measure the coupling between reasoning and truthfulness across 63 base models from 16…
Orhun Vural, Abdulaziz Ahmed, Ferhat Zengul +2 more
arXiv:2605.18839v1 Announce Type: cross
Abstract: Overcrowding in emergency departments (ED) remains a persistent operational challenge worldwide, causing delays in care delivery and downstream congestion. ED boarding…
arXiv:2605.18840v1 Announce Type: cross
Abstract: Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and at the frontier, this…
arXiv:2605.18844v1 Announce Type: cross
Abstract: With the deep integration of the travel and energy industries, cross-industry supply chain finance has gradually become a high-risk field of hidden money laundering…
Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung +1 more
arXiv:2605.18845v1 Announce Type: cross
Abstract: We give the first quantitative prediction of grokking delay under AdamW. Treating the delay as a first-passage time, we derive a closed-form law T_grok - T_mem = (1 / 2 …
arXiv:2605.18846v1 Announce Type: cross
Abstract: Classical communication systems fail not only through random noise but also when transmitter and receiver use incompatible operational codebooks. Variational…
arXiv:2605.18847v1 Announce Type: cross
Abstract: Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal…
arXiv:2605.18848v1 Announce Type: cross
Abstract: This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by leveraging the exact…
arXiv:2605.18849v1 Announce Type: cross
Abstract: Explainability methods have progressed rapidly, but global explanations for time-series models remain underdeveloped, with most approaches focusing on local,…
Adrian Cierpka, Mohammad Shafiqul Islam, Johannes Steinh\"ulb +3 more
arXiv:2605.18850v1 Announce Type: cross
Abstract: We introduce KadiAssistant, a privacy-by-design AI assistant integrated into the Kadi research data ecosystem, enabling researchers to efficiently access, aggregate,…
arXiv:2605.18852v1 Announce Type: cross
Abstract: Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are …
Vyzantinos Repantis, Harshvardhan Singh, Tony Joseph +5 more
arXiv:2605.18857v1 Announce Type: cross
Abstract: For most of the history of information retrieval (IR), search results were designed for human consumers who could scan, filter, and discard irrelevant information on…
arXiv:2605.18858v1 Announce Type: cross
Abstract: Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is…
arXiv:2605.18859v1 Announce Type: cross
Abstract: LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many…
arXiv:2605.18862v1 Announce Type: cross
Abstract: Cardiovascular disease remains the leading cause of death worldwide, and early detection of arrhythmias through continuous ECG monitoring on wearable devices can…
arXiv:2605.18865v1 Announce Type: cross
Abstract: Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing…
arXiv:2605.18866v1 Announce Type: cross
Abstract: Reconstructing continuous flow fields from sparse surface-mounted sensors is central to aerodynamic design, flow control, and digital-twin instrumentation. Existing…
arXiv:2605.18867v1 Announce Type: cross
Abstract: Test-time model evolution offers a promising way for deployed models to improve from unlabeled test-time experience, yet most existing methods depend on backpropagation …
arXiv:2605.18868v1 Announce Type: cross
Abstract: While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks.…
Jan B\"ussing, Moritz Schlager, Timo Hei{\ss} +2 more
arXiv:2605.18869v1 Announce Type: cross
Abstract: Large language models (LLMs) achieve strong performance across a wide range of tasks but are highly sensitive to prompt design, motivating the need for automatic prompt …
Shireen Kudukkil Manchingal, Abhey Kalia, Fernanda Gon\c{c}alves +1 more
arXiv:2605.18871v1 Announce Type: cross
Abstract: When Large Language Models produce structured outputs such as travel plans, code solutions, or multi-step proofs, individual reasoning steps may appear correct while…
Shih-Yu Lai, Chia-Ching Yen, Yang-Ting Shen +3 more
arXiv:2605.18872v1 Announce Type: cross
Abstract: Robotic assembly in architectural construction faces a persistent bottleneck: existing planners are either highly specialized, requiring prohibitive retraining for…
arXiv:2605.18873v1 Announce Type: cross
Abstract: Training and evaluating false data injection attack (FDIA) detectors for power systems is constrained by data scarcity. Operational grid measurements are commercially…
arXiv:2605.18879v1 Announce Type: cross
Abstract: Large language models inevitably retain sensitive information, defined as inputs that may induce harmful generations, due to training on massive web corpora, raising…
arXiv:2605.18882v1 Announce Type: cross
Abstract: LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three…
Andrew Bukowski, Aditya Kothari, Simba Shi +1 more
arXiv:2605.18883v1 Announce Type: cross
Abstract: A diffusion model trained on Hamiltonian trajectories can achieve rollout MSE near $10^{-3}$, but the standard deviation of its energy over time is between 7500 and…
arXiv:2605.18885v1 Announce Type: cross
Abstract: We prove that the extremum stack of a discrete sequence is a minimal sufficient statistic for the class of all computable, causal, rate-independent functionals, in the…
Mohammed Aledhari, Ali Aledhari, Fatimah Aledhari +1 more
arXiv:2605.18889v1 Announce Type: cross
Abstract: Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft…
arXiv:2605.18890v1 Announce Type: cross
Abstract: The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power …
arXiv:2605.18891v1 Announce Type: cross
Abstract: Evaluations of unlearning on reasoning models sometimes show a bypass pattern. The answer side looks unlearned, but the model's own thinking trace keeps emitting the…
arXiv:2605.18904v1 Announce Type: cross
Abstract: Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively…
Abderrahim Bendahi, Adrien Fradin, Johan Peralez +2 more
arXiv:2605.18905v1 Announce Type: cross
Abstract: Neural operators have emerged as a powerful, discretization-invariant framework for solving partial differential equations (PDEs). Although established approaches like…
arXiv:2605.18911v1 Announce Type: cross
Abstract: Wildfire prediction is important for early warning and resource allocation, yet existing Earth foundation models (Earth FMs) are pretrained for general atmospheric and…
arXiv:2605.18913v1 Announce Type: cross
Abstract: The U.S. financial system processes approximately 1.3 million interbank transactions daily, yet no system in the reviewed literature models fraud propagation across the …
arXiv:2605.18915v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs,…
arXiv:2605.18916v1 Announce Type: cross
Abstract: We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally…
arXiv:2605.18918v1 Announce Type: cross
Abstract: Modern AI assistants are agentic. To answer a single user request, the underlying language model pulls in information from many sources, such as web searches, retrieved …
arXiv:2605.18919v1 Announce Type: cross
Abstract: Evolutionary algorithms for adversarial attacks leverage population-based search to discover perturbations without gradient information, but suffer from inefficient…
arXiv:2605.18920v1 Announce Type: cross
Abstract: Generative Recommendation (GR) has emerged as a promising paradigm by formulating item recommendation as a sequence-to-sequence generation task over item identifiers.…
Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou +1 more
arXiv:2605.18930v1 Announce Type: cross
Abstract: Memory-augmented large language model (LLM) agents use iterative reflection and self-evolution to solve complex tasks, but these mechanisms introduce security risks.…
Abdelhakim Ziani (MICS), Andras Horvath (UNITO), Paolo Ballarini (MICS)
arXiv:2605.18931v1 Announce Type: cross
Abstract: Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep…
Nikita Klimenko, Hesam Salehipour, Parham Eftekhar +2 more
arXiv:2605.18932v1 Announce Type: cross
Abstract: In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language…
Mohamed Bouadi, Nassim Bouarour, Varun Kulkarni +3 more
arXiv:2605.18971v1 Announce Type: cross
Abstract: What determines the quality of a tabular foundation model? Unlike language or vision, tabular foundation models acquire their inductive biases almost entirely from…
Federico Melis, Davide Bilardello, Emanuele Prato +2 more
arXiv:2605.18974v1 Announce Type: cross
Abstract: Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an …
arXiv:2605.18988v1 Announce Type: cross
Abstract: The expansion of Multimodal Large Language Models (MLLMs) and their integration into autonomous agentic workflows has introduced a non-stationary attack surface.…
Mihai Christodorescu, Earlence Fernandes, Ashish Hooda +11 more
arXiv:2605.18991v1 Announce Type: cross
Abstract: We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and…
Thomas Sommariva, Francesca Morandi, Simone Calderara +1 more
arXiv:2605.18993v1 Announce Type: cross
Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction.…
Ehsan Ahmadi, Hunter Schofield, Behzad Khamidehi +5 more
arXiv:2605.19033v1 Announce Type: cross
Abstract: Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent…
arXiv:2605.19043v1 Announce Type: cross
Abstract: Automated grading systems have enabled scalable assessment for many response types, but handwritten mathematics remains a barrier due to the complexity of multi-step…
arXiv:2605.19049v1 Announce Type: cross
Abstract: Linear attention has recently gained significant attention for long-context inference due to its constant decoding cost with respect to context length. However,…
arXiv:2605.19060v1 Announce Type: cross
Abstract: High-resolution 3D medical image generation remains challenging because fully volumetric models are computationally expensive, while efficient 2D slice generators often …
arXiv:2605.19064v1 Announce Type: cross
Abstract: Workforce transformations are difficult to forecast and costly to mismanage. In particular, the integration of artificial intelligence into knowledge work currently…
Sajjad Abdoli (MAD), Ghassan Al-Sumaidaee (MAD), Clayton W. Taylor (MAD) +3 more
arXiv:2605.19069v1 Announce Type: cross
Abstract: Code-switching -- the natural alternation between two languages within a single utterance -- represents one of the most challenging and under-studied conditions for…
Ziheng Chen, Xiaojun Wu, Bernhard Sch\"olkopf +1 more
arXiv:2605.19073v1 Announce Type: cross
Abstract: Representations on the Symmetric Positive Definite (SPD) manifold have garnered significant attention across different applications. In contrast, the manifold of…
arXiv:2605.19074v1 Announce Type: cross
Abstract: The rapid global expansion of solar photovoltaic (PV) capacity-reaching a record 597 GW in 2024-highlights the urgent need for robust forecasting models to mitigate the …
Mahesh Bhosale, Abdul Wasi, Vishvesh Trivedi +3 more
arXiv:2605.19075v1 Announce Type: cross
Abstract: Grounded multi-video question answering over real-world news events requires systems to surface query-relevant evidence across heterogeneous video archives while…
arXiv:2605.19080v1 Announce Type: cross
Abstract: In Online Continual Learning (OCL), a neural network sequentially learns from a non-stationary data stream in a single-pass with access only to a limited memory replay…
arXiv:2605.19092v1 Announce Type: cross
Abstract: Reasoning systems increasingly separate intermediate computation into private and public channels, creating evaluation cases that look similar in transcripts:…
arXiv:2605.19095v1 Announce Type: cross
Abstract: Schedule-Free Learning has shown promise as a practical anytime training method for machine learning, showing success across dozens of standard benchmark problems.…
Branden Frieden, James M. Ferguson, Alan Kuntz +1 more
arXiv:2605.19104v1 Announce Type: cross
Abstract: Continuum robots enable dexterous manipulation in constrained environments, but require accurate and efficient models for real-time manipulation and control.…
arXiv:2605.19111v1 Announce Type: cross
Abstract: Existing text-to-image (T2I) evaluation metrics mainly assess whether generated images align with information explicitly stated in the prompt, but often fail to capture …
Dongyan Lin, Phillip Rust, Angel Villar Corrales +19 more
arXiv:2605.19130v1 Announce Type: cross
Abstract: Children acquire language grounding with remarkable robustness from limited visuo-linguistic input in ways that surpass today's best large multimodal models. Recent…
Muskaan Chopra, Lorenz Sparrenberg, Jan H. Terheyden +1 more
arXiv:2605.19133v1 Announce Type: cross
Abstract: Self-supervised learning (SSL) is now a standard way to pretrain medical image models, but performance is still mostly judged by downstream accuracy. For…
Ayush Agarwal, Ansh Gandhi, Jeremy A. Collins +6 more
arXiv:2605.19138v1 Announce Type: cross
Abstract: The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a…
Diganta Misra, Antonio Orvieto, Rediet Abebe +1 more
arXiv:2605.19141v1 Announce Type: cross
Abstract: Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on…
arXiv:2605.19147v1 Announce Type: cross
Abstract: Large language models (LLMs) are highly susceptible to backdoor attacks (BAs), wherein training samples are poisoned using trigger-based harmful content. Furthermore,…
Aleksandar Terzi\'c, Francesco Carzaniga, Nicolas Menet +4 more
arXiv:2605.19150v1 Announce Type: cross
Abstract: State-space models (SSMs) face a fundamental trade-off between efficiency and expressivity that is mainly dictated by the structure of the model's transition matrix.…
arXiv:2605.19172v1 Announce Type: cross
Abstract: Forecasting urban delivery demand becomes substantially more challenging when newly added service regions lack historical records. Existing spatiotemporal forecasters…
arXiv:2605.19185v1 Announce Type: cross
Abstract: Sparse goal-conditioned planning with few cost-to-go labels can be viewed as a graph-PDE Dirichlet extension problem: extend sparse labels on a goal-dependent boundary…
Charvi Rastogi, Mukul Bhutani, Minsuk Kahng +13 more
arXiv:2605.19190v1 Announce Type: cross
Abstract: Despite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant…
arXiv:2605.19201v1 Announce Type: cross
Abstract: Deep learning models detect pneumonia from chest X-rays with high accuracy, but the performance declines under domain shifts caused by differences in devices, patients, …
Fausto Mauricio Lagos Suarez, Akshit Saradagi, Vidya Sumathy +2 more
arXiv:2605.19202v1 Announce Type: cross
Abstract: This paper addresses the problem of using a deep Reinforcement Learning (RL)-based low-level Quadrotor controller within an autonomous Quadrotor navigation stack for…
arXiv:2605.19207v1 Announce Type: cross
Abstract: Deep learning models have shown strong performance in medical image analysis, but deploying them in low-resource clinical environments remains difficult due to…
arXiv:2605.19218v1 Announce Type: cross
Abstract: Vision-Language Models suffer severe KV cache pressure at inference, as a single image often encodes into thousands of tokens. Most existing methods exploit token…
arXiv:2605.19220v1 Announce Type: cross
Abstract: Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that…
Tobias Braun, Jonas Henry Grebe, Hossein Shakibania +2 more
arXiv:2605.19227v1 Announce Type: cross
Abstract: Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared parameters and a…
arXiv:2605.19228v1 Announce Type: cross
Abstract: Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step …