Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
Chenming He
Yanyong Zhang
Kai Wang
Dequan Wang
Lingyu Wang
the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
Preview abstract
Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises.
Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
A Novel CI Coding Strategy Based on a Cochlear Model and Deep Neural Network
Maryam Hosseini
Tim Brochier
Zachary Smith
Brett Swanson
Andrew Vandali
Alan Kan
Fadwa Alnafjan
Kat Fernandez
Conference on Implantable Auditory Prostheses 2025
Preview abstract
Objective: Many CI recipients face difficulties in understanding speech in noisy
environments and express frustration with the quality of music. This may be partly due
to the simple filter banks used in current CI technology, which do not fully replicate the
natural processes of the cochlea. This project aims to improve CI perception by more
accurately mimicking the responses of the auditory nerve.
Method: Audio signals were applied to CARFAC (Cascade of Asymmetric Resonators
with Fast-Acting Compression) [1] to produce a representation of the auditory nerve
response, known as a normal hearing (NH) “neurogram”. The NH neurogram was
down-sampled and applied to a deep neural network (DNN) to produce 22 electrode
stimulation currents. These currents were applied to an electrical hearing (EH) model
incorporating current spread, neural adaptation, and refractoriness, to produce a CI
neurogram. The DNN was trained on sentences from the TIMIT database to minimise
the difference between the NH and CI neurograms.
Results: The CI neurograms produced by the CARFAC-DNN strategy were more similar
to the NH neurograms than the CI neurograms produced by the Nucleus ACE strategy.
Similarity was quantified by the structural similarity index and mean squared error.
Conclusions: The CARFAC-DNN strategy may provide a more natural auditory nerve
response than traditional CI sound coding strategies. A sound-booth study with CI
recipients is planned.
This work was funded by Google through the Australian Future Hearing Initiative.
References:
[1] Lyon, R. F. (2017). Human and machine hearing. Cambridge University Press.
View details
Visualizing dynamics of charges and strings in (2 + 1)D lattice gauge theories
Tyler Cochran
Bernhard Jobst
Yuri Lensky
Gaurav Gyawali
Norhan Eassa
Melissa Will
Aaron Szasz
Dmitry Abanin
Rajeev Acharya
Laleh Beni
Trond Andersen
Markus Ansmann
Frank Arute
Kunal Arya
Abe Asfaw
Juan Atalaya
Brian Ballard
Alexandre Bourassa
Michael Broughton
David Browne
Brett Buchea
Bob Buckley
Tim Burger
Nicholas Bushnell
Anthony Cabrera
Juan Campero
Hung-Shen Chang
Jimmy Chen
Benjamin Chiaro
Jahan Claes
Agnetta Cleland
Josh Cogan
Roberto Collins
Paul Conner
William Courtney
Alex Crook
Ben Curtin
Sayan Das
Laura De Lorenzo
Paul Donohoe
ILYA Drozdov
Andrew Dunsworth
Alec Eickbusch
Aviv Elbag
Mahmoud Elzouka
Vinicius Ferreira
Ebrahim Forati
Austin Fowler
Brooks Foxen
Suhas Ganjam
Robert Gasca
Élie Genois
William Giang
Dar Gilboa
Raja Gosula
Alejo Grajales Dau
Dietrich Graumann
Alex Greene
Steve Habegger
Monica Hansen
Sean Harrington
Paula Heu
Oscar Higgott
Jeremy Hilton
Robert Huang
Ashley Huff
Bill Huggins
Cody Jones
Chaitali Joshi
Pavol Juhas
Hui Kang
Amir Karamlou
Kostyantyn Kechedzhi
Trupti Khaire
Bryce Kobrin
Alexander Korotkov
Fedor Kostritsa
John Mark Kreikebaum
Vlad Kurilovich
Dave Landhuis
Tiano Lange-Dei
Brandon Langley
Kim Ming Lau
Justin Ledford
Kenny Lee
Loick Le Guevel
Wing Li
Alexander Lill
Will Livingston
Aditya Locharla
Daniel Lundahl
Aaron Lunt
Sid Madhuk
Ashley Maloney
Salvatore Mandra
Leigh Martin
Orion Martin
Cameron Maxfield
Seneca Meeks
Anthony Megrant
Reza Molavi
Sebastian Molina
Shirin Montazeri
Ramis Movassagh
Charles Neill
Michael Newman
Murray Ich Nguyen
Chia Ni
Kris Ottosson
Alex Pizzuto
Rebecca Potter
Orion Pritchard
Ganesh Ramachandran
Matt Reagor
David Rhodes
Gabrielle Roberts
Kannan Sankaragomathi
Henry Schurkus
Mike Shearn
Aaron Shorter
Vladimir Shvarts
Vlad Sivak
Spencer Small
Clarke Smith
Sofia Springer
George Sterling
Jordan Suchard
Alex Sztein
Doug Thor
Mert Torunbalci
Abeer Vaishnav
Justin Vargas
Sergey Vdovichev
Guifre Vidal
Steven Waltman
Shannon Wang
Brayden Ware
Kristi Wong
Cheng Xing
Jamie Yao
Ping Yeh
Bicheng Ying
Juhwan Yoo
Grayson Young
Yaxing Zhang
Ningfeng Zhu
Yu Chen
Vadim Smelyanskiy
Adam Gammon-Smith
Frank Pollmann
Michael Knap
Nature, 642 (2025), 315–320
Preview abstract
Lattice gauge theories (LGTs) can be used to understand a wide range of phenomena, from elementary particle scattering in high-energy physics to effective descriptions of many-body interactions in materials. Studying dynamical properties of emergent phases can be challenging, as it requires solving many-body problems that are generally beyond perturbative limits. Here we investigate the dynamics of local excitations in a LGT using a two-dimensional lattice of superconducting qubits. We first construct a simple variational circuit that prepares low-energy states that have a large overlap with the ground state; then we create charge excitations with local gates and simulate their quantum dynamics by means of a discretized time evolution. As the electric field coupling constant is increased, our measurements show signatures of transitioning from deconfined to confined dynamics. For confined excitations, the electric field induces a tension in the string connecting them. Our method allows us to experimentally image string dynamics in a (2+1)D LGT, from which we uncover two distinct regimes inside the confining phase: for weak confinement, the string fluctuates strongly in the transverse direction, whereas for strong confinement, transverse fluctuations are effectively frozen. We also demonstrate a resonance condition at which dynamical string breaking is facilitated. Our LGT implementation on a quantum processor presents a new set of techniques for investigating emergent excitations and string dynamics.
View details
Preview abstract
Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution.
View details
RemapRoute: Local Remapping of Internet Path Changes
renata cruz teixeira
Christophe Diot
italo cunha
Elverton Fazzion
Darryl Veitch
2025
Preview abstract
Several systems rely on traceroute to track a large number of Internet paths as they change over time. Monitoring systems perform this task by remapping paths periodically or whenever a change is detected. This paper shows that such complete remapping is inefficient, because most path changes are localized to a few hops of a path. We develop RemapRoute, a tool to remap a path locally given the previously known path and a change point. RemapRoute sends targeted probes to locate and remap the often few hops that have changed. Our evaluation with trace-driven simulations and in a real deployment shows that local remapping reduces the average number of probes issued during remapping by 63% and 79%, respectively, when compared with complete remapping. At the same time, our results show that local remapping has little impact on the accuracy of inferred paths.
View details
Preview abstract
Given copies of a quantum state $\rho$, a shadow tomography protocol aims to learn all expectation values from a fixed set of observables, to within a given precision $\epsilon$. We say that a shadow tomography protocol is \textit{triply efficient} if it is sample- and time-efficient, and only employs measurements that entangle a constant number of copies of $\rho$ at a time. The classical shadows protocol based on random single-copy measurements is triply efficient for the set of local Pauli observables. This and other protocols based on random single-copy Clifford measurements can be understood as arising from fractional colorings of a graph $G$ that encodes the commutation structure of the set of observables. Here we describe a framework for two-copy shadow tomography that uses an initial round of Bell measurements to reduce to a fractional coloring problem in an induced subgraph of $G$ with bounded clique number. This coloring problem can be addressed using techniques from graph theory known as \textit{chi-boundedness}. Using this framework we give the first triply efficient shadow tomography scheme for the set of local fermionic observables, which arise in a broad class of interacting fermionic systems in physics and chemistry. We also give a triply efficient scheme for the set of all $n$-qubit Pauli observables. Our protocols for these tasks use two-copy measurements, which is necessary: sample-efficient schemes are provably impossible using only single-copy measurements. Finally, we give a shadow tomography protocol that compresses an $n$-qubit quantum state into a $\poly(n)$-sized classical representation, from which one can extract the expected value of any of the $4^n$ Pauli observables in $\poly(n)$ time, up to a small constant error.
View details
AI-assisted Academic Writing
Malcolm Kane
Ian Lang
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, Association for Computational Linguistics (2025), pp. 31-45
Preview abstract
We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user's current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs.
View details
Preview abstract
Initially conceived as a way to explain memory sharing in romantic couples, the concept of transactive memory systems (TMS) has been adopted by organizational psychology, information management, and other fields of study to examine team performance in corporate settings. While findings highlight a clear advantage for humans teams with TMS, it's not evident if AI-human teams could also develop such a psychological dynamic. This paper considers AI-human interaction through the lens of TMS and identifies potential opportunities for improvement in this area.
View details
I know what I don't know: improving model cascades through confidence tuning
Stephan Rabanser
Nathalie Rauschmayr
Petra Poklukar
Congchao Wang
2025
Preview abstract
Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluated our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
View details
Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
Carlos Tejeda-Ocampo
Toni Hirvonen
Ema Souza-Blanes
Mahmoud Namazi
AES 158th Convention of the Audio Engineering Society (2025)
Preview abstract
Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate bitrate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently
proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate when using underlying legacy audio coding instances.
View details
Preview abstract
Invisible labor is work that is either not fully visible or not appropriately compensated. In open source software (OSS) ecosystems, essential tasks that do not involve code (like content moderation) often become invisible to the detriment of individuals and organizations. However, invisible labor is sufficiently difficult to measure that we do not know how much of OSS activities are invisible. Our study addresses this challenge, demonstrating that roughly half of OSS work is invisible. We do this by developing a cognitive anchoring survey technique that measures OSS developer self-assessments of labor visibility and attribution. Survey respondents (n=142) reported that their work is more likely to be invisible (2 in 3 tasks) than visible, and that half (50.1%) is uncompensated. Priming participants with the idea of visibility caused participants to think their work was more visible, and that visibility was less important, than those primed with invisibility. We also found evidence that tensions between attribution motivations probably increase how common invisible labor is. This suggests that advertising OSS activities as "open" may lead contributors to overestimate how visible their labor actually is. Our findings suggest benefits to working with varied stakeholders to make select, collectively valued activities visible, and increasing compensation in valued forms (like attribution, opportunities, or pay) when possible. This could improve fairness in software development while providing greater transparency into work designs that help organizations and communities achieve their goals.
View details
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
Mihir Parmar
Chitta Baral
Mingyang Ling
2025
Preview abstract
Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce Plan-Tuning, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average ~7%. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average ~10% and ~12% performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that Plan-Tuning is an effective strategy for improving task-specific performance of smaller LLMs.
View details
LLM-based Lossless Text Simplification and its Effect on User Comprehension and Mental Load
Theo Guidroz
Diego Ardila
Jimmy Li
Adam Mansour
Paul Jhun
Nina Gonzalez
Xiang Ji
Mike Sanchez
Sujay Kakarmath
Miguel Ángel Garrido
Faruk Ahmed
Divyansh Choudhary
Jay Hartford
Georgina Xu
Henry Serrano
Yifan Wang
Jeff Shaffer
Eric (Yifan) Cao
Sho Fujiwara
Peggy Bui
arXiv (2025)
Preview abstract
Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.
View details