Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10822 publications
    mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
    Chenming He
    Yanyong Zhang
    Kai Wang
    Dequan Wang
    Lingyu Wang
    the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
    Preview abstract Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises. Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%. View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    A Novel CI Coding Strategy Based on a Cochlear Model and Deep Neural Network
    Maryam Hosseini
    Tim Brochier
    Zachary Smith
    Brett Swanson
    Andrew Vandali
    Alan Kan
    Fadwa Alnafjan
    Kat Fernandez
    Conference on Implantable Auditory Prostheses 2025
    Preview abstract Objective: Many CI recipients face difficulties in understanding speech in noisy environments and express frustration with the quality of music. This may be partly due to the simple filter banks used in current CI technology, which do not fully replicate the natural processes of the cochlea. This project aims to improve CI perception by more accurately mimicking the responses of the auditory nerve. Method: Audio signals were applied to CARFAC (Cascade of Asymmetric Resonators with Fast-Acting Compression) [1] to produce a representation of the auditory nerve response, known as a normal hearing (NH) “neurogram”. The NH neurogram was down-sampled and applied to a deep neural network (DNN) to produce 22 electrode stimulation currents. These currents were applied to an electrical hearing (EH) model incorporating current spread, neural adaptation, and refractoriness, to produce a CI neurogram. The DNN was trained on sentences from the TIMIT database to minimise the difference between the NH and CI neurograms. Results: The CI neurograms produced by the CARFAC-DNN strategy were more similar to the NH neurograms than the CI neurograms produced by the Nucleus ACE strategy. Similarity was quantified by the structural similarity index and mean squared error. Conclusions: The CARFAC-DNN strategy may provide a more natural auditory nerve response than traditional CI sound coding strategies. A sound-booth study with CI recipients is planned. This work was funded by Google through the Australian Future Hearing Initiative. References: [1]  Lyon, R. F. (2017). Human and machine hearing. Cambridge University Press. View details
    Visualizing dynamics of charges and strings in (2 + 1)D lattice gauge theories
    Tyler Cochran
    Bernhard Jobst
    Yuri Lensky
    Gaurav Gyawali
    Norhan Eassa
    Melissa Will
    Aaron Szasz
    Dmitry Abanin
    Rajeev Acharya
    Laleh Beni
    Trond Andersen
    Markus Ansmann
    Frank Arute
    Kunal Arya
    Abe Asfaw
    Juan Atalaya
    Brian Ballard
    Alexandre Bourassa
    Michael Broughton
    David Browne
    Brett Buchea
    Bob Buckley
    Tim Burger
    Nicholas Bushnell
    Anthony Cabrera
    Juan Campero
    Hung-Shen Chang
    Jimmy Chen
    Benjamin Chiaro
    Jahan Claes
    Agnetta Cleland
    Josh Cogan
    Roberto Collins
    Paul Conner
    William Courtney
    Alex Crook
    Ben Curtin
    Sayan Das
    Laura De Lorenzo
    Paul Donohoe
    ILYA Drozdov
    Andrew Dunsworth
    Alec Eickbusch
    Aviv Elbag
    Mahmoud Elzouka
    Vinicius Ferreira
    Ebrahim Forati
    Austin Fowler
    Brooks Foxen
    Suhas Ganjam
    Robert Gasca
    Élie Genois
    William Giang
    Dar Gilboa
    Raja Gosula
    Alejo Grajales Dau
    Dietrich Graumann
    Alex Greene
    Steve Habegger
    Monica Hansen
    Sean Harrington
    Paula Heu
    Oscar Higgott
    Jeremy Hilton
    Robert Huang
    Ashley Huff
    Bill Huggins
    Cody Jones
    Chaitali Joshi
    Pavol Juhas
    Hui Kang
    Amir Karamlou
    Kostyantyn Kechedzhi
    Trupti Khaire
    Bryce Kobrin
    Alexander Korotkov
    Fedor Kostritsa
    John Mark Kreikebaum
    Vlad Kurilovich
    Dave Landhuis
    Tiano Lange-Dei
    Brandon Langley
    Kim Ming Lau
    Justin Ledford
    Kenny Lee
    Loick Le Guevel
    Wing Li
    Alexander Lill
    Will Livingston
    Aditya Locharla
    Daniel Lundahl
    Aaron Lunt
    Sid Madhuk
    Ashley Maloney
    Salvatore Mandra
    Leigh Martin
    Orion Martin
    Cameron Maxfield
    Seneca Meeks
    Anthony Megrant
    Reza Molavi
    Sebastian Molina
    Shirin Montazeri
    Ramis Movassagh
    Charles Neill
    Michael Newman
    Murray Ich Nguyen
    Chia Ni
    Kris Ottosson
    Alex Pizzuto
    Rebecca Potter
    Orion Pritchard
    Ganesh Ramachandran
    Matt Reagor
    David Rhodes
    Gabrielle Roberts
    Kannan Sankaragomathi
    Henry Schurkus
    Mike Shearn
    Aaron Shorter
    Vladimir Shvarts
    Vlad Sivak
    Spencer Small
    Clarke Smith
    Sofia Springer
    George Sterling
    Jordan Suchard
    Alex Sztein
    Doug Thor
    Mert Torunbalci
    Abeer Vaishnav
    Justin Vargas
    Sergey Vdovichev
    Guifre Vidal
    Steven Waltman
    Shannon Wang
    Brayden Ware
    Kristi Wong
    Cheng Xing
    Jamie Yao
    Ping Yeh
    Bicheng Ying
    Juhwan Yoo
    Grayson Young
    Yaxing Zhang
    Ningfeng Zhu
    Yu Chen
    Vadim Smelyanskiy
    Adam Gammon-Smith
    Frank Pollmann
    Michael Knap
    Nature, 642 (2025), 315–320
    Preview abstract Lattice gauge theories (LGTs) can be used to understand a wide range of phenomena, from elementary particle scattering in high-energy physics to effective descriptions of many-body interactions in materials. Studying dynamical properties of emergent phases can be challenging, as it requires solving many-body problems that are generally beyond perturbative limits. Here we investigate the dynamics of local excitations in a LGT using a two-dimensional lattice of superconducting qubits. We first construct a simple variational circuit that prepares low-energy states that have a large overlap with the ground state; then we create charge excitations with local gates and simulate their quantum dynamics by means of a discretized time evolution. As the electric field coupling constant is increased, our measurements show signatures of transitioning from deconfined to confined dynamics. For confined excitations, the electric field induces a tension in the string connecting them. Our method allows us to experimentally image string dynamics in a (2+1)D LGT, from which we uncover two distinct regimes inside the confining phase: for weak confinement, the string fluctuates strongly in the transverse direction, whereas for strong confinement, transverse fluctuations are effectively frozen. We also demonstrate a resonance condition at which dynamical string breaking is facilitated. Our LGT implementation on a quantum processor presents a new set of techniques for investigating emergent excitations and string dynamics. View details
    Preview abstract Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution. View details
    RemapRoute: Local Remapping of Internet Path Changes
    renata cruz teixeira
    Christophe Diot
    italo cunha
    Elverton Fazzion
    Darryl Veitch
    2025
    Preview abstract Several systems rely on traceroute to track a large number of Internet paths as they change over time. Monitoring systems perform this task by remapping paths periodically or whenever a change is detected. This paper shows that such complete remapping is inefficient, because most path changes are localized to a few hops of a path. We develop RemapRoute, a tool to remap a path locally given the previously known path and a change point. RemapRoute sends targeted probes to locate and remap the often few hops that have changed. Our evaluation with trace-driven simulations and in a real deployment shows that local remapping reduces the average number of probes issued during remapping by 63% and 79%, respectively, when compared with complete remapping. At the same time, our results show that local remapping has little impact on the accuracy of inferred paths. View details
    Triply efficient shadow tomography
    Robbie King
    David Gosset
    PRX Quantum, 6 (2025), pp. 010336
    Preview abstract Given copies of a quantum state $\rho$, a shadow tomography protocol aims to learn all expectation values from a fixed set of observables, to within a given precision $\epsilon$. We say that a shadow tomography protocol is \textit{triply efficient} if it is sample- and time-efficient, and only employs measurements that entangle a constant number of copies of $\rho$ at a time. The classical shadows protocol based on random single-copy measurements is triply efficient for the set of local Pauli observables. This and other protocols based on random single-copy Clifford measurements can be understood as arising from fractional colorings of a graph $G$ that encodes the commutation structure of the set of observables. Here we describe a framework for two-copy shadow tomography that uses an initial round of Bell measurements to reduce to a fractional coloring problem in an induced subgraph of $G$ with bounded clique number. This coloring problem can be addressed using techniques from graph theory known as \textit{chi-boundedness}. Using this framework we give the first triply efficient shadow tomography scheme for the set of local fermionic observables, which arise in a broad class of interacting fermionic systems in physics and chemistry. We also give a triply efficient scheme for the set of all $n$-qubit Pauli observables. Our protocols for these tasks use two-copy measurements, which is necessary: sample-efficient schemes are provably impossible using only single-copy measurements. Finally, we give a shadow tomography protocol that compresses an $n$-qubit quantum state into a $\poly(n)$-sized classical representation, from which one can extract the expected value of any of the $4^n$ Pauli observables in $\poly(n)$ time, up to a small constant error. View details
    AI-assisted Academic Writing
    Malcolm Kane
    Ian Lang
    Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, Association for Computational Linguistics (2025), pp. 31-45
    Preview abstract We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user's current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs. View details
    Preview abstract Initially conceived as a way to explain memory sharing in romantic couples, the concept of transactive memory systems (TMS) has been adopted by organizational psychology, information management, and other fields of study to examine team performance in corporate settings. While findings highlight a clear advantage for humans teams with TMS, it's not evident if AI-human teams could also develop such a psychological dynamic. This paper considers AI-human interaction through the lens of TMS and identifies potential opportunities for improvement in this area. View details
    Preview abstract Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluated our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance. View details
    Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
    Carlos Tejeda-Ocampo
    Toni Hirvonen
    Ema Souza-Blanes
    Mahmoud Namazi
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate bitrate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate when using underlying legacy audio coding instances. View details
    Preview abstract Invisible labor is work that is either not fully visible or not appropriately compensated. In open source software (OSS) ecosystems, essential tasks that do not involve code (like content moderation) often become invisible to the detriment of individuals and organizations. However, invisible labor is sufficiently difficult to measure that we do not know how much of OSS activities are invisible. Our study addresses this challenge, demonstrating that roughly half of OSS work is invisible. We do this by developing a cognitive anchoring survey technique that measures OSS developer self-assessments of labor visibility and attribution. Survey respondents (n=142) reported that their work is more likely to be invisible (2 in 3 tasks) than visible, and that half (50.1%) is uncompensated. Priming participants with the idea of visibility caused participants to think their work was more visible, and that visibility was less important, than those primed with invisibility. We also found evidence that tensions between attribution motivations probably increase how common invisible labor is. This suggests that advertising OSS activities as "open" may lead contributors to overestimate how visible their labor actually is. Our findings suggest benefits to working with varied stakeholders to make select, collectively valued activities visible, and increasing compensation in valued forms (like attribution, opportunities, or pay) when possible. This could improve fairness in software development while providing greater transparency into work designs that help organizations and communities achieve their goals. View details
    Preview abstract Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce Plan-Tuning, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average ~7%. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average ~10% and ~12% performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that Plan-Tuning is an effective strategy for improving task-specific performance of smaller LLMs. View details
    LLM-based Lossless Text Simplification and its Effect on User Comprehension and Mental Load
    Theo Guidroz
    Diego Ardila
    Jimmy Li
    Adam Mansour
    Paul Jhun
    Nina Gonzalez
    Xiang Ji
    Mike Sanchez
    Sujay Kakarmath
    Miguel Ángel Garrido
    Faruk Ahmed
    Divyansh Choudhary
    Jay Hartford
    Georgina Xu
    Henry Serrano
    Yifan Wang
    Jeff Shaffer
    Eric (Yifan) Cao
    Sho Fujiwara
    Peggy Bui
    arXiv (2025)
    Preview abstract Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility. View details