SPRINT: Scalable, Secure & Differentially Private Inference for Transformers

About this project

SPRINT is a scalable framework for differentially private (DP) fine-tuning and inference via multiparty computation (MPC) of transformer-based models. SPRINT is built on top of PyTorch, Opacus for DP fine-tuning and CrypTen for MPC inference.

Repository for the paper "SPRINT: Scalable, Secure & Private Inference for Transformers".

If you find this code useful in your research, please cite our paper:

@inproceedings{capano2026sprint,
  title     = {{SPRINT: Scalable Secure \& Differentially Private Inference for Transformers}},
  author    = {Capano, Francesco and Böhler, Jonas and Weggenmann, Benjamin},
  booktitle = {{Proceedings on Privacy Enhancing Technologies (PoPETs)}},
  volume    = {2026},
  issue     = {1},
  year      = {2026}
}

Abstract

Machine learning as a service (MLaaS) enables deploying models for inference on cloud servers, offering scalable infrastructure and resource management. However, MLaaS exposes user queries and model parameters to servers. To guarantee confidentiality of queries and model parameters, multi-party computation (MPC) ensures secure inference by distributing data and computations across multiple service providers. MPC eliminates single points of failure, mitigates provider breaches and ensures confidentiality beyond legal agreements. Additionally, models can memorize and leak training data. To mitigate privacy concerns, differential privacy (DP) provides a formal privacy guarantee for training data, which can satisfied by injecting carefully calibrated noise into gradients during training. However, naive combinations of DP and MPC amplify accuracy loss due to DP noise and MPC approximations, and incur high computational and communication overhead due to cryptographic operations.

We present SPRINT, the first scalable solution for efficient MPC inference on DP fine-tuned models with high accuracy. SPRINT fine-tunes public pre-trained models on private data using DP, and integrates DP-specific optimizations, e.g., parameter-efficient fine-tuning and noise-aware optimizers, with MPC optimizations, e.g., cleartext public parameters and efficient approximations of non-linear functions. We evaluate SPRINT on the GLUE benchmark with RoBERTa, achieving 1.6× faster MPC than the state-of-the-art non-DP solution (SHAFT). Notably, SPRINT maintains high accuracy during MPC inference, with $<1$ percentage point gap compared to its cleartext accuracy.

Repository Structure

The repository is organized as follows:

src/: Contains the source code for the project.
- run_dp_finetuning.py: Script for fine-tuning models with differential privacy.
- run_inference.py: Script for model inference (cleartext and MPC).
- sprint_core/: Core modular components for SPRINT experiments.
  - constants.py: Constants and path management (Needs to be modified if custom paths are used).
  - config_manager.py: Configuration management and validation.
  - model_factory.py: Model creation and LoRA integration.
  - data_loaders.py: Data loading and tokenization utilities.
  - training_manager.py: Training orchestration and DP integration.
  - inference_manager.py: Inference execution and overflow handling.
  - experiment_runner.py: End-to-end experiment orchestration.
  - multiprocess_launcher.py: Multi-process execution for MPC.
  - constants.py: Constants and path management.
- configs/: Contains YAML configuration files for different experimental settings.
  - crypten_inference_config.yaml: Configuration file for CrypTen.
  - aws_inference_config.yaml: Example of configuration file for AWS inference experiments.
    - fine-tuning_example_cpu.yaml: Example of configuration file for DP fine-tuning on CPU.
    - fine-tuning_example_cuda.yaml: Example of configuration file for DP fine-tuning on GPU with CUDA.
    - inference_example.yaml: Example of configuration file for inference (cleartext and MPC).
- tokenize_dataset.py: Script for downloading and tokenizing datasets.
- modeling/: Contains the CrypTen modeling of BERT and RoBERTa.
  - models/: Model implementations (clear and encrypted versions).
  - lora/: LoRA (Low-Rank Adaptation) implementation and utilities.
  - activations/: Custom activation functions for MPC.
  - optimizers/: DP-specific optimizers (e.g., DP-AdamBC).
- aws/: Contains the refactored scripts for running MPC experiments on AWS.
- aws_launcher.py: Modified AWS instance launcher from CrypTen.
- aws_mpc_inference.sh: Script for running MPC inference on AWS.
data/: Contains the datasets and models.
- models/: Contains the fine-tuned models.
- finetuning/: Contains the results of the fine-tuning experiments (for each dataset).
- inference/: Contains the results of the MPC inference.
  - accuracy/: Inference accuracy results (for each dataset, encrypted and not-encrypted inference).
  - runtime/: Runtime and communication profiling results (from aws experiments)
requirements.txt: Python dependencies for the project.
setup.sh: Setup script for installing dependencies and setting up the environment.
ARTIFACT_APPENDIX.md: Documentation for artifact evaluation.

Requirements and Setup

Python Version: Tested with Python 3.9.23
Hardware: All experiments can be run on CPU, but GPU with CUDA support is recommended for larger models and datasets
Operating System: Tested on macOS and debian-based Linux distributions (e.g., Ubuntu) with apt package manager.

Installation

Clone the repository:

   git clone https://github.com/SAP/sprint.git
   cd sprint

Make setup script executable and run it:

It is possible to run the setup script installing python either system-wide and creating a virtual environment, or installing python via conda (using the --conda flag).

   chmod +x setup.sh
   ./setup.sh --conda  # for conda installation (otherwise, omit the --conda flag)

Activate the conda environment:

   conda activate sprint

or the virtual environment:

   source sprint_env/bin/activate

Set environment variable for sprint path (in the following, the command in run in the root of the cloned repo):

   export SPRINT_PATH=$(pwd)

Alternative Manual Setup (with system-wide python installation and virtual environment)

After cloning the repository, you can follow the manual setup instructions below:

Install python version 3.9 (e.g. in linux via apt)

   sudo apt-get install python3.9 python3.9-venv python3.9-dev

This code may fail since latest os versions do not have python3.9 available in default repositiories (e.g., in Ubuntu22.04). IN that case you may need to run the following commands:

   sudo apt-get update
   sudo apt-get install -y software-properties-common
   sudo add-apt-repository ppa:deadsnakes/ppa
   sudo apt-get update
   sudo apt-get install python3.9 python3.9-venv python3.9-dev

Setup and activate a virtual environment

   python3.9 -m venv sprint_env
   source sprint_env/bin/activate

Install dependencies:

   SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True pip install -r requirements.txt

NOTE: SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True since CrypTen requirements include sklearn. Alternatively, you can download CrypTen from source and modify the requirements file (i.e., replacing sklearn with scikit-learn).

Modify autograd_grad_sample.py in the private_transformers library:
- The expected path with the virtual environment is sprint_env/lib/python3.9/site-packages/private_transformers/ (it may vary depending on the OS and python version).
- You need to add register_full_backward_hook in line 97 of autograd_grad_sample.py (instead of register_backward_hook, which does not support layers with multiple autograd nodes like LoRALayers). The modified line changes from handles.append(layer.register_backward_hook(this_backward)) to handles.append(layer.register_full_backward_hook(this_backward)).
Set environment variable for sprint path (in the following, the command in run in the root of the cloned repo):

   export SPRINT_PATH=$(pwd)

Alternative Manual Setup (with conda)

Install conda (if not already installed). You can use Miniconda for a minimal installation.
Create and activate a conda environment with python 3.9:

   conda create -n sprint python=3.9
   conda activate sprint

Install dependencies:

   SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True pip install -r requirements.txt

NOTE: SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True since CrypTen requirements include sklearn. Alternatively, you can download CrypTen from source and modify the requirements file (i.e., replacing sklearn with scikit-learn).

Modify autograd_grad_sample.py in the private_transformers library:
- The expected path with the virtual environment is sprint_env/lib/python3.9/site-packages/private_transformers/ (it may vary depending on the OS and python version).
- You need to add register_full_backward_hook in line 97 of autograd_grad_sample.py (instead of register_backward_hook, which does not support layers with multiple autograd nodes like LoRALayers). The modified line changes from handles.append(layer.register_backward_hook(this_backward)) to handles.append(layer.register_full_backward_hook(this_backward)).
Set environment variable for sprint path (in the following, the command in run in the root of the cloned repo):

   export SPRINT_PATH=$(pwd)

Run DP Fine-tuning

Download and tokenize the dataset (for example sst2):

    cd src
    python tokenize_dataset.py --dataset sst2 --model_type roberta

This will download the dataset and save it in the data folder. The tokenized dataset will be saved in $SPRINT_PATH/data/tokenized_dataset/roberta/sst2/.

Configure the fine-tuning parameters in $SPRINT_PATH/src/config/fine-tuning_example_cpu.yaml (or create your own config file).
Run the fine-tuning script (using the absolute path to the config file):

   cd src
   python run_dp_finetuning.py --config fine-tuning_example_cpu.yaml

The config file $SPRINT_PATH/src/configs/fine-tuning_example_cpu.yaml uses the CPU for fine-tuning. If you have a GPU with CUDA support, you can use the config file $SPRINT_PATH/src/configs/fine-tuning_example_cuda.yaml (or create your own config file).

The fine-tuned model will be saved in the $SPRINT_PATH/data/models/ folder. The results (loss, validation accuracy) will be saved in the $SPRINT_PATH/data/finetuning/ folder.

Expected runtime: ~6 hours on NVIDIA A10G (g5.xlarge AWS instance) for RoBERTa-base on SST-2 dataset. For larger datasets (e.g., MNLI) the runtime can be more than 1 day. On CPU (e.g., c6.xlarge AWS instance) the runtime is ~2x longer.

Run Inference

The inference can run in cleartext or in MPC. The cleartext mode is used for local evaluation and debugging, while the MPC mode is used for secure inference. The MPC inference can be run locally on two different processes (to evaluate MPC accuracy) or on AWS with multiple machines (to evaluate the communication and overhead).

During inference, there is the possibility to apply logits capping (if not applied during fine-tuning), or to apply a different capping threshold. The capping threshold is set in the $SPRINT_PATH/src/configs/config_cuda0.yaml file during fine-tuning (default value is 50.0).

The output of the inference process can be saved in a log file via >> $log_file (e.g. log.txt).

Run local inference (cleartext or in MPC)

cd src
python run_inference.py --config inference_example.yaml --crypten_config crypten_inference_config.yaml

The config file $SPRINT_PATH/src/configs/inference_example.yaml contains the parameters for the inference (dataset, model, batch size, etc.). The config file $SPRINT_PATH/src/configs/crypten_inference_config.yaml contains the parameters for CrypTen.

In the inference config file, the model_name can be either a model saved after fine-tuning (in $SPRINT_PATH/data/models/) or a pre-trained model from the HuggingFace model hub (e.g. roberta-base). In the latter case, the model will be loaded from the model hub and used for inference without any fine-tuning, to test the runtime and communication overhead of the MPC inference.

Expected runtime (MPC inference with CrypTen): ~15 minutes on NVIDIA A10G (g5.xlarge AWS instance) for RoBERTa-base on SST-2 dataset in MPC. Batched inference is ~2x faster than non-batched inference. On CPU (e.g., c6.xlarge AWS instance) the runtime is ~1.5x longer.

Expected runtime (cleartext inference with PyTorch): ~1 minute on NVIDIA A10G (g5.xlarge AWS instance) for RoBERTa-base on SST-2 dataset. On CPU (e.g., c6.xlarge AWS instance) the runtime is ~2x longer.

AWS evaluation for runtime and communication overhead

The inference on different AWS machines can be run with the script in $SPRINT_PATH/src/aws/ folder via:

cd $SPRINT_PATH/src/aws/
./aws_mpc_inference.sh

This bash scripts runs on toy data. The script used for inference is the same as for local inference, with a different config files. For the example we use $SPRINT_PATH/src/configs/aws_inference_config.yaml. The data will be saved on the aws machine in the aws-launcher-tmp folder. The AWS machines need to be configured with the same environment as the local machine.

Expected runtime (MPC inference 2 parties): For RoBERTa-base, the expected runtime varies from 32 seconds in a LAN to 10 minutes in a WAN (e.g. Europe-US) on CPU (e.g. c6.xlarge AWS instance). GPU inference runtime varies from 20 seconds in a LAN to 10 minutes in a WAN (e.g. Europe-US) on NVIDIA A10G (on g5.xlarge AWS instance). Batched inference on GPU (with batch size equal to 32) is up to 1.7x faster than non-batched inference in a LAN and up to 1.9x faster in a WAN.

Third-parties components

The code for DP-AdamBC in $SPRINT_PATH/src/modeling/optimizers has been adapted from https://github.com/ubc-systopia/DP-AdamBC.

The modeling for BERT and RoBERTA model ,in $SPRINT_PATH/src/modeling/models folder, has been adapted for SPRINT fine-tuning and MPC inference from the Transformer library (https://github.com/huggingface/transformers).

The code for LoRA has been adapted from LoRA repository (https://github.com/microsoft/LoRA).

The launchers for MPC inference on AWS in the $SPRINT_PATH/src/aws folder have been adapted from the CrypTen repository (https://github.com/facebookresearch/CrypTen).

Security/Privacy Issues and Ethical Concerns

This artifact does not intentionally disable security mechanisms or run vulnerable code. However, it relies on research-grade libraries for secure and private machine learning, which have the following caveats:

private-transformers: The codebase is not production-grade. For example, cryptographically secure PRNGs are recommended for sampling noise in differential privacy, but the current implementation uses standard PRNGs for performance reasons.
CrypTen: This library is intended for research and is not production-ready. It may lack some security hardening and should not be used for sensitive deployments.
General: The artifact does not collect, store, or process real user data. All experiments use public datasets or synthetic data. No user study or personal data is included.

Ethical Review: No user study or human subject data is included, so no IRB process was required.

Recommendation: Users should not deploy this code in production or on sensitive data without further security review and hardening.

Notes on Reusability

The scope of this repository is to create a general framework that integrates differential privacy (DP) fine-tuning and multi-party computation (MPC) inference for transformer-based models. The overall goal is not only to reproduce our research results but to foster future research and development in privacy-preserving machine learning by providing a modular, extensible foundation.

Here we list some examples of how this artifact can be adapted and extended:

Different Models: The modular architecture supports various transformer architectures beyond RoBERTa and BERT, including newer models like DeBERTa, GPT-like models, or custom transformer variants. This requires adding modeling files for both cleartext and MPC variants in $SPRINT_PATH/src/modeling/models.
Novel Datasets: The data loading framework can be extended to handle additional NLP tasks beyond GLUE benchmark tasks, including custom datasets for domain-specific applications.
Non-linear Function Approximations: Researchers can experiment with different MPC-friendly approximations for activation functions (e.g., polynomial approximations for GELU, ReLU variants) by adding new activation modules to $SPRINT_PATH/src/modeling/activations.
DP Techniques: The framework supports experimentation with different noise mechanisms, clipping strategies, and privacy accounting methods beyond the current DP-SGD implementation, thanks to the integration with Opacus, by changing accounting or noise type configurations.

The modular design enables researchers to replace individual components (optimizers, activation functions, privacy mechanisms) without modifying the entire pipeline, facilitating systematic evaluation of privacy-utility trade-offs in secure transformer inference.

Support, Feedback, Contributing

This project is open to feature requests/suggestions, bug reports etc. via GitHub issues. Contribution and feedback are encouraged and always welcome. For more information about how to contribute, the project structure, as well as additional contribution information, see our Contribution Guidelines.

Security / Disclosure

If you find any bug that may be a security problem, please follow our instructions at in our security policy on how to report it. Please do not create GitHub issues for security-related doubts or problems.

Code of Conduct

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone. By participating in this project, you agree to abide by its Code of Conduct at all times.

Licensing

Copyright 2025 SAP SE or an SAP affiliate company and sprint contributors. Please see our LICENSE for copyright and license information. Detailed information including third-party components and their licensing/copyright information is available via the REUSE tool.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSES		LICENSES
src		src
.gitignore		.gitignore
ARTIFACT-APPENDIX.md		ARTIFACT-APPENDIX.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
REUSE.toml		REUSE.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPRINT: Scalable, Secure & Differentially Private Inference for Transformers

About this project

Abstract

Repository Structure

Requirements and Setup

Installation

Alternative Manual Setup (with system-wide python installation and virtual environment)

Alternative Manual Setup (with conda)

Run DP Fine-tuning

Run Inference

Run local inference (cleartext or in MPC)

AWS evaluation for runtime and communication overhead

Third-parties components

Security/Privacy Issues and Ethical Concerns

Notes on Reusability

Support, Feedback, Contributing

Security / Disclosure

Code of Conduct

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

SAP/sprint

Folders and files

Latest commit

History

Repository files navigation

SPRINT: Scalable, Secure & Differentially Private Inference for Transformers

About this project

Abstract

Repository Structure

Requirements and Setup

Installation

Alternative Manual Setup (with system-wide python installation and virtual environment)

Alternative Manual Setup (with conda)

Run DP Fine-tuning

Run Inference

Run local inference (cleartext or in MPC)

AWS evaluation for runtime and communication overhead

Third-parties components

Security/Privacy Issues and Ethical Concerns

Notes on Reusability

Support, Feedback, Contributing

Security / Disclosure

Code of Conduct

Licensing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages