-
Notifications
You must be signed in to change notification settings - Fork 388
Description
π Feature
Implement Phantom Clipping β an efficient approximation of per-sample gradient clipping β in Opacus to improve fine-tuning performance for large Transformer models under differential privacy.
Motivation
Currently, Opacus computes exact per-sample gradient norms, which introduces substantial computational overhead and memory usage when fine-tuning large-scale models (e.g., BERT, GPT, ViT).
This limits the practicality of differentially private fine-tuning, especially on consumer hardware.
Phantom Clipping (Ding et al., 2024)
provides an efficient alternative that estimates gradient norms using shared intermediate representations, achieving nearly identical privacy guarantees with improved efficiency and accuracy.
Pitch
Add a PhantomClippingOptimizer or flag in Opacus (e.g., use_phantom_clipping=True) that:
Approximates per-sample gradient norms using precomputed feature statistics.
Reduces memory usage and compute overhead during fine-tuning.
Maintains DP-SGD compatibility and privacy accounting.
This would allow Opacus users to fine-tune large models with practical batch sizes and stable convergence under strong privacy budgets (Ξ΅ β 3β6).
Alternatives
Using gradient accumulation or micro-batching β increases training time drastically.
Low-rank gradient approximations (e.g., LSG) β helpful but less general than Phantom Clipping.
Additional context
Phantom Clipping has shown up to 40% faster training and ~2β5% accuracy improvement on DP Transformers compared to standard DP-SGD.
It has been implemented in research prototypes but not yet supported in open-source DP frameworks.