Skip to content

Phantom Clipping #802

@prithagupta

Description

@prithagupta

πŸš€ Feature

Implement Phantom Clipping β€” an efficient approximation of per-sample gradient clipping β€” in Opacus to improve fine-tuning performance for large Transformer models under differential privacy.

Motivation

Currently, Opacus computes exact per-sample gradient norms, which introduces substantial computational overhead and memory usage when fine-tuning large-scale models (e.g., BERT, GPT, ViT).
This limits the practicality of differentially private fine-tuning, especially on consumer hardware.
Phantom Clipping (Ding et al., 2024)
provides an efficient alternative that estimates gradient norms using shared intermediate representations, achieving nearly identical privacy guarantees with improved efficiency and accuracy.

Pitch

Add a PhantomClippingOptimizer or flag in Opacus (e.g., use_phantom_clipping=True) that:

Approximates per-sample gradient norms using precomputed feature statistics.

Reduces memory usage and compute overhead during fine-tuning.

Maintains DP-SGD compatibility and privacy accounting.

This would allow Opacus users to fine-tune large models with practical batch sizes and stable convergence under strong privacy budgets (Ξ΅ β‰ˆ 3–6).

Alternatives

Using gradient accumulation or micro-batching β€” increases training time drastically.

Low-rank gradient approximations (e.g., LSG) β€” helpful but less general than Phantom Clipping.

Additional context

Phantom Clipping has shown up to 40% faster training and ~2–5% accuracy improvement on DP Transformers compared to standard DP-SGD.
It has been implemented in research prototypes but not yet supported in open-source DP frameworks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions