Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
188bb42
add state_dict to privacy accountant
karthikprasad May 17, 2022
fe891c9
minor fixes and docstrings in accountant
karthikprasad May 17, 2022
6083f99
some more minor fixes in accountant
karthikprasad May 17, 2022
26348bc
add state dict support for GradSampleModule and save/load checkpoint …
karthikprasad May 17, 2022
01c5c4c
import typevar
karthikprasad May 17, 2022
13c272e
accountant unit test
karthikprasad May 18, 2022
f253a5e
lint fix in test
karthikprasad May 18, 2022
6d11754
fix typo
karthikprasad May 18, 2022
29e2c28
fix var name in test
karthikprasad May 18, 2022
66e8094
fix num steps in test
karthikprasad May 18, 2022
5887f78
fix lint again
karthikprasad May 18, 2022
64a1632
add-ons to GradSampleModule state_dict
karthikprasad May 22, 2022
345d1d7
fixes to GS and test
karthikprasad May 22, 2022
ec2fd92
test privacy engine checkpointing
karthikprasad May 23, 2022
eb3224b
remove debug comments
karthikprasad May 23, 2022
b6d5a86
fix lint
karthikprasad May 23, 2022
29ff3a8
fix lint again
karthikprasad May 23, 2022
f69819b
Minor fixex in FAQ (#430)
Kevin-Abd May 20, 2022
86a8e0d
disable poisson sampling in checkpoints test
karthikprasad May 23, 2022
d3591b0
rebase
karthikprasad May 23, 2022
8ad0545
fix sort order
karthikprasad May 23, 2022
f14e810
fix black
karthikprasad May 23, 2022
2416bb1
some more lints
karthikprasad May 23, 2022
051bc0c
address comments
karthikprasad May 25, 2022
e655df4
address comments
karthikprasad May 31, 2022
ba29092
Merge branch 'main' into master
karthikprasad May 31, 2022
33ee300
fix flake lint
karthikprasad May 31, 2022
1f9297a
Merge branch 'master' of https://github.com/karthikprasad/opacus
karthikprasad May 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
### Bug fixes
* Fix accountant when using number of steps instead of epochs
* Add params check when converting BatchNorm to GroupNorm (#390)
* Fix typo in gdp accountant mechansim name (#386)
* Fix typo in gdp accountant mechanism name (#386)
* Fix linter errors (#392)
* Add friendly and detailed message for unsupported layers (#401)
* Run linter on nightly workflow (#399)
Expand Down
4 changes: 2 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ This statement extends to all downstream uses of this model: its inferences, fin

From the expression above it is obvious that epsilon and delta play different roles: epsilon controls the multiplicative increase in the baseline probability while delta lifts all probabilities by the same amount. For instance, if your baseline scenario (the model trained on *D*′, without your data) assigns 0 probability to some event, the bound on observing this event on *D* (that includes your data) is delta. Because of that, we’d like to target epsilon to be a small constant and select delta to be tiny. A rule of thumb is to set delta to be less than the inverse of the size of the training dataset.

Epsilon and delta are computed *ex post*, following an optimizer run. In fact, for each delta there’s some epsilon, depending on that delta, such that the run satisfies (epsilon, delta)-DP. The call `privacy_engine.accountant.get_privacy_spent(delta=delta)` outputs that epsilon in its first return value.
Epsilon and delta are computed *ex post*, following an optimizer run. In fact, for each delta there’s some epsilon, depending on that delta, such that the run satisfies (epsilon, delta)-DP. The call `privacy_engine.get_epsilon(delta=delta)` outputs that epsilon in its first return value.

Importantly, (epsilon, delta)-DP is a *conservative upper bound* on the actual privacy loss. There’s [growing](https://arxiv.org/abs/2006.07709) [evidence](https://arxiv.org/pdf/2006.11601.pdf) that the observable privacy loss of the DP-SGD algorithm can be significantly smaller.

Expand All @@ -110,7 +110,7 @@ Although we report expended privacy budget using the (epsilon, delta) language,

When the privacy engine needs to bound the privacy loss of a training run using (epsilon, delta)-DP for a given delta, it searches for the optimal order from among `alphas`. There’s very little additional cost in expanding the list of orders. We suggest using a list `[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))`. You can pass your own alphas by passing `alphas=custom_alphas` when calling `privacy_engine.make_private_with_epsilon`.

A call to `privacy_engine.accountant.get_privacy_spent(delta=delta)` returns a pair: an epsilon such that the training run satisfies (epsilon, delta)-DP and an optimal order alpha. An easy diagnostic to determine whether the list of `alphas` ought to be expanded is whether the returned value alpha is one of the two boundary values of `alphas`.
A call to `privacy_engine.get_epsilon(delta=delta)` returns a pair: an epsilon such that the training run satisfies (epsilon, delta)-DP and an optimal order alpha. An easy diagnostic to determine whether the list of `alphas` ought to be expanded is whether the returned value alpha is one of the two boundary values of `alphas`.

<!-- ## How do I run Opacus in Colab?

Expand Down
58 changes: 55 additions & 3 deletions opacus/accountants/accountant.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,20 @@
# limitations under the License.

import abc
from typing import Callable
from collections import OrderedDict
from copy import deepcopy
from typing import Any, Callable, Mapping, TypeVar

from opacus.optimizers import DPOptimizer


T_state_dict = TypeVar("T_state_dict", bound=Mapping[str, Any])


class IAccountant(abc.ABC):
@abc.abstractmethod
def __init__(self):
pass
self.history = [] # history of noise multiplier, sample rate, and steps

@abc.abstractmethod
def step(self, *, noise_multiplier: float, sample_rate: float):
Expand Down Expand Up @@ -67,7 +72,7 @@ def get_optimizer_hook_fn(
"""
Returns a callback function which can be used to attach to DPOptimizer
Args:
sample_rate: Expested samping rate used for accounting
sample_rate: Expected samping rate used for accounting
"""

def hook_fn(optim: DPOptimizer):
Expand All @@ -80,3 +85,50 @@ def hook_fn(optim: DPOptimizer):
)

return hook_fn

def state_dict(self, destination: T_state_dict = None) -> T_state_dict:
"""
Retruns a dictionary containing the state of the accountant.
Args:
destination: a mappable object to populate the current state_dict into.
If this arg is None, an OrderedDict is created and populated.
Default: None
"""
if destination is None:
destination = OrderedDict()
destination["history"] = deepcopy(self.history)
destination["mechanism"] = self.__class__.mechanism
return destination

def load_state_dict(self, state_dict: T_state_dict):
"""
Validates the supplied state_dict and populates the current
Privacy Accountant's state dict.

Args:
state_dict: state_dict to load.

Raises:
ValueError if supplied state_dict is invalid and cannot be loaded.
"""
if state_dict is None or len(state_dict) == 0:
raise ValueError(
"state dict is either None or empty and hence cannot be loaded"
" into Privacy Accountant."
)
if "history" not in state_dict.keys():
raise ValueError(
"state_dict does not have the key `history`."
" Cannot be loaded into Privacy Accountant."
)
if "mechanism" not in state_dict.keys():
raise ValueError(
"state_dict does not have the key `mechanism`."
" Cannot be loaded into Privacy Accountant."
)
if self.__class__.mechanism != state_dict["mechanism"]:
raise ValueError(
f"state_dict of {state_dict['mechanism']} cannot be loaded into "
f" Privacy Accountant with mechanism {self.__class__.mechanism}"
)
self.history = state_dict["history"]
2 changes: 1 addition & 1 deletion opacus/accountants/gdp.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def __init__(self):
"GDP accounting is experimental and can underestimate privacy expenditure."
"Proceed with caution. More details: https://arxiv.org/pdf/2106.02848.pdf"
)
self.history = [] # history of noise multiplier, sample rate, and steps
super().__init__()

def step(self, *, noise_multiplier: float, sample_rate: float):
if len(self.history) >= 1:
Expand Down
2 changes: 1 addition & 1 deletion opacus/accountants/rdp.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ class RDPAccountant(IAccountant):
DEFAULT_ALPHAS = [1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))

def __init__(self):
self.history = []
super().__init__()

def step(self, *, noise_multiplier: float, sample_rate: float):
if len(self.history) >= 1:
Expand Down
2 changes: 1 addition & 1 deletion opacus/grad_sample/grad_sample_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def __init__(
``[K, batch_size, ...]``
loss_reduction: Indicates if the loss reduction (for aggregating the gradients)
is a sum or a mean operation. Can take values "sum" or "mean"
strict: If set to ``True``, the input module will be validater to check that
strict: If set to ``True``, the input module will be validated to check that
``GradSampleModule`` has grad sampler functions for all submodules of
the input module (i.e. if it knows how to calculate per sample gradients)
for all model parameters. If set to ``False``, per sample gradients will
Expand Down
4 changes: 2 additions & 2 deletions opacus/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,9 +280,9 @@ def __init__(
self.generator = generator
self.secure_mode = secure_mode

self.param_groups = optimizer.param_groups
self.param_groups = self.original_optimizer.param_groups
self.defaults = self.original_optimizer.defaults
self.state = optimizer.state
self.state = self.original_optimizer.state
self._step_skip_queue = []
self._is_last_step_skipped = False

Expand Down
69 changes: 68 additions & 1 deletion opacus/privacy_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import warnings
from typing import List, Optional, Tuple, Union
from typing import IO, Any, BinaryIO, Dict, List, Optional, Tuple, Union

import torch
from opacus.accountants import create_accountant
Expand All @@ -22,6 +23,7 @@
from opacus.distributed import DifferentiallyPrivateDistributedDataParallel as DPDDP
from opacus.grad_sample.grad_sample_module import GradSampleModule
from opacus.optimizers import DPOptimizer, get_optimizer_class
from opacus.scheduler import _NoiseScheduler
from opacus.validators.module_validator import ModuleValidator
from torch import nn, optim
from torch.nn.parallel import DistributedDataParallel as DDP
Expand Down Expand Up @@ -493,3 +495,68 @@ def get_epsilon(self, delta):
Privacy budget (epsilon) expended so far.
"""
return self.accountant.get_epsilon(delta)

def save_checkpoint(
self,
*,
path: Union[str, os.PathLike, BinaryIO, IO[bytes]],
module: GradSampleModule,
optimizer: Optional[DPOptimizer] = None,
noise_scheduler: Optional[_NoiseScheduler] = None,
checkpoint_dict: Optional[Dict[str, Any]] = None,
module_state_dict_kwargs: Optional[Dict[str, Any]] = None,
torch_save_kwargs: Optional[Dict[str, Any]] = None,
):
"""
Saves the state_dict of module, optimzer, and accountant at path.
Args:
path: Path to save the state dict objects.
module: GradSampleModule to save; wrapped module's state_dict is saved.
optimizer: DPOptimizer to save; wrapped optimizer's state_dict is saved.
module_state_dict_kwargs: dict of kwargs to pass to ``module.state_dict()``
torch_save_kwargs: dict of kwargs to pass to ``torch.save()``

"""
checkpoint_dict = checkpoint_dict or {}
checkpoint_dict["module_state_dict"] = module.state_dict(
**(module_state_dict_kwargs or {})
)
checkpoint_dict["privacy_accountant_state_dict"] = self.accountant.state_dict()
if optimizer is not None:
checkpoint_dict["optimizer_state_dict"] = optimizer.state_dict()
if noise_scheduler is not None:
checkpoint_dict["noise_scheduler_state_dict"] = noise_scheduler.state_dict()

torch.save(checkpoint_dict, path, **(torch_save_kwargs or {}))

def load_checkpoint(
self,
*,
path: Union[str, os.PathLike, BinaryIO, IO[bytes]],
module: GradSampleModule,
optimizer: Optional[DPOptimizer] = None,
noise_scheduler: Optional[_NoiseScheduler] = None,
module_load_dict_kwargs: Optional[Dict[str, Any]] = None,
torch_load_kwargs: Optional[Dict[str, Any]] = None,
) -> Dict:
checkpoint = torch.load(path, **(torch_load_kwargs or {}))
module.load_state_dict(
checkpoint["module_state_dict"], **(module_load_dict_kwargs or {})
)
self.accountant.load_state_dict(checkpoint["privacy_accountant_state_dict"])

optimizer_state_dict = checkpoint.pop("optimizer_state_dict", {})
if optimizer is not None and len(optimizer_state_dict) > 0:
optimizer.load_state_dict(optimizer_state_dict)
elif (optimizer is not None) ^ (len(optimizer_state_dict) > 0):
# warn if only one of them is available
warnings.warn(
f"optimizer_state_dict has {len(optimizer_state_dict)} items"
f" but optimizer is {'' if optimizer else 'not'} provided."
)

noise_scheduler_state_dict = checkpoint.pop("noise_scheduler_state_dict", {})
if noise_scheduler is not None and len(noise_scheduler_state_dict) > 0:
noise_scheduler.load_state_dict(noise_scheduler_state_dict)

return checkpoint
56 changes: 56 additions & 0 deletions opacus/tests/accountants_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,59 @@ def test_get_noise_multiplier_gdp(self):
)

self.assertAlmostEqual(noise_multiplier, 1.3232421875)

def test_accountant_state_dict(self):
noise_multiplier = 1.5
sample_rate = 0.04
steps = int(90 / 0.04)

accountant = RDPAccountant()
for _ in range(steps):
accountant.step(noise_multiplier=noise_multiplier, sample_rate=sample_rate)

dummy_dest = {"dummy_k": "dummy_v"}
# history should be equal but not the same instance
self.assertEqual(accountant.state_dict()["history"], accountant.history)
self.assertFalse(accountant.state_dict()["history"] is accountant.history)
# mechanism populated to supplied dict
self.assertEqual(
accountant.state_dict(dummy_dest)["mechanism"], accountant.mechanism
)
# existing values in supplied dict unchanged
self.assertEqual(
accountant.state_dict(dummy_dest)["dummy_k"], dummy_dest["dummy_k"]
)

def test_accountant_load_state_dict(self):
noise_multiplier = 1.5
sample_rate = 0.04
steps = int(90 / 0.04)

accountant = RDPAccountant()
for _ in range(steps - 1000):
accountant.step(noise_multiplier=noise_multiplier, sample_rate=sample_rate)

new_rdp_accountant = RDPAccountant()
new_gdp_accountant = GaussianAccountant()
# check corner cases
with self.assertRaises(ValueError):
new_rdp_accountant.load_state_dict({})
with self.assertRaises(ValueError):
new_rdp_accountant.load_state_dict({"1": 2})
with self.assertRaises(ValueError):
new_rdp_accountant.load_state_dict({"history": []})
with self.assertRaises(ValueError):
new_gdp_accountant.load_state_dict(accountant.state_dict())
# check loading logic
self.assertNotEqual(new_rdp_accountant.state_dict(), accountant.state_dict())
new_rdp_accountant.load_state_dict(accountant.state_dict())
self.assertEqual(new_rdp_accountant.state_dict(), accountant.state_dict())

# ensure correct output after completion
for _ in range(steps - 1000, steps):
new_rdp_accountant.step(
noise_multiplier=noise_multiplier, sample_rate=sample_rate
)

epsilon = new_rdp_accountant.get_epsilon(delta=1e-5)
self.assertAlmostEqual(epsilon, 7.32911117143)
21 changes: 21 additions & 0 deletions opacus/tests/grad_sample_module_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,3 +228,24 @@ def test_submodule_access(self):

with self.assertRaises(AttributeError):
_ = self.grad_sample_module.fc3

def test_state_dict(self):
gs_state_dict = self.grad_sample_module.state_dict()
og_state_dict = self.original_model.state_dict()
# check wrapped module state dict
for key in og_state_dict.keys():
self.assertTrue(f"_module.{key}" in gs_state_dict)
assert_allclose(og_state_dict[key], gs_state_dict[f"_module.{key}"])

def test_load_state_dict(self):
gs_state_dict = self.grad_sample_module.state_dict()
new_gs = GradSampleModule(
SampleConvNet(), batch_first=False, loss_reduction="mean"
)
new_gs.load_state_dict(gs_state_dict)
# wrapped module is the same
for key in self.original_model.state_dict().keys():
self.assertTrue(key in new_gs._module.state_dict())
assert_allclose(
self.original_model.state_dict()[key], new_gs._module.state_dict()[key]
)
Loading