This repository contains code that can be used to reproduce the experimental results presented in the paper:
- L.J.P. van der Maaten* and A.Y. Hannun*. The Trade-Offs of Private Prediction. arXiv 2007.05089, 2020.
The code requires Python 3.5+, PyTorch 1.5.0+, torchvision 0.6.0+, and visdom (optional). It also uses parts of TensorFlow Privacy and pytorch_resnet_cifar10.
Presuming you have installed Anaconda, you can install all the dependencies via:
conda install -c pytorch pytorch torchvision
pip install visdom
python install.py
All experiments can be run via the private_prediction_experiment.py script.
For example, to train and test a linear model on the MNIST dataset using loss
perturbation with privacy loss 1.0, you can use the following command:
python private_prediction_experiment.py \
--dataset mnist \
--method loss_perturbation \
--epsilon 1.0
The following input arguments can be used to change the model, private prediction method, and privacy loss:
--model: the model used can belinear(default) orresnet{20,32,44,56,110,1202}--method: private prediction method can besubsagg(default),loss_perturbation,{model,logit}_sensitivity, ordpsgd--epsilon: privacy loss value for predictions (default = infinity)--delta: privacy failure probability for predictions (default = 0.0)--inference_budget: number of inferences to support (default = -1 to try many values)--weight_decay: L2-regularization parameter (default = 0.0; set to -1 to cross-validate)
The following input arguments can be used to change details of the optimizer:
--optimizer: optimizer used can belbfgs(default) orsgd--num_epochs: number of training epochs (default = 100)--batch_size: batch size for SGD optimization (default = 32)--learning_rate: initial learning rate for SGD optimization (default = 1.0)
The following input arguments alter hyperparameters of specific private prediction methods:
--num_models: number of models in subsample-and-aggregate method (default = 32)--noise_dist: noise distribution used in sensitivity methods can besqrt_gaussian(default),laplacian,gaussian,advanced_gaussian--clip: gradient clipping value for DP-SGD (default = 1e-1; set to -1 to cross-validate)--use_lr_scheduler: use learning rate reduction (for DP-SGD)
The following input arguments alter the dataset used for experimentation:
--dataset: the dataset used can bemnist(default),mnist1m,cifar10, orcifar100--num_samples: number of training samples to use (default: all)--num_classes: number of classes to use (default: all)--pca_dims: number of PCA dimensions for data (default: PCA not used)
The following input arguments alter other system properties:
--device: compute device can becpu(default) orgpu--visdom: visdom server for learning curves (default = localhost)--num_repetitions: number of times to repeat experiment (default = 10)--data_folder: folder in which to store dataset for re-use--result_file: file in which to write experimental results (default: unused)
The MNIST-1M dataset used in the paper is not directly available for download, but can be generated using InfiniMNIST.
Download InfiniMNIST and run:
make
mkdir /tmp/infinimnist
infimnist patterns 70000 1069999 > /tmp/infinimnist/mnist1m-images-idx3-ubyte
infimnist labels 70000 1069999 > /tmp/infinimnist/mnist1m-labels-idx1-ubyte
infimnist patterns 0 9999 > t10k-images-idx3-ubyte
infimnist labels 0 9999 > t10k-labels-idx1-ubyte
Now, you should be able to run experiments on the MNIST-1M dataset, for example:
python private_prediction_experiment.py \
--dataset mnist1m \
--num_samples 100000 \
--method loss_perturbation \
--epsilon 1.0 \
--data_folder /tmp/infinimnist
If you use the code in this repository, please cite the corresponding paper:
- L.J.P. van der Maaten* and A.Y. Hannun*. The Trade-Offs of Private Prediction. arXiv 2007.05089, 2020.
This code is released under a CC-BY-NC 4.0 license. Please see the LICENSE file for more information.
Please review Facebook Open Source Terms of Use and Privacy Policy.