Hi there,
I was wondering if virtual batch sizing (as explained, for example, in Opacus · Train PyTorch models with Differential Privacy) is supposed to work with multi-GPU training? For me, virtual batch sizing work well (even if model is wrapped in DifferentiallyPrivateDistributedDataParallel) when using a single GPU but fails when using multiple GPUs. The first couple iterations work fine but at some point the self.pre_step(): in DistributedDPOptimizer returns True for one rank while returning False for the other rank. The code then gets stuck because the first optimizer takes a “real step” while the second doesn’t.
EDIT: After further investigation it seems that the problem is the following. Due to Poisson sampling, the BatchSplittingSampler may split a batch into n physical batches, where n is not divisible by the number of GPUs used.