opacus
opacus copied to clipboard
How to calculate the overall privacy budget generated during federated differential privacy model training using Opacus?
Hello, developers! I want to apply Opacus to the model training of federated differential privacy. I decide to preform differential privacy training on each client-side model, and send the noisy gradients of each client to the server, for averaging and aggregating the gradients to update the model. How can I calculate the overall privacy budget rather than generated by client-side model in federated differential privacy training using Opacus?
Hi there, I have a few clarifying questions:
- Are the guarantees you would like to offer "local differential privacy" or "central differential privacy"? I.e. are you trying to offer the privacy guarantee on the server-side (central) or the privacy guarantee is on the client side (local), e.g.,, prohibiting server-side reconstruction of gradients and protecting privacy leakage from eavesdroppers in the communication between server and client.
- Is your question more about the conceptual privacy budget computation or is the question rather about how privacy budget handling is done in Opacus?
Feel free to provide parameters you are using such as batch size, noise multiplier etc.
Thank you for your reply !
- I would like to offer guarantees for "local differential privacy". I have 12000 trainable samples, they are evenly divided among 100 clients. I make 5 rounds training, every round training randomly selects 5 cliends from 100 clients.
- My question is more about how privacy budget handing is done in Opacus.
# the structures of global model and client model are same
global_model = model()
client_models = [model() for _ in range(5)]
global_optim = optim.Adam(global_model.parameters(),lr=0.01)
client_optims = [optim.Adam(client_model.parameters(), lr=0.01) for client_model in client_models]
# randomly selecting 5 clients, the difference of clients is only train_loader
client_idx = np.random.choice(100, 5, replace=False)
client_train_loaders = [train_loader[i] for i in client_idx]
# I am not sure that how to use opacus in client model training, maybe the follows, and I found test accuracy is 82% under dp
# but it is 98% under no-dp after 50 rounds for MNIST 10 classification. 10 clients are selected rondomly before every round.
def client_update(self, client_idx, global_model=cnn, epochs=5, batch_size=50, lr=0.01):
"""local client update"""
model = copy.deepcopy(global_model)
model.train()
# get client data
client_dataset = self.client_datasets[client_idx]
print(len(client_dataset)) # 600
client_loader = DataLoader(client_dataset, batch_size=batch_size, shuffle=True)
optimizer = optim.SGD(model.parameters(), lr=lr)
# apply differential privacy
if self.DP:
privacy_engine = PrivacyEngine()
model, optimizer, client_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=client_loader,
noise_multiplier=0.01,
max_grad_norm=1.0,
poisson_sampling=False
)
# local training
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(client_loader):
data, target = data.to(self.device), target.to(self.device)
optimizer.zero_grad()
output = model(data)
loss = nn.functional.nll_loss(output, target)
loss.backward()
optimizer.step()
return model._module.state_dict() if self.DP else model.state_dict()