MIA icon indicating copy to clipboard operation
MIA copied to clipboard

Potential Issue with Member/Non-member Data Handling in Inference Code

Open YYuGAN opened this issue 1 year ago • 3 comments

Hello, I've been reviewing the code and noticed a potential issue regarding the handling of member and non-member data in the inference.py script. It seems that the indices from train_indices.csv are being used as non-member data for inference, which may not align with the typical definitions used in MIA.

I hope this information is helpful, and I look forward to any clarification or updates you can provide on this matter.

Thank you for your attention to this issue.

YYuGAN avatar May 05 '24 09:05 YYuGAN

@ganyuhhutao Thank you for your attention to the repository!

I think it's one of my fault to write the code in confusing manner, but I think I did implemented in right way.

The target model is trained on the testset of the CIFAR dataset, where data points are saved as train_indices.csv. Testset of the CIFAR dataset is splitted as train and validation splits which the model is optimized on. https://github.com/yonsei-sslab/MIA/blob/5dc858b7b8ef7eea5cd9a58520a42f97ab855805/train_target.py#L46-L47 https://github.com/yonsei-sslab/MIA/blob/5dc858b7b8ef7eea5cd9a58520a42f97ab855805/train_target.py#L64-L69

Subsequently, the target model inferences on member vs non-member data. The model has never seen trainset of the CIFAR dataset before, so it is non-member data. For non-member data, train_indices.csv that is saved earlier is used to index from trainset of the CIFAR dataset. The reason why indexing with train_indices.csv is just to because to match the number of datapoints between member and non-member. https://github.com/yonsei-sslab/MIA/blob/5dc858b7b8ef7eea5cd9a58520a42f97ab855805/inference_attack.py#L53-L64

Sorry if this part was confusing. If I were to write code now, I would have written as following:

list_nonmember_indices = np.random.choice(len(trainset), len(pd.read_csv("./attack/train_indices.csv")["index"].to_list()) , replace=False)

But in the end, I don't think there's a problem in the alignment of the code and the paper.

snoop2head avatar May 05 '24 10:05 snoop2head

If possible, can please @dokyungs give authorization to the repo back to me so that I can clean up issues and fix the code?

snoop2head avatar May 05 '24 10:05 snoop2head

Thank you for your explanation.

YYuGAN avatar May 05 '24 12:05 YYuGAN