How to Analyze Communication Cost?
Describe the feature
I would like to ask about how we can analyze the communication cost or more precisely the size of communicated ciphertexts in bytes (single scalar, and vactor) that are sent to the data owner from cloud service providers in an interactive workflow.
Sample code:
def _calculate_msr(self, data, rows, cols, HE, t_enc, t_dec):
# sub_data = data[rows][:, cols]
sub_data = np.ascontiguousarray(sub_data)
enc_sub_data = sub_data.flatten()
arr_sub_data = np.empty(len(enc_sub_data), dtype=PyCtxt)
for i in np.arange(len(enc_sub_data)):
arr_sub_data[i] = HE.encryptFrac(enc_sub_data[i])
arr_sub_data = arr_sub_data.reshape(sub_data.shape)
# Encrypting data_mean
enc_len_array_sub_data = HE.encodeFrac(1 / len(arr_sub_data))
enc_data_mean = np.sum(arr_sub_data) * enc_len_array_sub_data
# Encrypting row_means
enc_row_means = np.sum(arr_sub_data, axis=1) * enc_len_array_sub_data
enc_row_means = enc_row_means.reshape((sub_data.shape[0], 1))
# Encrypting col_means
enc_col_means = np.mean(arr_sub_data, axis=0)
# Encrypting Residues
enc_residues = arr_sub_data - enc_row_means - enc_col_means + enc_data_mean
# Encrypting Squared Residues
enc_squared_residues = enc_residues ** 2
# Encrypting msr (single scalar)
enc_len_squared_residue = HE.encodeFrac(1 / len(enc_squared_residues))
enc_msr = np.sum(enc_squared_residues) * enc_len_squared_residue
# Encrypting row_msr (vector)
enc_row_msr = np.sum(enc_squared_residues, axis=1) * enc_len_squared_residue
# Encrypting col_msr (vector)
enc_col_msr = np.mean(enc_squared_residues, axis=0)
# Decrypting msr
decrypted_msr = HE.decryptFrac(enc_msr)
# Decrypting msr_row
decrypted_msr_row = np.empty(len(enc_row_msr), dtype=PyCtxt)
for i in np.arange(len(enc_row_msr)):
decrypted_msr_row[i] = HE.decryptFrac(enc_row_msr[i])
# Decrypting msr_col
decrypted_msr_col = np.empty(len(enc_col_msr), dtype=PyCtxt)
for i in np.arange(len(enc_col_msr)):
decrypted_msr_col[i] = HE.decryptFrac(enc_col_msr[i])
return decrypted_msr, decrypted_msr_row, decrypted_msr_col
(Input data [https://arep.med.harvard.edu/biclustering/yeast.matrix]): matrix containing rows (2884) * columns (17)
The best way to analyze communication costs would probably to use SEAL's serialization feature. Either by actually writing the ciphertext to a file, or by using something like save_size (see https://github.com/microsoft/SEAL/blob/main/native/src/seal/ciphertext.h#L458-L466).
iirc, Pyfhel has serialization support, but I assume we don't expose the save_size function.
If needed, save_size could be easily exposed for individual PyCtxt objects.
AttributeError: 'Pyfhel.PyCtxt.PyCtxt' object has no attribute 'save_size' ( Pyfhel 2.3.1)
If needed,
save_sizecould be easily exposed for individualPyCtxtobjects.
I said it could! Currently it is not implemented, but I can do it if you deem it necessary
On Sun, 14 Aug 2022, 17:53 Shokofeh VahidianSadegh, < @.***> wrote:
AttributeError: 'Pyfhel.PyCtxt.PyCtxt' object has no attribute 'save_size' ( Pyfhel 2.3.1)
If needed, save_size could be easily exposed for individual PyCtxt objects.
— Reply to this email directly, view it on GitHub https://github.com/ibarrond/Pyfhel/issues/134#issuecomment-1214404904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADF6Z5YEF5CDZYKBC24FTZTVZEI75ANCNFSM55HKS3WA . You are receiving this because you commented.Message ID: @.***>
I said it could! Currently it is not implemented, but I can do it if you deem it necessary … On Sun, 14 Aug 2022, 17:53 Shokofeh VahidianSadegh, < @.> wrote: AttributeError: 'Pyfhel.PyCtxt.PyCtxt' object has no attribute 'save_size' ( Pyfhel 2.3.1) If needed, save_size could be easily exposed for individual PyCtxt objects. — Reply to this email directly, view it on GitHub <#134 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADF6Z5YEF5CDZYKBC24FTZTVZEI75ANCNFSM55HKS3WA . You are receiving this because you commented.Message ID: @.>
Ok, I understood. yes, that's good but I assume that would be a new feature for a current stable version which I am not working with at the moment. One approach that I take to do so is writing ciphertext objects individually in a file and then estimating the total size of the file.
As per 5e0a456 (v3.3.0), all the save and load functions return the bytesize of the serialized/loaded object. You can use this to directly get the serialized object sizes and thus measure the communication cost.
Closing as completed!
Hi @ibarrond
I came back again to analyze the communication size :)
In my implementation, there are a number of iterations ranging from 330 (by setting up different parameters for better performance). I tried to get the size of communicated ciphertext by (ciphertext.save("file_name.txt", "zlib")). After all, I have an array containing these sizes for a specific ciphertext.
To elaborate on this cost, I don't know what would be best to get from this array (max, mean or average). I would like to ask whether you can help me in this regard?
Thanks Alberto!
I think the average per round and total communication are the only metrics that matter. And I agree on the method to calculate them via the bytesize of the ciphertexts compressed with zlib.