Calculating Mean Value of Two-Dimensional Matrix
Hi!
I am trying to calculate the mean value of a two-dimensional matrix (sub_data) with Pyfhel (2.3.1). As it doesn't support the division of two ciphertexts then I have to compute mean (data_mean ) value like:
enc_len_sub_data = HE.encodeFrac(1 / len(sub_data)) enc_data_mean = np.sum(sub_data) * enc_len_sub_data
However, the final result is completely different from what I can achieve by data_mean = numpy.mean(sub_data) Are there any potential solutions rather than what I mentioned earlier?
Hi @ShokofehVS , can you post the piece of code (or a smaller, reproducible example) that yields wrong results? Your approach seems to be the correct one, but without the actual code it is hard to diagnose the issue.
Thank you for your response. Below is the implementation of Cheng and Church biclustering algorithm with homomorphic encryption (the original algorithm can be found in https://github.com/padilha/biclustlib/blob/master/biclustlib/algorithms/cca.py ):
def _calculate_msr(self, data, rows, cols, HE, t_enc, t_dec):
"""Calculate the mean squared residues of the rows, of the columns and of the full data matrix by homomorphic encryption"""
# sub_data = data[rows][:, cols]
sub_data = np.ascontiguousarray(sub_data)
enc_sub_data = sub_data.flatten()
arr_sub_data = np.empty(len(enc_sub_data), dtype=PyCtxt)
for i in np.arange(len(enc_sub_data)):
arr_sub_data[i] = HE.encryptFrac(enc_sub_data[i])
arr_sub_data = arr_sub_data.reshape(sub_data.shape)
# Encrypting data_mean
enc_len_array_sub_data = HE.encodeFrac(1 / len(arr_sub_data))
enc_data_mean = np.sum(arr_sub_data) * enc_len_array_sub_data
# Encrypting row_means
enc_row_means = np.sum(arr_sub_data, axis=1) * enc_len_array_sub_data
enc_row_means = enc_row_means.reshape((sub_data.shape[0], 1))
# Encrypting col_means
enc_col_means = np.mean(arr_sub_data, axis=0)
# Encrypting Residues
enc_residues = arr_sub_data - enc_row_means - enc_col_means + enc_data_mean
# Encrypting Squared Residues
enc_squared_residues = enc_residues ** 2
# Encrypting msr
enc_len_squared_residue = HE.encodeFrac(1 / len(enc_squared_residues))
enc_msr = np.sum(enc_squared_residues) * enc_len_squared_residue
# Encrypting row_msr
enc_row_msr = np.sum(enc_squared_residues, axis=1) * enc_len_squared_residue
# Encrypting col_msr
enc_col_msr = np.mean(enc_squared_residues, axis=0)
# Decrypting msr
decrypted_msr = HE.decryptFrac(enc_msr)
# Decrypting msr_row
decrypted_msr_row = np.empty(len(enc_row_msr), dtype=PyCtxt)
for i in np.arange(len(enc_row_msr)):
decrypted_msr_row[i] = HE.decryptFrac(enc_row_msr[i])
# Decrypting msr_col
decrypted_msr_col = np.empty(len(enc_col_msr), dtype=PyCtxt)
for i in np.arange(len(enc_col_msr)):
decrypted_msr_col[i] = HE.decryptFrac(enc_col_msr[i])
return decrypted_msr, decrypted_msr_row, decrypted_msr_col
My problem is mainly with calculating the mean of data_mean and row_mean.
The code seems to be correct (all the operations are based on additions, multiplications and divisions by public values).
If you are getting a very unexpected result, the first thing you should do is to check whether the parameters picked for Pyfhel.contextGen allow enough encrypted operations to yield a correct result. What parameters are you using?
Additionally, do you get a correct result when executing only np.sum(arr_sub_data)? If so, the error might come from the encoding of 1/len(arr_sub_data); otherwise there might be something wrong with the encrypted addition. What size do you typically have for arr_sub_data?
As a third question, could you provide some fake data (sth like np.random.randint(low=0, high=255, size=(100, 200)), just to have an idea on shapes and data intervals) and the code to initialize the HE object? this might help on diagnosing the problem further.
Thank you!
- Pyfhel.contextGen(p=65537, m=2048, flagBatching=True, base=2, intDigits=64, fracDigits=3)
- np.sum(arr_sub_data) and 1/len(arr_sub_data) are generating a true results but the multiplication result unexpectedly is by far very different: np.sum(arr_sub_data) * enc_len_array_sub_data
- Size of arr_sub_data is at most (2884,17)
I see the problem! your context sets fracDigits=3, which means only 3 bits are used to encode the fractional part, thus all the fractional parts of your numbers are coerced into one of {0, 0.125, 0.25, 0.375, 0.5, 0.635, 0.75, 0.875}. Since your encoding of the array length is encode(1/len(arr_sub_data)) = encode(1/(2884*17)) ~ encode(0.0000204), you need many more fracDigits to encode this small number. Try with at least 20 bits, and maybe reduce the intDigits (do you really need to hold values of 2^64?)
I understand that increasing the size of fracDigit is of paramount importance.
On the other hand, after enc_squared_residues = enc_residues ** 2 in which Pyhfel Ciphertext's noiseBuget is zero encoding=FRACTIONAL, size=3/3, noiseBudget=0 I am getting more negative values than before. Do you think it can have this implication of changing the value of e.g. enc_msr?
If your noiseBudget reaches zero, you cannot decrypt anymore (the resulting values look like gibberish) . That being said, the main way to solve it is to increase the encryption parameters to have a higher noiseBudget at the beginning.
Thank you, I will check the parameter by which the issue might be solved.
After trying different parameters: p=65537, m=4096,flagBatching=True, base=2, intDigits=16, fracDigits=32; getting mean values over either axis 0 or 1 is working properly.
However, right after squaring (enc_squared_residues), the mean value across all axis and axis =1 (here enc_msr and enc_row_msr) results in a negative value despite having noisebudget more than 0. Add that mean value over axis=0 is correct.
I am not sure how this overflow in numpy.mean has happened and how to solve this within a ciphtertext.
Without some example input it is still hard to diagnose. Could you share some example input?
You can try with higher values for intDigits=16 to ensure there are enough space in the integer part to hold the values.
Thank you for your response. I will have to look in detail as maybe the problem is not related to HE parameters.
I would like to add that in the aboved-mentioned piece of code, my problem was diving sum of ciphertext to its length. Further, I have implementation of mean values by numpy.mean that worked properly for a special axis (such as enc_col_msr = np.mean(enc_squared_residues, axis=0) ). I supposed it's due to behaving as numpy operation.
Could you possibly determine how the numpy.mean operation can be possible for a particular axis and not for both)?
Revisiting this now that scalar product is implemented for CKKS in v3.3.0.
To calculate the mean of a 2D array you could just encrypt the NxM matrix row-wise or column-wise (SIMD), add all rows/columns, add the accumulated values using cumul_add and multiplying by 1/(N*M).
Here's some example code:
# Initialize Pyfhel
from Pyfhel import Pyfhel
HE = Pyfhel({'scheme':'CKKS', 'n':16384, 'qi_sizes':[60, 40, 40, 60], 'scale':2**40}, key_gen=True)
HE.rotateKeyGen()
HE.relinKeyGen()
# Generate 2D matrix
import numpy as np
np.random.seed(42)
N, M = 1000, 200
matrix2D = np.random.normal(50, 100, (N, M))
mean_2Dm = np.mean(matrix2D) # 50.09739510884986
# Encrypt matrix rows, one per ciphertext.
# For more efficiency, pack multiple rows per ciphertext.
c_matrix_2D = [HE.encrypt(matrix2D[j]) for j in range(N)]
# Calculate the global sum
c_matrix_2D_rowsum = sum(c_matrix_2D)
c_matrix_2D_sum = HE.cumul_add(c_matrix_2D_rowsum, in_new_ctxt=True)
# Downscale to obtain the mean
c_mean = c_matrix_2D_sum * (1 / (N * M))
# Decrypt result!
print(HE.decrypt(c_mean)[0])
#> 50.097393843257834 --> Could be a bit different, noise in ciphertexts is not seeded.
With this, we can happily close this issue.