Pyfhel Calculating Mean Value of Two-Dimensional Matrix

Hi!

I am trying to calculate the mean value of a two-dimensional matrix (sub_data) with Pyfhel (2.3.1). As it doesn't support the division of two ciphertexts then I have to compute mean (data_mean ) value like:

enc_len_sub_data = HE.encodeFrac(1 / len(sub_data)) enc_data_mean = np.sum(sub_data) * enc_len_sub_data

However, the final result is completely different from what I can achieve by data_mean = numpy.mean(sub_data) Are there any potential solutions rather than what I mentioned earlier?

Jun 14 '22 08:06 ShokofehVS

Hi @ShokofehVS , can you post the piece of code (or a smaller, reproducible example) that yields wrong results? Your approach seems to be the correct one, but without the actual code it is hard to diagnose the issue.

Jun 14 '22 08:06 ibarrond

Thank you for your response. Below is the implementation of Cheng and Church biclustering algorithm with homomorphic encryption (the original algorithm can be found in https://github.com/padilha/biclustlib/blob/master/biclustlib/algorithms/cca.py ):

def _calculate_msr(self, data, rows, cols, HE, t_enc, t_dec):
        """Calculate the mean squared residues of the rows, of the columns and of the full data matrix by homomorphic encryption"""

        # sub_data = data[rows][:, cols]
        sub_data = np.ascontiguousarray(sub_data)
        enc_sub_data = sub_data.flatten()
        arr_sub_data = np.empty(len(enc_sub_data), dtype=PyCtxt)
        for i in np.arange(len(enc_sub_data)):
            arr_sub_data[i] = HE.encryptFrac(enc_sub_data[i])

        arr_sub_data = arr_sub_data.reshape(sub_data.shape)

        # Encrypting data_mean
        enc_len_array_sub_data = HE.encodeFrac(1 / len(arr_sub_data))
        enc_data_mean = np.sum(arr_sub_data) * enc_len_array_sub_data 

        # Encrypting row_means
        enc_row_means = np.sum(arr_sub_data, axis=1) * enc_len_array_sub_data  
        enc_row_means = enc_row_means.reshape((sub_data.shape[0], 1))

        # Encrypting col_means
        enc_col_means = np.mean(arr_sub_data, axis=0)

        # Encrypting Residues
        enc_residues = arr_sub_data - enc_row_means - enc_col_means + enc_data_mean
        
        # Encrypting Squared Residues
        enc_squared_residues = enc_residues ** 2

        # Encrypting msr
        enc_len_squared_residue = HE.encodeFrac(1 / len(enc_squared_residues))
        enc_msr = np.sum(enc_squared_residues) * enc_len_squared_residue 
   
        # Encrypting row_msr
         enc_row_msr = np.sum(enc_squared_residues, axis=1) * enc_len_squared_residue 
    
        # Encrypting col_msr
        enc_col_msr = np.mean(enc_squared_residues, axis=0)

        #  Decrypting msr
        decrypted_msr = HE.decryptFrac(enc_msr)
    
        # Decrypting msr_row
        decrypted_msr_row = np.empty(len(enc_row_msr), dtype=PyCtxt)
        for i in np.arange(len(enc_row_msr)):
            decrypted_msr_row[i] = HE.decryptFrac(enc_row_msr[i])
        
        # Decrypting msr_col
        decrypted_msr_col = np.empty(len(enc_col_msr), dtype=PyCtxt)
        for i in np.arange(len(enc_col_msr)):
            decrypted_msr_col[i] = HE.decryptFrac(enc_col_msr[i])

  
        return decrypted_msr, decrypted_msr_row, decrypted_msr_col

My problem is mainly with calculating the mean of data_mean and row_mean.

Jun 14 '22 09:06 ShokofehVS

The code seems to be correct (all the operations are based on additions, multiplications and divisions by public values).

If you are getting a very unexpected result, the first thing you should do is to check whether the parameters picked for Pyfhel.contextGen allow enough encrypted operations to yield a correct result. What parameters are you using?

Additionally, do you get a correct result when executing only np.sum(arr_sub_data)? If so, the error might come from the encoding of 1/len(arr_sub_data); otherwise there might be something wrong with the encrypted addition. What size do you typically have for arr_sub_data?

As a third question, could you provide some fake data (sth like np.random.randint(low=0, high=255, size=(100, 200)), just to have an idea on shapes and data intervals) and the code to initialize the HE object? this might help on diagnosing the problem further.

Jun 15 '22 08:06 ibarrond

Thank you!

Pyfhel.contextGen(p=65537, m=2048, flagBatching=True, base=2, intDigits=64, fracDigits=3)
np.sum(arr_sub_data) and 1/len(arr_sub_data) are generating a true results but the multiplication result unexpectedly is by far very different: np.sum(arr_sub_data) * enc_len_array_sub_data
Size of arr_sub_data is at most (2884,17)

Jun 15 '22 10:06 ShokofehVS

I see the problem! your context sets fracDigits=3, which means only 3 bits are used to encode the fractional part, thus all the fractional parts of your numbers are coerced into one of {0, 0.125, 0.25, 0.375, 0.5, 0.635, 0.75, 0.875}. Since your encoding of the array length is encode(1/len(arr_sub_data)) = encode(1/(2884*17)) ~ encode(0.0000204), you need many more fracDigits to encode this small number. Try with at least 20 bits, and maybe reduce the intDigits (do you really need to hold values of 2^64?)

Jun 15 '22 11:06 ibarrond

I understand that increasing the size of fracDigit is of paramount importance. On the other hand, after enc_squared_residues = enc_residues ** 2 in which Pyhfel Ciphertext's noiseBuget is zero encoding=FRACTIONAL, size=3/3, noiseBudget=0 I am getting more negative values than before. Do you think it can have this implication of changing the value of e.g. enc_msr?

Jun 15 '22 14:06 ShokofehVS

If your noiseBudget reaches zero, you cannot decrypt anymore (the resulting values look like gibberish) . That being said, the main way to solve it is to increase the encryption parameters to have a higher noiseBudget at the beginning.

Jun 16 '22 08:06 ibarrond

Thank you, I will check the parameter by which the issue might be solved.

Jun 17 '22 09:06 ShokofehVS

After trying different parameters: p=65537, m=4096,flagBatching=True, base=2, intDigits=16, fracDigits=32; getting mean values over either axis 0 or 1 is working properly. However, right after squaring (enc_squared_residues), the mean value across all axis and axis =1 (here enc_msr and enc_row_msr) results in a negative value despite having noisebudget more than 0. Add that mean value over axis=0 is correct. I am not sure how this overflow in numpy.mean has happened and how to solve this within a ciphtertext.

Jun 25 '22 19:06 ShokofehVS

Without some example input it is still hard to diagnose. Could you share some example input?

You can try with higher values for intDigits=16 to ensure there are enough space in the integer part to hold the values.

Jul 08 '22 16:07 ibarrond

Thank you for your response. I will have to look in detail as maybe the problem is not related to HE parameters.

Jul 24 '22 18:07 ShokofehVS

I would like to add that in the aboved-mentioned piece of code, my problem was diving sum of ciphertext to its length. Further, I have implementation of mean values by numpy.mean that worked properly for a special axis (such as enc_col_msr = np.mean(enc_squared_residues, axis=0) ). I supposed it's due to behaving as numpy operation.

Could you possibly determine how the numpy.mean operation can be possible for a particular axis and not for both)?

Jul 27 '22 13:07 ShokofehVS

Revisiting this now that scalar product is implemented for CKKS in v3.3.0.

To calculate the mean of a 2D array you could just encrypt the NxM matrix row-wise or column-wise (SIMD), add all rows/columns, add the accumulated values using cumul_add and multiplying by 1/(N*M).

Here's some example code:

# Initialize Pyfhel
from Pyfhel import Pyfhel
HE = Pyfhel({'scheme':'CKKS', 'n':16384, 'qi_sizes':[60, 40, 40, 60], 'scale':2**40}, key_gen=True)
HE.rotateKeyGen()
HE.relinKeyGen()

# Generate 2D matrix
import numpy as np
np.random.seed(42)
N, M = 1000, 200

matrix2D = np.random.normal(50, 100, (N, M))
mean_2Dm = np.mean(matrix2D)   # 50.09739510884986

# Encrypt matrix rows, one per ciphertext.
#    For more efficiency, pack multiple rows per ciphertext.
c_matrix_2D = [HE.encrypt(matrix2D[j]) for j in range(N)]

# Calculate the global sum
c_matrix_2D_rowsum = sum(c_matrix_2D)
c_matrix_2D_sum = HE.cumul_add(c_matrix_2D_rowsum, in_new_ctxt=True)

# Downscale to obtain the mean
c_mean = c_matrix_2D_sum * (1 / (N * M))

# Decrypt result!
print(HE.decrypt(c_mean)[0]) 
#> 50.097393843257834  --> Could be a bit different, noise in ciphertexts is not seeded.

With this, we can happily close this issue.

Oct 04 '22 21:10 ibarrond