Pyfhel icon indicating copy to clipboard operation
Pyfhel copied to clipboard

Calculating Mean Value of Two-Dimensional Matrix

Open ShokofehVS opened this issue 3 years ago • 12 comments

Hi!

I am trying to calculate the mean value of a two-dimensional matrix (sub_data) with Pyfhel (2.3.1). As it doesn't support the division of two ciphertexts then I have to compute mean (data_mean ) value like:

enc_len_sub_data = HE.encodeFrac(1 / len(sub_data)) enc_data_mean = np.sum(sub_data) * enc_len_sub_data

However, the final result is completely different from what I can achieve by data_mean = numpy.mean(sub_data) Are there any potential solutions rather than what I mentioned earlier?

ShokofehVS avatar Jun 14 '22 08:06 ShokofehVS

Hi @ShokofehVS , can you post the piece of code (or a smaller, reproducible example) that yields wrong results? Your approach seems to be the correct one, but without the actual code it is hard to diagnose the issue.

ibarrond avatar Jun 14 '22 08:06 ibarrond

Thank you for your response. Below is the implementation of Cheng and Church biclustering algorithm with homomorphic encryption (the original algorithm can be found in https://github.com/padilha/biclustlib/blob/master/biclustlib/algorithms/cca.py ):

def _calculate_msr(self, data, rows, cols, HE, t_enc, t_dec):
        """Calculate the mean squared residues of the rows, of the columns and of the full data matrix by homomorphic encryption"""

        # sub_data = data[rows][:, cols]
        sub_data = np.ascontiguousarray(sub_data)
        enc_sub_data = sub_data.flatten()
        arr_sub_data = np.empty(len(enc_sub_data), dtype=PyCtxt)
        for i in np.arange(len(enc_sub_data)):
            arr_sub_data[i] = HE.encryptFrac(enc_sub_data[i])

        arr_sub_data = arr_sub_data.reshape(sub_data.shape)

        # Encrypting data_mean
        enc_len_array_sub_data = HE.encodeFrac(1 / len(arr_sub_data))
        enc_data_mean = np.sum(arr_sub_data) * enc_len_array_sub_data 

        # Encrypting row_means
        enc_row_means = np.sum(arr_sub_data, axis=1) * enc_len_array_sub_data  
        enc_row_means = enc_row_means.reshape((sub_data.shape[0], 1))

        # Encrypting col_means
        enc_col_means = np.mean(arr_sub_data, axis=0)

        # Encrypting Residues
        enc_residues = arr_sub_data - enc_row_means - enc_col_means + enc_data_mean
        
        # Encrypting Squared Residues
        enc_squared_residues = enc_residues ** 2

        # Encrypting msr
        enc_len_squared_residue = HE.encodeFrac(1 / len(enc_squared_residues))
        enc_msr = np.sum(enc_squared_residues) * enc_len_squared_residue 
   
        # Encrypting row_msr
         enc_row_msr = np.sum(enc_squared_residues, axis=1) * enc_len_squared_residue 
    
        # Encrypting col_msr
        enc_col_msr = np.mean(enc_squared_residues, axis=0)

        #  Decrypting msr
        decrypted_msr = HE.decryptFrac(enc_msr)
    
        # Decrypting msr_row
        decrypted_msr_row = np.empty(len(enc_row_msr), dtype=PyCtxt)
        for i in np.arange(len(enc_row_msr)):
            decrypted_msr_row[i] = HE.decryptFrac(enc_row_msr[i])
        
        # Decrypting msr_col
        decrypted_msr_col = np.empty(len(enc_col_msr), dtype=PyCtxt)
        for i in np.arange(len(enc_col_msr)):
            decrypted_msr_col[i] = HE.decryptFrac(enc_col_msr[i])

  
        return decrypted_msr, decrypted_msr_row, decrypted_msr_col

My problem is mainly with calculating the mean of data_mean and row_mean.

ShokofehVS avatar Jun 14 '22 09:06 ShokofehVS

The code seems to be correct (all the operations are based on additions, multiplications and divisions by public values).

If you are getting a very unexpected result, the first thing you should do is to check whether the parameters picked for Pyfhel.contextGen allow enough encrypted operations to yield a correct result. What parameters are you using?

Additionally, do you get a correct result when executing only np.sum(arr_sub_data)? If so, the error might come from the encoding of 1/len(arr_sub_data); otherwise there might be something wrong with the encrypted addition. What size do you typically have for arr_sub_data?

As a third question, could you provide some fake data (sth like np.random.randint(low=0, high=255, size=(100, 200)), just to have an idea on shapes and data intervals) and the code to initialize the HE object? this might help on diagnosing the problem further.

ibarrond avatar Jun 15 '22 08:06 ibarrond

Thank you!

  1. Pyfhel.contextGen(p=65537, m=2048, flagBatching=True, base=2, intDigits=64, fracDigits=3)
  2. np.sum(arr_sub_data) and 1/len(arr_sub_data) are generating a true results but the multiplication result unexpectedly is by far very different: np.sum(arr_sub_data) * enc_len_array_sub_data
  3. Size of arr_sub_data is at most (2884,17)

ShokofehVS avatar Jun 15 '22 10:06 ShokofehVS

I see the problem! your context sets fracDigits=3, which means only 3 bits are used to encode the fractional part, thus all the fractional parts of your numbers are coerced into one of {0, 0.125, 0.25, 0.375, 0.5, 0.635, 0.75, 0.875}. Since your encoding of the array length is encode(1/len(arr_sub_data)) = encode(1/(2884*17)) ~ encode(0.0000204), you need many more fracDigits to encode this small number. Try with at least 20 bits, and maybe reduce the intDigits (do you really need to hold values of 2^64?)

ibarrond avatar Jun 15 '22 11:06 ibarrond

I understand that increasing the size of fracDigit is of paramount importance. On the other hand, after enc_squared_residues = enc_residues ** 2 in which Pyhfel Ciphertext's noiseBuget is zero encoding=FRACTIONAL, size=3/3, noiseBudget=0 I am getting more negative values than before. Do you think it can have this implication of changing the value of e.g. enc_msr?

ShokofehVS avatar Jun 15 '22 14:06 ShokofehVS

If your noiseBudget reaches zero, you cannot decrypt anymore (the resulting values look like gibberish) . That being said, the main way to solve it is to increase the encryption parameters to have a higher noiseBudget at the beginning.

ibarrond avatar Jun 16 '22 08:06 ibarrond

Thank you, I will check the parameter by which the issue might be solved.

ShokofehVS avatar Jun 17 '22 09:06 ShokofehVS

After trying different parameters: p=65537, m=4096,flagBatching=True, base=2, intDigits=16, fracDigits=32; getting mean values over either axis 0 or 1 is working properly. However, right after squaring (enc_squared_residues), the mean value across all axis and axis =1 (here enc_msr and enc_row_msr) results in a negative value despite having noisebudget more than 0. Add that mean value over axis=0 is correct. I am not sure how this overflow in numpy.mean has happened and how to solve this within a ciphtertext.

ShokofehVS avatar Jun 25 '22 19:06 ShokofehVS

Without some example input it is still hard to diagnose. Could you share some example input?

You can try with higher values for intDigits=16 to ensure there are enough space in the integer part to hold the values.

ibarrond avatar Jul 08 '22 16:07 ibarrond

Thank you for your response. I will have to look in detail as maybe the problem is not related to HE parameters.

ShokofehVS avatar Jul 24 '22 18:07 ShokofehVS

I would like to add that in the aboved-mentioned piece of code, my problem was diving sum of ciphertext to its length. Further, I have implementation of mean values by numpy.mean that worked properly for a special axis (such as enc_col_msr = np.mean(enc_squared_residues, axis=0) ). I supposed it's due to behaving as numpy operation.

Could you possibly determine how the numpy.mean operation can be possible for a particular axis and not for both)?

ShokofehVS avatar Jul 27 '22 13:07 ShokofehVS

Revisiting this now that scalar product is implemented for CKKS in v3.3.0.

To calculate the mean of a 2D array you could just encrypt the NxM matrix row-wise or column-wise (SIMD), add all rows/columns, add the accumulated values using cumul_add and multiplying by 1/(N*M).

Here's some example code:

# Initialize Pyfhel
from Pyfhel import Pyfhel
HE = Pyfhel({'scheme':'CKKS', 'n':16384, 'qi_sizes':[60, 40, 40, 60], 'scale':2**40}, key_gen=True)
HE.rotateKeyGen()
HE.relinKeyGen()

# Generate 2D matrix
import numpy as np
np.random.seed(42)
N, M = 1000, 200

matrix2D = np.random.normal(50, 100, (N, M))
mean_2Dm = np.mean(matrix2D)   # 50.09739510884986

# Encrypt matrix rows, one per ciphertext.
#    For more efficiency, pack multiple rows per ciphertext.
c_matrix_2D = [HE.encrypt(matrix2D[j]) for j in range(N)]

# Calculate the global sum
c_matrix_2D_rowsum = sum(c_matrix_2D)
c_matrix_2D_sum = HE.cumul_add(c_matrix_2D_rowsum, in_new_ctxt=True)

# Downscale to obtain the mean
c_mean = c_matrix_2D_sum * (1 / (N * M))

# Decrypt result!
print(HE.decrypt(c_mean)[0]) 
#> 50.097393843257834  --> Could be a bit different, noise in ciphertexts is not seeded.

With this, we can happily close this issue.

ibarrond avatar Oct 04 '22 21:10 ibarrond