BPR results not reproducible using multiple threads
Hello,
Thank you for sharing this great library with us! 😊 I am using implicit version 0.7.2 and I think I found a bug.
If I create two instances of BPR using the same arguments, with a fixed random_state and when using more than one thread via num_threads, I do get different user and item factors after fitting.
But, when I use just one thread this issue does not occur.
It seems that BPR produces non-deterministic results when using multiple threads.
I suspect that there may be some bug using the RNGVector class in bpr.pyx (c++ RNG object).
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L43
A Numpy Random Generator is seeded using the supplied random_state argument (check_random_state in utils.py).
https://github.com/benfred/implicit/blob/v0.7.2/implicit/utils.py#L83
Each thread is supposed to get a RNGVector instance using a different random seed by using the Random Generator.
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L185-L187
But, this issue does not occur for me using the LMF matrix factorization model, though it also uses a c++ RNG object. https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/lmf.pyx#L40
There is a difference between BPR and LMF regarding the generation of random numbers with the RNGVector class.
-
LMF uses in
lmf_updatewhen running in parallel the functionthreadidfromcython.parallelto obtain the thread id, which is then used to generate a random number usingRNGVectorwithrng.generate(thread_id). https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/lmf.pyx#L250 - In contrast, BPR obtains in
bpr_updatethe thread id by using an external functionget_thread_numinbpr.h, which returnsomp_get_thread_numif the_OPENMPmacro is defined, otherwise it returns 0. https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L267 https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.h#L10-L20
Maybe this is the reason for the non-deterministic behaviour of BPR?
Best regards, Bernhard
Minimal example of creating 2 BPR instances using a fixed seed and multiple threads. The resulting user and item factors are different after fitting the models. Assertion errors occur.
import implicit
from threadpoolctl import threadpool_limits
from scipy.sparse import csr_matrix
import numpy as np
print('implicit version', implicit.__version__, '\n')
raw = [
[1, 1, 0, 1, 0, 0],
[0, 1, 1, 1, 0, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 1],
[0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
]
def get_bpr_model():
return implicit.bpr.BayesianPersonalizedRanking(factors=3,
learning_rate=1e-3,
regularization=1e-3,
use_gpu=False,
verify_negative_samples=True,
iterations=5,
num_threads=2, # Assertions fail if num_threads > 1
random_state=42)
with threadpool_limits(limits=1):
model_A = get_bpr_model()
model_A.fit(csr_matrix(raw), show_progress=False)
print(model_A.item_factors)
print(model_A.user_factors)
print(model_A.user_factors.dot(model_A.item_factors.T))
model_B = get_bpr_model()
model_B.fit(csr_matrix(raw), show_progress=False)
print(model_B.item_factors)
print(model_B.user_factors)
print(model_B.user_factors.dot(model_B.item_factors.T))
assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-2)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-2)
assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-3)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-3)
assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-4)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-4)
assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-5)
assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-5)
Output:
implicit version 0.7.2
[[-0.13629021 0.09140649 0.05145105 -0.01928077]
[-0.02253029 0.11990372 -0.13830613 0.06782728]
[-0.10003278 -0.13561453 0.00902637 0.15518866]
[ 0.07921306 0.08666488 0.07246038 0.09480929]
[ 0.00444893 -0.1239751 0.11326855 -0.01699392]
[-0.00050317 -0.04274316 -0.10560265 0.14299433]]
[[ 0.09415838 0.0487572 -0.03222072 1. ]
[ 0.01535796 -0.01886933 -0.01671766 1. ]
[-0.13623102 0.01816958 0.12942049 1. ]
[ 0.11905595 0.10971054 -0.07423817 1. ]
[-0.1114961 0.08572128 0.06691711 1. ]
[-0.14401852 0.15679215 -0.01908839 1. ]
[ 0.05942317 0.09276801 0.08664715 1. ]]
[[-0.0293147 0.07600836 0.13886672 0.10415867 -0.02626929 0.14426552]
[-0.02395883 0.06753092 0.15606043 0.09317917 -0.01647985 0.14555857]
[ 0.00760582 0.05517557 0.16752037 0.09497053 -0.00519331 0.1286191 ]
[-0.02929831 0.08856721 0.12773071 0.1083688 -0.03847447 0.14608479]
[ 0.00719349 0.07136258 0.1553209 0.0982552 -0.02053766 0.13231981]
[ 0.01369725 0.09251206 0.14815965 0.09560636 -0.03923509 0.13838078]
[-0.01444188 0.06562787 0.13744582 0.11383459 -0.01841608 0.12984906]]
[[-0.13642174 0.09134811 0.05149103 -0.01928074]
[-0.02247692 0.11985338 -0.1383458 0.06830399]
[-0.10003294 -0.1356145 0.00902635 0.15518874]
[ 0.07921314 0.08666489 0.07246038 0.09480958]
[ 0.00444896 -0.12397511 0.11326852 -0.01699418]
[-0.00050322 -0.04274315 -0.10560266 0.14299443]]
[[ 0.0941582 0.04875714 -0.03222046 1. ]
[ 0.01535813 -0.01886933 -0.01671774 1. ]
[-0.1362314 0.01816951 0.12942068 1. ]
[ 0.11917121 0.10970803 -0.07422689 1. ]
[-0.11149608 0.08572138 0.06691711 1. ]
[-0.14401817 0.15679215 -0.01908856 1. ]
[ 0.05942311 0.092768 0.08664716 1. ]]
[[-0.02933116 0.07648887 0.1388668 0.10415898 -0.02626951 0.14426559]
[-0.02396042 0.06801005 0.15606049 0.09317947 -0.01648012 0.14555869]
[ 0.00762793 0.05563892 0.16752052 0.0949708 -0.00519355 0.1286192 ]
[-0.02933869 0.08904324 0.12771969 0.10837884 -0.03847263 0.14608376]
[ 0.00720586 0.07182638 0.15532097 0.0982555 -0.02053794 0.13231991]
[ 0.01370624 0.09297396 0.14815971 0.09560666 -0.03923537 0.13838091]
[-0.01445162 0.06609963 0.13744588 0.11383489 -0.01841634 0.12984917]]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[15], line 50
47 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-3)
48 assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-3)
---> 50 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-4)
51 assert np.allclose(model_A.item_factors, model_B.item_factors, atol=1e-4)
53 assert np.allclose(model_A.user_factors, model_B.user_factors, atol=1e-5)
AssertionError:
I am using a win64 architecture and this conda environment:
name: my-conda-env
channels:
- conda-forge
- nodefaults
dependencies:
- implicit=0.7.2
- libblas=3.9.0=21_win64_mkl
- mkl=2024.0.0=h66d3029_49657
- numpy=1.24.4
- pandas=2.0.3
- pip=24.0
- python=3.8.18
- scipy=1.10.1
- pip:
- ipykernel==6.29.2