Incorrect results and segmentation faults
For some reason I keep getting extremely unreasonable results when fitting a model, regardless of the data or parameters. I am not sure why this is, but I suspect it might be due to the initialization of the factor matrices and/or perhaps passing some input in the wrong order to the underlying libmf library.
I also get segmentation faults if the values/ratings have negative values, which leads me to think some input might be passed to the wrong argument in the call to the C++ code.
As an example, if I run the following sample code:
import numpy as np
inp = np.array([
[3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0],
[6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1],
[7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, 9, 5, 5]
]).T
from libmf import mf
engine = mf.MF()
engine.fit(inp)
I get this log:
iter tr_rmse obj
0 5.0551 4.1289e+02
1 4.4090 3.1610e+02
2 3.7154 2.2675e+02
3 3.1115 1.6138e+02
4 2.6273 1.1733e+02
5 2.2683 8.9466e+01
6 2.0092 7.1917e+01
7 1.8223 6.0572e+01
8 1.6926 5.3367e+01
9 1.6013 4.8609e+01
10 1.5395 4.5542e+01
11 1.4966 4.3483e+01
12 1.4670 4.2098e+01
13 1.4467 4.1173e+01
14 1.4298 4.0405e+01
15 1.4182 3.9881e+01
16 1.4098 3.9509e+01
17 1.4028 3.9204e+01
18 1.3959 3.8900e+01
19 1.3912 3.8690e+01
If I change the input to contain negative values:
inp = np.array([
[3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0],
[6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1],
[-7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, -9, 5, 5]
]).T
Then I get instead a segmentation fault.
When I run the same thing in recosystem for R:
library(recosystem)
inp = data_memory(as.integer(c(3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0)),
as.integer(c(6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1)),
rating=c(7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, 9, 5, 5),
index1=FALSE)
r = Reco()
r$train(inp, opts=list(dim=8))
Then the log looks a lot better:
iter tr_rmse obj
0 4.5503 3.3641e+02
1 3.8732 2.4760e+02
2 3.1535 1.6900e+02
3 2.5291 1.1421e+02
4 2.0629 8.1257e+01
5 1.7487 6.3139e+01
6 1.5302 5.2326e+01
7 1.3728 4.5442e+01
8 1.2615 4.1277e+01
9 1.1749 3.8192e+01
10 1.1056 3.5838e+01
11 1.0540 3.4197e+01
12 1.0115 3.2985e+01
13 0.9736 3.1885e+01
14 0.9378 3.0881e+01
15 0.9127 3.0278e+01
16 0.8730 2.9144e+01
17 0.8518 2.8633e+01
18 0.8268 2.8056e+01
19 0.8002 2.7445e+01
If I try it with real data, say the movielens 10M, then the log looks completely off:
iter tr_rmse obj
0 14563.5307 1.6968e+15
1 13515.4114 1.4613e+15
2 13296.2147 1.4143e+15
3 13173.7712 1.3884e+15
4 13069.1206 1.3664e+15
5 12969.8403 1.3457e+15
6 12875.6878 1.3263e+15
7 12785.1806 1.3077e+15
8 12687.9232 1.2879e+15
9 12581.6009 1.2664e+15
10 12486.9251 1.2474e+15
11 12393.4994 1.2288e+15
12 12313.2395 1.2129e+15
13 12224.9060 1.1956e+15
14 12148.5366 1.1807e+15
15 12082.6648 1.1679e+15
16 12016.2389 1.1551e+15
17 11953.0185 1.1430e+15
18 11889.6046 1.1309e+15
19 11823.5130 1.1184e+15
Compared to recosystem:
iter tr_rmse obj
0 0.9644 1.3232e+07
1 0.8875 1.2022e+07
2 0.8569 1.1675e+07
3 0.8431 1.1550e+07
4 0.8328 1.1463e+07
5 0.8259 1.1415e+07
6 0.8210 1.1376e+07
7 0.8167 1.1348e+07
8 0.8133 1.1327e+07
9 0.8114 1.1317e+07
10 0.8090 1.1299e+07
11 0.8075 1.1294e+07
12 0.8056 1.1281e+07
13 0.8039 1.1270e+07
14 0.8027 1.1266e+07
15 0.8014 1.1259e+07
16 0.8001 1.1253e+07
17 0.7990 1.1246e+07
18 0.7981 1.1246e+07
19 0.7971 1.1236e+07
Think I found the issue: the wrapper assumes that the input is in row-major order, but doesn't make any check to verify that is, and if it is in column-major order, will pass the wrong inputs to libmf.
Hey David - I know this was a while ago but would you consider submitting a PR that adds this check?