python-libmf icon indicating copy to clipboard operation
python-libmf copied to clipboard

Incorrect results and segmentation faults

Open david-cortes opened this issue 5 years ago • 2 comments

For some reason I keep getting extremely unreasonable results when fitting a model, regardless of the data or parameters. I am not sure why this is, but I suspect it might be due to the initialization of the factor matrices and/or perhaps passing some input in the wrong order to the underlying libmf library.

I also get segmentation faults if the values/ratings have negative values, which leads me to think some input might be passed to the wrong argument in the call to the C++ code.

As an example, if I run the following sample code:

import numpy as np
inp = np.array([
    [3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0],
    [6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1],
    [7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, 9, 5, 5]
]).T

from libmf import mf
engine = mf.MF()
engine.fit(inp)

I get this log:

iter      tr_rmse          obj
   0       5.0551   4.1289e+02
   1       4.4090   3.1610e+02
   2       3.7154   2.2675e+02
   3       3.1115   1.6138e+02
   4       2.6273   1.1733e+02
   5       2.2683   8.9466e+01
   6       2.0092   7.1917e+01
   7       1.8223   6.0572e+01
   8       1.6926   5.3367e+01
   9       1.6013   4.8609e+01
  10       1.5395   4.5542e+01
  11       1.4966   4.3483e+01
  12       1.4670   4.2098e+01
  13       1.4467   4.1173e+01
  14       1.4298   4.0405e+01
  15       1.4182   3.9881e+01
  16       1.4098   3.9509e+01
  17       1.4028   3.9204e+01
  18       1.3959   3.8900e+01
  19       1.3912   3.8690e+01

If I change the input to contain negative values:

inp = np.array([
    [3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0],
    [6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1],
    [-7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, -9, 5, 5]
]).T

Then I get instead a segmentation fault.

When I run the same thing in recosystem for R:

library(recosystem)
inp = data_memory(as.integer(c(3, 7, 1, 9, 6, 7, 3, 4, 9, 3, 4, 2, 6, 0, 8, 0)),
				  as.integer(c(6, 2, 3, 6, 4, 4, 5, 7, 3, 7, 3, 6, 3, 4, 5, 1)),
				  rating=c(7, 1, 1, 6, 1, 1, 6, 4, 2, 9, 7, 7, 2, 9, 5, 5),
				  index1=FALSE)
r = Reco()
r$train(inp, opts=list(dim=8))

Then the log looks a lot better:

iter      tr_rmse          obj
   0       4.5503   3.3641e+02
   1       3.8732   2.4760e+02
   2       3.1535   1.6900e+02
   3       2.5291   1.1421e+02
   4       2.0629   8.1257e+01
   5       1.7487   6.3139e+01
   6       1.5302   5.2326e+01
   7       1.3728   4.5442e+01
   8       1.2615   4.1277e+01
   9       1.1749   3.8192e+01
  10       1.1056   3.5838e+01
  11       1.0540   3.4197e+01
  12       1.0115   3.2985e+01
  13       0.9736   3.1885e+01
  14       0.9378   3.0881e+01
  15       0.9127   3.0278e+01
  16       0.8730   2.9144e+01
  17       0.8518   2.8633e+01
  18       0.8268   2.8056e+01
  19       0.8002   2.7445e+01

If I try it with real data, say the movielens 10M, then the log looks completely off:

iter      tr_rmse          obj
   0   14563.5307   1.6968e+15
   1   13515.4114   1.4613e+15
   2   13296.2147   1.4143e+15
   3   13173.7712   1.3884e+15
   4   13069.1206   1.3664e+15
   5   12969.8403   1.3457e+15
   6   12875.6878   1.3263e+15
   7   12785.1806   1.3077e+15
   8   12687.9232   1.2879e+15
   9   12581.6009   1.2664e+15
  10   12486.9251   1.2474e+15
  11   12393.4994   1.2288e+15
  12   12313.2395   1.2129e+15
  13   12224.9060   1.1956e+15
  14   12148.5366   1.1807e+15
  15   12082.6648   1.1679e+15
  16   12016.2389   1.1551e+15
  17   11953.0185   1.1430e+15
  18   11889.6046   1.1309e+15
  19   11823.5130   1.1184e+15

Compared to recosystem:

iter      tr_rmse          obj
   0       0.9644   1.3232e+07
   1       0.8875   1.2022e+07
   2       0.8569   1.1675e+07
   3       0.8431   1.1550e+07
   4       0.8328   1.1463e+07
   5       0.8259   1.1415e+07
   6       0.8210   1.1376e+07
   7       0.8167   1.1348e+07
   8       0.8133   1.1327e+07
   9       0.8114   1.1317e+07
  10       0.8090   1.1299e+07
  11       0.8075   1.1294e+07
  12       0.8056   1.1281e+07
  13       0.8039   1.1270e+07
  14       0.8027   1.1266e+07
  15       0.8014   1.1259e+07
  16       0.8001   1.1253e+07
  17       0.7990   1.1246e+07
  18       0.7981   1.1246e+07
  19       0.7971   1.1236e+07

david-cortes avatar Oct 31 '20 15:10 david-cortes

Think I found the issue: the wrapper assumes that the input is in row-major order, but doesn't make any check to verify that is, and if it is in column-major order, will pass the wrong inputs to libmf.

david-cortes avatar Nov 01 '20 13:11 david-cortes

Hey David - I know this was a while ago but would you consider submitting a PR that adds this check?

PorkShoulderHolder avatar Aug 07 '22 21:08 PorkShoulderHolder