can deepFM use sparse data format?
I try using deepFM.py with sparse data a8a.train, and its format likes "label index:value index:value..." . I see in S1_4.txt, if some value is 0 it is also in the feature line, but in a8a.train it is not. I run python deepFM.py, I got "Input to reshape is a tensor with 5528 values, but the requested shape requires a multiple of 672" I don't know if the code not supports the format?
hi sddi, deepFM reads sparse data as input, but notice that each instance must have exactly the same number of features, which is the "field number" in the paper. So if a field is empty or missing, you should append a zero fake value for it.
hi @Leavingseason , thank you for answering. I have millions of features, if i append a zero fake value , the input file maybe very large, could you update the code support the input format likes libsvm format(index:value, if value is zero, omit it in the input file )?
hi sddi, How many fields of feature (not the number of feature) do you have? Actually we do not request for every feature append a zero value, instead for each field, if there is no feature under it, we will append a zero fake value. The deepFM model use field-wise dense embedding as the input for deep neural network, so the number of fields can not be too large.
oh~~ @Leavingseason i see~~~ for instance, i have two fieds of feature, userID features(index from 0 to 100), itemID features(index from 101 to 1000). for one sample, in the input file maybe, "1 36:1 108:1 123:1 365:1", is it ok?
That's partially right. Now my code only supports at most one feature for each field, which follows the original paper's framework. So for itemID features, you can only keep one itemID. I know you concerns, in the real world, multiple features under one field happens a lot. We have the corresponding version of code to handle this case, which leverages sparse embedding lookup https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup_sparse, and the input format becomes fieldID:featureID:value. We will consider to release this version.
OK,thank you very much! I am waiting for your new version~~~:D
Have the version which supports "multiple features under one field" released ? Thanks
Not yet. All right, since some people are interested in this version, I will release a preview code which is now very ugly. I will try to find some time in two days (it is so sad that KDD deadline is near...)
Done.
@Leavingseason hello, 请教一个格式上的问题,fieldID:featureID:value 这里,如果fieldID==1 对应的featureID 有3个,如果fieldID==2对应的featureID 有2个,fieldID==2的 featureID 的值的编码需要基于 fieldID==1 的featureID 上吗? for example:
0 1:1:1 1:2:1 1:3:1 2:1:1 #这里fieldID==2的 featureID 可以重新编码 0 1:1:1 1:2:1 1:3:1 2:4:1 #这里fieldID==2的 featureID 不可以重新编码,需要基于原来的1之上,谢谢!