libsvm icon indicating copy to clipboard operation
libsvm copied to clipboard

Use libsvm in hadoop

Open rendi7936 opened this issue 9 years ago • 4 comments

Hello, everyone. I want to ask something. Can i use libsvm in apache hadoop ? Is it work with map reduce programming model in hadoop ?

rendi7936 avatar Oct 19 '16 01:10 rendi7936

why do you want to use libsvm on hadoop? I think it might be improper to apply kernel svm onto map reduce settings cause currently the kernel svm solver can not handle too large data sets

infwinston avatar Oct 19 '16 04:10 infwinston

I want to do performance analysis in Hadoop and Spark using SVM algorithm.

If i only use less than 1 GB dataset, it is ok ? I have read many paper that Hadoop can implement SVM algorithm, but no one explain what library they use. So, i start with libSVM.

So, What should i do ? Or are there another SVM library that support map reduce programing model ?

rendi7936 avatar Oct 19 '16 22:10 rendi7936

Spark ML contains an implementation of Linear SVMs, similar to, but not as comprehensive as those in LibLINEAR. As @infwinston mentioned, SVMs with kernels, which is what LibSVM is for, are not really suited for Hadoop and Spark, since they don't scale well to large datasets, which is why you would use Hadoop/Spark. If your dataset is not large, then just use LibSVM directly.

GerbenKD avatar Oct 20 '16 11:10 GerbenKD

I think you may want to check out LIBLINEAR webpage and github page. in some cases, Linear SVMs give good enough performance and get way faster than Kernel SVM.

infwinston avatar Oct 20 '16 15:10 infwinston