Quick model: lasso and liblinear
I see that the model for lasso is sklearn.LogisticRegression with l1 penalty and liblinear solver, and that for liblinear it is sklearn.LogisticRegression with l2 penalty and lbfgs solver.
What I suggest:
- rename "Liblinear" to "Logistic", and give it the lbfgs solver (because of this: "liblinear can only handle binary classification by default" -> this shows in our current "Lasso", it never predicts the third class or above!)
- change lasso function to sklearn.Lasso: it's the same objective as l1-penalty logistic, but a different solver. We can keep the C parameter for consistency with Logistic, and set alpha = 1 / (2C)
on it
is that what you wanted?
Exactly!
right, using Lasso is just a pain cause it does not have the same attributes as the other models.
What did you want to change for the lasso function exactly?
in the meantime i've left the LogisticRegression with "saga" solver that works with "L1" and multiclass
The Lasso algorithm is different, more efficient in high-dim cases. And if we advertise Lasso in the UI, it's better to call an actual lasso
Ok, actually in sklearn Lasso is only for (numeric) regression, not classification...
So, new suggestion:
- replace Lasso name by Linear-L1, and in backend LogisticRegression, l1 and lbfgs
- replace Logistic name by Linear-L2
- default to Linear-L1? (better for DFM, not sure for SBERT...)
right, in that case, we will need to adapt the code to take this specificity into account.
My two cents from a user perspective: if regularization is applied, explicitly mentioning it in the name (whether L1/L2 or Lasso/Ridge) does indeed seem necessary. Most sociologists/political scientists seeing “logistic” without any clarification will assume it’s the standard, unpenalized version. Source: I’m ashamed by how long it took me to understand why sklearn logistic regressions were giving me “weird” results…
commit 94ba500 had nothing to do with this issue, I reverted it
- replace Lasso name by Linear-L1, and in backend LogisticRegression, l1 and lbfgs
I have the following error Error in train_quickmodel : Solver lbfgs supports only 'l2' or None penalties, got l1 penalty.; I set the solver to saga because it works; if you want something else, feel free to let me know. you know more than i do!
I leave this issue open, close it if you're happy with the solution
Indeed, saga seems to be the only viable option for multiclass with L1.
While we're at it, I see that sklearn.LogisticRegression also takes a "class_weight" argument to handle unbalanced classes, which is likely to be a frequent usecase. Could we add a checkbox "balance classes" in the hidden options, checked by default, that sets "class_weight='balanced'"?
Yea, can you open a new issue, I’ll take care of it on monday