activetigger icon indicating copy to clipboard operation
activetigger copied to clipboard

Quick model: lasso and liblinear

Open jboelaert opened this issue 5 months ago • 13 comments

I see that the model for lasso is sklearn.LogisticRegression with l1 penalty and liblinear solver, and that for liblinear it is sklearn.LogisticRegression with l2 penalty and lbfgs solver.

What I suggest:

  • rename "Liblinear" to "Logistic", and give it the lbfgs solver (because of this: "liblinear can only handle binary classification by default" -> this shows in our current "Lasso", it never predicts the third class or above!)
  • change lasso function to sklearn.Lasso: it's the same objective as l1-penalty logistic, but a different solver. We can keep the C parameter for consistency with Logistic, and set alpha = 1 / (2C)

jboelaert avatar Nov 21 '25 11:11 jboelaert

on it

AXLMRIN avatar Nov 21 '25 11:11 AXLMRIN

Image

is that what you wanted?

AXLMRIN avatar Nov 21 '25 11:11 AXLMRIN

Exactly!

jboelaert avatar Nov 21 '25 11:11 jboelaert

right, using Lasso is just a pain cause it does not have the same attributes as the other models.

What did you want to change for the lasso function exactly?

AXLMRIN avatar Nov 21 '25 12:11 AXLMRIN

Image

in the meantime i've left the LogisticRegression with "saga" solver that works with "L1" and multiclass

AXLMRIN avatar Nov 21 '25 12:11 AXLMRIN

The Lasso algorithm is different, more efficient in high-dim cases. And if we advertise Lasso in the UI, it's better to call an actual lasso

jboelaert avatar Nov 21 '25 12:11 jboelaert

Ok, actually in sklearn Lasso is only for (numeric) regression, not classification...

So, new suggestion:

  • replace Lasso name by Linear-L1, and in backend LogisticRegression, l1 and lbfgs
  • replace Logistic name by Linear-L2
  • default to Linear-L1? (better for DFM, not sure for SBERT...)

jboelaert avatar Nov 21 '25 13:11 jboelaert

right, in that case, we will need to adapt the code to take this specificity into account.

AXLMRIN avatar Nov 21 '25 13:11 AXLMRIN

My two cents from a user perspective: if regularization is applied, explicitly mentioning it in the name (whether L1/L2 or Lasso/Ridge) does indeed seem necessary. Most sociologists/political scientists seeing “logistic” without any clarification will assume it’s the standard, unpenalized version. Source: I’m ashamed by how long it took me to understand why sklearn logistic regressions were giving me “weird” results…

leomignot avatar Nov 21 '25 17:11 leomignot

commit 94ba500 had nothing to do with this issue, I reverted it

AXLMRIN avatar Nov 28 '25 10:11 AXLMRIN

  • replace Lasso name by Linear-L1, and in backend LogisticRegression, l1 and lbfgs

I have the following error Error in train_quickmodel : Solver lbfgs supports only 'l2' or None penalties, got l1 penalty.; I set the solver to saga because it works; if you want something else, feel free to let me know. you know more than i do!

I leave this issue open, close it if you're happy with the solution

AXLMRIN avatar Nov 28 '25 10:11 AXLMRIN

Indeed, saga seems to be the only viable option for multiclass with L1.

While we're at it, I see that sklearn.LogisticRegression also takes a "class_weight" argument to handle unbalanced classes, which is likely to be a frequent usecase. Could we add a checkbox "balance classes" in the hidden options, checked by default, that sets "class_weight='balanced'"?

jboelaert avatar Nov 28 '25 14:11 jboelaert

Yea, can you open a new issue, I’ll take care of it on monday

AXLMRIN avatar Nov 28 '25 14:11 AXLMRIN