TuringGLM.jl icon indicating copy to clipboard operation
TuringGLM.jl copied to clipboard

Clarification on `Logistic` (and possibly rename?)

Open ParadaCarleton opened this issue 4 years ago • 16 comments

For likelihoods, TuringGLM.jl supports:

Gaussian() (the default if not specified): linear regression
Student(): robust linear regression
**Logistic(): logistic regression**
Pois(): Poisson count data regression
NegBin(): negative binomial robust count data regression

I'm marking this because it seems to be implying two different things, and I'm not sure which it's referring to. "Logistic regression" almost always means regression with a Binomial likelihood using the logit-link (logistic inverse link). However, the Logistic distribution also exists, and can be used to perform robust linear regression. (It has slightly thicker tails than a normal distribution, but unlike the T distribution's they scale off exponentially, making it a good efficiency/robustness compromise). If these names are supposed to refer to likelihoods, Logistic would then be inappropriate and this name could result in misunderstandings.

Perhaps we should make a clear distinction between a likelihood and the link function associated with it? There's no reason you can't use a logistic link with a normal likelihood, for example.

ParadaCarleton avatar Feb 14 '22 16:02 ParadaCarleton

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

If you could find how to circumvent this I would love to know.

storopoli avatar Feb 14 '22 16:02 storopoli

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

If you could find how to circumvent this I would love to know.

Could give them all names like BernoulliLike, TDistLike, etc. to have the same names without clashing with Distributions. Although I would like a way to be able to use an arbitrary likelihood.

Perhaps we could use the type itself, e.g. turing_model(@formula(y~x), Bernoulli)?

ParadaCarleton avatar Feb 14 '22 18:02 ParadaCarleton

The *Like is a viable way. I tried using the type itself and had a nasty bug

storopoli avatar Feb 14 '22 20:02 storopoli

The *Like is a viable way. I tried using the type itself and had a nasty bug

Hmm, what happened? Maybe @devmotion will have some idea (since he's more familiar with Dists.jl)?

ParadaCarleton avatar Feb 14 '22 20:02 ParadaCarleton

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

Which name clashes exactly? These packages should play together nicely since they are designed to do so. Reexporting should also not introduce any name clashes.

devmotion avatar Feb 14 '22 21:02 devmotion

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

Which name clashes exactly? These packages should play together nicely since they are designed to do so. Reexporting should also not introduce any name clashes.

I assume he just means we can't write, say: turing_model(@formula(y~x), Bernoulli()) With a new struct Bernoulli, since that name's already taken by Distributions.jl.

ParadaCarleton avatar Feb 14 '22 21:02 ParadaCarleton

Is a new struct needed? Could you just use Distributions.Bernoulli? Or the link functions in GLM?

devmotion avatar Feb 14 '22 21:02 devmotion

Is a new struct needed? Could you just use Distributions.Bernoulli? Or the link functions in GLM?

I would expect you could use Distributions.Bernoulli by passing the type itself, but @storopoli said that caused bugs?

And using Bernoulli(p) directly doesn’t work, since the parameter is exactly what we want to estimate.

ParadaCarleton avatar Feb 14 '22 22:02 ParadaCarleton

You might want to take a look at the fit function in Dists.jl: https://github.com/JuliaStats/Distributions.jl/blob/71f1b1e39ad2b66b4865b5e1fd537315c8a53ae8/src/genericfit.jl#L8-L15

Which works with distribution types directly.

ParadaCarleton avatar Feb 15 '22 03:02 ParadaCarleton

With a new struct Bernoulli, since that name's already taken by Distributions.jl.

Yes that was the issue.

storopoli avatar Feb 15 '22 09:02 storopoli

Yeah but why can't you use the type Distributions.Bernoulli instead of an instance of it? That's more natural as @ParadaCarleton also said above.

devmotion avatar Feb 15 '22 09:02 devmotion

It should also be more generalizable — it would be super useful if you could pass an arbitrary likelihood from Dists.jl.

ParadaCarleton avatar Feb 15 '22 16:02 ParadaCarleton

@storopoli does TuringGLM currently work by writing out the most common GLMs one at a time? In theory, you should be able to work with any likelihood, including ones specified by the user, by converting anything of the form y ~ x (with x = [x_1, x_2,...] a vector of features) into:

β ~ Prior()
y .~ Likelihood(InvLink(β ⋅ x))

Showing off how a general approach can work with an unusual likelihood, e.g. a Gumbel for predicting extreme values, would be very cool!

ParadaCarleton avatar Feb 15 '22 19:02 ParadaCarleton

The likelihood API would need a rewrite. I haven't touched anything InvLink related.

storopoli avatar Feb 16 '22 15:02 storopoli

The likelihood API would need a rewrite. I haven't touched anything InvLink related.

Sorry, can you clarify what you mean by this?

ParadaCarleton avatar Feb 28 '22 21:02 ParadaCarleton

Check https://github.com/TuringLang/TuringGLM.jl/blob/main/src/model.jl. There is no multiple dispatch on any Distributions.jl type like Bernoulli, InvLink.

So the likelihood API would need a rewrite. I am focusing now on some tutorials marked with the tag tutorials. So any PR would be most welcome.

storopoli avatar Feb 28 '22 21:02 storopoli

now we use model=Bernoulli.

storopoli avatar Sep 05 '22 21:09 storopoli