Machine-Learning-with-Python icon indicating copy to clipboard operation
Machine-Learning-with-Python copied to clipboard

Statistically significant function in regression model

Open MatthiVH opened this issue 5 years ago • 1 comments

Hi,

I'm wondering what the yesno-fuction does in the following notebook: https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Regression/Regression_Diagnostics.ipynb

def yes_no(b): if b: return 'Yes' else: return 'No'

It should decide whether a parameter is significantly important or not for the model? Where does the b refer to and what's the threshold for it to decide it's not statistically significant?

I usually look at the p-values in the statsmodels-ols table and when they fall below 0.05, they are significant, but in this notebook something else seems to be happening and I'm wondering if you could elaborate a bit on it (What is b?, how is it calculated?, what's the b's threshold? How to change the threshold from 0.01 to 0.05?) When the p-value in the ols-table is above 0.05, but the yes_no-function decides it's significant, what should I do (leave the parameter out or not)?

Kind regards, Matthias

MatthiVH avatar Nov 13 '20 16:11 MatthiVH

Strangely, I don't remember its context now but I agree with your simple threshold-based approach. Would you like to create a PR by updating the function by rewriting it with the threshold parameter?

tirthajyoti avatar Dec 08 '20 07:12 tirthajyoti