ros icon indicating copy to clipboard operation
ros copied to clipboard

Regression and Other Stories - Tidyverse Examples

Regression and Other Stories - Tidyverse Examples

  • Examples by chapters
    • 1 Introduction
    • 2 Data and measurement
    • 3 Some basic methods in mathematics and probability
    • 4 Generative models and statistical inference
    • 5 Simulation
    • 6 Background on regression modeling
    • 7 Linear regression with a single predictor
    • 8 Fitting regression models
    • 9 Prediction and Bayesian inference
    • 10 Linear regression with multiple predictors
    • 11 Assumptions, diagnostics, and model evaluation
    • 12 Transformations
    • 13 Logistic regression
    • 14 Working with logistic regression
    • 15 Other generalized linear models
    • 16 Design and sample size decisions
    • 17 Poststratification and missing-data imputation
    • 18 Causal inference basics and randomized experiments
    • 19 Causal inference using regression on the treatment variable
    • 20 Observational studies with all confounders assumed to be measured
    • 21 More advanced topics in causal inference
    • 22 Advanced regression and multilevel models
    • A Computing in R
  • Examples alphabetically

This repository contains Tidyverse implementations of examples from Regression and Other Stories by Andrew Gelman, Jennifer Hill, and Aki Vehtari (2020).

Tidyverse version by Bill Behrman.


Examples by chapters

1 Introduction

  • ElectionsEconomy/
    • hibbs_tv.md - Predicting presidential vote share from the economy
  • ElectricCompany/
    • electric_tv.md - Analysis of “Electric Company” data
  • Peacekeeping/
    • peace_tv.md - Outcomes after civil war in countries with and without United Nations peacekeeping
  • SimpleCausal/
    • causal_tv.md - Simple graphs illustrating regression for causal inference
  • Helicopters/
    • helicopters_tv.md - Example data file for helicopter flying time exercise

2 Data and measurement

  • HDI/
    • hdi_tv.md - Human Development Index - Looking at data in different ways
  • Pew/
    • pew_tv.md - Miscellaneous analyses using raw Pew data
  • HealthExpenditure/
    • healthexpenditure_tv.md - Discovery through graphs of data and models
  • Names/
    • lastletters_tv.md - Last letters - Distributions of last letters of names of American babies
  • Congress/
    • congress_plots_tv.md - Predictive uncertainty for congressional elections
  • AgePeriodCohort/
    • births_tv.md - Age adjustment

3 Some basic methods in mathematics and probability

  • Mile/
    • mile_tv.md - Trend of record times in the mile run
  • Metabolic/
    • metabolic_tv.md - How to interpret a power law or log-log regression
  • CentralLimitTheorem/
    • heightweight_tv.md - Illustrate central limit theorem and normal distribution
  • Stents/
    • stents_tv.md - Stents - comparing distributions

4 Generative models and statistical inference

  • Coverage/
    • coverage_tv.md - Example of coverage
  • Death/
    • polls_tv.md - Proportion of American adults supporting the death penalty
  • Coop/
    • riverbay_tv.md - Example of hypothesis testing
  • Girls/

5 Simulation

  • ProbabilitySimulation/
    • probsim_tv.md - Simulation of probability models
  • Earnings/
    • earnings_bootstrap_tv.md - Bootstrapping to simulate the sampling distribution

6 Background on regression modeling

  • Simplest/
    • simplest_tv.md - Linear regression with a single predictor
  • Earnings/
    • earnings_regression_tv.md - Predict respondents’ yearly earnings using survey data from 1990
  • PearsonLee/
    • heights_tv.md - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
  • FakeMidtermFinal/
    • simulation_tv.md - Fake dataset of 1,000 students’ scores on a midterm and final exam

7 Linear regression with a single predictor

  • ElectionsEconomy/
    • hibbs_tv.md - Predicting presidential vote share from the economy
    • hibbs_coverage_tv.md - Checking the coverage of intervals
  • Simplest/
    • simplest_tv.md - Linear regression with a single predictor

8 Fitting regression models

  • ElectionsEconomy/
    • hills_tv.md - Present uncertainty in parameter estimates
    • hibbs_tv.md - Predicting presidential vote share from the economy
  • Influence/
    • influence_tv.md - Influence of individual points in a fitted regression

9 Prediction and Bayesian inference

  • ElectionsEconomy/
    • hibbs_tv.md - Predicting presidential vote share from the economy
    • bayes_tv.md - Demonstration of Bayesian information aggregation
  • Earnings/
    • height_and_weight_tv.md - Predict weight
  • SexRatio/
    • sexratio_tv.md - Example where an informative prior makes a difference

10 Linear regression with multiple predictors

  • KidIQ/
    • kidiq_tv.md - Linear regression with multiple predictors
  • Earnings/
    • height_and_weight_tv.md - Predict weight
  • Congress/
    • congress_tv.md - Predictive uncertainty for congressional elections
  • NES/
    • nes_linear_tv.md - Fitting the same regression to many datasets
  • Beauty/
    • beauty_tv.md - Student evaluations of instructors’ beauty and teaching quality

11 Assumptions, diagnostics, and model evaluation

  • KidIQ/
    • kidiq_tv.md - Linear regression with multiple predictors
  • Residuals/
    • residuals_tv.md - Plotting the data and fitted model
  • Introclass/
    • residual_plots_tv.md - Plot residuals vs. predicted values, or residuals vs. observed values?
  • Newcomb/
    • newcomb_tv.md - Posterior predictive checking of Normal model for Newcomb’s speed of light data
  • Unemployment/
    • unemployment_tv.md - Time series fit and posterior predictive model checking for unemployment series
  • Rsquared/
    • rsquared_tv.md - Bayesian R^2
  • CrossValidation/
    • crossvalidation_tv.md - Demonstration of cross validation
  • FakeKCV/
    • fake_kcv_tv.md - Demonstration of K-fold cross-validation using simulated data
  • Pyth/

12 Transformations

  • KidIQ/
    • kidiq_tv.md - Linear regression with multiple predictors
  • Earnings/
    • earnings_regression_tv.md - Predict respondents’ yearly earnings using survey data from 1990
  • Gay/
    • gay_simple_tv.md - Simple models (linear and discretized age) and political attitudes as a function of age
  • Mesquite/
    • mesquite_tv.md - Predicting the yields of mesquite bushes
  • Student/
    • student_tv.md - Models for regression coefficients
  • Pollution/
    • pollution_tv.md - Pollution data

13 Logistic regression

  • NES/
    • nes_logistic_tv.md - Logistic regression, identifiability, and separation
  • LogisticPriors/
    • logistic_priors_tv.md - Effect of priors in logistic regression
  • Arsenic/
    • arsenic_logistic_building_tv.md - Building a logistic regression model: wells in Bangladesh
  • Rodents/

14 Working with logistic regression

  • LogitGraphs/
    • logitgraphs_tv.md - Different ways of displaying logistic regression
  • Arsenic/
    • arsenic_logistic_building_tv.md - Building a logistic regression model: wells in Bangladesh
    • arsenic_logistic_apc_tv.md - Average predictive comparisons for a logistic regression model: wells in Bangladesh
    • arsenic_logistic_residuals_tv.md - Residual plots for a logistic regression model: wells in Bangladesh
  • NES/
    • nes_logistic_tv.md - Logistic regression, identifiability, and separation

15 Other generalized linear models

  • PoissonExample/
    • poisson_regression_tv.md - Demonstrate Poisson regression with simulated data
  • Roaches/
    • roaches_tv.md - Analyze the effect of integrated pest management on reducing cockroach levels in urban apartments
  • Storable/
    • storable_tv.md - Ordered categorical data analysis with a study from experimental economics, on the topic of “storable votes”
  • Robit/
    • robit_tv.md - Comparison of robit and logit models for binary data
  • Earnings/
    • earnings_compound_tv.md - Compound discrete-continuous model
  • RiskyBehavior/
    • risky_tv.md Risky behavior data
  • NES/
  • Lalonde/
  • Congress/
  • AcademyAwards/

16 Design and sample size decisions

  • SampleSize/
    • simulation_tv.md - Sample size simulation
  • FakeMidtermFinal/
    • simulation_based_design_tv.md - Fake dataset of a randomized experiment on student grades
  • ElectricCompany/
    • electric_tv.md - Analysis of “Electric Company” data

17 Poststratification and missing-data imputation

  • Poststrat/
    • poststrat_tv.md - Poststratification after estimation
    • poststrat2_tv.md - Poststratification after estimation
  • Imputation/
    • imputation_tv.md - Regression-based imputation for the Social Indicators Survey

18 Causal inference basics and randomized experiments

  • Sesame/
    • sesame_tv.md - Causal analysis of Sesame Street experiment

19 Causal inference using regression on the treatment variable

  • ElectricCompany/
    • electric_tv.md - Analysis of “Electric Company” data
  • Incentives/
    • incentives_tv.md - Simple analysis of incentives data
  • Cows/

20 Observational studies with all confounders assumed to be measured

  • ElectricCompany/
    • electric_tv.md - Analysis of “Electric Company” data
  • Childcare/
    • childcare_tv.md - Infant Health and Development Program (IHDP) example
  • Lalonde/

21 More advanced topics in causal inference

  • Sesame/
    • sesame_tv.md - Causal analysis of Sesame Street experiment
  • ChileSchools/
    • chile_schools_tv.md - ChileSchools example.
  • Bypass/

22 Advanced regression and multilevel models

  • Golf/
    • golf_tv.md - Gold putting accuracy: Fitting a nonlinear model using Stan
  • Gay/
    • gay_tv.md - Nonlinear models (LOESS and spline) and political attitudes as a function of age
  • ElectionsEconomy/
    • hibbs_tv.md - Predicting presidential vote share from the economy
  • Scalability/
    • scalability_tv.md - Demonstrate computation speed with 100,000 observations

A Computing in R

  • Coins/
  • Mile/
    • mile_tv.md - Trend of record times in the mile run
  • Earnings/
    • earnings_data_tv.md - Read in and prepare earnings data
  • Parabola/
    • parabola_tv.md - Demonstration of using Stan for optimization
  • Restaurant/
    • restaurant_tv.md - Demonstration of using Stan for optimization
  • DifferentSoftware/
    • linear_tv.md - Linear regression using different software options

Examples alphabetically

  • AcademyAwards/
  • AgePeriodCohort/
    • births_tv.md - Age adjustment
  • Arsenic/
    • arsenic_logistic_building_tv.md - Building a logistic regression model: wells in Bangladesh
    • arsenic_logistic_apc_tv.md - Average predictive comparisons for a logistic regression model: wells in Bangladesh
    • arsenic_logistic_residuals_tv.md - Residual plots for a logistic regression model: wells in Bangladesh
  • Beauty/
    • beauty_tv.md - Student evaluations of instructors’ beauty and teaching quality
  • Bypass/
  • CentralLimitTheorem/
    • heightweight_tv.md - Illustrate central limit theorem and normal distribution
  • Childcare/
    • childcare_tv.md - Infant Health and Development Program (IHDP) example
  • ChileSchools/
    • chile_schools_tv.md - ChileSchools example.
  • Coins/
  • Congress/
    • congress_tv.md - Predictive uncertainty for congressional elections
    • congress_plots_tv.md - Predictive uncertainty for congressional elections
  • Coop/
    • riverbay_tv.md - Example of hypothesis testing
  • Coverage/
    • coverage_tv.md - Example of coverage
  • Cows/
  • CrossValidation/
    • crossvalidation_tv.md - Demonstration of cross validation
  • Death/
    • polls_tv.md - Proportion of American adults supporting the death penalty
  • DifferentSoftware/
    • linear_tv.md - Linear regression using different software options
  • Earnings/
    • earnings_bootstrap_tv.md - Bootstrapping to simulate the sampling distribution
    • earnings_compound_tv.md - Compound discrete-continuous model
    • earnings_regression_tv.md - Predict respondents’ yearly earnings using survey data from 1990
    • height_and_weight_tv.md - Predict weight
    • earnings_data_tv.md - Read in and prepare earnings data
  • ElectionsEconomy/
    • bayes_tv.md - Demonstration of Bayesian information aggregation
    • hibbs_coverage_tv.md - Checking the model-fitting procedure using fake-data simulation.
    • hibbs_tv.md - Predicting presidential vote share from the economy
    • hills_tv.md - Present uncertainty in parameter estimates
  • ElectricCompany/
    • electric_tv.md - Analysis of “Electric Company” data
  • FakeKCV/
    • fake_kcv_tv.md - Demonstration of K-fold cross-validation using simulated data
  • FakeMidtermFinal/
    • simulation_tv.md - Fake dataset of 1,000 students’ scores on a midterm and final exam
    • simulation_based_design_tv.md - Fake dataset of a randomized experiment on student grades
  • Gay/
    • gay_simple_tv.md - Simple models (linear and discretized age) and political attitudes as a function of age
    • gay_tv.md - Nonlinear models (LOESS and spline) and political attitudes as a function of age
  • Girls/
  • Golf/
    • golf_tv.md - Gold putting accuracy: Fitting a nonlinear model using Stan
  • HDI/
    • hdi_tv.md - Human Development Index - Looking at data in different ways
  • HealthExpenditure/
    • healthexpenditure_tv.md - Discovery through graphs of data and models
  • Helicopters/
    • helicopters_tv.md - Example data file for helicopter flying time exercise
  • Imputation/
    • imputation_tv.md - Regression-based imputation for the Social Indicators Survey
  • Incentives/
    • incentives_tv.md - Simple analysis of incentives data
  • Influence/
    • influence_tv.md - Influence of individual points in a fitted regression
  • Introclass/
    • residual_plots_tv.md - Plot residuals vs. predicted values, or residuals vs. observed values?
  • KidIQ/
    • kidiq_tv.md - Linear regression with multiple predictors
  • Lalonde/
  • LogisticPriors/
    • logistic_priors_tv.md - Effect of priors in logistic regression
  • LogitGraphs/
    • logitgraphs_tv.md - Different ways of displaying logistic regression
  • Mesquite/
    • mesquite_tv.md - Predicting the yields of mesquite bushes
  • Metabolic/
    • metabolic_tv.md - How to interpret a power law or log-log regression
  • Mile/
    • mile_tv.md - Trend of record times in the mile run
  • Names/
    • lastletters_tv.md - Last letters - Distributions of last letters of names of American babies
  • NES/
    • nes_linear_tv.md - Fitting the same regression to many datasets
    • nes_logistic_tv.md - Logistic regression, identifiability, and separation
  • Newcomb/
    • newcomb_tv.md - Posterior predictive checking of Normal model for Newcomb’s speed of light data
  • Parabola/
    • parabola_tv.md - Demonstration of using Stan for optimization
  • Peacekeeping/
    • peace_tv.md - Outcomes after civil war in countries with and without United Nations peacekeeping
  • PearsonLee/
    • heights_tv.md - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
  • Pew/
    • pew_tv.md - Miscellaneous analyses using raw Pew data
  • PoissonExample/
    • poisson_regression_tv.md - Demonstrate Poisson regression with simulated data
  • Pollution/
    • pollution_tv.md - Pollution data
  • Poststrat/
    • poststrat_tv.md - Poststratification after estimation
    • poststrat2_tv.md - Poststratification after estimation
  • ProbabilitySimulation/
    • probsim_tv.md - Simulation of probability models
  • Pyth/
  • Residuals/
    • residuals_tv.md - Plotting the data and fitted model
  • Restaurant/
    • restaurant_tv.md - Demonstration of using Stan for optimization
  • RiskyBehavior/
    • risky_tv.md Risky behavior data
  • Roaches/
    • roaches_tv.md - Analyze the effect of integrated pest management on reducing cockroach levels in urban apartments
  • Robit/
    • robit_tv.md - Comparison of robit and logit models for binary data
  • Rodents/
  • Rsquared/
    • rsquared_tv.md - Bayesian R^2
  • SampleSize/
    • simulation_tv.md - Sample size simulation
  • Scalability/
    • scalability_tv.md - Demonstrate computation speed with 100,000 observations
  • Sesame/
    • sesame_tv.md - Causal analysis of Sesame Street experiment
  • SexRatio/
    • sexratio_tv.md - Example where an informative prior makes a difference
  • SimpleCausal/
    • causal_tv.md - Simple graphs illustrating regression for causal inference
  • Simplest/
    • simplest_tv.md - Linear regression with a single predictor
  • Stents/
    • stents_tv.md - Stents - comparing distributions
  • Storable/
    • storable_tv.md - Ordered categorical data analysis with a study from experimental economics, on the topic of “storable votes”
  • Student/
    • student_tv.md - Models for regression coefficients
  • Unemployment/
    • unemployment_tv.md - Time series fit and posterior predictive model checking for unemployment series