INSANITY UNHINGED

Linear Regression Failure (and Remedy): an easy example

4/28/2020

 

Once upon a time there was a cute, little method called the Linear Regression. It had some interesting uses. It blew many people’s minds. It was loved and cherished by economists. But then some people loved it too much. They used it for everything. Everything. I’d like to give a little example (nonlinear relationship) and a possible fix for the problem.

I’m usually a Python user myself, but for quick and easy convenience, along with the fact that a lot more economists and social scientists use R, I’ve gone with the latter. Let’s imagine an independent varaible, xxx, and two dependent variables (yyy, zzz) entirely derived from xxx and noise:
y=x2+εyy = x^2 + \varepsilon_yy=x2+εy​
z=x2+εzz = x^2 + \varepsilon_zz=x2+εz​
ε=dN(0,18)\varepsilon =_d \mathcal{N} (0,\frac{1}{8})ε=d​N(0,81​)
created in the following R code:

x = runif(1000,-1,1)
y = x^2 + rnorm(1000,0,0.25)
z = x^2 + rnorm(1000,0,0.25)

You might see where this is heading. Let’s try running the simple linear regression
z=β0+βxx+βyy+εz = \beta_0 + \beta_x x + \beta_y y + \varepsilonz=β0​+βx​x+βy​y+ε

summary(lm(z~x+y))

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.123104   0.013118   9.385   <2e-16 ***
x           -0.003933   0.017132  -0.230    0.818    
y            0.601872   0.025118  23.961   <2e-16 ***

Residual standard error: 0.3168 on 997 degrees of freedom

Well here’s a problem!!! In case you didn’t notice, the variable from which zzz is derived (xxx) does not show as "significant" (there’s a discussion for another day) in the linear regression. Oh, and yyy, the variable which is only related to zzz via xxx, shows as unbelievably significant. The only thing even close to right here is the standard error. To be fair, who can blame the computer?

So here we are, using a technique that does not automatically pick up nonlinearities. In such a simple example with only two variables the problem could perhaps be rectified by eyeballing the plots and accordingly making something like x2 = x^2 and then lm(z~x2+y). In real life, however, we’re usually looking at at least a dozen variables, perhaps hundreds (or thousands). Eyeballing a relationship for each variable isn’t even remotely plausible. What is one to do in light of such misfortune and villainy?!
In walks nonparametric statistics, a wide-brim Stetson shading its eyes. A Native American flute plays a foreboding tune.
Let’s try something else. An extraordinary tool that should have been implemented in everyday econometrics a decade ago. The GAM (generalized additive model). The simplest way to describe a GAM is an additive penalized spline. For example, instead of y=β0+β1x1+β2x2y = \beta_0 + \beta_1 x_1 + \beta_2 x_2y=β0​+β1​x1​+β2​x2​, one can think of a GAM estimating a smoothed curve y=f(x1)+f(x2)y = f(x_1) + f(x_2)y=f(x1​)+f(x2​). The curviness is optimized (rather than being forced, like our prior x2x^2x2 example) so it gives you a line if the relationship is linear, and a curve if appropriate. And yes, you can even get p-values. ??? The most commonly used R package is mgcv with the gam() function (a difficult to remember function name…). Trying out gam():

summary(gam(z~s(x)+s(y)))

Approximate significance of smooth terms:
       edf Ref.df      F p-value    
s(x) 5.403  6.541 84.176  <2e-16 ***
s(y) 1.633  2.055  0.654   0.568

Holy crap! yyy drops out like the poser it really is, while xxx makes a move to become zzz's new significant other (pun totally intended). And thus they lived happily ever after. And look at that smile!

Ok. I’ll concede, a GAM isn’t a cure-all elixir. But it solves problems of indescribable magnitude in empirical research. And I’ll give my opinion without shame: A GAM should always be compared when running econometrics on a linear model. If I can dare to be even more controversial, I’d say we should just stop using linear regressions altogether in favor of GAMs.

I’ve considered working on a paper to similar effect as this blog post, obviously more involved, using previous studies, technical technicalities, and cool applications. Suggestions are appreciated!

0 Comments

    Author

    I'm the cofounder of a fintech startup

    View my profile on LinkedIn

    Archives

    July 2022
    April 2020
    March 2020
    December 2018
    March 2018
    October 2016
    February 2016

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Insanity Unhinged home
  • Blog
  • About
  • Recent projects
  • Insanity Unhinged home
  • Blog
  • About
  • Recent projects