Local regression involves fitting a flexible, non-linear function by computing the fit at a target point, , using only the nearby training observations. We assign weights to the nearby observations for each value of .

When we fit a local regression, we can make a few choices about the weighting function or whether to fit a linear model, a quadratic model, etc at each step. The most important choice we make is the span, . The span defines the proportion of points used to compute the local regression at . Small leads to more wiggly models, whereas larger invokes a more global fit.

Span is a hyperparameter, kinda like in regularized regressions, and we can tune it via cross-validation to help choose the best value.

Local Regression Algorithm

The algorithm for fitting a local regression with 1 predictor is:

  1. Limit the data to the fraction of points that are closes to .
  2. Assign weights to each point in the neighborhood. The weights of the points farthest from should be the smallest, and the weights of the points closest to should be the largest. Any points outside of the neighborhood will have a weight of 0.
  3. Fit a weighted least squares regression of on using these weights. This regression should minimize the function: $$ \sum_{i=1}^n = K_{i_{0}}(y_{i}-\beta_{0}-\beta_{1}x_{i})
4. The fitted value at $x_0$ is given by $\hat{f}(x_{0}) = \hat{\beta_{0}} + \hat{\beta}_{1}x_{0}$ ## Generalizations Local regression can be generalized in different ways. One way involves fitting a model that is global in some variables but local in others. We can also fit a local regression with multiple local predictors, but if $p$ is much larger than 3, it can be hard to find observations suitably close to $x_0$ in this high-dimensional space. [[K Nearest Neighbors|KNN]] has a similar problem. ## R Local regressions can be fit in [[R]] using the `loess()` function: ```r fit <- loess(wage ~ age, data = Wage) ``` The model is specified via a formula, as is common in most of R's statistical modeling functions.