Regression splines are a flexible class of basis functions that extend polynomial regression and piecewise constant regression.

Piecewise polynomials allow the functions to be discontinuous at the knots, but we can constrain the functions to be continuous. We can also constrain the first and second derivatives to be continuous, which makes the function very smooth. These constraints are also beneficial because they free up degrees of freedom (by reducing the allowable complexity).

Cubic Spline

A cubic spline is an example of a piecewise polynomial with continuity in the function as well as continuity in the first and second derivatives (so a total of 3 constraints). A cubic spline with knots uses a total of degrees of freedom.

In general, we can represent splines as basis functions, where a cubic spline with knots could be modeled as

The most direct way to represent a cubic spline this way is to start off with a basis for a cubic polynomial (i.e. ), and then add one truncated power function per knot. A truncated power basis function can be defined as:

Essentially, to fit a cubic spline with knots, we perform least squares regression with an intercept and predictors — — where the terms are the knots.

Natural Cubic Spline

One issue with splines is that they have high variance at the outer ranges of . A natural spline addresses this by forcing the function to be linear at the boundaries (smaller than the smallest knot, larger than the largest knot).

We tend to choose the number of knots in the model via cross-validation. And we often just let whatever statistical software we’re using choose the location of the knots, which is typically done uniformly — i.e. if we have 3 knots, we’d place them at the 25th, 50th, and 75th percentiles of .

Smoothing Splines

A smoothing spline takes a slightly different approach to fitting a smooth curve to a set of data.

The goal is to have little error, i.e.

should be small.

The approach that smoothing splines use for estimating is basically the same approach that regularized regression uses in that it calculates the RSS and then adds a penalty term. So the way we estimate is by minimizing:

where is a non-negative tuning parameter. The function that minimizes this is a smoothing spline.

The first part of this equation is just the residual sum of squares (i.e. the loss). The second part is the penalty term. By penalizing the integral of the second derivative (i.e. the sum of the rate of change of the rate of change over the range of ), we encourage the function to be less wiggly, with higher values producing curves that are closer to linear. If we set , then we just get a linear regression. If , on the other hand, we get a curve that perfectly interpolates all of the points in .

In other words, controls the bias-variance tradeoff (as is the case in regularized regression). And as is the case in regularized regression, we can choose by using some sort of cross-validation strategy.

The function that ends up minimizing the loss function above is actually a natural cubic spline with knots at .

While it might seem like this will have far too many degrees of freedom (estimating a knot at each ), we end up with far fewer effective degrees of freedom because many of the parameters are heavily constrained/shrunken.