Logistic Regression is a statistical technique for predicting a binary outcome given a set of independent variables. It is a Generalized Linear Model that extends linear regression via a link function. In the case of logistic regression, the link function is the logit (sigmoid) function:
where
In a logistic regression, we’re predicting the probability, , of an outcome, so the equations end up being:
then
then
From this model, we can either generate probabilistic predictions, or we can assign predictions to yes/no
categories by comparing the predicting probabilities to some threshold. Usually, we’ll just use p = .5
as the threshold, where cases with a predicted probability of greater than or equal to .5 are assigned to yes
, and cases with a predicted probability of less than .5 are assigned to no
, but this threshold can be set at any value.
Model Fitting
The model coefficients for a logistic regression are typically estimated using maximum likelihood estimation.
Implementation in R
We can fit a logistic regression in R using the glm()
function, like so:
model <- glm(y ~ x1 + x2, data = my_data, family = binomial(link = "logit"))