The K Nearest Neighbors algorithm is a non-parametric algorithm used in regression and classification models.
The idea is that, given a positive integer , we can predict the outcome (class, numeric value, whatever) of an observation, , by using the data points nearest to those of . So, for example, if we want to estimate whether a person will default on a loan, and we can do so by looking at the historical records of the people who are most similar to them and seeing if those people defaulted on a loan.
In a classification setting, the mathematical notation for the KNN model is:
where are the closest data points to .
“Nearest” in the algorithm requires some way to operationalize distance between data points. There are different approaches, but a straightforward one for continuous metrics is Euclidean distance.
Another point is that the value of in these models is a hyperparameter that requires tuning. In general, smaller values will increase model variance (and probably lead to overfitting), whereas larger values will lead to underfitting.