From the Postgres introduction to window functions:
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
In other words, we can perform some aggregations while still retaining the number of rows in our original data. It also means that we don’t need to include all of the columns in our table in a GROUP BY
clause.
For instance, imagine we have a table with student testing information:
If we wanted to just get the average scores for each test, we could do a normal GROUP BY
query:
And this would work just fine. But imagine we also want to see each student’s test score, but we also want a column in our results that contains the average test score (so we can compare each student to the average score, for instance). We could join our average score table back to our original table:
Or we could use a a window function with a PARTITION BY
clause, which is more succinct:
Window functions always contain an OVER
clause directly following the function’s name and arguments. This defines how the rows are subset when being processed.
This is similar to a grouped mutate in R and dplyr:
Window Functions with ORDER BY
This can be especially helpful if we need to generate in-group row numbers or if we need to order stuff within a group. Imagine if we wanted to order our test scores by the date a person took the test, so we can label their first attempt, second attempt, etc.