Calculating medians and grouping them by date with Redshift

I’m a big fan of calculating medians. They’re a bit tricky to calculate with normal Postgres, but Redshift makes it really easy with the MEDIAN window function.

Here’s a quick example to calculate medians of a column and group them by month. It’s also easy to swap out the timestamp column with another data type to group with.

-- Select our date range
-- Since the median function isn't an aggregate function, use distinct
-- If we don't use distinct, it will return every row in the table
select distinct date_trunc('month', timestamp_column) as month, 

-- Use the median function on the column to calculate median

-- Since median isn't an aggregate, we use a window function 
-- to partition data by categories and type
over (partition by month)

-- Identify which table we're grabbing from
from table

-- Group partitions
group by timestamp_column, column_to_calculate_medians

-- Order by most recent month first
order by month desc