DuckDB Summarize Query

This snippet demonstrates how to use the SUMMARIZE function in DuckDB to calculate aggregate statistics for a dataset.

-- summarize a specific table
SUMMARIZE my_table

-- summarize a specific column
SUMMARIZE my_table.my_column

The SUMMARIZE command in DuckDB provides a comprehensive overview of your data by computing various aggregates for each column:

  • min and max: The minimum and maximum values in the column.
  • approx_unique: An approximation of the number of unique values.
  • avg: The average value for numeric columns.
  • std: The standard deviation for numeric columns.
  • q25, q50, q75: The 25th, 50th (median), and 75th percentiles.
  • count: The total number of rows.
  • null_percentage: The percentage of NULL values in the column.

This command is particularly useful for quick data exploration and understanding the distribution of values across your dataset.

You can read more about the SUMMARIZE command in the DuckDB documentation here.