SQL Snippets
DuckDB SQL Snippets for the Hugging Face SQL Console
Open a discussion to request a new snippet.
Alpaca to Conversation Format
Convert Alpaca format to conversational format
-- Convert Alpaca format to Conversation format WITH source_view AS ( SELECT * FROM train -- Change 'train' to your desired view name here ) SELECT [ struct_pack( "from" := 'user', "value" := CASE WHEN input IS NOT NULL AND input != '' THEN instruction || '\n\n' || input ELSE instruction END ), struct_pack( "from" := 'assistant', "value" := output ) ] AS conversation FROM source_view WHERE instruction IS NOT NULL AND output IS NOT NULL;
Check Duplicate Rows
Count the number of duplicate rows in a table using DuckDB.
-- Count duplicate rows in the 'train' table SELECT COUNT(*) - COUNT(DISTINCT columns(*)) FROM train;
Histogram
Create a histogram for a specific column to visualize the distribution of values.
from histogram( table_name, column_name, )
Leakage Detection
Detect data leakage between train and test datasets by calculating the overlap percentage.
WITH overlapping_rows AS ( SELECT COUNT(*) AS overlap_count FROM train INTERSECT SELECT COUNT(*) AS overlap_count FROM test ), total_unique_rows AS ( SELECT COUNT(*) AS total_count FROM ( SELECT * FROM train UNION SELECT * FROM test ) combined ) SELECT overlap_count, total_count, (overlap_count * 100.0 / total_count) AS overlap_percentage FROM overlapping_rows, total_unique_rows;
Show Tables
List all tables that are loaded using SHOW TABLES or PRAGMA show_tables.
-- List all tables using SQL command SHOW TABLES; -- List all tables using PRAGMA PRAGMA show_tables;
Summarize
Summarize a specific table or columns for a quick overview of the dataset's structure and statistics.
-- summarize a specific table SUMMARIZE my_table -- summarize a specific column SUMMARIZE my_table.my_column
Table Info
Get detailed information about a table's structure
-- Returns detailed information about a table's structure -- Use this to get the column names, types, and other details PRAGMA table_info('table_name');
Table Info
Get detailed information about a table's structure
with nested as ( select unnest(reasoning_chains) as reasoning_chains from train limit 100 ) select reasoning_chains.* from nested;