SQL Snippets

DuckDB SQL Snippets for the Hugging Face SQL Console

Open a discussion to request a new snippet.

  • Alpaca to Conversation Format

    Convert Alpaca format to conversational format

    -- Convert Alpaca format to Conversation format
    WITH 
    source_view AS (
      SELECT * FROM train  -- Change 'train' to your desired view name here
    )
    SELECT 
      [
        struct_pack(
          "from" := 'user',
          "value" := CASE 
                      WHEN input IS NOT NULL AND input != '' 
                      THEN instruction || '\n\n' || input
                      ELSE instruction
                    END
        ),
        struct_pack(
          "from" := 'assistant',
          "value" := output
        )
      ] AS conversation
    FROM source_view
    WHERE instruction IS NOT NULL 
      AND output IS NOT NULL;
    
  • Check Duplicate Rows

    Count the number of duplicate rows in a table using DuckDB.

    -- Count duplicate rows in the 'train' table
    SELECT COUNT(*) - COUNT(DISTINCT columns(*))
    FROM train;
    
  • Histogram

    Create a histogram for a specific column to visualize the distribution of values.

    from histogram(
        table_name,
        column_name,
    )
    
  • Leakage Detection

    Detect data leakage between train and test datasets by calculating the overlap percentage.

    WITH
        overlapping_rows AS (
            SELECT COUNT(*) AS overlap_count
            FROM train
            INTERSECT
            SELECT COUNT(*) AS overlap_count
            FROM test
        ),
        total_unique_rows AS (
            SELECT COUNT(*) AS total_count
            FROM (
                SELECT * FROM train
                UNION
                SELECT * FROM test
            ) combined
        )
    SELECT
        overlap_count,
        total_count,
        (overlap_count * 100.0 / total_count) AS overlap_percentage
    FROM overlapping_rows, total_unique_rows;
    
  • Show Tables

    List all tables that are loaded using SHOW TABLES or PRAGMA show_tables.

    -- List all tables using SQL command
    SHOW TABLES;
    
    -- List all tables using PRAGMA
    PRAGMA show_tables;
    
  • Summarize

    Summarize a specific table or columns for a quick overview of the dataset's structure and statistics.

    -- summarize a specific table
    SUMMARIZE my_table
    
    -- summarize a specific column
    SUMMARIZE my_table.my_column
    
  • Table Info

    Get detailed information about a table's structure

    -- Returns detailed information about a table's structure
    -- Use this to get the column names, types, and other details
    PRAGMA table_info('table_name');
    
  • Table Info

    Get detailed information about a table's structure

    with nested as 
    (
    select unnest(reasoning_chains) as reasoning_chains
    from train limit 100
    )
    
    select reasoning_chains.* from nested;