Converting Alpaca to ChatML Conversation Format
-- Convert Alpaca format to Conversation format
WITH
source_view AS (
SELECT * FROM train -- Change 'train' to your desired view name here
)
SELECT
[
struct_pack(
"from" := 'user',
"value" := CASE
WHEN input IS NOT NULL AND input != ''
THEN instruction || '\n\n' || input
ELSE instruction
END
),
struct_pack(
"from" := 'assistant',
"value" := output
)
] AS conversation
FROM source_view
WHERE instruction IS NOT NULL
AND output IS NOT NULL;
Why?
Differences between Alpaca and ChatML Conversation Format:
Alpaca Format:
- The Alpaca format usually has three columns:
instruction
,input
, andoutput
.
- The Alpaca format usually has three columns:
ChatML Conversation Format:
- The ChatML Conversation format is a JSON format that contains a list of messages.
- Each message has a
from
field, which can be eithersystem
,user
, orassistant
. - The
value
field contains the message content.
Example
yahma/alpaca-cleaned
You can run this query through via the sql_console
in the Hugging Face Hub here.