by: Walid DjebaliPosted on: 27/08/2025

dbt: Hybrid Slowly Changing Dimension (SCD) Type1 & Type 2

🧠 What is a Hybrid SCD?

A Hybrid SCD combines:

Type 1 columns → overwrite values in place (no history)
Type 2 columns → track history by versioning rows with effective dates and surrogate keys

✅ Why use Hybrid SCD?

Feature	Benefit
Type 1 → Overwrite	For non-critical attributes like phone number, email, etc.
Type 2 → Versioning	For business-critical changes like region, segment, or status
Hybrid → Combined model	Avoids splitting tables or duplicating unnecessary history

📦 Example Use Case: Customer Dimension

We have a dim_customers table with these columns:

Attribute	SCD Type	Notes
`customer_id`	Natural Key
`customer_sk`	Surrogate Key (Primary Key)
`name`	Type 1	Typo corrections are fine
`email`	Type 1	Can be updated directly
`region`	Type 2	We want to track changes over time
`is_active`	Type 2	Used in reporting/audits
`valid_from`	–	Start timestamp for this version
`valid_to`	–	End timestamp (null = current)
`is_current`	–	Boolean flag for current row

🛠️ dbt Implementation: Hybrid SCD with Selective Versioning

✅ Step 1: Define your model

📁 models/dim/dim_customers.sql

{{ config(
    materialized='incremental',
    unique_key='customer_id',
    incremental_strategy='insert_overwrite'
) }}

with source as (

  select
    customer_id,
    name,
    email,
    region,
    is_active,
    current_timestamp() as updated_at
  from {{ ref('stg_customers') }}

),

-- Get current version of customer from dimension
existing as (

  select *
  from {{ this }}
  where is_current = true

),

-- Join source with existing data
changes as (

  select
    src.*,
    dim.customer_sk,
    dim.region as old_region,
    dim.is_active as old_is_active,
    dim.name as old_name,
    dim.email as old_email
  from source src
  left join existing dim
    on src.customer_id = dim.customer_id

),

-- Identify rows needing Type 2 handling
scd_type_2 as (

  select *
  from changes
  where region != old_region or is_active != old_is_active

),

-- Identify rows needing Type 1 update only
scd_type_1 as (

  select *
  from changes
  where customer_sk is not null
    and (region = old_region and is_active = old_is_active)
    and (name != old_name or email != old_email)

),

final as (

  -- New rows for Type 2 changes
  select
    {{ dbt_utils.surrogate_key(['customer_id', 'updated_at']) }} as customer_sk,
    customer_id,
    name,
    email,
    region,
    is_active,
    updated_at as valid_from,
    null as valid_to,
    true as is_current
  from scd_type_2

  union all

  -- Updated rows for Type 1 changes
  select
    customer_sk,
    customer_id,
    name,
    email,
    region,
    is_active,
    valid_from,
    valid_to,
    is_current
  from {{ this }}
  where is_current = true
    and customer_id in (select customer_id from scd_type_1)

)

select * from final

✅ Step 2: Benefits of this Hybrid SCD Strategy

Aspect	Advantage
🧠 Logic separation	Clear Type 1 vs Type 2 distinction
🧾 Efficient storage	Only version rows when necessary
📊 Unified query surface	One dimension table, no need to join Type 1 and Type 2 tables
🚀 Incremental-friendly	Works well with `insert_overwrite` strategy in dbt
🔍 Easier maintenance	All rules live in one model with clean logic blocks

✅ Step 3: Downstream Usage

When querying, you typically filter for current records:

select *
from {{ ref('dim_customers') }}
where is_current = true

Or use valid_from and valid_to for historical views or point-in-time analysis.

🧼 Summary

Hybrid SCD is a balanced approach to handling changing dimensions.
Use Type 2 versioning only when needed, and Type 1 updates otherwise.
dbt makes this efficient via:
- incremental models
- insert_overwrite strategies
- Macros like surrogate_key

Leave a Reply Cancel reply