dbt: Understanding dbt manifest file structure and documentation


📘 Reusable Documentation Blocks in dbt’s Manifest

In dbt, documentation blocks (docs()) are first-class citizens in the project’s metadata. This means they are treated just like models, sources, seeds, etc., and have their own:

  • Unique identifiers (IDs),
  • Dedicated entries in the manifest file (manifest.json), and
  • References from other resources (like models or columns).

This design supports reuse, consistency, and full documentation coverage in a maintainable way.


🧠 Why does this matter?

  • You can write a documentation block once and reuse it in multiple places.
  • dbt automatically maintains a mapping between your resources (models, sources, columns) and the documentation blocks.
  • This makes it possible to track which parts are documented, and to generate a clean and searchable documentation website with dbt docs.

🧱 Example: Shared Documentation Block

Step 1: Define a reusable documentation block

Create a file like models/docs/common_docs.md:

{% docs customer_id_doc %}
This is the unique identifier for a customer in our system. It is used across fact and dimension tables to join customer-related data.
{% enddocs %}

Step 2: Reference it from multiple models

# models/staging/stg_orders.yml

version: 2

models:
  - name: stg_orders
    columns:
      - name: customer_id
        description: "{{ docs('customer_id_doc') }}"
# models/staging/stg_payments.yml

version: 2

models:
  - name: stg_payments
    columns:
      - name: customer_id
        description: "{{ docs('customer_id_doc') }}"

🔁 Now, both models share the same documentation source.


🔍 What happens in the manifest.json?

When you run:

dbt compile

dbt generates a manifest.json file inside the target/ directory, which contains:

  • A node for the documentation block, like: "docs": { "customer_id_doc": { "block_contents": "This is the unique identifier for a customer...", "resource_type": "doc", "unique_id": "doc.common_docs.customer_id_doc" } }
  • Each column that references this block will have: "description": "{{ docs('customer_id_doc') }}"
  • And dbt resolves it into a static description in the documentation website.

🌐 How does this help?

✅ Benefit💬 Description
ReusabilityWrite once, use anywhere.
ConsistencyAvoid divergence in wording across models.
Coverage trackingdbt knows which columns are documented.
First-class manifest nodeDocs are searchable and navigable in dbt docs.
Web-readydbt can generate a full static site with linked docs.

🚀 Bonus Tip

Want to enforce documentation coverage?

Use a dbt test like this:

columns:
  - name: customer_id
    tests:
      - not_null
      - dbt_expectations.expect_column_to_have_documentation

Or write a custom macro to fail builds if documentation is missing.


Leave a Reply

Your email address will not be published. Required fields are marked *