Docs
dbt Integration

dbt Integration

Overview

Glean can import models directly from dbt by specifying metrics in your dbt yaml file. Glean will automatically sync dbt documentation and keep metrics in sync with downstream dashboards (whether they are checked into version control or not). The Glean dbt integration can also be integrated in CI/CD workflows.

Configure your dbt_project.yml

You can use the glean-defaults key to specify the glean version (which is required) and connection information.

dbt_project.yml
 
models:
  +meta:
    # glean-defaults settings will be merged into each Glean model
    glean-defaults:
      glean: "1.0"
      # preventUpdatesFromUI defaults to true, but can be set to false
      preventUpdatesFromUI: false
      source:
        connectionName: bigquery-prod # the name of the conneciton in Glean

Define model information

The simplest example of a Glean model just defines a metric. Glean models must include the glean key in the dbt model.

models/schema.yml
version: 2
 
models:
  # 
  - name: order_attribution
    description: This model describes how each order is attributed to different marketing touchpoints.
    columns:
      - name: attribution_id
        description: Unique ID for the attribution.
      - name: order_id
        description: Order ID, matches the order_id in the sales model.
      - name: touchpoint_id
        description: ID of the marketing touchpoint which led to the order.
    meta:
      glean:
        cols:
          - id: row_count
            type: metric
            name: Order Attribution Records
            aggregate: row_count

Build your Glean models

This is the simplest way to preview your models, there are more advanced options below.

# assuming you've already built your tables with dbt run
glean preview --dbt
 
# and later
glean deploy --dbt

Model specification

The glean-defaults connection configuration in the example above gets merged into the Glean configuration inside each dbt model. If you don't specify the defaults, you must add the complete connection information in the meta key of each dbt model you want imported to Glean. Only dbt models that specify a glean key will get built into Glean models.

schema.yml
 
version: 2
 
models:
  - name: a_model
    description: model docs here
    meta:
      glean:
        glean: "1.0" # Glean dbt integration version
        source:
          connectionName: snowflake_production # which Glean data connection to use, can also be specified globally if you use glean-defaults
    columns:
      - name: a_str
        description: some column documentation
      # ... your dbt columns

Some notes on how models get built from dbt:

  • Glean will read the dbt manifest file and find all models with a glean key specified within the meta key.
  • The name of dbt model is imported into Glean as the model name (this can be overriden).
  • Columns specified explicitly in the dbt model will become attributes in Glean.
  • Descriptions on the model will get imported as model documentation.
  • You can specify additional attribute configuration in a meta key for a column.

How data model source is determined:

  • source has the same options as the corresponding field of Glean data models.
  • Unless otherwise specified, Glean automatically infers defaults for the source physicalName and schema. If no connectionName or connectionId is specified, Glean will try to use a connection whose name is the same as the dbt target database (e.g. "bigquery" or "snowflake").

You may then use dbt-core to produce a Manifest JSON file (opens in a new tab), which should be included with any Glean DataOps build (or preview):

glean preview --dbt-manifest=./target/manifest.json

Glean will include each dbt column as an attribute in the corresponding Glean model. You can also add additional fields to the dbt columns to take advantage of Glean's more advanced modeling features:

    # ... as above
    columns:
      - name: my_dbt_attribute
        documentation: this is a dbt attribute
        tests:
          - unique
          - not_null
        # add additional configuration to this column
        meta:
          glean:
            primaryKey: true

To add Glean metrics (or other attributes not present in the dbt model) to the imported model, you can add a cols field to the dbt model's meta.glean configuration:

models:
  - # ... dbt schema configuration, as above
    meta:
      glean:
        cols:
          - id: daily_active_users
            type: metric
            name: Daily Active Users
            sql: COUNT(DISTINCT customer_id) / COUNT(DISTINCT create_at)

Glean configuration in your dbt schema file has access to the same templating and macros (opens in a new tab) as your dbt configuration. You could, for example, configure the Glean data connection based on your dbt target:

models:
  - # ... dbt model configuration 
    meta:
      glean:
        glean: "1.0"
        name: "customers"
        source:
          # configure the source based on the dbt target
          connectionName: "{{ target.database }}"

Imported dbt models are exclusively controlled by DataOps and cannot be saved from the web UI, though you are still free to prototype and experiment with dbt models in the Glean model editor.

Column descriptions will be automatically synced from dbt to the corresponding Glean model during deploys.

Global configuration

If all models use the same Glean source, you can configure a default source in the dbt_project.yml file:

dbt_project.yml
 
models:
  the_models:
    +meta: # will be applied to every model in models/the_models
      glean-defaults:
        source:
          connectionName: # which Glean data connection to use

To import all dbt models in a folder as Glean models, simply add

dbt_project.yml
 
models:
  the_models:
    +meta:
      glean-defaults:
        glean: "1.0"
        source:
          connectionName: # which Glean data connection to use
          # schema is always optional because Glean will infer it
          schema: "{{ target.schema }}" # can use jinja substitution here

This configuration will add the meta.glean object to every model in the_models, so they will all be imported into Glean. You can still add additional metadata for the specific models (such as adding additional metrics).