dbt Integration
Overview
Glean can import models directly from dbt by specifying metrics in your dbt yaml file. Glean will automatically sync dbt documentation and keep metrics in sync with downstream dashboards (whether they are checked into version control or not). The Glean dbt integration can also be integrated in CI/CD workflows.
Configure your dbt_project.yml
You can use the glean-defaults
key to specify the glean version (which is required) and connection information.
models:
+meta:
# glean-defaults settings will be merged into each Glean model
glean-defaults:
glean: "1.0"
# preventUpdatesFromUI defaults to true, but can be set to false
preventUpdatesFromUI: false
source:
connectionName: bigquery-prod # the name of the conneciton in Glean
Define model information
The simplest example of a Glean model just defines a metric. Glean models must include the glean
key in the dbt model.
version: 2
models:
#
- name: order_attribution
description: This model describes how each order is attributed to different marketing touchpoints.
columns:
- name: attribution_id
description: Unique ID for the attribution.
- name: order_id
description: Order ID, matches the order_id in the sales model.
- name: touchpoint_id
description: ID of the marketing touchpoint which led to the order.
meta:
glean:
cols:
- id: row_count
type: metric
name: Order Attribution Records
aggregate: row_count
Build your Glean models
This is the simplest way to preview your models, there are more advanced options below.
# assuming you've already built your tables with dbt run
glean preview --dbt
# and later
glean deploy --dbt
Model specification
The glean-defaults
connection configuration in the example above gets merged into the Glean configuration inside each dbt model. If you don't specify the defaults, you must add the complete connection information in the meta
key of each dbt model you want imported to Glean. Only dbt models that specify a glean
key will get built into Glean models.
version: 2
models:
- name: a_model
description: model docs here
meta:
glean:
glean: "1.0" # Glean dbt integration version
source:
connectionName: snowflake_production # which Glean data connection to use, can also be specified globally if you use glean-defaults
columns:
- name: a_str
description: some column documentation
# ... your dbt columns
Some notes on how models get built from dbt:
- Glean will read the dbt manifest file and find all models with a
glean
key specified within themeta
key. - The name of dbt model is imported into Glean as the model name (this can be overriden).
- Columns specified explicitly in the dbt model will become attributes in Glean.
- Descriptions on the model will get imported as model documentation.
- You can specify additional attribute configuration in a
meta
key for a column.
How data model source is determined:
source
has the same options as the corresponding field of Glean data models.- Unless otherwise specified, Glean automatically infers defaults for the source
physicalName
andschema
. If noconnectionName
orconnectionId
is specified, Glean will try to use a connection whose name is the same as the dbt target database (e.g."bigquery"
or"snowflake"
).
You may then use dbt-core to produce a Manifest JSON file (opens in a new tab), which should be included with any Glean DataOps build (or preview):
glean preview --dbt-manifest=./target/manifest.json
Glean will include each dbt column as an attribute in the corresponding Glean model. You can also add additional fields to the dbt columns to take advantage of Glean's more advanced modeling features:
# ... as above
columns:
- name: my_dbt_attribute
documentation: this is a dbt attribute
tests:
- unique
- not_null
# add additional configuration to this column
meta:
glean:
primaryKey: true
To add Glean metrics (or other attributes not present in the dbt model) to the imported model, you can add a cols
field to the dbt model's meta.glean
configuration:
models:
- # ... dbt schema configuration, as above
meta:
glean:
cols:
- id: daily_active_users
type: metric
name: Daily Active Users
sql: COUNT(DISTINCT customer_id) / COUNT(DISTINCT create_at)
Glean configuration in your dbt schema file has access to the same templating and macros (opens in a new tab) as your dbt configuration. You could, for example, configure the Glean data connection based on your dbt target:
models:
- # ... dbt model configuration
meta:
glean:
glean: "1.0"
name: "customers"
source:
# configure the source based on the dbt target
connectionName: "{{ target.database }}"
Imported dbt models are exclusively controlled by DataOps and cannot be saved from the web UI, though you are still free to prototype and experiment with dbt models in the Glean model editor.
Column descriptions will be automatically synced from dbt to the corresponding Glean model during deploys.
Global configuration
If all models use the same Glean source
, you can configure a default source
in the dbt_project.yml
file:
models:
the_models:
+meta: # will be applied to every model in models/the_models
glean-defaults:
source:
connectionName: # which Glean data connection to use
To import all dbt models in a folder as Glean models, simply add
models:
the_models:
+meta:
glean-defaults:
glean: "1.0"
source:
connectionName: # which Glean data connection to use
# schema is always optional because Glean will infer it
schema: "{{ target.schema }}" # can use jinja substitution here
This configuration will add the meta.glean
object to every model in the_models
, so they will all be imported into Glean. You can still add additional metadata for the specific models (such as adding additional metrics).