Once you add a database connection, it will be made available from the Add Data Model workflow and you'll be able to specify tables or SQL queries to start visualizing in Glean.
Your database must be accessible from our systems at
Create a database connection
- Go to the
Settingspage using the link on the navigation side bar
+ Add Database Connection
- Select the type of database; either Athena, Snowflake, Postgres, Redshift or BigQuery
- Fill out the credentials according to your type of database
- Click the antenna icon to test your connection
Addto complete the process
For most database types, you can optionally specify a schema. If you specify a schema in Glean, only tables from that schema will be made available in the Add Data Model workflow
SSH tunneling beta
Currently in beta, Glean supports connecting to a Postgres or Redshift database via a bastion host. A bastion host is a server which you control that has access to your database but is not the database host itself. This may be useful when handling especially sensitive data. To join the beta, contact us.
To use a bastion host, toggle on
Use SSH tunnel when creating a Postgres database connection. Fill out the additional credentials according to your bastion host set up. Download Glean's public ssh key and set it as an authorized key on your server under the bastion host user that you specify in the data connection.
Fill out the remaining credentials as they'll be accessed by your bastion host, e.g. for
Port specify the port on which your bastion host connects to the database.
We currently support the following SQL dialects, but can relatively easily support any SQL-oriented database dialect.
- Amazon Redshift
- Amazon Athena
Your service user requires access to a few AWS services for all features in Glean to work correctly. The service user needs access to S3 to store the output of queries and to access the source data as well as Glue access to be able to explore database metadata.
The service user should have the following permissions for the relevant resources:
"athena:Batch*", "athena:CreateNamedQuery", "athena:DeleteNamedQuery", "athena:Get*", "athena:List*", "athena:StartQueryExecution", "athena:StopQueryExecution", "glue:Get*", "glue:List*", "s3:CreateBucket", "s3:Get*" "s3:Head*", "s3:List*", "s3:Put*"
BigQuery connections in Glean use service accounts to connect to your database. You will need to copy and paste the entire contents of a service account JSON key file into this field.
Google documentation on authenticating with service accounts The account will require three roles to be added in the IAM configuration:
BigQuery Data Viewer,
BigQuery Job Userand
BigQuery Metadata Viewer
- Snowflake The following needs to be granted to your glean user:
set database_name = 'YOUR DATABASE NAME'; set warehouse_name = 'YOUR WAREHOUSE NAME'; set glean_user = 'YOUR GLEAN USER'; set glean_role = 'GLEAN_ROLE'; create role if not exists identifier($glean_role); grant role identifier($glean_role) to user identifier($glean_user); -- grant Glean role access to warehouse grant USAGE on warehouse identifier($warehouse_name) to role identifier($glean_role); -- grant Glean access to database grant MONITOR, USAGE on database identifier($database_name) to role identifier($glean_role); grant USAGE on all schemas in database identifier($database_name) to role identifier($glean_role); grant SELECT on all tables in database identifier($database_name) to role identifier($glean_role); grant SELECT on future tables in database identifier($database_name) to role identifier($glean_role);