Google BigQuery (GCP)
BigQuery is a great cloud data warehouse to get started with: they have a free tier, and it feels totally serverless: you don't have to worry about configuring instance or node sizes like you do with snowflake.
BigQuery is also good if you need to be HIPAA compliant and don't want to sign onerous enterprise licenses (for US-based healthcare companies). In short, we recommend using BigQuery if you have moderate amount of data to query and want to set up a cloud data warehouse. We use BigQuery for Glean's own analytics.
If you have relatively small data and are very early, using something lightweight like Postgres or uploading csvs or parquet files to our DuckDB integration could also be good options.
How to get set up
- Sign up for free GCP / BigQuery (opens in a new tab) account if you don't have one.
- Set up a Service User Account in Google Cloud.
- Set up a database connection in Glean.
Create a BigQuery service account
BigQuery connections in Glean use GCP service accounts to connect to your database. You will need to copy and paste the entire contents of a service account JSON key file into this field. See the Google documentation on authenticating with service accounts (opens in a new tab)
-
Create a service account in GCP by going to IAM > service accounts (opens in a new tab).
-
Select
+ Create Service Account
from the top of the page. -
When prompted you will need to add four IAM roles below:
BigQuery User
BigQuery Data Viewer
BigQuery Job User
BigQuery Metadata Viewer
-
Once you've created the service account, click into the service account and select the
Keys
tab. -
Click
Add Key
and make sure you create a JSON key (not a P12 key). The key will be automatically downloaded to your computer - you'll need to open the file and copy the contents so you can drop them into Glean below.
Create BigQuery database connection in Glean
- First, go to your Glean settings (opens in a new tab) page from the project dropdown.
- Click
+ New Database Connection
and fill out the fields below.
Settings
- Connection Name: A name for this database connection. If you use DataOps, you can use the connection name to refer to this database.
- Project Name: (optional) The GCP project. This is optional since it's usually also included in the below key file.
- JSON Key: BigQuery service accounts use a JSON file as a credential. The entire contents of this file needs to be copied and pasted into this field.