Introduction to repositories

This document helps you understand the concept of repositories in Dataform.

Each Dataform repository houses a collection of SQLX and JavaScript files that make up your SQL workflow, as well as Dataform configuration files and packages. You interact with the contents of your repository in a development workspace.

Dataform displays your repositories on the Dataform page in the alphabetical order of repository IDs. You can sort and filter them.

Go to Dataform

Each Dataform repository is connected to a service account. You can select a service account when you create a repository, or edit the service account later.

By default, Dataform uses a service account derived from your project number in the following format:

service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com

Dataform uses Git to record changes and manage file versions. Each Dataform repository corresponds with a Git repository. After you create a Dataform repository, you can connect it to a remote GitHub, GitLab, or Bitbucket repository.

In a Dataform repository, Dataform stores the repository code. In a connected repository, the third-party repository stores the repository code. Dataform interacts with the third-party repository to allow you to edit and execute its contents in a Dataform development workspace.

A Dataform repository page consists of the following components:

Development workspaces tab
Displays development workspaces created in the repository.
Release configurations tab
Lets you inspect, create, edit, and delete releases.
Workflow execution logs tab
Displays Dataform workflow execution logs.
Workflow configurations tab
Lets you inspects, create, edit, and delete workflow configurations.
Settings tab
Displays the name and location of the repository. For a repository connected to a third-party Git repository, displays the third party repository source, default branch name, and secret token. Displays the buttons to connect the repository to a third-party Git repository and to edit the Git connection.
Create development workspace button
Lets you create a development workspace.

After you create and initialize a development workspace, you can edit the dataform.json file to configure the following Dataform settings of your repository:

  • The default database (Google Cloud project ID)
  • The default schema (BigQuery dataset ID)
  • The default BigQuery location
  • The default schema (BigQuery dataset ID) for assertions
  • The warehouse, which must be set to bigquery
  • User-defined variables that are made available to project code during compilation

For more information about Dataform repository settings, see IProjectConfig in the Dataform core reference.

What's next