Create materialized view replicas

This document describes how to create materialized view replicas in BigQuery. You can use materialized view replicas to make Amazon Simple Storage Service (Amazon S3) data available locally for joins.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Google Cloud project.

Required permissions

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

To create materialized view replicas, you need the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.replicateData
  • bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to create a materialized view:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information about BigQuery Identity and Access Management (IAM), see Introduction to IAM in BigQuery.

Prerequisites

Before creating a materialized view replica, you must complete the following tasks:

  1. Create a dataset in a region that supports Amazon S3.
  2. Create an Amazon S3 BigLake table in the dataset you created in Step 1. Enable metadata caching when creating the table.
  3. Create a materialized view over the Amazon S3 BigLake table in the dataset you created in Step 1. You could also create the materialized view in a different dataset that is in a region that supports Amazon S3.
  4. Authorize the materialized view on the datasets that contain the source Amazon S3 BigLake tables used in the query that created the materialized view.
  5. If you configured manual metadata cache refreshing for the Amazon S3 BigLake table, run the BQ.REFRESH_EXTERNAL_METADATA_CACHE system procedure to refresh the metadata cache.
  6. Run the BQ.REFRESH_MATERIALIZED_VIEW system procedure to refresh the materialized view.

Create materialized view replicas

To create materialized view replicas, use the CREATE MATERIALIZED VIEW AS REPLICA OF statement:

CREATE MATERIALIZED VIEW PROJECT_ID.BQ_DATASET.REPLICA_NAME
OPTIONS(replication_interval_seconds=REPLICATION_INTERVAL)
AS REPLICA OF PROJECT_ID.S3_DATASET.MATERIALIZED_VIEW_NAME;

Replace the following:

  • PROJECT_ID: the name of your project in which you want to create the materialized view replica—for example, myproject.
  • BQ_DATASET: the name of the BigQuery dataset that you want to create the materialized view replica in—for example, bq_dataset. The dataset must be in the BigQuery region that maps to the region of the source materialized view.
  • REPLICA_NAME: the name of the materialized view replica that you want to create—for example, my_mv_replica.
  • REPLICATION_INTERVAL: specifies how often to replicate the data from the source materialized view to the replica, in seconds. Must be a value between 60 and 3,600, inclusive. Defaults to 300 (5 minutes).
  • S3_DATASET: the name of the dataset that contains the source materialized view—for example, s3_dataset.
  • MATERIALIZED_VIEW_NAME: the name of the materialized view to replicate—for example, my_mv.

The following example creates a materialized view replica named mv_replica in bq_dataset:

CREATE MATERIALIZED VIEW `myproject.bq_dataset.mv_replica`
OPTIONS(
  replication_interval_seconds=600
)
AS REPLICA OF `myproject.s3_dataset.my_s3_mv`

After you create the materialized view replica, the replication process polls the source materialized view for changes and replicates data to the materialized view replica, refreshing the data at the interval you specified in the replication_interval_seconds option. If you query the replica before the first backfill completes, you get a backfill in progress error. You can query the data in the materialized view replica after the first replication completes.