Monitor listings

As a data provider, you can track the usage metrics for you listings. There are two methods to get the usage metrics for your shared data:

  • Use Analytics Hub. With Analytics Hub you can view the usage metrics dashboard for your listings that includes daily subscriptions, daily executed jobs, number of subscribers for each organization, and jobs frequency for each table. The usage metrics for your shared data is retrieved by querying the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

  • Use the INFORMATION_SCHEMA view. You can track how subscribers use your datasets by directly querying the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

Use Analytics Hub

To get the usage metrics for your shared data by using Analytics Hub, follow these steps:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the data exchange name that contains the listing for which you want to view the usage metrics.

  3. Click Usage metrics, and then do the following:

    1. From the Listings menu, select the listing.

    2. Set the time range.

The page displays the following usage metrics:

  • Total Subscriptions: the number of current subscriptions on the selected listing. You can view total subscriptions for up to 60 days.
  • Total Subscribers: the number of unique subscribers across all subscriptions on the selected listing. You can view total subscribers for up to 60 days.
  • Total jobs executed: the number of unique jobs run on each table of the selected listing.
  • Total bytes scanned: the total number of bytes scanned from all tables of the selected listing.
  • Daily Subscriptions: the chart that tracks the number of subscriptions for the selected listing over a time period. You can view daily subscriptions for up to 60 days.
  • Subscribers per organization: lists the organizations and their number of subscribers that consume your selected listing.
  • Daily Executed Jobs: this chart displays the jobs consumption from the selected listing.
  • Tables' job frequency: the frequency at which the tables are accessed on the selected listing.

Use INFORMATION_SCHEMA view

If you are a data provider, you can track how subscribers use your datasets by querying the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view. Ensure that you have the required role to query this view.

To run the query against a project other than your default project, add the project ID in the following format:

PROJECT_ID.region-REGION_NAME.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

For example, myproject.region-us.INFORMATION_SCHEMA.SHARED_DATASET_USAGE.

The following examples describe how to view the usage metrics by querying the INFORMATION_SCHEMA view:

Get the total number of jobs executed on all shared tables

The following example calculates total jobs run by subscribers for a project:

SELECT
  COUNT(DISTINCT job_id) AS num_jobs
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE

The result is similar to the following:

+------------+
| num_jobs   |
+------------+
| 1000       |
+------------+

To check the total jobs run by subscribers, use the WHERE clause:

  • For datasets, use WHERE dataset_id = "...".
  • For tables, use WHERE dataset_id = "..." AND table_id = "...".

Get the most used table based on the number of rows processed

The following query calculates the most used table based on the number of rows processed by subscribers.

SELECT
  dataset_id,
  table_id,
  SUM(num_rows_processed) AS usage_rows
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1,
  2
ORDER BY
  3 DESC
LIMIT
  1

The output is similar to the following:

+---------------+-------------+----------------+
| dataset_id    | table_id      | usage_rows     |
+---------------+-------------+----------------+
| mydataset     | mytable     | 15             |
+---------------+-------------+----------------+

Find the top organizations that consume your tables

The following query calculates the top subscribers based on the number of bytes processed from your tables. You can also use the num_rows_processed column as a metric.

SELECT
  subscriber_org_number,
  ANY_VALUE(subscriber_org_display_name) AS subscriber_org_display_name,
  SUM(total_bytes_processed) AS usage_bytes
FROM
  `region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
GROUP BY
  1

The output is similar to the following:

+--------------------------+--------------------------------+----------------+
|subscriber_org_number     | subscriber_org_display_name    | usage_bytes    |
+-----------------------------------------------------------+----------------+
| 12345                    | myorganization                 | 15             |
+--------------------------+--------------------------------+----------------+

For subscribers without an organization, you can use job_project_number instead of subscriber_org_number.

Get usage metrics for your data exchange

If your data exchange and source dataset are in different projects, follow these step to view the usage metrics for your data exchange:

  1. Find all listings that belong to your data exchange.
  2. Retrieve the source dataset attached to the listing.
  3. To view the usage metrics for your data exchange, use the following query:
SELECT
  *
FROM
  source_project_1.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"
UNION ALL
SELECT
  *
FROM
  source_project_2.`region-us`.INFORMATION_SCHEMA.SHARED_DATASET_USAGE
WHERE
  dataset_id='source_dataset_id'
AND data_exchange_id="projects/4/locations/us/dataExchanges/x1"

What's next