Teacher Training Entitlement - dfe_analytics
Summary
For the purpose of analytics, we send data to Google Cloud Platform (GCP) BigQuery, using the DfE::Analytics gem.
Enabling analytics
Analytics is enabled via a feature flag in the admin console.
Entity table checks
We have the config.entity_table_checks_enabled config option set to true, and the CheckAnalyticsEntity job ensures the latest version of an entity table in BigQuery is in sync with the database.
Environment variables
The following environment variables are required:
-
BIGQUERY_PROJECT_IDcontains the GCP BigQuery project ID -
BIGQUERY_TABLE_NAMEcontains the name of the table to write to - set toevents -
GOOGLE_CLOUD_CREDENTIALScontains the GCP service account credentials in JSON format, already set by the infra team via terraform
Review apps
To test the analytics export on a review app, you can use the following steps:
- Enable the DfE Analytics feature flag in the admin console
- DfE Analytics events should be viewable on the GCP BigQuery console, in the
npq_events_reviewdataset
To get access to the GCP BigQuery console, you need to have a GCP account (requested via the IT Help Centre), then access to the BigQuery project can be requested on #twd_data_insights.
Creating datasets and tables
There are datasets for each environment (i.e. npq_events_production, npq_events_sandbox, npq_events_staging, npq_events_review),
and an events table in each dataset.
If we need to create a new dataset for a new environment, the following process can be used:
- In the GCP console, navigate to BigQuery, and create a new dataset with the following:
- use London region (europe-west2)
- for encryption use "Cloud KMS key" and use the "Enter key manually" to set the key to the same one used by other datasets
- set the "Time Travel Window" to 7 days
- Go to an existing
eventstable, and on the schema tab, check the first checkbox to check all fields, then click on "Copy" and "Copy as JSON" - In the new dataset, create a new empty table called
events, with the following:- for the schema, use the "Edit as text" option, and paste the JSON from the previous step
- for partitioning settings, partition by field "occurred_at", with partitioning type "By day"
- for clustering settings, cluster by field "event_type"
- in Advanced options, set the encryption to "Cloud KMS key" and use the "Enter key manually" to set the key to the same one used by other tables
Helper scripts
The below is an example script to list BigQuery records.
require "google/cloud/bigquery"
def list_records(project_id, dataset_id, table_id)
bigquery = Google::Cloud::Bigquery.new(project: project_id, credentials: JSON.parse(DfE::Analytics.config.bigquery_api_json_key))
bigquery.dataset(dataset_id)
query = "SELECT * FROM `#{dataset_id}.#{table_id}` LIMIT 10"
rows = bigquery.query(query)
rows.each do |row|
puts row
end
end
list_records "ecf-bq", "npq_events_review", "events"