Skip to main content

Apply for QTS in England - Setting up analytics

1. Configure dfe-analytics

Follow the instructions in the README, and set it up: https://github.com/DFE-Digital/apply-for-qualified-teacher-status/pull/99

2. Get a BigQuery project setup and add initial owners

Ask in Slack on the #twd_data_insights channel for someone to help you procure a BigQuery instance in the digital.education.gov.uk Google Cloud Organisation.

Ask for your @digital.education.gov.uk Google account to be setup as an owner via the IAM and Admin settings. Add other team members as necessary.

Set up billing

You also need to set up your BigQuery instance with paid billing. This is because dfe-analytics uses streaming, and streaming isn't allowed in the free tier:

accessDenied: Access Denied: BigQuery BigQuery: Streaming insert is not allowed
in the free tier

3. Create a data set and table

Update: The dfe_analytics terraform module now automates these steps.

You should create separate data sets for each environment (dev/preprod/prod).

  1. Select the BigQuery instance
  2. Go to the Analysis -> SQL Workspace section
  3. Tap on the 3 dots next to the project name, "Create data set"
  4. Name it events_ENVIRONMENT, such as events_local for local development testing, and set the location to europe-west2 (London)
  5. Select your new events_local data set
  6. Create a table
  7. Name it events
  8. Set the schema to match the one below (including the nested fields inside request_query and data)
  9. Set Partitioning to occurred_at
  10. Set Partitioning type to By day
  11. Set Clustering order to event_type
  12. Click on "Create table"

Tip: You can copy this empty table between environments to save time and not have to do the last few steps over and over.

Schema

Field name Type Mode
occurred_at TIMESTAMP REQUIRED
event_type STRING REQUIRED
environment STRING REQUIRED
namespace STRING NULLABLE
user_id STRING NULLABLE
request_uuid STRING NULLABLE
request_method STRING NULLABLE
request_path STRING NULLABLE
request_user_agent STRING NULLABLE
request_referer STRING NULLABLE
request_query RECORD REPEATED
request_query.key STRING REQUIRED
request_query.value STRING REPEATED
response_content_type STRING NULLABLE
response_status STRING NULLABLE
data RECORD REPEATED
data.key STRING REQUIRED
data.value STRING REPEATED
entity_table_name STRING NULLABLE
event_tags STRING REPEATED
anonymised_user_agent_and_ip STRING NULLABLE

If you edit as text, you can paste this:

[
  {
    "name": "occurred_at",
    "type": "TIMESTAMP",
    "mode": "REQUIRED"
  },
  {
    "name": "event_type",
    "type": "STRING",
    "mode": "REQUIRED"
  },
  {
    "name": "environment",
    "type": "STRING",
    "mode": "REQUIRED"
  },
  {
    "name": "namespace",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "user_id",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_uuid",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_method",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_path",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_user_agent",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_referer",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "request_query",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "key",
        "type": "STRING",
        "mode": "REQUIRED"
      },
      {
        "name": "value",
        "type": "STRING",
        "mode": "REPEATED"
      }
    ]
  },
  {
    "name": "response_content_type",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "response_status",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "data",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "key",
        "type": "STRING",
        "mode": "REQUIRED"
      },
      {
        "name": "value",
        "type": "STRING",
        "mode": "REPEATED"
      }
    ]
  },
  {
    "name": "entity_table_name",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "event_tags",
    "type": "STRING",
    "mode": "REPEATED"
  },
  {
    "name": "anonymised_user_agent_and_ip",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]

4. Create custom roles

Update: The dfe_analytics terraform module now automates these steps.

  1. Go to IAM and Admin settings > Roles
  2. Click on "+ Create role"
  3. Create the 3 roles outlined below

Analyst

Field Value
Title BigQuery Analyst Custom
Description Assigned to accounts used by performance analysts.
ID bigquery_analyst_custom
Role launch stage General Availability
+ Add permissions See below

Developer

Field Value
Title BigQuery Developer Custom
Description Assigned to accounts used by developers.
ID bigquery_developer_custom
Role launch stage General Availability
+ Add permissions See below

Appender

Field Value
Title BigQuery Appender Custom
Description Assigned to accounts used by appenders (apps and scripts).
ID bigquery_appender_custom
Role launch stage General Availability
+ Add permissions See below

5. Create an appender service account

Update: The dfe_analytics terraform module now automates these steps.

  1. Go to IAM and Admin settings > Create service account
  2. Name it like "Appender NAME_OF_SERVICE ENVIRONMENT", so "Appender Apply Local"
  3. Add a description, like "Used when developing locally."
  4. Grant the service account access to the project, use the "BigQuery Appender Custom" role you set up earlier

6. Get an API JSON key :key:

Update: The dfe_analytics terraform module now automates these steps.

  1. Access the service account you previously setup
  2. Go to the keys tab, click on "Add key > Create new key"
  3. Create a JSON private key

The full contents of this JSON file is your BIGQUERY_API_JSON_KEY.

6. Set up environment variables

Update: The dfe_analytics terraform module now automates these steps.

Putting the previous things together, to finish setting up dfe-analytics, you need these environment variables:

BIGQUERY_TABLE_NAME=events
BIGQUERY_PROJECT_ID=apply-for-qts-in-england
BIGQUERY_DATASET=events_local
BIGQUERY_API_JSON_KEY=<contents of the JSON, make sure to strip or escape newlines>

7. Configure dfe-analytics-dataform

Follow the instructions in the README, and set it up: https://github.com/DFE-Digital/dfe-analytics-dataform#how-to-install