Skip to main content

Apply for teacher training - 25. Redacting personal data fields for local database copies

Date: 2024-11-27

Status

Agreed

Context

2024-11-27

This revision

  • adds find_feedback and vendor_api_response to the tables to be deleted. The api_response particularly would have had all application data, including those fields we are anonymising elsewhere.
  • Removes equality_and_diversity (jsonb) from the columns to be anonymised. The reason is that there is a lot of logic in the form and HESA conversion related to this and without realistic input, it will be meaningless for debugging issues.

2024-11-18

It is occasionally useful to have access to production-like data for all kinds of analysis and experiments, such as running large or dangerous migrations, analysing queries etc.

This usually becomes more apparent during an incident and that’s where this idea originates.

The first hurdle for us to overcome is deciding what we need to redact/sanitise/pseudonymise.

Decision

Primary / Foreign keys

These should be left alone.

The following tables will be empty in the data dump

  • audits
  • blazer_audits
  • blazer_checks
  • blazer_dashboard_queries
  • blazer_dashboards
  • blazer_queries
  • email_clicks
  • emails
  • find_feedback
  • vendor_api_requests

The following fields will by anonymised

Table name fields
application_forms first_namelast_namephone_numberaddress_line1address_line2address_line3address_line4postcodedisability_disclosurebecoming_a_teachersafeguarding_issuesinternational_addressright_to_work_or_study_details
application_choices personal_statement
candidates email_address
provider_users email_addressfirst_namelast_name
references email_addressfeedbackname
support_users email_addressfirst_namelast_name
vendor_api_users full_nameemail_address

Consequences

This is enabling work for eventually allowing developers to use production-like data in local development.