Skip to main content

Find a lost TRN - How to restore database from point in time backup

The below steps can be used to restore a database to its point in time backup. We can restore to any point from 5 minutes to 7 days ago, with a resolution of one second.

Adding the user to SpaceDeveloper role

The user must be in the SpaceDeveloper role to perform any of the below actions. A user can be added to the SpaceDeveloper role by anyone in the SpaceManager role using the below command

  • CF CLI cf set-space-role <firstname.lastname@digital.education.gov.uk|Google SSO UID> dfe <space-name> SpaceDeveloper

  • Makefile

make <ENV> set-space-developer USER_ID=<user id obtained from $(cf target) command>

The username of the person being added to the role can be obtained by asking them to run cf target, this is usually their @digital.education.gov.uk email address unless they have enabled Google SSO in their GovUK PaaS account.

Restoring from a point in time backup

1. Set the correct space context and stop any apps bound to the database service instance

Set the context for the cf cli commands to the space under which the service exists.

Stop the app that depend on the database service instance using the cf stop command. The list of apps in the space can be obtained by running cf apps.

This can be done by running Makefile rule as follows:

For example run make <ENV> stop-app CONFIRM_STOP=1

2. Identify the service instance's GUID

Find the GUID of the database service instance which has been corrupted.

Run Makefile for example make <ENV> get-postgres-instance-guid.

After this we need to know the latest timestamp in UTC from which we want to restore using the backup.

3. Login to Azure CLI and obtain PaaS passcode

Obtain a passcode from GOV.UK PaaS

Login to Azure using Azure CLI. Run az login if not logged in previously. This will take you to the azure portal to authenticate the cli. This is required for reading the terraform state and KeyVault secrets.

4. Rename the postgres database

This is because as part of the recovery process we will recreate the service using the same name as before with the backup parameters in one terraform action.

Run make <ENV> rename-postgres-service CONFIRM_RENAME=1 NEW_NAME_SUFFIX=old

5. Remove existing database service from terraform state

This is needed so that, a new postgres service instance with point in time backup is created instead of the resource in state file.

Run make <ENV> remove-postgres-tf-state PASSCODE=<CF_SSO_PASSCODE> where PASSCODE is the PaaS passcode obtained in step 3.

6. Deploy new database instance from PaaS backup

This will create a new instance of the postgres database with point in time backup specified by the time passed in.

Please note in the below command, the timestamp is in UTC, so if we are in BST, make sure to offset the hour by 1, ie: 5:00 PM BST would be 4:00 GMT during summer.

make <ENV> restore-postgres PASSCODE=<CF_SSO_PASSCODE> DB_INSTANCE_GUID=<> BEFORE_TIME="<UTC_TIMESTAMP>".

Note: DB_INSTANCE_GUID is the output of step 2. BEFORE_TIME should be in format 2022-04-21 09:00:00

Running the command above will prompt for few terraform variables if you dont pass it as arguments to the command in TF_VAR_name_of_variable=VALUE form.

  • var.flt_docker_image

    You can pass the image url found from the Github packages. For example ghcr.io/dfe-digital/find-a-lost-trn:19486740313c9c89566bcfa4d1f60981d27fb2e3

    Find the current version of the Github SHA of image by running cf app APP_NAME on CF CLI, where APP_NAME is the name of the app you stopped in step one. Make sure you use the same image as the current one deployed.

Review the generated deploy plan and check that the database instance will be created using the specified point in time backup. Make sure there are no changes to the flt_docker_image property of the app as we are deploying the app again with the same version as before the incident.

Once the plan is reviewed, type yes to initiate the deployment process. This process can take between 10 and 60 mins to complete. When this is in progress, you can run cf service to check the progress. The database service instance will be created and the dependant app will be started on successful completion.

7. Delete renamed database instance

Once restore has been ran successfully and checked, make sure to delete the renamed instance created before the restore process was intitated in step 4.

Run cf delete-service <db-service-instance-name>-old

How to restore backup from Azure storage

Backup of production database find-a-lost-trn-production-pg-svc is taken every night and stored in Azure storage s165p01dbbackup for 35 days. You can download the backup file and restore it to the PaaS postgres instance.

Note: This process assumes that the database instance is available on PaaS.

Permissions required

  • SpaceDeveloper role to run the restore command in Gov PaaS.
  • Contributor access to s165-teachingqualificationsservice-production subscription raised through PIM request. This will enable access to storage account.

Tool required

  • Conduit plugin

Install CF Conduit plugin by running cf install-plugin conduit

1. Download the backup file

  • Go to Azure storage account s165p01dbbackup and container find-a-lost-trn.
  • Download the specific backup file avaialable to restore
  • extract the zip file locally

2. Restore to database

Run cf conduit find-a-lost-trn-production-pg-svc -- psql < backup.sql, where backup.sql is the backup you downloaded from Azure storage.