Apply for teacher training - 11. Storing Diversity Data
Date: 2019-01-14
Status
Accepted
Context
This service will need to store sensitive diversity data for each candidate - gender, ethnicity, etc - which, although of little interest to attackers, would be potentially impactful to the candidate if someone was to gain unauthorised access to it. Therefore we need to take extra care to minimise the risk of this happening.
Findings
The risks and threats we considered:
-
Non-malicious but still unauthorised access. For instance, a genuine provider or support user just clicking around out of interest, or to check up on a family member, etc
-
SQL-injection-style attack: "script kiddies" applying as Mr ';SELECT * FROM Candidates;', etc
-
Application bug accidentally dumping too much data into the response
-
Accidental exposure through opsec breach. For instance, leaving unencrypted backups on publicly-accessible file store, etc
Options we have considered:
1. One app, two databases
Storing the data in a separate database within the same infrastructure and ops setup
Pros
- Rails 6 has native support for multiple databases
- Most performant of the separate-datastore solutions
Cons
- Possibly too transparent to the app/developers - if it presents too much like the same database, we'll have the constant cognitive overhead of trying to do things like cross-database joins, and then wasting time while we remember that "oh yes, that's why it did not work..."
- Ops overhead - all DB management tasks will need to be scoped to the database (applying migrations, etc), deploy chain needs extending to support multiple databases, multiple environment variables with database URLs, multiple backups, etc
- Adds complexity to an already quite-heavyweight single application
- Not much separation in practice
2. Completely separate microservice
Creating a separate microservice with a dedicated RESTful API, deployed as an entirely separate standalone application.
Pros
- Fully isolated
- Access can be managed, logged and monitored entirely separately
- Clear in Apply codebase that it's a remote call
- Keeps each app inside the default single-database assumption
Cons
- Operationally complex to set up
- Needs entirely separate monitoring, deploy pipeline, backups, etc
- Creates a hard dependency on a synchronous remote call in Apply (for retrieving & storing data)
3. Dedicated microservice in the same application
Similar to option 2 above, but deploying the separate microservice & database as a separate container within the same infrastructure and ops setup
Pros
- Access can be managed, logged and monitored entirely separately
- Clear in Apply codebase that it's a remote call
- Keeps each app inside the default single-database assumption
- Keeps all Apply infrastructure & components within the Apply CIP setup
- Data transfer between the Apply app and the diversity microservice can be kept inside the Apply 'firewall'
Cons
- Adds complexity to monitoring
- Adds complexity to deployment pipeline
- Adds a hard dependency on a synchronous remote call within Apply
4. Not storing the data at all
Pushing back on the need for our service to collect & store the data, when we would only be doing so to give it to the providers
Pros
- Adds no complexity at all to us
Cons
- Pushes complexity back to the user's experience - if they're going to have to provide this data, it makes most sense for them to do so at the point when they're already providing the rest of their data (at the point of submitting their application)
5. Worrying less about storage, more about access
Accepting the risk of not separating the data at the point of storage, and focussing our efforts on separation at the point of access
We can store the diversity data in a separate table, providing some slight mitigation against SQL-injection attack (the attacker needs to know the table name & join conditions).
We can mitigate risk 1 (non-malicious but still unauthorised access) through a couple of enhancements:
-
only show the diversity data in the provider UI on a separate page, potentially with an interstitial reminder to the user that the data they're about to view is sensitive and their access will be logged & audited. This will at least make the casual browser think twice before proceeding
-
do not include the diversity data within the
/applications/(id)
endpoint, but only on a separate sub-resource endpoint (/applications/(id)/diversity_data
or similar)
For both of the above dedicated URLs, access can be logged, monitored, audited and even alerted on with simple config changes to existing tooling.
Decision
For all of the solutions except 4 (not storing the data at all), the Apply app will still need to decrypt the data to present it in the API and UI as plain text to the Student Records Systems or the user. So however we store the data, we would have added complexity in the code and the operations management, yet still be vulnerable to all the considered risks (except risk 2 - SQL injection - we have framework-level protection against this, and our most recent pen test found no SQL-injection vulnerabilities).
We have discounted solution 4 as the user should not have to deal with the complexity involved - we should be aiming to offer the best possible user experience, and this is not it.
Therefore we have decided to pursue option 5 - concentrating on separation of the data at the point of the access.