Based in San Francisco Bay Area, Securesql is a blog by John Menerick. His insights dissect complex systems, offering a masterclass in cyber guardianship through expert analysis and cutting-edge protective strategies.

PII Code Review

GDPR

Let’s jump right into it. GDPR stands for the General Data Protection Regulation and it is a law adopted by the European Union. The EU Charter of Fundamental Rights stipulates that EU citizens have the right to protection of their personal data and GDPR is the law that enforces this right.

It may seem like a straightforward, common sense goal, but may not be easy to achieve once you start to look at the details.

Let’s break down the law, in layman terms, and see how coding is affected by these requirements.

DISCLAIMER: I am not a lawyer so do not take this as legal advice. Speak with your corporate counsel or data protection office for official guidance :)

First something to get your attention.

GDPR Penalties

Penalties can amount up to €20 million or 4% of the business’s total annual worldwide turnover. Recently in the news: Google has received a €50 million fine for “lack of transparency, inadequate information and lack of valid consent regarding ads personalisation”. Google is planning to fight the ruling.

Fifty million euros is not a very large amount for Google but may not be so easy to handle by others. So first aspect: your company may be severely impacted financially by infringement of the law.

Informed Consent

So why did Google received the aforementioned fine? Because it allegedly failed to inform its users how their data is being used. Was Google missing a Privacy Policy? Highly doubt it. Could that privacy policy be read by a regular person, that is not a lawyer? Well, I’ll let you figure out the answer.

Does your site or service have a Privacy Policy? Is it clear, concise and easy to understand or does it contain a lot of legal terms and fine print? Even if the Privacy Policy is in place and it is legible, does the software behave according to the privacy policy?

As developers we try to take advantage of data to unlock better and greater functionality in our software. That is perfectly fine as long as our actions are transparently communicated to our users and as long as our users fully control how their data is being used. This information can be better communicated through user interface and product documentation instead of a stuffy Privacy Policy document.

Code review tip #1: If the code collects personal information ensure that this action is described in product documentation and that it can be configured by the end user. The person should be able to access and delete their own data and choose whether their data is collected in the first place.

Data Controller and Data Processor

Since we spoke about the requirement for a person to control their data, this is a good time to introduce some GDPR related terms.

Data Controller: It’s the entity controlling the data, “determines ‘why’ and ‘how’ personal data should be processed”.

Data Processor: Handles the personal data on behalf of a controller.

If your software powers a cloud service that other companies use to handle personal data then your company is a data processor. Furthermore the cloud platform you may be leveraging to host your software is also a data processor.

A company that is using your service can be the data controller or can be in turn a data processor for a different data controller.

This creates an interesting Russian doll effect, which is not fun at all when dealing with contractual obligations and vetting security requirements.

The European Union site provides the example of a company using a payroll service to store employee private information. Imagine the payroll service is then in turn using a web hosting provider which in turn is using a cloud platform like AWS or Azure. So if the data controller wanted to ensure the security of the data they would have to review security practices from the payroll service, the web hosting company and finally the cloud services provider.

Code review tip #2: Before your code is sending user data to a 3rd party you must ensure that there is a data processing agreement in place with that 3rd party. For example, the agreement allows your company to retrieve any data collected for an individual as a result of a data access request. Such a decision is likely not something to be made during code review, so if you are the code reviewer and you spot a 3rd party host name that was not there before you need to ask for more details.

With the ever increasing expansion of cloud services it becomes more and more likely that individual data may end up in the wrong places and used for unintended purposes. Take for example the Facebook — Cambridge Analytica scandal.

Data Classification

Not all data is personal in nature, should we protect all data to be on the safe side? In my opinion, when in doubt err on the side of caution, but protecting everything will create unnecessary burden on a development team.

The European Commission provides the following definition for personal data:

Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of a particular person, also constitute personal data.

It is very important to come up with a Data Classification document pertaining to the software. Here’s a simple example:

1.png

Code review tip #3: Consult/update the data classification document for any code change that handles data.

Pseudonymizing/Anonymizing Data

If the personal data is not needed or if it provides marginal value to the business case, the safest approach is to avoid collection.

Code review tip #4: Ask yourself, does the software really need to store a person’s age and home address? Avoid the approach of collecting all the information just in case.

If business requirements exist to analyze and process personal data, data can undergo a process of de-identification where personal information can be removed through various methods.

Anonymization is the process of completely removing the sensitive portions of the data:

  • Replacing characters with ***: [email protected] becomes j****[email protected]

  • Applying a one way salted cryptographic hashing algorithm: Jane Doe becomes EBC95E915A….1DC437491326E

Pseudonymization is the process of replacing the sensitive portions of the data with pseudonyms:

  • Using an indexing mechanism: Jean Dupont becomes User 8472

  • Encrypting the sensitive portions of the data with a key.

The main difference between Pseudonymization and Anonymization is that Pseudonymization allows reverting the data back to its original form.

Code review tip #5: Software logs are one of the most likely avenues for data leakage. Logs easily cross the boundary between production and development environments. Logs can be sent to 3rd party support organizations and Security Information and Event Management (SIEM) systems. Logs can be stored indefinitely in support and development tickets. Logging is part of common software changes and likely to be part of code review. Before logging ensure personal data is anonymized or pseudonymized.

Obviously personal data should be anonymized and pseudonymized everywhere it is stored or transmitted not just in logs. If an attacker gains access to the database and the personal data is not encrypted or the encryption is weak, then we have a data breach.

Over the past couple of years, occurrences of personal data stashed in unprotected cloud resources have become common place. Remember that a data breach of this kind qualifies for GDPR penalties.

Code review tip #6: Ensure personal data is encrypted in storage and that it is transmitted over a secure connection. The encryption key should be stored in a different location such as a Key Management System (KMS). Transparent data encryption offered by database servers only protects from physical attacks. TDE won’t protect from software flaws like SQL Injection or access flaws where an attacker obtains access to the database password. It’s best to ensure personal data is individually encrypted on top of platform/infrastructure protection methods.

How Software Best Practices Help with GDPR Requirements

Let’s do a quick recap to review how the code review tips in this article help with adherence to GDPR.

Data classification, minimizing data collection and ensuring anonymization/pseudonymization are measures that ensure the Data Protection by Design and by Default requirement.

Transparency and person control over the data, as well as handling 3rd party relationships, allow companies to respect the GDPR requirements regarding Right of Access and Right of Erasure.

Compliance Standards

Before the end the article let’s take a quick look at security compliance standards. Compliance standards help companies speak the same language when it comes to Security.

Let’s take ISO 27001 for example. ISO 27001 is an internationally recognized standard, that outlines the requirements for establishing an Information Security Management System (ISMS). For example, ISO 27001 provides a set of security controls, which specifically address security of the data and development practices:

  • A.10.1.1 — A policy on the use of cryptographic controls for protection of information shall be developed and implemented.

  • A.14.2.1 — Rules for the development of software and systems shall be established and applied to developments within the organization.

But ISO 27001 covers more than that. It outlines all aspects of Information Security, from physical access to network access, from user management to data management.

When an organization wishes to vet a 3rd party vendor or service, they can use the ISO 27001 framework to ensure compliance with internationally recognized security standards.

Having an ISMS in place practically ensures the European legislation requirement to have Technical and Organizational Measures in place to protect the data. A company can engage a certified ISO 27001 auditor to obtain a certificate demonstrating adherence to the standard.

Other well known security compliance standards are:

  • SOC 2 — Required for publicly traded companies in United States

  • PCI-DSS — Required for companies that handle credit card information

  • HIPAA — Required for companies or health care providers that handle patient information in the United States

There is significant overlap between all security compliance standards. They almost all cover the same requirements regarding Network Security, Access Control, Data Protection and Software Development. While there are some differences between them, if you have one you are likely adhering to the other.

To sum it all up

Some key takeaways from this article:

  • GDPR penalties could put a company out of business

  • Avoid collecting personal data, but if collection is needed allow individuals to control their data

  • Cloud services create a Russian doll effect and complexities in controlling the data

  • Protect the data with anonymization and pseudonymization

  • Compliance standards help companies speak a common language and demonstrate technical and organizational measures for protecting data

Hopefully you’re still with me. I know compliance is no fun but it is important. Security and privacy of personal data is a subject that impacts all of us.

Memory Safety Code Review

Data Controls Code Review