Skip to content
GitHub Copilot is now available for free. Learn more

Privacy engineering: 8 tips to mitigate risks and secure your data

Understand what can go wrong and how to protect against the most likely scenarios.

Artwork: Ariel Davis

Photo of Ayden Férdeline
 logo

Ayden Férdeline // Researcher and Policy Analyst,

The ReadME Project amplifies the voices of the open source community: the maintainers, developers, and teams whose contributions move the world forward every day.

System design can address privacy proactively or reactively. You either build systems that are intentional about respecting privacy from day one, or you accept that you might eventually end up in a painful compliance exercise that requires you to re-engineer your product or say goodbye to it. This is the difference between a bug in the code and a flaw in the code. Faulty architectural assumptions can lead to cracks in the virtual foundations of software and systems, which are difficult—and sometimes impossible—to correct. 

I speak from experience here. Four years ago, I was appointed to an expert working group that sought to bring a legacy database, developed decades earlier, into compliance with data protection law. But sometimes you just can’t make the circle into a square. No matter how much privacy expertise we brought in or spent on outside legal counsel, the problem remained that the database was not compliant with data protection law, and re-engineering for compliance was impossible. It had to be decommissioned. By the time the organization accepted this reality, authorities had begun investigating the product in question. To make matters worse, real people were being harmed by it, as the database was harnessed by bad actors to stalk or dox victims. Addressing the product’s privacy issues became such a bottleneck in the organization that almost nothing else was getting done. 

 Privacy engineering is a new field, and was unheard of when that database was envisioned. That is not to say that sloppy or uninformed privacy work in critical systems was ever acceptable; however, it was somewhat understandable. Thankfully, we have learned a lot over the past two decades about assessing, mitigating, and addressing privacy risks. When we build systems from scratch that are intentional about respecting privacy safeguards, we keep people and data safe. 

Here are eight lessons I’ve learned about designing for privacy:

Distinguish privacy risks from information security risks 

You might be tempted to address privacy and security risks at the same time, especially when resources are tight (which, let’s be honest, is always the case when it comes to privacy programs). This strategy can make a lot of sense, as there are overlaps, but it is important to understand the differences between the two and not conflate them. Privacy risks concern people, while security risks concern data. You manage privacy risks on behalf of other people, who will be directly affected by the consequences. Protecting data is information security, not privacy—it’s important, and you don’t want your trade secrets stolen. Still, privacy is ultimately about protecting the interests of the people behind the data. In many jurisdictions, such as the European Union, you, as the data processor, have the legal obligation to put the interests of the data subjects above your own interests. Good privacy practices usually support good information security practices, but the inverse is not always true. Data being secure doesn’t make its very existence harmless.

Not all data is personal information

Addressing privacy can be daunting because you might think that all data constitutes personal information. That’s not the case. The definition of personal information varies by jurisdiction, and is often much narrower than you might think. Some elements are indeed personal information—a social security number or a home address, for example—but many other personal facts, like someone’s age, weight, job title, or wealth, do not constitute personal information in and of themselves. You might still want to keep this information private, but when you classify your data, bear in mind your legal obligations extend only so far. Your moral compass can, of course, go further.

Some data is especially private

When classifying data, keep in mind that some data isn’t just personal information; it’s highly sensitive. Laws often require that information such as biometric data, trade union membership status, religious or philosophical beliefs, or personal data concerning one’s sex life be stored in a more stringent manner. This requirement can be context dependent. Medical records and financial records need to be handled more carefully than, say, a wedding registry wishlist. As a result, you might want to consider what access control mechanisms can be placed on this data and how to secure the data when “at rest” and “in motion” between systems. Beware of your data blindspots: You probably don’t want applications storing sensitive information in server logs meant for troubleshooting. You can accept some risks, but as with any trade-off, you should always be clear as to what these risks are and how these could harm others. There’s an obvious ethical component here.

Only collect the data you really need

A fundamental principle in data protection law is data minimization. The personal information you collect must be relevant and limited to what is necessary to accomplish your purpose for processing that data. Leaving aside the fact that this is a legal requirement, this principle can make it easier to implement an effective privacy program. There’s no need to classify a data element or assess whether it belongs to a sensitive category if you never collect it in the first place! For example, you can prevent the accidental collection of overly sensitive information by using checkboxes to limit user input, instead of allowing free form input fields. You can also use automated processes to audit and remove select information, such as credit card numbers or social security numbers, from a field before this data is ingested into enterprise systems and backed up in a thousand locations.

You can’t truly de-identify data, so don’t say you can

It is reassuring for users to hear that their personal information will be anonymized before being shared with third parties. The problem is, data anonymization is extremely difficult, and, in all likelihood, not possible. Even the largest enterprises struggle to de-identify data from the original data subject. The chance that data will either not be sufficiently scrubbed, or your dataset will be combined with another dataset, is too high. These scenarios would allow the identity of a natural person to be deduced, or for “anonymized” data to be re-identified. The privacy consequences here are so grave that the Federal Trade Commission has labeled this a form of consumer deception

Trust no one

It’s generally a bad idea to share data with third parties. Even if your systems are secure and well-designed, can you be certain the same is true of the organizations you share records with? There are countless examples of organizations being found jointly negligent for the data processing activities of sub processors and other third parties with whom they share data, even when contracts supposedly limit liability. If you are going to share data, don’t take anyone’s word for it that they know how to handle data securely; verify that they know what they’re doing. Ask to see their breach response plan, cybersecurity insurance, and precise details of internal operating procedures—and verify these are in effect with an independent auditor. And remember proportionality: Only send a third party the information that they truly require, not everything and the kitchen sink.

It’s not just third parties that you need to be wary of: Insiders can also be bad actors. Staff who have opportunities to misuse systems just might, so don’t forget to account for the fact that properly authenticated and authorized users who should have access to records might need to be distrusted from time to time. Think through how and when you’d revoke an internal stakeholder’s access to data and how you’d deal with an internal incident.

Communicate openly and honestly

Some privacy risks may need to be accepted when budgets for privacy work are small, when internal expertise is limited, or when there is just no way to minimize a danger. Be upfront and honest with people about how their data will be used and what could potentially go wrong. These risks need to be managed carefully, and people should be voluntarily opting in to high-risk data processing activities. Never assume someone would be okay with what you’re doing. At the end of the day, you’re a custodian for someone else’s personal information, and it’s the unexpected use of personal information that usually sees users running to data protection authorities to lodge complaints. When you remember the user and get their approval to do something, you’re respecting their wishes. (Just remember, a basic tenant of law is that you cannot ask anyone to consent to something unlawful, such as a prohibited data processing activity.)

Build your work around the gold standard in data protection

There are now over 130 countries with data protection laws and regulations. Some are more stringent than others. You should seek legal advice on your specific circumstances, particularly if you are processing sensitive data. Don’t have legal counsel? If you’re a startup or small business, contact a local law school with a legal clinic and ask for advice. They can help you directly or refer you to licensed attorneys who will work pro bono or for a reasonable fee, depending on your resources.

As a general rule, complying with the most robust privacy framework—the European Union’s General Data Protection Regulation (GDPR)—will set you up for compliance with more forgiving laws. It’s almost always better to over-comply with a law than to under-comply. And as privacy laws and regulations around the world harmonize, the GDPR is rapidly becoming the gold standard for emulation. One GDPR principle requires that you conduct a data protection impact assessment before engaging in any high-risk activities. Several government agencies have developed ‘how-to’ guides for completing such an assessment. These are handy to identify the privacy risks your products pose. At the end of the day, your system won’t be privacy-friendly just because you secure its data well—it will be privacy-friendly if you understand what can go wrong and mitigate against the most likely harmful scenarios. And because history so often repeats itself, the most common harmful scenarios are already known and thus preventable.

Ayden Férdeline is a public interest technologist based in Berlin, Germany. He was previously a technology policy fellow with the Mozilla Foundation, where he researched the ongoing development and harmonization of privacy and data protection laws around the world. As an independent researcher, he has studied the impact of the internet on society for Coworker.org, the National Democratic Institute, the National Endowment for Democracy, and YouGov. He is an alumnus of the London School of Economics.

About The
ReadME Project

Coding is usually seen as a solitary activity, but it’s actually the world’s largest community effort led by open source maintainers, contributors, and teams. These unsung heroes put in long hours to build software, fix issues, field questions, and manage communities.

The ReadME Project is part of GitHub’s ongoing effort to amplify the voices of the developer community. It’s an evolving space to engage with the community and explore the stories, challenges, technology, and culture that surround the world of open source.

Follow us:

Nominate a developer

Nominate inspiring developers and projects you think we should feature in The ReadME Project.

Support the community

Recognize developers working behind the scenes and help open source projects get the resources they need.

Thank you! for subscribing