Say that you want to develop a new payment gateway for your online store. A team of developers are hired, the improvements you want are designed, and the new system is created. Finally, you need to test it all, to ensure that the improvements will work the way you intend them to, and also to ensure that all the old payment information still works correctly.
What data do you use to test the system?
According to a recent report by the Ponemon Institute, 80% of companies use a copy of their ‘live’ production data. By ‘production data’ I mean they take real customer records and real credit card details, and give them to the developers. The developers run all the tests they want, send tests offshore and, once the system is working to an acceptable standard, deploy it and sign off. The test data is usually erased.
It seems obvious, but given that 80% of companies are doing it, maybe it isn’t. Given the amount of money most organizations spend to secure their live environments from external threats, it is puzzling that the same companies will take direct copies of these systems and allow them to be used in non-secure environments. Imagine yours and my personal banking information being shared amongst numerous people in test and development teams. Scary.
You’d hope that all of your developers are competent, friendly, moral people, but you can’t guarantee that. A disgruntled developer could use the data to severely damage your company’s reputation and public image. Maybe that’s not very likely – you look after your developers, of course, and ensure that they don’t end up so antagonistic – but what is likely is that somebody makes a mistake. Maybe somebody accidentally places a real charge on the credit cards, rather than just a virtual charge. Maybe somebody takes a copy of the data onto their laptop to work with it while commuting, and their laptop is stolen. No matter how good your developers are, occasionally, everybody has a bad day.
However, while you might be willing to take the risk, the law is not willing to let you.
There are a number of regulations governing the way that personal information is handled. Payment card information, for example, is governed by the PCI DSS standard. It dictates the security measures and policies that must be in place, such as encrypting all card data on public networks, and having firewalls and up-to-date antivirus software installed. Other standards include the USA’s GLBA for financial data, HIPAA for healthcare information, the UK’s data protection act, the European Data Protection Directive, and many others. A common theme across all data protection acts is the principle that data should be kept on a ‘need-to-know’ basis.
Still, maybe you think you can justify that the developers ‘need-to-know’ the data, because the systems need to be tested. Even if you successfully argue that, you’ve then got another problem: now you need to take measures to protect the data on the developer machines, just as you have to for your production database servers. Just because it’s ‘non-production’ doesn’t mean it’s exempt from the regulations.
What does that mean in practice? If you’re complying with HIPAA regulations, you have to keep the development offices physically secure, with full sign-in and sign-out logs for developers (HIPAA §164.310), and provide and maintain a full training program to ensure developers are using the data appropriately (HIPAA §164.308 (5)(i)). The PCI DSS will require that your developers be fully audited (PCI DSS v2 10.2), and that the software they’re developing is perpetually secure (PCI DSS v2 6.3), even while it’s still in development. The UK Data Protection Act actually states that data may only be used for the specific purposes for which it was collected (DPA98 Sch1 I.2), so unless at the time of collection you tell the user that you’ll use their data for testing purposes, then using it at all is a DPA violation.
In short: unless you’re taking lengthy and expensive measures to ensure that your development and testing environment is just as secure as your production environment, then it’s not legal to use production data in development and testing.
What can be done about this? You’ve got to test with some kind of data.
The most popular approach is data masking. Data masking takes a copy of your live production data, and then de-identifies the sensitive content. The masked data no longer contains sensitive information, and so is not covered by any regulations, and can be freely shared with developers.
Data masking is quick and fairly easy to understand, which is why it’s a popular method. However, it’s not without a fair number of problems, foremost of which is that successfully masking data to the point that sensitive information can’t be inferred or deduced is often extremely hard. Likewise, once highly sensitive data is masked to the level where it can never be traced back to individuals or re-engineered, it is pretty much useless. For this reason, there is an alternative called test data creation, which is the automated creation of completely synthetic data; it can mimic your production data, but is not directly derived from it, making it free from regulation.
In conclusion: Using live data in non-production is either illegal or expensive. For the companies using it illegally, it’s only a matter of time before somebody slips up and the practice is discovered. For the companies paying extra to keep their developers compliant, they’ll find themselves resistant to new development and undercut by companies who’ve used their data in a strategic way. In the long run, the tiny benefit is just not worth the risk.
About Richard Fine
Richard Fine is a Technical Writer for Grid-Tools Ltd. He graduated from Oxford University in 2009 with a Master's Degree in Computer Science. A programmer since the age of 4, he has extensive experience working with real-time interactive simulation systems, as well as a range of web technologies. He has received the MVP Award from Microsoft 5 times for his contributions to online communities and IT education.