The new European GDPR (General Data Protection Regulation) is going to be enforced in May 2018 and companies have started preparing for it.
In a nutshell, GDPR will:
- Require you to be more open on how you store and process user data
- Expand the definition of such data
- Set heavy sanctions for companies that do not comply with the regulation
The preparation process - seminars, workshops and such - are run by privacy officers and privacy lawyers. This is understandable, but it has one major flaw: everything is based on documentation.
Documentation is never a detailed map of a system. The documentation of a system is always an executive summary of the system. It's never detailed enough to fully describe the system from a specific point-of-view. In this case we're interested in the data, how it is stored, where, based on what thresholds, and how can we remove it. There is no document that describes it, unless it's written exactly for this purpose.
A generic technical documentation rarely describes the data flows through the system, or different ways the data is processed, copied, cached, and removed. It's usually written from the system perspective, not from the data perspective. And in a lot of cases the system is way too complicated for a single person to know how it works in detail. The chief architect, the service owner, or such knows how the system works on a high level, but they don't know the details.
Why are we, the technical experts, silent about what actually doesn’t match the documentation? Because that's how things work day to day. We don't go into details, known or unknown, with non-technical people.
If you ask: "Where do we store the user's data", we interpret you asking "What's the master data storage of the user data". Just to make things simple. The typical answer would be: "We store everything in the CRM." If the data is in the CRM, we can integrate the CRM to other systems, we can edit the data in the CRM, and we can trust the CRM to have up to date information at all times. For most purposes, the data truly is only in the CRM.
What we didn't tell you is that the user data, in total or partially, is stored in a lot of other places, too. They're just not places that are relevant in a lot of situations.
"The user data is also cached to the form submissions table of the self-service platform. And to the caches of the system. And to a log file that's auditing all changes to the user data. It's also stored in the database of the self-service platform, then replicated to three different continents for redundancy and fast access. Then backed up. Then it's moved to the Enterprise Service Bus, a message queue, in the cloud. It's probably stored there, too, we have no way of knowing that. From there it'll go to the CRM, which is the data master. And to the CRM's backups. The CRM indexes all the data to a cloud indexing service we use as the search backend."
Can the law really require us to explain and control all that? I have no idea. But if there's a law saying we need to describe where we store the users’ data, shouldn't we tell where we knowingly store it? What happens if someone wants to use their right to be forgotten? Is it enough we remove it from one place, but leave it to the other 15?
In my opinion, all data controllers should map out the actual use of user data in detail during 2017. External consultants can help, where needed, to get to the bottom of it. Ask the technical staff to think about removal, portability and aggregation of the data. Tell them what the GDPR requires.
That's the only way to be prepared for what you'll have to change in 2018.
Kalle Varisvirta, Technology Director