Data protection and anonymisation procedures
More data protection through anonymisation
The anonymisation is one of the most important measures to individual-related data to protect them. For it deprives them of direct or indirect reference to persons. In the Recital 26 the DPA states: "The principles of data protection should [...] not apply to anonymous information, i.e. information which does not relate to an identified or identifiable natural person, or personal data which has been rendered anonymous in such a way that the data subject cannot or can no longer be identified". At the same time, the data should remain useful for analysis despite the anonymisation.
Anonymisation removes identifiable information such as names, telephone numbers and e-mail addresses, or modifies the records to make them less precise, for example by adding "noise" to the data. A distinction is made between absolutely and factually anonymous data.
Absolutely anonymous data are modified by coarsening and removal of features to such an extent that identification is prevented. Data are referred to as virtually anonymous if de-anonymisation cannot be completely ruled out, but the data can "only be allocated with a disproportionate expenditure of time, cost and labour", as defined in Section 16 (6) of the Federal Statistics Act. According to this regulation, however, the data must be related to scientific projects and institutions.
Many procedures are not user-friendly
There are several anonymisation procedures. Common are procedures for reducing information (for example, aggregation and class formation) and for changing information (for example, the swapping procedure). However, the poor usability of many Internet anonymisation solutions is problematic. Privacy-friendly techniques will only conquer the mass market if they are set by default and do not lead to significant limitations in service quality such as latency and bandwidth.
A special form of anonymisation is data masking. It preserves the format of the data, but changes its values to prevent the identification of persons. The data is alienated or replaced by fictitious data. Alternatively, data can be shortened in such a way that it no longer has any meaning about specific persons. There is static masking, which removes the personal reference from stored data, and dynamic masking, which changes data almost in real time so that no personal data is stored.
Especially for the subsequent anonymisation of databases there are different tools available. However, it is better to avoid personal data from the outset as far as possible. In market research, this is usually possible because it is not the individual consumer but groups of consumers who are being investigated. "Especially the development of anonymisation and pseudonymisation procedures as privacy-by-default solutions represent an important contribution to the protection of data privacy", said the Federal Data Protection Commissioner Andrea Voßhoff.
Anonymisation can be cracked
However, even anonymisation does not offer a hundred percent guarantee for the protection of privacy. Researchers from Imperial College London and the Université catholique de Louvain have developed a machine learning model that calculates how easy it is to identify people using an anonymised set of data comprising postcode, gender and date of birth. The study shows how far anonymisation technology falls short of our ability to crack them.
A recommendation to companies is therefore to use differential privacy. This is a complex mathematical model that allows organisations to exchange aggregated data about user habits while protecting an individual's identity. Differential privacy aims to make the answers to database queries more accurate without identifying the records used to answer them. Attackers cannot then determine whether a particular person is contained in a database. This technique will undergo a first major test in 2020, when it will be used to secure the US census database.