University of Valencia logo Logo Privacy and Digital Transformation Chair Microsoft-UV Logo del portal

Anonymization & Pseudonymization: Legitimate Interests as legal basis for research activity

  • July 7th, 2022
Research

Professor Ricard Martinez exposes the European legal framework to protect personal data in healthcare research. Also, he highlights the necessary requirements to comply with this legislation.

The asymmetries between the rules applicable in the countries of origin of the data create problems that can only be solved by anonymization. Moreover, while the standards require a type of anonymization determined by the risk of re-identification through reasonable effort, the data protection authorities have been requiring irreversible anonymization since the Working Party’ Article 29 Opinion 5/2014. 

Thus, the use of data analytics methodologies for health research has evolved in the last two years towards very well-defined and common strategies in the countries of the European Union, firstly, anonymization and, secondly, the construction of federated data spaces where processing applies national law. 

As you know, the General Data Protection Regulation (GDPR) considers that health research represents a clear public interest and should be promoted. To this end it has provided legal bases legitimizing this processing (Articles 6, 9(2)(j) and 89). Also, it has considered as compatible uses of the data those relating to research (Article 5(1)(e)). And, finally, requires regulatory development by the European Union or by Member States' laws to include both the grounds of legitimacy for processing and conditions for processing that provide adequate safeguards (Articles 6, 9(2)(j) and 89).

Indeed, Article 89 (1) includes a clear preference regarding the identifiability of the data. Anonymization should be the rule and pseudonymization the safe alternative in case anonymization proves impossible. Moreover, the latter technique is conceived as necessary by default and as an adequate safeguard. 

 

Risk-based approach

As we said, the reference framework on anonymization was defined by the Article 29 Working Party. The Opinion 05/2014 on anonymization techniques is based on a risk-based approach focusing on re-identification risk through singling out, link ability and inference. Summarizing, the working party considers that:

  1.  Anonymization is a processing itself. Therefore, personal data must have been collected and processed in compliance with the applicable legislation. This implies the need to obtain verifiable evidence of the legitimate origin of the data, the compliance with the transparency right of the patients and the need for appropriate ethical and legal approvals for the processing of the data.
  2. Anonymization is a technique applied to personal data to achieve irreversible de-identification.
  3. The concept of reasonable effort on the part of the controller or any third party requires consideration of:
    1. The technical means available considering possible changes in the evolution of information technologies.
    2. Ensuring the maximum level of aggregation of data.
    3. Third parties should analyze the risks of re-identification. This implies that when a research project receives anonymized data it should implement an additional verification process aimed at analyzing the risks of singling out, linking and inference. 
  4. Anonymization does not imply the absence of rights of the persons concerned. 

In our view, this may imply additional obligations such as providing traceability of the use of anonymized datasets so that we are able to identify their origin, the conditions and purposes of use of the data, the users of the data repository and the actions executed. 

 

Permanent anonymization

 

Finally, the document takes the Directive 2002/58/EC as a reference and goes so far as to state the following: The underlying rationale is that the outcome of anonymization as a technique applied to personal data should be, in the current state of technology, as permanent as erasure, i.e., making it impossible to process personal data.

To achieve this goal, the Article 29 Working Party proposes a combined strategy of randomization, generalization, pseudonymization, differential privacy, diversity and proximity techniques. On their side, data protection authorities, some of whose guidelines provide criteria not only in technical but also in organizational terms. For example, the Spanish Data Protection Agency recommends a double-layer anonymization process with independent teams and with a clear segmentation of tasks and responsibilities in terms of decision-making, risk analysis or security. 

But, what could be the scenario behind the Proposal for a Regulation of the European Parliament and of the Council on the European Health Data Space? The Proposal refers to anonymization and pseudonymization at several points. The standard designs a system based on the creation of one or more National Health Data Access Bodies that would operate as an intermediary and control agent in relation to secondary uses of data. 

 

Processing of special categories of data

 

The EHDS Proposal states in its Recital (45) two basic principles for the processing of special categories of data. Then “privacy by design” and “bring questions to data instead of moving data” should be respected whenever possible. 

These principles should be considered together with the second and third paragraphs of Art. 44 of the Proposal. This rule proposes an approach consistent with Article 89 of the GDPR. It prioritizes as a first option the anonymization of electronic health data where it is possible to achieve the purpose of the processing requested by the user. Failing this, health data access bodies shall provide access to electronic health data in pseudonymized format. 

Moreover, a strategy identical to that of the Spanish law is adopted in the seventeenth additional provision on health data processing. Like the Spanish law, it defines a scenario of functional separation between those who hold the identifiers and those who research with data and a commitment to non-re-identification for the latter, since “the information necessary to reverse the pseudonymization will only be available to the body accessing the health data” and “data users will not re-identify the electronic health data provided to them” in this format.

Finally, where data are to be requested in pseudonymized format, an ethical assessment may be required in addition to a justification of the reasons for the request.

The EHDS Proposal is consistent with the criteria of the Opinion 5/2014 and the GDPR. Particularly instructive in this respect is Recital (46) from which common principles with the aforementioned can be drawn:

  1. Secondary use of data shall require the invocation of a legitimate basis for processing founded on the GDPR as well as compliance with the principles of Article 5 of the GDPR (purpose limitation, data minimization and security).
  2. The use of anonymized data is preferred and if pseudonymized data is required the data requester should explain why this is necessary and why anonymized data would not be sufficient. When the applicant needs anonymized statistical data, it should submit a data request application, requiring the health data access body to directly provide the result. 
     

 

Imatge ciència

Pseudonymization as an alternative

Recital (60) completes the EHDS Proposal's approach to anonymization by considering not only the risk of re-identification but also the fact that (60)     “certain categories of electronic health data can remain particularly sensitive even when they are in anonymized format and thus non-personal. In practice, the impossibility of irreversible anonymization is being recognized as there will always be a residual risk of re identification in several cases as rare diseases”

Article 44(3) of the EHDS Proposal provides for pseudonymization as an alternative when anonymization is not feasible. Where the purpose of the data user’s processing cannot be achieved with anonymized data, taking into account the information provided by the data user, the health data access bodies shall provide access to electronic health data in pseudonymized format. The information necessary to reverse the pseudonymization shall be available only to the health data access body. Data users shall not re-identify the electronic health data provided to them in pseudonymized format. The data user’s failure to respect the health data access body’s measures ensuring pseudonymization shall be subject to appropriate penalties.

Finally, the requirements to be formalized according to Article 45 of the EHDS Proposal should be considered when requesting access to data includes: 

  • a detailed explanation of the intended use of the electronic health data, including for which of the purposes referred to in Article 34(1) access is sought;
  • a description of the requested electronic health data, their format and data sources, where possible, including geographical coverage where data is requested from several Member States;
  • an indication whether electronic health data should be made available in an anonymized format;
  • where applicable, an explanation of the reasons for seeking access to electronic health data in a pseudonymized format;
  • a description of the safeguards planned to prevent any other use of the electronic health data;
  • a description of the safeguards planned to protect the rights and interests of the data holder and of the natural persons concerned;
  • an estimation of the period during which the electronic health data is needed for processing;
  • a description of the tools and computing resources needed for a secure environment.

 


Risks in data anonymization process 

Today, the anonymization of the data constitutes the most common technique for the development of retrospective studies and for the creation of data repositories. Its scope is more limited in prospective studies for which it is not unusual for national laws to require consent as a basis for legitimate the data processing. 

This has also led to the implementation of very precise privacy safeguards. Thus, from the point of view of the origin of data collection research projects usually require accredit the legitimate origin of the data; provide evidence of its procurement by the project either by declaratory acts of the data provider, or by accreditation of the terms of use of the source open data environment, or by a data sharing agreement; and, finally, have an ethics approval statement issued by an ethics committee accredited under national law. 

However, this is not sufficient and, from the point of view of anonymization, different strategies are adopted:

  1. Applying risk-focused methodologies by reviewing the anonymized datasets that are delivered. In practice this involves incorporating two or even three layers of anonymization. 
  2. Building data spaces that from a controlled open data philosophy include legal and technological measures. The first of these include the signing of data sharing agreements, acceptance of terms and conditions and the assumption of non-re-identification commitments. From a security point of view, the user must be registered and his or her action in the repository must be traceable. 
  3. The adoption of increasingly sophisticated anonymization techniques such as differential privacy techniques, Multy-Party Computation and/or synthetic data generation. To some extent, the use of encryption techniques is a common element in many of these techniques. 

 


The new legal framework

The entry into force of the Data Governance Act and the forthcoming EHDS Proposal raise significant questions. The first foresees the promotion of data reuse through the principle of "open by design and by default" by promoting the creation and collection of data in formats and structures that facilitate anonymization. In the public health system this will be realized in each state through “one or more competent bodies”, which may be sectoral, to support the public sector bodies which grant access to the re-use of data. 

They will provide support:

  • providing technical support by making available a secure processing environment for providing access for the re-use of data;
  • providing guidance and technical support on how to best structure and store data to make data easily accessible;
  • providing technical support for pseudonymization and ensuring data processing in a manner that effectively preserves the privacy, confidentiality, integrity and accessibility of the information contained in the data for which re-use is allowed, including techniques for the anonymization, generalization, suppression, randomization of personal data or other state-of-the-art privacy preserving methods, and the deletion of commercially confidential information, including trade secrets or content protected by intellectual property rights;
  • where relevant, assisting the public sector bodies to provide assistance to re-users in requesting consent for re-use from data subjects or permission from data holders in line with their specific decisions, including on the jurisdiction or jurisdictions in which the data processing is intended to take place, and assisting the public sector bodies in establishing technical mechanisms that allow the transmission of requests for consent from re-users, where practically feasible;
  • providing public sector bodies with assistance on the adequacy of undertakings made by a re-user, in case of transmit of non-personal confidential data or data protected by intellectual property rights to a re-user which intends to transfer those data to a third country.

On the other hand, the supporting infrastructure in the case of health is attributed by the EHDS Proposal to National Data Access Bodies. Finally, it becomes clear that healthcare systems, hospitals should be considered as data holders, and a similar position could be considered for the data repositories that are being created in multi-partner research projects. Data holders will undoubtedly have the task of anonymizing the datasets, cataloguing them appropriately and providing trustworthy conditions. 

However, the risk inherent in any data anonymization process leads us to conclude that there is only one way out. Given the need to achieve and demonstrate a level of irreversible anonymization equivalent to erasure, there is no other recourse but to use pseudonymization strategies. Even when data sets are anonymized, the control, security and traceability techniques that apply to pseudonymized data should be applied. 

 


Legal documents cited in the text:

 

 

Professor Ricard Martinez exposes the European legal framework to protect personal data in healthcare research. Also, he highlights the necessary requirements to comply with this legislation.