Research participants must identify which data sets constitute personal data to ensure compliance with the GDPR.

By Frances Stocks Allen and Mihail Krepchev

The UK Medical Research Council (MRC) has published a useful guidance note on the identifiability, anonymisation, and pseudonymisation of personal data in the context of research activities (the Guidance). The Guidance reminds research organisations that the General Data Protection Regulation (GDPR) applies to health data used in research and contains a number of recommendations that participants in the research process, particularly clinical trial sponsors, should bear in mind. The Guidance has been developed with the participation of the UK privacy regulator, the Information Commissioner’s Office (ICO).

Identifiability of personal data is a continuum

The Guidance reiterates that identifiability is a continuum. Fully identifiable data, e.g., data including a person’s name, sits on one end of the continuum, whereas fully anonymised data, i.e., data from which it would be impossible to identify an individual, sits on the other. Key-coded (or, in the terminology of the GDPR, “pseudonymised”) data, as is commonly used in many research contexts, sits in between fully identifiable data and fully anonymised data. Clinical trial sponsors need to be in a position to decide where each data set they process falls on this continuum in order to ensure they apply the GDPR correctly. As explained below, context is crucial in this determination.

On the other hand, sponsors are reminded that GDPR applicability is a binary issue and that GDPR obligations kick in when “personal data” is concerned, which may include key-coded data. The Guidance reminds organisations that the likelihood of identifiability is judged from the perspective of a hypothetical person “who may be more motivated than most, using all means that might reasonably be available to them” to identify individuals.

Anonymisation is context-specific and considers what the recipient of the data has available to them

Where data falls on the identifiability continuum depends on both the content and form of the data and, crucially, the context in which the data is processed or shared.

The Guidance refers to information being “anonymous to” a recipient. This expression highlights that the same data set may be anonymous in the hands of one party, but identifiable in the hands of another party. For example, a key-coded patient list might be anonymous in the hands of an unrelated third party that has no access to the key, but may be identifiable in the hands of the treating physician.

In addition, two or more data points which, taken in isolation, would be anonymous, may become personal data if they are reasonably likely to identify someone when combined (also known as jigsaw identification).

The leading Court of Justice of the European Union (CJEU) case on identifiability is Breyer v. Bundersrepublik Deutschland. This case held that key-coded data in the hands of a party (even if another party holds the key) is likely to be considered personal data if that party has the “means likely reasonably to be used” to access the key and to combine the key with the key-coded data. The CJEU noted that a party would not have the “means likely reasonably to be used” to identify data subjects if the party is “prohibited by law” from obtaining access to the key. In practice, there will often be intermediate cases in which a party does not have the means to access the key, but is also not “prohibited by law” from accessing further information that could lead to jigsaw identification. Such cases are not clear-cut and organisations need to treat them with care before making a determination that the GDPR does not apply.

The same principles apply to genetic data, but a more nuanced analysis may be required

While the same principles apply to genetic data, organisations should note that certain genetic data may be inherently identifiable if unique to an individual. The MRC encourages organisations to take a context-sensitive approach when deciding where genetic data falls on the identifiability continuum, in particular noting the possibility of jigsaw identification.

In most cases the holder of genetic data will not be able to identify an individual solely from the genetic material available. However, with advances in technology and gene sequencing projects making more and more genetic data available to researchers, the possibility of identification is likely to increase in the future. Organisations processing genetic information should be cognisant of this and ensure that they use robust technical and security measures if the genetic data in their possession is reasonably likely to be, or to become, identifiable personal data.

Intra-company anonymisation does not prevent GDPR applying

If the key and the key-coded data are held within the same organisation (even if held by separate teams with strict access controls between them), the organisation will still be processing personal data and the GDPR will still apply. Key-coding and segregation are valuable technical measures to reduce the risk of a security breach, but they will not take personal data in the hands of the organisation entirely out of the scope of the GDPR.

Key practical takeaways

  • Organisations should form a view as to which of their data sets constitute personal data and set up appropriate measures to protect such data. This applies to both data sets in their direct possession and to information processed on their behalf, for example, by a clinical site or a service provider such as a laboratory.
  • Context is key: to determine if the data has been robustly anonymised, organisations must consider who the potential recipient of the information is and what other information they might have access to.
  • While key coded data sets can potentially be “anonymous to” a data recipient, companies should be careful when making that determination, in particular taking into account the possibility of jigsaw identification.
  • The same principles apply to genetic data and organisations should form a view whether individual-level or family-level identification can occur on the basis of the genetic data alone, or in combination with other available information.
  • Intra-organisational access controls are a useful tool to manage the risk of reidentification, but will not in themselves be sufficient to take that organisation outside of the scope of the GDPR.