4

GDPR Section 2 Recital 18 (?) reads:

Not Applicable to Personal or Household Activities

This Regulation does not apply to the processing of personal data by a natural person in the course of a purely personal or household activity and thus with no connection to a professional or commercial activity. Personal or household activities could include correspondence and the holding of addresses, or social networking and online activity undertaken within the context of such activities.

What information do we have to determine what counts as "a purely personal or household activity"?

Data processing tools available to the individual have become much more powerful in recent years, and it seems this is likely to increase rapidly in the near future. Last year I thought the most powerful tools available to anyone that had the potential to break the GDPR were bellingcat's tools for building personal network graphs from public information such as social media and Companies House. However in the last month open source tools derived from Meta’s LLaMA Large Language Model have improved to the point that they are competitive with the best in the world and Google expects them to eclipse this performance in coming months.

For example Vicuna-13B runs on a reasonable gaming computer and when asked "tell me about [FIRST SURNAME] from [INSTITUTION]" will provide a mostly inaccurate paragraph that includes some true personally identifiable information. This works for me and some others who have a web presence such as have published peer reviewed literature but are not in any way famous. They have an online demo that requires no registration, you can try it yourself here.

It seems at the very least it would be challenging to justify the use of such a tool under the GDPR. As I understand it all the information within the network is used to answer every question, even if just as far as "do not use this bit of the network". Therefore any use could be classed as the processing of personal data. Therefore if one wanted to download and experiment with such a model the easiest way would be to rely on the Household Activities exception. However it is not totally clear what would and would not be counted. What information could we use to try and determine if such a use was legal, and what the limits would be?

Previous Reaserch

The ICO has a page on the exceptions but that explicitly does not cover domestic purposes: "This is simply because they are not covered by the UK GDPR". It gives the topic two sentences, the first repeats the law and the second give a couple of examples that are not relevant here.

There is a denied FOI request for "any policy, guidance, lines-to-take or other material you hold on the scope of the exemption contained in GDPR art 2(2)(c) and/ or recital 18" with the denial based on this information appearing in guidance including "a final published version". However I cannot find this guidance with either Google or the ICO website search.

There is a case from the Netherlands that excludes posting photos of one's child on social media from the household use exception. While this may not directly relate to my question, it does illustrate the the exception can be interpreted quite narrowly. It rather surprised me as posting photos of one's children seems to be quite a big bit of social media.

User65535
  • 10,342
  • 5
  • 40
  • 88

1 Answers1

3

The household exemption is part of Art 2 GDPR, which lays out the subject matter scope. Clearly, the use of LLMs and other technologies can be GDPR-covered processing of personal data, unless this is done “by a natural person in the course of a purely personal or household activity”. You cited Recital 18, which provides a bit more background, and gives correspondence, keeping addresses, and some social media use as examples of personal or household activities.

There is some CJEU case law on the GDPR household exemption.

In Lindqvist (C-101/01), the court noted:

47. That exception must therefore be interpreted as relating only to activities which are carried out in the course of private or family life of individuals, which is clearly not the case with the processing of personal data consisting in publication on the internet so that those data are made accessible to an indefinite number of people.

That case was about a personal blog about the lives of coworkers. It would likely have been fine for that person to keep a personal diary of those matters, but here the problem was publication to the internet. Since anyone could read it, the blog was not purely personal.

A decade later in Rynes (C‑212/13), the court noted that the legislators' use of the word "purely" requires this exception to be interpreted narrowly:

30. The fact that Article 3(2) of Directive 95/46 falls to be narrowly construed has its basis also in the very wording of that provision, under which the directive does not cover the processing of data where the activity in the course of which that processing is carried out is a ‘purely’ personal or household activity, that is to say, not simply a personal or household activity.

Such a narrow interpretation is also necessary to maintain the data subject's fundamental rights.

Note that Directive 95/46 is the old Data Protection Directive which was replaced by the GDPR. It contains a household exception that is worded virtually identically to the one in the GDPR, so these old CJEU judgments remain directly applicable. They also remain relevant in the UK, as all of this predates Exit Day.

To my knowledge there has not been any further guidance on the household exception, neither by the ICO nor by the EDPB. This is also because these data protection agencies do not make law or policy, but merely apply it. It is up to courts to correctly interpret the household exception. In the Dutch case you mention (for GDPRHub has more details and the original judgement is here), the court does not explain why exactly the household exemption wouldn't apply, but the narrow interpretation seems in line with the CJEU case law. I want to highlight in particular that the defendant in that case failed to demonstrate that the personal data in question wasn't accessible to an "indefinite number of people", to borrow the expression from the Lindqvist case.

My personal opinion is that this "indefinite number of people" aspect provides the clearest criterion for whether the household exception might apply. If processing activities are not shared at all, or only made available to close friends and family, there's a good chance the exception might apply. In the LLM context, to the degree that their use constitutes processing of personal data, running such models for personal amusement or out of personal curiosity is probably fine. It might also be OK to share such outputs in a closed chat group. But creating them for non-personal purposes (e.g. employment purposes) or disseminating them to the public could cause GDPR to apply. LLMs are far from the only scenario where such difficult questions arise. Another fun topic is the degree to which the use of LinkedIn is GDPR compliant (since no one uses that platform for purely personal purposes), or how large a friend group's Discord server can grow while still counting as "purely personal".

If the household exception doesn't apply, this means that data processing activities have to be performed in compliance with the GDPR. This doesn't generally means that they would be illegal. However, the use of AI models does pose some potential challenges:

  • What is the legal basis for training the models? Perhaps there is a legitimate interest for using publicly available data, maybe not. Can an unlawfully trained model still be used lawfully?
  • What is the legal basis for using the models for inference? This will depend massively on the purpose for which the models are used.
  • How does the use of (potentially wrong) AI output interact with the Art 5(1)(d) accuracy principle and the Art 22 restrictions on automated individual decision-making?
  • To which degree are prompts other people's personal data? How does this interact with the legal basis used for inference? Whose personal data is the output?
amon
  • 24,244
  • 3
  • 46
  • 77