11

I create a script, which asks a user who upload a GIF to a social media website, if it can reupload it to another website (to reduce other users' internet usage). Now I want to give users an option to save their consent for a longer time, so they don't need to click a link every time they upload a GIF. Users don't have a random ID, I only can use their usernames, which are surely personal data (many users use their Facebook account to create their account, and then their usernames are created from their first and last name).

So my question is - if I store a hash of the usernames (irreversible form), is it still considered to be their personal data?

lnl
  • 213
  • 2
  • 6

2 Answers2

7

It depends on whether you can identify the person to whom a username hash belongs.

If you store both username and its hash in the same database row then yes.

If it is impracticable for you to identify the person by their hash only, then no.

This comes from the definition of personal data — "any information relating to an identified or identifiable natural person", and Recital 26: Not applicable to anonymous data. The hash is essentially anonymous data when it does not on its own allow to identify the person (with reasonable efforts i.e. without spending $$$ on detectives or forensic science).

phoog
  • 42,299
  • 5
  • 91
  • 143
Greendrake
  • 28,487
  • 5
  • 71
  • 135
7

The Art. 29 WP has released Opinion 05/2014 on Anonymisation Techniques. There it defines a hash function like this:

Hash function: this corresponds to a function which returns a fixed size output from an input of any size (the input may be a single attribute or a set of attributes) and cannot be reversed; this means that the reversal risk seen with encryption no longer exists. However, if the range of input values the hash function are known they can be replayed through the hash function in order to derive the correct value for a particular record. For instance, if a dataset was pseudonymised by hashing the national identification number, then this can be derived simply by hashing all possible input values and comparing the result with those values in the dataset. Hash functions are usually designed to be relatively fast to compute, and are subject to brute force attacks. Pre-computed tables can also be created to allow for the bulk reversal of a large set of hash values.

The use of a salted-hash function (where a random value, known as the “salt”, is added to the attribute being hashed) can reduce the likelihood of deriving the input value but nevertheless, calculating the original attribute value hidden behind the result of a salted hash function may still be feasible with reasonable means.

So a hash function is considered pseudonymisation, not anonymisation. Pseudonymised data is still personal data. See also art.4 GDPR which contains definitions of ‘personal data’ and ‘pseudonymisation’.

In your question you say you want to give users an option to save their consent for a longer time. So you use a hash to identify such a user. That implies that you are able to identify a user, so it is personal data. That does not mean your processing is unlawful. If you make it clear to users you save their consent when they select that option, they implicitly also give consent to the saving itself, so art. 6(1)(a) applies. But also art. 6(1)(b) would probably apply.

Art. 25 and Art. 32 encourage the use of pseudonymisation when processing personal data.

The above would only apply if the hash function generates a unique hash for each input. If you use a hash function where different inputs generate the same hash, it would not be considered aggregation (or generalization). Art. 29 WP has also written about that in their Opinion 05/2014 on Anonymisation Techniques. See the topic on "Aggregation and K-anonymity" (3.2.1). So even aggregation does not allow effective anonymisation in all cases.

wimh
  • 2,925
  • 12
  • 16