Data clean rooms: What are they and how do they work?

The cookieless future of tracking is upon us. Web browsers will no longer store third party cookies as GDPR privacy policies are becoming more and more strict. With no more third party cookies available, tracking marketing efforts becomes increasingly challenging, as cookies formed the most essential mechanism that allowed marketers to show more personalised ads and measure their campaign performances
In one of our articles, we dive into Google’s enhanced conversions tracking, which relies on first party data for better Ads attribution and audience targeting. Facebook (meta) also has the “enhanced matching” feature which relies on the same concept. 

All marketing  efforts are now directed towards using first party data for more granular and precise tracking. Among the up and coming solutions, like Universal IDs and Google Sandbox, data clean rooms are garnering a lot of attention recently.

What is a data clean room?

Data clean room is a privacy-first, closed environment built on the cloud. It’s a software-as-a-service solution (SaaS) that acts as a safe and protected platform which allows brands to match their first party user data with a publisher’s first party user data without any access to each other’s PII data (Personal identifiable information) for measurement targeting and analytics end use.  

The way it works is as follows: On one end, a company (or an advertiser) submits their first party data to a data clean room (technically into a black box) that is not affected or disturbed by any external factor. The first party data can originate from different sources (Ecommerce data, CRM, Google Analytics, apps, etc..) the company runs. On the other hand, there’s another party (publishers, like for example Facebook) who also provide their first party user data to the box. It is important that these data sets share common identifiers that allow for matching them into singular profiles such as (hashed) email addresses, userIDs etc. Once both streams of data are in the box, they are stripped from their user PII by a series of transformations (hashing, pseudonymization, etc..). Now audiences that both parties have in common can be overlapped with each other creating an identity graph without having the need to access each other’s data sets. The output of the process is an aggregated audience that does not allow for the identification of any single user that either of the parties provided.

User level data get into the clean room and aggregated insights come out in a comangled audience called cohort which can then be reused for better audience targeting and measurement.

How do data clean rooms work?

 We will summarise the process with the following steps

1. Inject and store data
The publisher and advertiser would already have accounts in the same cloud provider (example: a shared Snowflake database or any data clean room offering) and both parties agree to launch a data clean room between them, and with that they adhere to the security and privacy framework set by the data clean room provider, and have their data completely separated without any means of access from either side to any raw data of the other party involved (in practice this can include more than two parties for a given clean room).
2. Join data sets
Based on shared keys (like trimmed email or IP addresses, stripped of their PII) data sets from both parties are joined thus eliminating the need for ETL tools to extract and access data from each other. In case shared keys do not exist, machine learning abilities and probabilistic modelling might be applied to optimise matching. By doing so, advertisers and publishers agree to each other’s data joining rules.
3. Analyse data for better insights
After the joining is successful, Advertisers would have the ability to analyse the results by analysing aggregated results of publisher’s anonymous first party data, see where both datasets merge and have better insights on audience targeting (demographics, shared user behaviour).

    Benefits and disadvantages of data clean rooms


    • Privacy and GDPR compliance: Publisher and advertiser first user party data are completely separated and secured from each other. Personal user information is kept hidden and only an overlapping of audiences happens based on keys. Thus it conforms to current GDPR privacy laws and secures your user PII’s privacy

    • Trends, segmentation, and analytics: Data clean rooms provide aggregated user information, thus grant visibility into trends and audience segments. By using the overlapping audiences, companies can build more granular audiences, update and hone their campaign targeting, optimise reach and frequency measurements, and have a better return on investment


      • Rigid flexibility (especially with the Walled Gardens): Data clean rooms provided by the tech giants are restricted to them. Meaning that it’s not possible to merge data from 2 different clean rooms, this process has to be done manually. (companies would need to have one data clean room with meta and another separate data clean room for Google ads)

      • Human errors and potential breach: Very sensitive data will be shared in the data clean room. In case of a data breach this could inflict serious consequences. Manually managed clean rooms might bring with it the element of human error, where a wrong query, or a wrong access permission could also be very bad.

      • Granularity of data compromised: Since first party data will be anonymized and stripped down of its PII the granularity of the data (compared to cookie tracking) would be lower.

      • Non-standard implementation: Even though data clean rooms have been coming out for a few years, the topic is catching fire now. The software is relatively new. Which means there hasn’t been a standard implementation yet.

      Biggest data clean room providers

      First we have the walled gardens (Google, Amazon) data clean rooms. Where they  provide their secured first party data for companies that utilise their advertising platforms.

      Google Ads Data Hub 

      Data Hub is a safe, privacy-friendly warehousing solution, built on the Google cloud, that allows customers to create personalised reports based on event-level ad campaign data and  aggregated insights. You can upload your first party data into BigQuery and join it with Google’s ad campaign data. 

       Amazon Marketing Cloud

       In turn, Amazon Marketing Cloud is the warehousing solution that is built on AWS cloud. It also provides the ability to create custom reports on an event-level across multiple datasets and gives a holistic view of campaign performance.

      There are also other tech companies that provide data clean rooms like Infosum, Habu, Snowflake, AppsFlyer.

      Closing notes

      Cookies will disappear from most browsers very soon, and GDPR privacy policies are becoming stricter. Tracking marketing efforts and Ad campaign performances will become increasingly difficult. The walled gardens and tech companies have been heavily focusing on developing technologies that use a first party data blueprint. Data clean rooms are emerging as the most promising solution, as they allow two entities to match their data without the need to access each other’s data set, be GDPR compliant, and provide valuable insights for tracking marketing efforts.

      Latest Posts

      Are you facing similar challenges?

      We would be happy to discuss ways we can best assist you. Do not hesitate to book a free consultation at a date of your choice!