What is a Data Clean Room, and Do You Need One?

There’s a lot of talk in the market these days about data clean rooms, along with some confusion about what exactly a data clean room is and how it differs from data sharing methods. In this blog post, I’d like to shed some light on this topic. 

What is data sharing?

Traditional data sharing methods include copying files in FTP/cloud buckets, using ETL pipelines, or maintaining and calling APIs. The problem with these traditional methods is that they can inhibit effective collaboration, creating gaps in insights, inaccurate data, and security issues. Once data is moved, it’s only as secure as the location it’s moved to, which is near impossible to effectively govern, and it can be costly to maintain custom pipelines, among numerous other problems.

Secure data sharing, in contrast, is a Snowflake platform capability that enables organizations to securely grant access to their data, a major leap forward from the days of copying large files over FTP or building and maintaining fragile ETL pipelines. Instead of copying files from one system to another, for example, a Snowflake user can simply define which tables within a database can be accessed by another Snowflake customer. 

When to use Secure Data Sharing?

Secure data sharing is great when you want to provide raw access to a data set. For example, when you’re sharing data across business units or with trusted agencies or partners for analysis to drive better business decision-making and outcomes. It’s also used by companies or data providers who monetize or sell data because it provides secure and revocable access to their customers. 

Snowflake Secure Data Sharing

Snowflake’s platform enables seamless data collaboration while helping reduce costs and reveal new business insights. Snowflake Secure Data Sharing allows organizations to securely share data across their business ecosystem so they can:

  • Quickly access live data from across their organization 
  • Control governed access to shared data
  • Easily publish data sets for discovery while setting access controls

Snowflake’s Cloud Data Platform

With Snowflake, organizations can be data consumers, data providers, or both. The Snowflake Data Cloud is also a powerful tool for applications, allowing customers to discover, build, and distribute apps that run natively within their Snowflake account.

Businesses that take advantage of Snowflake’s platform to facilitate privacy-preserving collaboration can see a wide range of benefits, including: 

  • Sharing across business ecosystems without copying or moving data
  • Analyzing data without exposing it
  • Discovering and monetizing data in the Data Cloud
  • Securely sharing data with companies not on Snowflake

Today’s data demands a new approach

But businesses also generate a wealth of sensitive and/or regulated data that simply cannot be shared with other parties, such as customer lists and personally identifiable information (PII). As a result, this data is either not shared or collaborated on outside of the organization it originated from, or it’s aggregated before sharing, which limits the types of analysis that can be done. In order for a business to derive insights or run queries on sensitive and/or regulated data without exposing the underlying data, it should consider the use of a data clean room.

What is a data clean room?

“Current industry dynamics have accelerated demand for sharing and collaboration,” writes Snowflake Principal Data Strategist Jennifer Bellisent. “At the same time, new use cases and data-intensive analytic methods have resulted in an explosion in demand for data. Yet concerns about preserving data privacy have also grown. These dynamics have resulted in a perfect storm: the need for secure data collaboration via data clean rooms.”

A data clean room is not necessarily a “room” at all. While some traditional clean rooms require physical infrastructure, modern data clean rooms are not physical spaces but rather a framework that doesn’t require data to be moved into a different system or environment. 

A data clean room differs from data sharing in that a provider can define rules about the types of queries that can be run on the data, but restrict the company that is running the queries from accessing the underlying data itself.

Secure Data Sharing and data clean rooms are similar in that they enable two or more parties to securely collaborate on data. But just as Secure Data Sharing was a big improvement over legacy data-sharing methods, the emergence of data clean rooms is the next big step in more secure data collaboration methods for organizations.

Do you need a data clean room?

To reiterate, secure data sharing is a great option when sharing data across business units or with trusted third parties. But there are different scenarios when a company may decide it’s time to set up a data clean room environment.

With the introduction of various regulations, including the California Consumer Privacy Act (CCPA) and General Data Protection Regulation (GDPR), consumer data must now be handled while adhering to strict levels of privacy. Collaborating with other companies using a data clean room is a solution that can be used when each party’s data is sensitive and/or regulated. 

For media and advertising companies, for example, a data clean room can allow companies to enable personalized segment insights for advertising and campaign attribution in a  privacy-preserving fashion. One multinational media company uses a cross-cloud data clean room environment powered by Snowflake to feed its first-party audience data to advertising partners, who can securely join with their own respective data sets—all without moving, copying, or exposing any underlying PII.

In fact, a data clean room should be considered for every industry that has sensitive and/or regulated data, especially when the value (or risk) of collaborating on that data is high. 

Snowflake Global Data Clean Room

Snowflake Global Data Clean Room is a framework for secure, multi-party collaboration. It allows two or more Snowflake customers to analyze data without disclosing the raw data to one another. It is a solution that leverages core Snowflake collaboration and data governances features such as: 

  • Row Access Policies and database roles where parties can match customer data without exposing either party’s PII
  • Stored Procedures to generate and validate query requests
  • Secure Data Sharing for automatically and securely sharing tables between multiple Snowflake accounts without the need to move data outside of Snowflake

Key benefits of Snowflake Global Data Clean Room include:

  • Enable two or more parties to analyze data across clouds and regions without exposing it to one another.
  • Protect against reverse engineering or re-identification of highly sensitive data by limiting the types of queries that can be run on the data.
  • Audit Global Data Clean Room access with custom event logging.

Which approach is best for my business?

All enterprise data collaboration should be secure, so continuing to copy data

using legacy methods is becoming obsolete. That said, when you’re collaborating with trusted partners and that collaboration does not violate privacy regulations, using Snowflake Secure Data Sharing is a viable approach—it’s fast, easy, and secure. However, when you begin collaborating on sensitive and/or regulated data, and the risk of collaborating on that data is high, consider using a data clean room. 

Source

Originally posted on October 7, 2022 @ 2:39 am