Full article under https://brave.com/privacy-updates/19-star/
Researchers at Brave have developed STAR1, a system that allows users to participate in private data collection, under cryptographic guarantees that their data will be readable only if other users have contributed the exact same values. Such systems are important for performing privacy-protecting, Web-scale measurements of software (sometimes referred to as analytics or telemetry).
STAR’s main goals are to provide strong privacy guarantees while still being usable and affordable for small-to-medium sized companies. Existing systems2 are extremely expensive to deploy (making them unusable for all but the largest companies), require trusted third-parties or special hardware, and/or require millions of users to achieve useful results. STAR, by contrast, provides privacy guarantees similar to, or better than, existing systems, while being practical and affordable for projects and organizations serving anywhere from dozens to millions of users.
The STAR system will be presented at the 2022 ACM Conference on Computer and Communications Security (CCS) in Los Angeles, and is being discussed for possible standardization in the IETF. STAR is available in an open source Rust implementation, and will be used to protect user privacy in many current and future Brave products.
Privacy-preserving data collection through k-anonymity
Collecting data on how software is used in detail can be helpful for both developers and users. Developers can use this information to fix bugs and optimize code; users benefit from better software.
But capturing this user data carries the ethical and often legal responsibility of collecting it in a privacy preserving way. We emphasize that collecting data in a privacy-protecting manner is a necessary, but not sufficient, part of ethical data collection. Users should always be in control and aware when contributing data.
Brave’s new system STAR protects user privacy by ensuring the data users contribute are never unique to that user. This property, sometimes called k-anonymity, ensures that the data collector can only see a submitted value if the same value has also been submitted by some number of other users. K-anonymity (and thus the STAR system) prevents the data collector from ever seeing values that are unique—this means the values can’t be used to identify users.
K-anonymity is one of many approaches for ensuring privacy during data collection3, each with different strengths and weaknesses. STAR adopts k-anonymity because:
It embodies an easy to understand approach to privacy.
It successfully allows the data collector to learn the “heavy hitters” (i.e., the most commonly shared values) without requiring very large user bases.
The applications targeted by STAR are not the kinds of cases where k-anonymity systems have been attacked in the past.
A simple k-anonymity example: ice cream
As an example of how k-anonymity protects user privacy, consider this fake example:
An organization wants to learn about their employees’ favorite ice cream flavor. But people only want to participate in this ice-cream survey if they’re assured the organization can’t learn whether they voted. Participants want anonymity.
People who cast ballots for chocolate, vanilla, and strawberry (or other common flavors) aren’t at risk—many people like these flavors, so the organization can’t learn much about who voted, if they voted for a common flavor.
However, votes for uncommon flavors of ice cream do risk revealing who gave that answer. If everyone knows a person’s favorite flavor of ice cream is olive—and if that person completed the survey—the organization will be pretty confident about who submitted that answer.
With k-anonymity, the “olive” answer would be removed before the data is ever analyzed.
At essence, k-anonymity is an approach to data collection that builds on the ice cream example: It allows the party collecting data to see common, popular values, but does not allow the data collector to see rare (and therefore potentially identifying) values.
K-anonymity is difficult in practice
K-anonymity is simple in concept, but difficult to build in practice. For example, who decides (and how) which values are common or rare without revealing the potentially identifying values in the first place?
One sub-optimal option would be to let a neutral third-party count the values first, before those values are shared with the data collector. But that only introduces a “shell game” of trusting privacy to the third-party instead of the data collector; the same privacy risk still exists.
There are many other similar difficulties when trying to implement real-world k-anonymity systems.
With STAR, we’ve found a way to achieve k-anonymity without these suboptimal verifications.
STAR achieves k-anonymity cheaply and securely
STAR is a practical, effective, and cheap way to build data collection systems that protect k-anonymity. STAR differs from existing systems by being the first deployed system to achieve each of the following goals:
Cheap to deploy: STAR is extremely fast and does not require special hardware. This means that STAR can be deployed by everything from small hobbyist projects to large, multi-million-user software projects. In our simulations, STAR is 24 times cheaper than the existing state-of-the-art approach4.
Easy to understand: STAR uses a unique combination of existing, vetted, and well understood cryptographic tools (i.e., symmetric encryption to encrypt data, Shamir secret sharing to enforce k-anonymity, and verifiable oblivious pseudorandom functions to boost randomness). Using existing cryptographic tools (rather than relying on novel cryptographic primitives) means more people can safely implement, deploy, and audit STAR systems.
Strong privacy guarantees: STAR delivers privacy similar or superior to existing state-of-the-art systems, including fallback protections in the case of server compromise5.
Accurate results with small user bases: STAR provides strong accuracy guarantees, even with small numbers of users. This is unlike other existing approaches6 that provide accurate results only when thousands or millions of users contribute results.
Doesn’t require special hardware: STAR runs on standard computing hardware, and can thus be deployed on personal servers, standard cloud infrastructure, or any other stock hardware. This ensures STAR can be used by more projects, especially those with smaller budgets (unlike some existing systems that rely on special “trusted” hardware, such as AWS Nitro or Intel SGX).
Using STAR to protect user privacy
Brave has developed STAR as a practical, real-world focused system for improving user privacy. While we hope others will use STAR to protect privacy in their own projects, the primary aim was to allow users to share data with Brave in a way that still preserves privacy.
To that end, Brave is making three commitments with STAR:
First, Brave will use STAR in its own products, in cases where we give users the option to allow data collection. For example, Brave’s Web Discovery Project uses a form of STAR to allow users to share browsing information to help build the Brave Search index. Similarly, we’ve incorporated STAR into the “Privacy Preserving Product Analytics” (P3A) system that allows users to share browser usage data with Brave.
More important, even with STAR’s protections, Brave users will always have the option not to share data with Brave. STAR is intended only to add additional privacy protections to data that users want to share, rather than to allow Brave to collect more data about users.
Second, Brave is developing STAR in the open, for other projects to use, adopt, or modify as they choose. Brave maintains both Rust and WASM versions of STAR, and both are published under the Mozilla Public License v2.
Third, Brave is working to standardize STAR in the Internet Engineering Task Force (IETF) as part of the Privacy Preserving Measurements working group. Our goal is to ensure that there is a standards-based way for small organizations to collect data in a privacy-respecting way.