Misfire

We're in a bad place when even the W3C TAG falls for Apple's privacy schtick.

July 30, 2024

The W3C Technical Architecture Group^[1] is out with a blog post and an updated Finding regarding Google's recent announcement that it will not be imminently removing third-party cookies.

The current TAG members are competent technologists who have a long history of nuanced advice that looks past the shouting to get at the technical bedrock of complex situations. The TAG also plays a uniquely helpful role in boiling down the guidance it issues into actionable principles that developers can easily follow.

All of which makes these pronouncements seem like weak tea. To grok why, we need to walk through the threat model, look at the technology options, and try to understand the limits of technical interventions.

Unmasking The Problem
Fire And Movement
Finding A Way Forward

But before that, I should stipulate my personal position on third-party cookies: they aren't great!

They should be removed from browsers when replacements are good and ready, and Google's climbdown isn't helpful. That said, we have seen nothing of the hinted-at alternatives, so the jury's out on what the impact will be in practice.^[2]

So why am I dissapointed in the TAG, given that my position is essentially what they wrote? Because it failed to acknowledge the limited and contingent upside of removing third-party cookies, or the thorny issues we're left with after they're gone.

Unmasking The Problem #

So, what do third-party cookies do? And how do they relate to the privacy theat model?

Like a lot of web technology, third-party cookies have both positive and negative uses. Owing to a historcal lack of platform-level identity APIs, they form the backbone of nearly every large Single Sign-On (SSO) system. Thankfully, replacements have been developed and are being iterated on.

Unfortunately, some browsers have unilaterally removed them without developing such replacements, disrupting sign-in flows across the web, harming users and pushing businesses toward native mobile apps. That's bad, as native apps face no limits on communicating with third parties and are generally worse for tracking. They're not even subject to pro-user interventions like browser extensions. The TAG should have called out this aspect of the current debate in its Finding, encouraging vendors to adopt APIs that will make the transition smoother.

The creepy uses of third-party cookies relate to advertising. Third-party cookies provide ad networks and data brokers the ability to silently reidentify users as they browse the web. Some build "shadow profiles", and most target ads based on sites users visit. This targeting is at the core of the debate around third-party cookies.

Adtech companies like to claim targeting based on these dossiers allows them to put ads in front of users most likely to buy, reducing wasted ad spending. The industry even has a shorthand: "right people, right time, right place."

Despite the bold claims and a consensus that "targeting works," there's reason to believe pervasive surveillence doesn't deliver, and even when it does, isn't more effective.

Assuming the social utility of targeted ads is low — likely much lower than adtech firms claim — shouldn't we support the TAG's finding? Sadly, no. The TAG missed a critical opportunity to call for legislative fixes to the technically unfixable problems it failed to enumerate.

Privacy isn't just about collection, it's about correlation across time. Adtech can and will migrate to the server-side, meaning publishers will become active participants in tracking, funneling data back to ad networks directly from their own logs. Targeting pipelines will still work, with the largest adtech vendors consolidating market share in the process.

This is why "give us your email address for 30% discount" popups and account signup forms are suddenly everywhere. Email addresses are stable, long-lived reidentifiers. Overt mechanisms like this are already replacing third-party cookies. Make no mistake: post-removal, tracking will continue for long as reidentification has perceived positive economic value. The only way to change that equation is legislation; anything else is a band-aid.

Pulling tracking out of the shadows is good, but a limited and contingent good. Users have a terrible time recognising and mitigating risk on the multi-month time-scales where privacy invasions play out. There's virtually no way to control or predict where collected data will end up in most jurisdictions, and long-term collection gets cheaper by the day.

Once correlates are established, or "consent" is given to process data in ways that facilitate unmasking, re-identification becomes trivial. It only takes giving a phone number to one delivery company, or an email address to one e-commerce site to suddenly light up a shadow profile, linking a vast amount of previously un-attributed browsing to a user. Clearing caches can reset things for a little while, but any tracking vendor that can observe a large proportion of browsing will eventually be able to join things back up.

Removal of third-party cookies can temporarily disrupt this reidentification while collection funnels are rebuilt to use "first party" data, but that's not going to improve the situation over the long haul. The problem isn't just what's being collected now, it's the ocean of dormant data that was previously slurped up.^[3] The only way to avoid pervasive collection and reidentification over the long term is to change the economics of correlation.

The TAG surely understands the only way to make that happen is for more jurisdictions to pass privacy laws worth a damn. It should say so.

Fire And Movement #

The goal of tracking is to pick users out of crowds, or at least bucket them into small unique clusters. As I explained on Mastodon, this boils down to bits of entropy, and those bits are everywhere. From screen resolution and pixel density, to the intrinsic properties of the networks, to extensions, to language and accessibility settings that folks rely on to make browsing liveable. Every attribute that is even subtly different can be a building block for silent reidentification; A.K.A., "fingerprinting."^[4]

In jurisdictions where laws allow collected data to remain the property of the collector, the risks posed by data-at-rest is only slightly attenuated by narrowing the funnel through which collection takes place.

It's possible to imagine computing that isn't fingerprintable, but that isn't what anyone is selling. For complex reasons, even the most cautious use of commodity computers is likely to be uniquely identifiable with enough time. This means that the question to answer isn't "do we think tracking is bad?", it's "given that we can't technically eliminate it, how can we rebuild privacy?". The TAG's new Finding doesn't wrestle with that question, doing the community a disservice in the process.

The most third-party cookie removal can deliver is temporary disruption. That disruption will affect distasteful collectors, costing them money in the short run. Many think of this as a win, I suspect because they fail to think through the longer-term consequences. The predictable effect will be a recalibration and entrenchment of surveillence methods. It will not put the panopticon out of business; only laws can do that.

For a preview of what this will look like, think back on Apple's "App Tracking Transparency" kayfabe, which did not visibly dent Facebook's long-term profits.

So this is not a solution to privacy, it's fire-and-movement tactics against corporate enemies. Because of the deep technical challenges in defeating fingerprinting^[4:1], even the most outspoken vendors have given up, introducing "nutrition labels" to shift responsibility for privacy onto consumers.

If the best vertically-integrated native ecosystems can do is to shift blame, the TAG should call out posturing about ineffective changes and push for real solutions. Vendors should loudly lobby for stronger laws that can truly change the game and the TAG should join those calls. The TAG should also advocate for the web, rather than playing into technically ungrounded fearmongering by folks trying to lock users into proprietary native apps whilst simultaneously depriving users of more private browsers.

Finding A Way Forward #

The most generous take I can muster is that the TAG's work is half-done. Calling on vendors to drop third-party cookies has the virtue of being technical and actionable, properties I believe all TAG writing should embody. But having looked deeply at the situation, the TAG should have also called on browser vendors to support further reform along several axes — particularly vendors that also make native OSes.

First, if the TAG is serious about preventing tracking and improving the web ecosystem, it should call on all OS vendors to prohibit the use of "in-app browsers" when displaying third-party content within native apps.

It is not sufficient to prevent JavaScript injection because the largest native apps can simply convince the sites to include their scripts directly. For browser-based tracking attenuation to be effective, these side-doors must be closed. Firms grandstanding about browser privacy features without ensuring users can reliably enjoy the protections of their browser need to do better. The TAG is uniquely positioned to call for this erosion of privacy and the web ecosystem to end.

Next, the TAG should have outlined the limits of technical approaches to attenuating data collection. It should also call on browser vendors to adopt scale-based interventions (rather than absolutism) in mitigating high-entropy API use.^[5] The TAG should go first in moving past debates that don't acknowledge impossibilities in removing all reidentification, and encourage vendors to do the same. There's no solution to the privacy puzzle that can be solved by the purchase of a new phone, and the TAG should be clarion about what will end our privacy nightmare: privacy laws worth a damn.

Lastly, the TAG should highlight discrepancies between privacy marketing and the failure of vendors to push for strong privacy laws and enforcement. Because the threat model of privacy intrusion renders solely techincal interventions ineffective on long timeframes, this is the rare case in which the TAG should push past providing technical advice.

The TAG's role is to explain complex things with rigor and signpost credible ways forward. It has not done that yet regarding third-party cookies, but it's not too late.

Praise, as well as concern, in this post is specific to today's TAG's, not the output of the group while I served. I surely got a lot of things wrong, and the current TAG is providing a lot of value. My hope here is that it can extend this good work by expanding its new Finding. ↩︎
Also, James Roswell can go suck eggs. ↩︎
It's neither here nor there, but the TAG also failed in these posts to encourage users and developers to move their use of digital technology into real browsers and out of native apps which invasively track and fingerprint users to a degree web adtech vendors only fantasize about.

A balanced finding would call on Apple to stop stonewalling the technologies needed to bring users to safer waters, including PWA installation prompts. ↩︎
As part of the drafting of the 2015 finding on Unsanctioned Web Tracking, the then-TAG (myself included) spent a great deal of time working through the details of potential fingerprinting vectors. What we came to realise was that only the Tor Browser had done the work to credibly analyise fingerprinting vectors and produce a coherent threat model. To the best of my knowledge, that remains true today.

Other vendors continue to publish gussied-up marketing documents and stroppy blog posts that purport to cover the same ground, but consistently fail to do so. It's truly objectionable that those same vendors also prevent users from chosing disciplined privacy-focused browsers.

To understand the difference, we can do a small thought experiment, enumerating what would be necessary to sand off currently-identifiable attributes of individual users. Because only 31 or 32 bits are needed to uniquely identify anybody (often less), we want a high safety factor. This means bundling users into very large crowds by removing distinct observable properties. To sand off variations between users, a truly private browser might:
- Run the entire browser in a VM in order to:
  - Cap the number of CPU cores, frequency, and centralise on a single instruction set (e.g., emulating ARM when running on x86). Will likely result in a 2-5x slowdown.
  - Ensure (high) fixed latency for all disk access.
  - Set a uniform (low) cap on total memory.
- Disable hardware acceleration for all graphics and media.
- Disable JIT. Will slow JavaScript by 3-10x.
- Only allow a fixed set of fonts, screen sizes, pixel densities, gamuts, and refresh rates; no more resizing browsers with a mouse. The web will pixelated and drab and animations will feel choppy.
- Remove most accessibility settings.
- Remove the ability to install extensions.
- Eliminate direct typing and touch-based interactions, as those can leak timing information that's unique.
- Run all traffic through Tor or a similarly high-latency VPN egress nodes.
- Disable all reidentifying APIs (no more web-based video conferencing!)
Only the Tor project is shipping a browser anything like this today, and it's how you can tell that most of what passes for "privacy" features in other browsers are anti-annoyance and anti-creep-factor interventions; they matter, but won't end the digital panopticon. ↩︎ ↩︎
It's not a problem that sign-in flows need third-party cookies today, but it is a problem that they're used for pervasive tracking.

Likewise, the privacy problems inherent in email collection or camera access or filesystem folders aren't absolute, they're related to scale of use. There are important use-cases that demand these features, and computers aren't going to stop supporting them. This means the debate is only whether or not users can use the web to meet those needs. Folks who push an absolutist line are, in effect, working against the web's success. This is anti-user, as the alternatives are generally much more invasive native apps.

Privacy problems arise at scale and across time. Browsers should be doing more to discourage high-quality reidentifiaction across cache clearing and in ways that escalate with risk. The first site you grant camera access isn't the issue; it's the 10th. Similarly, speed bumps should be put in place for use of reidentifying APIs on sites across cache clearing where possible.

The TAG can be instrumental is calling for this sort of change in approach. ↩︎