The Cookie Clearinghouse

As you may recall from almost six weeks ago, we held the Safari-like third-party cookie patch, which blocks cookies set for domains you have not visited according to your browser’s cookie database, from progressing to Firefox Beta, because of two problems:

False positives. For example, say you visit a site named foo.com, which embeds cookie-setting content from a site named foocdn.com. With the patch, Firefox sets cookies from foo.com because you visited it, yet blocks cookies from foocdn.com because you never visited foocdn.com directly, even though there is actually just one company behind both sites.

False negatives. Meanwhile, in the other direction, just because you visit a site once does not mean you are ok with it tracking you all over the Internet on unrelated sites, forever more. Suppose you click on an ad by accident, for example. Or a site you trust directly starts setting third-party cookies you do not want.

Our challenge is to find a way to address these sorts of cases. We are looking for more granularity than deciding automatically and exclusively based upon whether you visit a site or not, although that is often a good place to start the decision process.

The logic driving us along the path to a better default third-party cookie policy looks like this:

  1. We want a third-party cookie policy that better protects privacy and encourages transparency.
  2. Naive visited-based blocking results in significant false negative and false positive errors.
  3. We need an exception management mechanism to refine the visited-based blocking verdicts.
  4. This exception mechanism cannot rely solely on the user in the loop, managing exceptions by hand. (When Safari users run into a false positive, they are advised to disable the block, and apparently many do so, permanently.)
  5. The only credible alternative is a centralized block-list (to cure false negatives) and allow-list (for false positives) service.

I’m very pleased that Aleecia McDonald of the Center for Internet and Society at Stanford has launched just such a list-based exception mechanism, the Cookie Clearinghouse (CCH).

Today Mozilla is committing to work with Aleecia and the CCH Advisory Board, whose members include Opera Software, to develop the CCH so that browsers can use its lists to manage exceptions to a visited-based third-party cookie block.

The CCH proposal is at an early stage, so we crave feedback. This means we will hold the visited-based cookie-blocking patch in Firefox Aurora while we bring up CCH and its Firefox integration, and test them.

Of course, browsers would cache the block- and allow-lists, just as we do for safe browsing. I won’t try to anticipate or restate details here, since we’re just starting. Please see the CCH site for the latest.

We are planning a public “brown bag” event for July 2nd at Mozilla to provide an update on where things stand and to gather feedback. I’ll update this post with details as they become available (UPDATE: details are here), but I wanted to share the date ASAP.

/be

23 Replies to “The Cookie Clearinghouse”

  1. False positives are a big deal and I don’t feel it’s reasonable to have a centralized (even if best-intended) limited set of people listing and solving them all. Maybe that can sort-of work as a short-term solution, but it doesn’t scale really well in space (there are a lot of websites with subdomains & “sibling domains”) and time (there will be a need to contact CCH for the rest of time to get some of your domains whitelisted).

    From the CCH website:
    “During the technical review, Cookie Clearinghouse staff will work through the two competing claims and make a factual evaluation. In some cases this will involve contacting the parties involved.”
    => Having to convince a group of English-speaking people about the necessity to whitelist a website when everyone on Earth isn’t proficient at English should be a non-starter :-/

    A more decentralized/long-term solution could involve an HTTP header (or meta element? but there is always the HTML parsing/prefetching mess with meta elements…) where a first-party lists all the third-party where cookies are ok. A “friend” or “sibling” domain list.

    It is decentralized because each website can make its own decision without the CCH approval.
    A good thing is that it would be declarative, so a browser can inform the user of what’s going on; run heuristic or compare with the CCH whitelist to say if something fishy may be going down.

    If false positives are the main reason that keeps 3rd party cookies blocking from being shipped, the “sibling domain” idea could be the solution. Just implement it, ship it at the same time than 3rd party cookie blocking and every broken website with a maintainer can be fixed (hopefully there aren’t too many unmaintained broken websites?)

    Mozilla and the CCH can deal with false negative as time passes. False negatives don’t break website.

  2. “C is for Cut the crap out of Javascript”

    Will you ever hear what developers say about Javascript being a PITA for large projects due to variable hoisting, insane this, unpredictable performance, no type checking, etc…?

    Is Javascript only ambition to stay a “script kiddie”? Yes, even ES6 does way too little to alleviate the aforementioned PITA.

  3. Hi David — a couple of points.

    First, the automatic flip (from blocked to allowed or vice versa) on appeal is key. That will resolve a lot of cases without humans in the loop, we believe.

    Second, the human in the loop is running a short decision process based on standards and policy (regulation, self- and gov’t). We need this to be simple too. Perhaps more of it could be automated some day with secure attestation (e.g., of DNT compliance).

    Letting facebook.com list fbcdn.net as a sibling would be fine, but how do you avoid the obvious problem that sites integrating third parties that track across the web could easily make those parties “siblings” and we’re back to the _status quo_?

    The false-negative (first party empowerment) problem remains and I think it’s actually the bigger one. As you seem to acknowledge, it needs a CCH-like solution.

    /be

  4. re your first point
    => After this point and a re-read of the challenge/counter-challenge idea, I better understand it and it’s much less human work than I had originally understood.
    However, without some form of authenticity matching between a website and the “challenger”, challenging could become some sort of subtle and weird attack vector.
    I imagine that’ll be figured out as part of Phase I.

    “Second, the human in the loop is running a short decision process based on standards and policy (regulation, self- and gov’t). We need this to be simple too.”
    => I’m still skeptical because of language/cultural (and now gov regulation) barriers. Hopefully, seeing it happen and working will prove me wrong.

    “how do you avoid the obvious problem that sites integrating third parties that track across the web could easily make those parties “siblings” and we’re back to the _status quo_?”
    => I’m proposing an opt-in. Sites need an incentive to make a 3rd party a sibling. I believe this makes a major difference. I’m interested if there are reason to think I’m guessing wrong.
    If sites and tracking 3rd-party do cooperate in tracking, the 3rd-party can always ask for a script to be included (a FB like button already seems to require importing a script https://developers.facebook.com/docs/reference/plugins/like/ ). Why not served as same-origin if that ever becomes necessary. Fighting against tracking when websites cooperate with tracking 3rd-party sounds like a lost battle.
    As I asked elsewhere: what is the exact threat model? https://bugzilla.mozilla.org/show_bug.cgi?id=536509#c48 (and the exact threat being defended against?)

    We’d also be in a situation different, because websites would explicitly list who they consider to be sibling domains. It then makes very clear and observable who’s cooperating with whom in tracking (unlike today where embedding always means tracking whether the embedder wants to cooperate in that or not).
    For instance, a crawler could tell which domains consider one another siblings, etc. Or Collusion could be improve to make very explicit who’s collaborating with whom. UA or addons could display warning, offer user choices. An opt-in for tracking forces a level of transparency we don’t really have today. I believe that would make a difference.

    “The false-negative (first party empowerment) problem remains and I think it’s actually the bigger one.”
    => Agreed. But again, false negatives don’t break websites and can be dealt with after the first version of the third-party cookie patch is shipped. (you started the post on the reasons that prevented the patch from shipping and I believe the earlier it’s shipped, the better)

  5. I’ve been accepting/rejecting cookies manually for years, so this gives me experience, if not wisdom. 🙂

    Firefox has always had the best semantics around cookie handling, IMO. I like the separate checkboxes to accept 1st & 3rd-party cookies. I like that FF can force session-only treatment, and it remembers domains whose cookies I remove when I review the on-disk inventory.

    I find that most names of 3rd-party sites I don’t want suggest an ad or metrics purpose, so I can reject them quickly (doubleclick, specificclick, sitemeter). I seldom have a false-positive problem, but when I do, I have to dig through my block list and guess which domains I need to allow. A button that would “Allow cookies from domains appearing on this page”, or show a picklist, would solve that situation quickly.

  6. The immediate result of this is that I now have to attend a meeting on Monday about switching from cookie-based tracking to browser fingerprinting… :/

  7. Safari had blocked 3rd party cookies for years. No issues with foocdn when visiting foo

  8. Brendan, this is my 1st visit to your blog. I’ve used Firefox for many years. IMO, track/no-track is a small problem. Frequent 30-sec. hangs on page loads is the BIG problem (almost non-existent before 2012) — apparently often due to ad servers. I urge you to integrate best practices for minimizing/eliminating/escaping-from this into user settings ASAP!

  9. I can see how a service would work well in the short to medium term, but if you get too aggressive blocking 3rd party cookies what is to stop websites proxying the requests or using a sub-domain? Probably too difficult for some webmasters, but easy for anyone who’s at the level of administering their own servers (particularly when the analytics services provide step by step instructions).

    I think this change would make things harder for both typical webmasters (since they will need to support the proxying) and for security-conscious users (since they won’t be able to just block the domains of the analytics services).

  10. David: you wrote

    “Why not served as same-origin if that ever becomes necessary.”

    Because everything costs, including another subdomain, which implies more server-to-server traffic. The way ad-tech integrates today uses the third party domain as the source of the iframe, wherein the 3rd party cookie is set for the top-level (publisher) domain. This cookie is then sent to its domain automatically by the browser.

    “Fighting against tracking when websites cooperate with tracking 3rd-party sounds like a lost battle.”

    Indeed and we are not proposing to fight it by making sites add JS and publishers with 3rd parties change their domain architectures. You are!

    We’re proposing fighting tracking by using the visited status as the first in a series of 5 or 6 presumptions that can be vetted centrally, with policy enforcement.

    “As I asked elsewhere: what is the exact threat model? https://bugzilla.mozilla.org/show_bug.cgi?id=536509#c48 (and the exact threat being defended against?)”

    In that bug you ask “That’s a part of the threat model that hasn’t been always clear to me. Are the website and the tracking 3rd party cooperating in tracking?” The answer is yes! Tracking is part of the bargain. Long-tail sites (smaller blogs, e.g.) use 3rd parties to arbitrage the big first party powerhouses that own ad networks and exchanges. This is why the IAB accuses Mozilla of hurting small businesses by contemplating shipping the naive patch.

    “We’d also be in a situation different, because websites would explicitly list who they consider to be sibling domains.”

    Isn’t this clear already from iframe src= attribute values?

    “An opt-in for tracking forces a level of transparency we don’t really have today. I believe that would make a difference.”

    Here you seem to contradict yourself re: “fighting against tracking when websites cooperate with tracking 3rd-party sounds like a lost battle”. Requiring websites to declare siblings (or I should write express them via JS API calls) could help but the site info architecture already has enough data to tell what is going on.

    “”The false-negative (first party empowerment) problem remains and I think it’s actually the bigger one”
    => Agreed. But again, false negatives don’t break websites and can be dealt with after the first version of the third-party cookie patch is shipped.”

    We should get this right up front. First parties have too much power already. Consider the FB shadow accounts problem.

    “(you started the post on the reasons that prevented the patch from shipping and I believe the earlier it’s shipped, the better)”

    I don’t agree. We know the naive visited-based blocking policy is simplistic and too often wrong. Shipping it will affect the ecosystem, rapidly and in perverse ways. We do not get to take our time and keep trying to bite the apple.

    /be

  11. Tim: please file a bug and cc: me and :dveditz (that string with the leading : will auto-complete in bugzilla) asking for this enhancement. I will back it. Thanks.

    Developer at online advertising company: why don’t you identify yourself so I can point out how your employer is overreacting and wasting money? We are not shipping the naive patch. When the CCH is up and running, perhaps your employer will have a policy-based presumption in their favor?

    Alarmism and escalation are dumb. Now is the time for the IAB and their sock-puppets to rescue DNT from its death by a thousand W3C cuts, and comply with it.

    Steve Jobs: Hi, how’s the temperature in your afterlife? And: nope.

    Mark: get AdblockPlus. We are working on general networking and layout improvements. Most sites do not block layout on ads, that’s dumb and a well-known (thanks to Steve Souders) source of jank. But you seem to frequent such sites, so: AdblockPlus.

    Ian: see my answer to David. Yes, we could trigger an arms race, but it costs. We are instead trying to thread the needle and increase net-net privacy. Wish us luck! No one among the privacy zealots and the ad-tech folks likes us right now.

    /be

  12. ” “We’d also be in a situation different, because websites would explicitly list who they consider to be sibling domains.”

    Isn’t this clear already from iframe src= attribute values?”
    => Not really. Currently, embedding means participating in tracking whether you (website owner) want it or not. Currently, a website may be willing to embed an ad, but not to participate in tracking, but has no technical way to express such an intent. Hmm… Re-reading my comments, I realize that anytime I wrote “cooperate/participate”, I meant “willfully cooperate/participate” (that is, with the intent of having the user tracked by the 3rd party). This applies to my bugzilla comment too.
    There is cooperation today between websites and trackers, but it’s implicit because of a technical limitation.
    Collusion makes a pessimistic approximation of who’s tracking you, based on @src attributes (or on actual HTTP requests with cookie set which is approximately the same thing in today’s web).

    Whenever the UA+CCH makes the decision of sending cookies or not on behalf of the user without the website having a say, the information of who’s cooperating with whom is lost.

    “(or I should write express them via JS API calls)”
    => I suggested to declare them an HTTP header or a meta element. The list can be frozen after the head element is parsed or something (like the @sandbox value can’t be changed programatically after it’s first set).

    “An opt-in for tracking forces a level of transparency we don’t really have today. I believe that would make a difference.”

    Here you seem to contradict yourself re: “fighting against tracking when websites cooperate with tracking 3rd-party sounds like a lost battle”
    => There are 2 battles: one is technical (the one I consider lost, because I see too many workarounds) and the other one is social/political (the one where websites being transparent about their relationships would make a difference I believe).
    Both Collusion and DNT have been amazing social/political initiatives to raise awareness, but don’t make a difference in the technical part of the web. (The semantics of DNT are still fuzzy and very context-specific. And who can make any guarantee on whether the server-side code actually respects DNT anyway?)
    I see CCH as an attempt to bring social/political wisdom/decisions to the technical level (so is DNT at a softer level).
    But having websites being explicit over who they want to help tracking would help raising more awareness, I believe.

    “the site info architecture already has enough data to tell what is going on.”
    => I’m balanced on that point. Again, a website may want to put an ad, but not want to willfully participate in tracking. Currently both actions can’t be differentiated because HTTP forces cookies. It still won’t be differentiated after CCH+3PCB.

  13. > “Yes, we could trigger an arms race, but it costs.”

    It costs, but not very much, and it will cost us if we can’t see the identifiers being used for the tracking.

    Setting up a CNAME on a subdomain is easy for all but the smallest websites, and it will only get easier if it becomes a standard requirement when adding 3rd party javascript. Alternatively the ad networks can switch to techniques such as fingerprinting, which requires a relatively small investment by the networks and zero change on the sites. Which brings me to…

    > “your employer is overreacting and wasting money” [response to ad network that’s already planning to work around this change]

    I disagree. How much has this online advertising company spent on working around this change so far? One meeting by the sound of it. Even the cost to write a full workaround would likely be petty cash compared to what they could lose if they weren’t ready by the time this change is released. It’s prudent to keep an eye on developing risks and invest proportionately to mitigate those risks.

  14. Ian: the CNAME is not the only cost. The subdomain must be run by the 3rd party, or else must be run by the publisher and make server-to-server calls to the 3rd party. That (either) is the more significant cost than the CNAME.

    Fingerprinting is needed for identifying users cross-device, but by itself it doesn’t replace cookies.

    I don’t take the “we’re changing because Firefox might block our 3rd party cookie” maybe-trolling seriously, sorry. Ad-tech folks we talk to all say they know cookies are under assault and they must look into other options in the next 18 months. Many therefore kick the can.

    If we can rescue DNT from the W3C and make a CCH presumption for allowing tracking if the 3rd party complies with rescued-DNT, then cookies can continue to be the low-cost tracking device that they are — for users who don’t set DNT.

    /be

  15. David: you wrote

    “Again, a website may want to put an ad, but not want to willfully participate in tracking.”

    I don’t buy it. The site wants the ad to make revenue, which in fact does depend in many cases on tracking (to arbitrage the big networks and 1st parties).

    Say a blogger is unaware of this, we add declarative means for that blogger to allow 3rd party cookies to be set, and that blogger does not employ it out of a desire to avoid tracking. The 3rd party takes a hit, which if it can discern the cause, must mean it pays that blogger less.

    That discern the cause part is yet another cost, which reduces further the profitability of the ads and the success of the arb. All else equal, this is a losing strategy for the blogger who embeds ads to make some (usually small, $20-80/month I’m told) money.

    Could it work due to social movement upside? Maybe, but if we require the declarative markup to allow the 3rd party cookie, then it is opt-in and most bloggers/publishers won’t do it manually — but sites such as blogspot.com and wordpress.com can easily whitelist.

    So it looks to me likely to be ineffective, and just add overhead and some temporary cost and stress.

    /be

  16. Re CNAMEs: Have I misunderstood how this would work? If my site is http://www.example.com, I’d set analytics.example.com to CNAME to http://www.myanalytics.com who can then serve cookies with subdomain privileges. I don’t need to setup a proxy (unless we start blocking subdomains or varying IPs) and there was already a service available at http://www.myanalytics.com, it just now has an alternative address.

    Re Ad-tech response: Yes, I appreciate ad-tech folk will already be looking at alternatives, but that doesn’t mean we want to bump it up their todo list.

    It’s a shame that cookies are under assult, because they are the tool not the problem behaviour. I now get annoying nag screens that warn me that the site I’m on uses cookies (well, yeah, so does almost every site on the internet). I’m much rather that nag screen said something like “We will record how you use this site and share this data with x, y and z so that they can better tailor their services to your needs.”

    I’m much less anti this if it is tied into DNT, although it would be a shame if this caused people to disable DNT because it was breaking websites they were trying to use.

  17. David wrote:
    “Again, a website may want to put an ad, but not want to willfully participate in tracking.”

    I think it that would be a tiny proportion of websites that chose that option unless there’s some other incentive or coercion. In my experience many developers will just blindly follow the instructions they are given by the partner networks, and the ones who do raise a privacy concern will be overruled by management interested in the higher ad revenue.

  18. Ian: you’re right, that should work, but publishers don’t generally want that. Could be they ought to, but the 3rd parties don’t sell it. An iframe src= cross-site is just that much easier.

    What is giving you annoying nag screens that warn you about the site using cookies? The site itself? Not Firefox I trust.

    /be

  19. No, it’s the sites, thanks to the well intentioned but poorly executed EU privacy law.
    http://www.ico.org.uk/for_organisations/privacy_and_electronic_communications/the_guide/cookies

    Thankfully it did at least get watered down from the original guidance which suggested users would need to click an ‘OK’ button before they could use a site that uses cookies (i.e. the vast majority of sites).

    The law makes no distinction between the purposes of the cookie, and has no protection against non cookie-based tracking techniques.

  20. I’m really astonished seeing someone who “breaths the web” and at the same time believes that some central institution could be any kind of solution…

    1. Piotr: networks have path dependencies, which can and do breed monopolies, duopolies, Pareto-optimal powers. These then take advantage of who’s-on-first (or second) to tilt the playing field. Fighting fire with fire is sometimes necessary, e.g. all browsers have anti-phishing and anti-malware protection based on centralized or consolidated info.

      This is a deep problem, and CCH is not “the” solution, just another source of info.

      /be

Comments are closed.