Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Safari now blocks Google Analytics on sites, new Privacy Report feature shows (appleinsider.com)
170 points by imheretolearn on June 23, 2020 | hide | past | favorite | 113 comments


This article is incorrect.

There does not appear to be a change between Safari 13.1 and Safari 14 for how it handles GA. I have tested it myself.

With Safari 13.1 Apple made a change [1] to their Intelligent Tracking Prevention that blocks all 3rd party cookies for cross site resources.

The only change in Safari 14 appears to be that it reports which domains have had cookies blocked. The Google Analytics beacon is still sent by the browser.

Benedict Evans (cited in article) since deleted his tweet, but this article seems not yet to have been updated.

[1] https://webkit.org/blog/10218/full-third-party-cookie-blocki...


I haven't looked too deep at the code for this (stupid shared cache…) but yes, it's easy to see that this is just a repackaging of the resource load statistics database (~/Library/Containers/com.apple.Safari/Data/Library/WebKit/WebsiteData/ResourceLoadStatistics/observations.db), which has historically been populated using the notion of "prevalence" and "tracker-like behavior" that has been described by the many WebKit blog posts. Looks like there's nothing new under the hood, and no specific singling out of Google Analytics.


Could someone translate this statement to something a layman might understand? (I'm not a web dev nor am I following SEO news closely enought)

Do you imply that Apple is putting up a misleading interface in the new version of Safari where GA appear blocked while it actually isn't?

That's a pretty bold assertion and the provided link is irrelevant on that.

Again only looking for more information maybe I didn't get your point at all because I'm not in the SEO field.


No, it's that Safari is blocking third-parties under the same policies that it laid out earlier.


Original commenter stated that

> The Google Analytics beacon is still sent by the browser

This is what I'd like some clarification about.

Wether policy changed or if it is just a UI change is another subject. Do they actually block GA or not? And if not does the UI mislead users to think otherwise?


Original commenter here. They are not blocking GA.

Apple's ITP affects both 3rd party HTTP cookies and 1st party JS cookies. There does not appear to be anything but a UI change between Safari 13.1 and 14, from my initial analysis.

GA primarily utilises 1st party JS cookies. If you have GA on example.com then the cookie will be stored against example.com. On of these cookies stores a unique ID for you as a user.

GA will then send 'beacons' (GET requests to Google endpoint, with data passed in URL parameters) for pageviews and other events. Your unique ID is included in these URL parameters, which allows the various events from you to be joined up into a 'session'. That has not changed.

What ITP did previously (just over a year ago, I think) is limit the age of these cookies to auto delete after 7 days, meaning that if you leave a site for more than 7 days your new session won't be able to be joined up with your previous session. That is still the case.

If it is of interest I have a deck [1] that discusses this (link goes to the relevant section - skip to slide 71 if you just want to understand the ITP impact on GA).

I'll caveat this a bit - I've not done an in-depth analysis. The person I strongly recommend following for expert updates on this topic is (my friend) Simo Ahava [2] - he is an authority of GA and the impact of ITP.

[1] https://www.slideshare.net/TomAnthony/browser-changes-that-w...

[2] https://twitter.com/SimoAhava


Simo also has this[1] handy site, for tracking the impact of ITP and other browser cookie restriction schemes.

[1] https://www.cookiestatus.com/


I'm not an expert on this either, but as far as I understand WebKit does not block beacon requests when you click on links. However, it does clear third-party cookies as it did before, which is described in the blog post. So what the commenter above is saying that this is showing instances where third-party cookies are being blocked, not where beacon requests are (which WebKit isn't interfering with). I haven't looked at the code for this in detail yet but from what I saw it seems that the implementation matched with this.


And such beacons are then expected to be anonymous because no cookies data could be send along.

If any tracking oriented company wanted to overcome this for nefarious reasons they wouldn't be able to do that without explicit help from the visited website who would then become legally liable. (For example by including in the beacon call a hash of personal identifier like IP or e-mail disrespecting RGPD and such).

Is that a good enough layman understanding of the stakes here?


TomAnthony described beacons but even without javascript any server you have requested gets IP, OS, Browser (usual server log data). That is ads, trackers, CDNs, images, stylesheets. For google (besides what is usually blocked) it is at least ajax.googleapies.com, fonts.googleapies.com


Suggestion: the access logs your web server collects are enough for basic analytics, and if you are curious for more, you can manually instrument your site a little. You don't need to give Google the power to spy on your users.


Server logs can't do this job well.

If the system is designed well, in a lot of cases the users won't be receiving the resource from your server, they will be receiving it from a) their browser cache, b) their local network cache, c) their ISP's cache, or d) a CDN. Your server logs might be showing several thousand hits while the number of times a page is actually shown to somebody might be tens of thousands of hits.

Server logs are great for logging what is happening on your server, but awful for understanding your user base.


Are b) and c) still a common thing with most of the Traffic now encrypted?

I wouldn't worry about a) too much, as you've already counted them, most sites are "one time use", and those that aren't usually have some interactive or time-sensitive feature that requires short (if any) ttl on any caches not under the site's control. d) is true, but also is somewhat under the site's control and should provide numbers.

Most people use GA for a rough picture of "how many people visit my site?", and stats from any CDN will most likely be enough for that. And browser caching doesn't really concern those people, because most visitors aren't returning visitors.


Is it possible to add a basic ping that pings your server whenever a page loads (cached or not) to get around this?


Yes, with Cache-Control, and then if you turn that product into a SaaS so that not every website has to run a copy, you get Google Analytics.


I think some startups are now trying to offer this as a self hosted service. That sidesteps the concerns around using a half baked, DIY solution as well as issues with privacy and data ownership.


Oh yes they do, and why would you want to break network caching to spy on your users? You can always add some little js to ping your server (logs! Again!) or a non-cacheable tracking pixel.


Sure, you can do that. Google Analytics is merely a free version of that service.


Free to you, the user pays with their data. This is apple giving the user the option not to subsidise the website operator.


Tracking (especially first party) is not some shadowy value extraction scheme. You need at least some data to improve your service and understand what brings traffic to your page.


The developer also pays. By giving Google their entire viewer/user base, so Google knows whom to target if they decide to create an alternative.


> Server logs are great for logging what is happening on your server, but awful for understanding your user base.

There is a high correlation between a usage pattern being interesting and likelihood of it making a representative appearance in the logs.

Understanding usage patterns shouldn't depend on knowing absolute numbers. Assuming "dumb" content is cached relatively uniformly, the usage patterns itself should be sufficiently inferable from the logs. For the non-dumb content, why would consumption of highly dynamic endpoints have distortive cache utilization at all?

To add more, how often high-resolution usage data really translates into actual product improvement? What percentage of sites have that much traffic to justify engineering hours spent on micro-growth-hacking based on this level of data?


Here's another suggestion: Learn how to configure Google Analytics properly.

https://support.google.com/analytics/answer/1011397?hl=en

"If all [data sharing] settings are OFF, your Analytics data is only used to provide and maintain the Analytics service."


Weill, this only "works" for you if you trust google. Given multiple security incidents in the past I am certainly not one of those people.


What security incidents?

Of all tech companies I'd feel most secure with Google being the stewards of my data.


Snowden. They were leaking a lot of data to the NSA and they didn't even know.

Of course, they upped their game since then, but their sheer size will always make them an interesting target and they have to fail only once.


Thank you. I am purposely avoiding convo to be channeled into discussion of this or that incident - those who want to dig into the matter they can find plenty and draw their own conclusions.


If google makes you feel more secure that doesn't mean it's secure. Feeling secure perhaps works for you but if we would assess how they deal with our data they would be quite far from pool of companies we can trust with data.

And that's only based on what we know currently from security incidents.


I have worked for a few years at Google and have a decent insight into all the processes that go into keeping user data secure. It's perfectly fine to have your reservations but so far as I am concerned, I sleep well.

I don't share the same confidence for all my data that is floating around carelessly in some government spreadsheet somewhere, though.


What security incidents?


IIRC you can thank Mozilla for that setting!


The fact that that does not default to 'off' is quite telling.


There are multiple things you can do with GA of course, like IP anonymization, setting a smaller expiration date for that cookie, etc, however trust in Google Analytics is low and that won't change.

Also a lot of websites, especially personal ones, are in violation of ePrivacy. You may not need GDPR consent for analytics, qualifying for a legitimate interest, but you still need a cookie banner when dropping cookies for analytics.

The market is clear about it, analytics have to be first party and privacy preserving.


Collecting server logs doesn't work if you host your website behind Cloudflare or another CDN.

You can setup some sort of pixel that's never cached and that transmits the page via query params, but then the implementation isn't reliable and a PITA.

If you have your own server, something like Matomo.org works better.


Or you could just use Cloudflare logs. It's far more reliable to use log information than JavaScript given the number of people who use blockers of some sort.


Is that available for Free and Pro users? I think that's only available for Enterprise customers, no?


You need CloudFlare Enterprise to get those logs.


I am running the new betas iOS 14, iPadOS 14, and macOS Big Sur.

I love this feature in Safari. You open a new tab or browser window and there is a message area letting you know what is being blocked.

I am sick and tired of large tech companies like Facebook, Google, Amazon, etc. making money off of my data with scant benefit to me.

Not to pick on Facebook and Google, but I only use them now (well, almost) for paid services. I happily pay Facebook for Oculus Quest hardware and entertainment. I happily pay Google for music, books, movies, and Google Cloud Platform (which I personally like much better than AWS).


I've done a bit more reading. This is not as dramatic as it seems. First party GA is not blocked. Access to 3rd party storage may be blocked, but this would not prevent web interactions from being visible to GA.


What is the difference between First party and third party GA?


All cookies set via Javascript code are 1st party, and belong to the site that was in the address bar when the code ran (unless they are in a frame).

GA primarily utilises 1st party JS cookies. If you have GA on example.com then the cookie will be stored against example.com. One of these cookies stores a unique ID for you as a user.

GA will then send 'beacons' (GET requests to Google endpoint, with data passed in URL parameters) for pageviews and other events. Your unique ID is included in these URL parameters, which allows the various events from you to be joined up into a 'session'. That has not changed.

Apple's ITP affects both 3rd party HTTP cookies and 1st party JS cookies.

The main impact of ITP on GA is to limit the age of these cookies to auto delete after 7 days (usually), meaning that if you leave a site for more than 7 days your new session won't be able to be joined up with your previous session.

There is an edge case where ITP 3rd party cookie limits will directly block GA JS cookies, which is when it is running inside an iFrame (hat tip to Simo [1] for alerting me to this), in which case the JS code may not be able to set a cookie and will not fire a beacon.

[1] https://twitter.com/SimoAhava


Strictly speaking, GA is always 3rd party (it would only be 1st party if/when its used to measure google-analytics.com which is not a user-facing domain). For measuring any other website, its 3rd party.

I believe that this would limit GA's ability to store data in cookies on my system (user or session-based attributes). GA could still create unique identifiers based on my user or session-based attributes and store those IDs on their own servers, then call them back up when I revisit the site.

I'm still coming up to speed on this, maybe someone with more experience could correct or clarify my statements.


In the context of web analytics, first-party vs third-party usually refers to how the user is identified, not to where the analytics status is hosted. Google Analytics uses first-party data for its tracking. Third-party tracking builds a universal profile of the user across the internet, first-party tracking builds user profiles that are silo'd to the website.

Google Analytics stores a first-party cookie with a unique ID for each user. This is used to de-duplicate traffic from the same user to the same domain. This cookie not shared across websites. If you go to example.com and website.com, those are two different first-party cookies without views into each other.

Optionally, Google Analytics can be configured correlated this with a third-party cookie from google.com for advertising purposes. This allows the website owner for example.com to connect a user converting on example.com to that same user having seen or clicked an ad on google.com. It also allows the website owner to target ads on google.com towards users based on their activity on example.com

Previous iterations of Safari's ITP have outright blocked the third-party cookie, and have limited first-party cookies based on whether they're HTTP cookies or JavaScript cookies.


Source?


There are a few threads in Twitter where people are discussing the validity of this article's claim. Here are the two most prominent (referenced in the article) [1], [2].

[1]: https://twitter.com/TomAnthonySEO/status/1275524965077524482

[2]: https://twitter.com/SimoAhava/status/1275151672218705922


Simo Ahava is an authority on GA.


It’s time. GA does seem like a privacy destroying data grab and it’s so hard to argue with marketing because it’s free (for us).

I like the fact that open source and managed analytics which is privacy-sensitive is becoming a thing.

When looking for a privacy-sensitive analytics service I stumbled on https://plausible.io/ which checked my boxes at the time. I imagine that there are more out there.


I have no love for Google Analytics, but as a small-scale publisher, I have serious questions about what tools can replace it:

1) Which analytics can a website employ that is not geared toward enriching the analytics platform developer by violating site visitor privacy?

2) Do those analytics mechanisms have any kind of aftermarket, such that third-party developers integrate it into publishing platforms that publishers use.

3) What do those analytics mechanisms and the associated tooling cost?


I think collecting analytics from the server logs is going to get more popular again, though a problem for SPAs? We could call it Edge Analytics this time around and run it on cloudflare.

For publishing systems, couldn't wordpress report back analytics data to itself, on its own domain? Something like that existed but i'm not sure if it was using Jetpack servers or not. If this isn't already possible, it seems like an obvious thing for someone privacy concious to build.


This exists over here: https://github.com/ibericode/koko-analytics - never used it myself though.


I'm not sure about all of your questions but there's platforms like matomo out there, that seem to be able to handle some of the use cases for ga.

Matomo: #1 Secure Open Web Analytics Platform https://matomo.org/


I've been very happy with Plausible.io for my dinky little blog. It has all the information I need, and isn't as intrusive as GA.


I use a self-hosted Matomo instance, in a Docker setup on my own server. Low maintenance, works great:

https://matomo.org/

You can turn off the cookies too, anonymize IPs (the default actually) and if you do that you don't even need GDPR consent or cookie banners.


Goat Counter (yeah, weird name) seems to be getting popular, and it's Open Source:

https://www.goatcounter.com


There's been a recent thread with lot of options discussed here: https://news.ycombinator.com/item?id=23560823, also https://lobste.rs/s/lvdj3w/lightweight_alternatives_google


For small-scale anything, the best analytics are simply talking to your customers.


The webkit engineer behind "Safari’s Intelligent Tracking Prevention" on Twitter confirmed when asked there have been no changes to how it blocks GA in the new version, it just is now visible in the report. And details, ITP doesn’t block loads, it blocks cookie access, deletes cookies and website data for sites you don’t interact with, downgrades referrers, and more. Its cookie blocking applies to all third parties." I responded that it seems all the new sites are reporting that GA is now being blocked/prevented in the new version are incorrect then. And these headlines appearing on Hacker News it looks like will only further unintentionally spread that misinformation.


That little rumour about Apple buying DuckDuckGo seem to make more and more sense by the day.


DDG is also front and center in the Big Sur new features for safari page.

https://www.apple.com/macos/big-sur-preview/


If you lookup street addresses within DDG it now returns a result with AppleMaps.

I don't mind this but I hope they've partnered with Apple and not sold out.


I've had that for what I though was 2 years, so I looked it up and it appears they partnered with Apple back in Jan 2019.

https://www.theverge.com/2019/1/15/18183653/duckduckgo-apple...


Nonsense that Apple would buy DuckDuckGo. DGG is just a wrapper around Bing Search. Therefore Apple could recreate DGG from scratch in a month.


The value in DDG is the userbase and brand.


I'm not convinced of that. Apple is in a position to invent a search engine, make it real pretty (if not as powerful), and slap it as the default on iOS Safari. Throw in some claims that "it's the most private" so the apple fan base eats it up, and I think apple would take a significant chunk of iOS searches overnight.


It seems like an obvious win, but Apple are terrible when it comes to building web services. They don’t have the capability to pull this off in-house and they don’t seem willing to bring in leadership from their competitors to improve on that front. An acquihire or partnership seems to be the only way Apple could get this done successfully.


LOL, and what about the Apple brand ?


Not saying there isn't value in Apple brand, just that DDG's value is in it's brand.


Apple is orders of magnitude bigger on both accounts.


Not saying there isn't value in Apple brand, or that it's bigger, just that DDG's value is in it's brand.


I'd love if DDG offered email, but they were just a front-end for your iCloud.com email. My Gmail usage would drop a ton.

I do have a iCloud email address, but on a desktop I have to leave the browser and open up an email client. I only open email clients while at work cause I have to.


You do realize that you can read e-mail from the browser if you log into icloud.com?


Yeah, but if there was a link within DDG search that took me to a branded DDG email service that would be great. It would just be a cool DDG email front end design, but using iCloud or another 3rd party they partner with. I don't think DDG wants to create their own email service, so this was a thought and Apple/DDG's marketing are similar.


If that happens, it's safe to assume that Apple will simply try to replicate Google's business model. iAds may have been a failure, but I would imagine digital ad market is simply too lucrative for Apple to leave to Google and Facebook (and Bing and Yahoo/Verizon) alone.


How about the 500 other tracking/analytics companies?

Even Adobe is now in analytics business.


I think the biggest problem is one entity having insights on all web users and all the sites they visit. If each site had their own internal analytics, which are limited to that domain only and maybe only session-bassed, the privacy improvements would be huge.


From the screenshot in the article, it looks like many other trackers are blocked as well. It's just less title worthy than Google's.


Adobe somehow hosts critical CSS assets for some adobe.com pages using their tracking domain(s), resulting in a broken page in Firefox due to tracking protection.

I think they do this to encourage people to turn off FF or other tracking protection. Instead they lost an enterprise license renewal as I was “encouraged” to look elsewhere.


Adobe is on my list, affiliated with the domain "demdex.net".


Every time I see that domain name, I think of skin mites (Demodex folliculorum).


Google Analytics is the default attribution tool for many (most?) small and medium sized digital marketing agencies. Most businesses have some kind of GA account set up with a few years of history and it's quick to leverage the data.

With ad and taking blocking becoming more popular all the time, there will reach a tipping point where the data is unreliable enough that ROI can no longer be proved with it. It will be interesting to see what happens then in this space.

Alternative attribution models, more insidious fingerprinting?


What's a good tool for doing server side analytics?

Like for a small site I've made for fun. The only way I can easily know how many users visit is via Google analytics.

Well easily that is.


When I was a child, I used AWStats. It's still around. Sorry this isn't of much help! Honestly I just wanted to reminisce :)


I fetch logs and run https://goaccess.io/


Won't this be the norm with pushing to get rid of third party cookies? Or are we backpedaling on that now?


We run Matomo (formerly Piwik), and I recommend it. It's actually better than Google Analytics in some ways, and you get your own data. This makes deeper analytics easier and also helps protect user privacy since the data is held only by the operators of the site being visited.


Are you happy with how fast the data loads in Matomo? I tried their online demo and everything load very slowly and content kept jumping around, it felt pretty bad in terms of UX.


Agreed on this – everything in Matomo is slow and I find the UI clunky.


I am actually working on an alternative ( https://usertrack.net/ ), even though I never used Matomo. I did try their demo a few days ago and, being so popular, I expected it to be really good, but it felt like an outdated platform that couldn't care less about UX.


Piwik utterly failed (for me) for any semi-high traffic site. Simply couldn't hand large volume logs fast enough no matter what hardware we threw at it:(


Good. Currently I have to modify my hosts file or use something like NextDNS to block Google and others from gathering data from me even when I’m not explicitly using their services.


Is there any chance that this is about reducing Google's competitive advantage and not really about users' privacy?


I wish Apple would enable this at the OS level, though in the long run it won't matter. People will proxy requests and find all sorts of ways around it. It's an arms race with billions of dollars on one side and a vague sense of "privacy would be nice" on the other.


I used simpleanalytics for a bit and it was nice. Less granular data but that is by design.


This is not new. Safari has been blocking third-party cookies for a while. Even Chrome offers this setting now. With that setting turned on, Chrome will also block Google Analytics when you are not on a google.com page.


Federated analytics soon? - Google gives you a docker container and an API key to run that collects metrics on your public/own cloud, that gets slurped up to Google and you still get your demographic info.


Clever, but could just send them logs


If that's really the case, 3rd party analytic service can still work by taking over the traffic from the service domain, then proxy back the traffic to the web server, somehow like cloudflare.


While it certainly blocks it by default in the new Safari beta, can anyone confirm if the existing Safari which already blocked third party trackers explicitly blocked google analytics?


This is pretty awesome. All browsers should do this with malware.

I get that people like statistics, but if you share it with a third party, it can safely be condemned.


Shots fired - Apple will take Google’s money to be the default search engine but won’t let them track “their” users.


I've been using Simple Analytics since day one so I can kick back and relax I guess


Personally I am not at all concerned when my data is used for marketing. Instead, I like learning about products tailor-made for me.

By privacy I understand not sharing data with the public or with the oppressive regimes when it may lead to harassment, firing or arrests of people.

But using my data to sell me some watches or bike parts? I'm in.


The internet is used for more than shopping. Your searches should not be public domain. Your health records. Your location. The time you go to bed at night. Where your house is. How many children you have. When you’ll next be away from home on vacation. That you looked around on a dating website while married. That you’re gay in a country where it carries the death penalty.


Is this currently public?

I advertise on Facebook but I can't see anyone's searches or current location. Although I can target people who are interested in LGBT or people who have been to SFO recently. If they like my post, I can also see their name and photo.

That seems reasonable to me.


What about using your data to send you malware and to steal money from your bank account for example?

I've had to use safari on Mac. Sites are constantly trying to auto download dmg files (executables on mac) of malware. The most common ad is saying your mac is broken and needs to download an antivirus (that's fake and trying to get you to download a virus). Top google results for "download vlc" or equivalent search is pointing to fake websites distributing malware.

All of this is the result of ads, analytics and mac detection.

I've tried to setup ublock origin on safari but apple blocked safari extensions and doesn't allow to replace safari fully with another browser.


I really thinks it's a weird move. Safari user will disapear from our understanding of the web. We might not design and code for Safari because these users will become virtually innexistent.

I am a small publisher, Safari users are demonetized , untracable, i might just say to them gtfo.


Aren't safari users more rich on average, and more willing to spend money on your website?

If you need to track your visitors without thinking about their privacy, perhaps your business model isn't great.


Haha because surveillance capitalism isn’t making hundreds of billions with data!


That would be leaving a lot of money on the table up for grabs by your competitors that do know how to see through this situation.


I'm wondering how self-hosted solutions like Matomo will be treated?


How do you think they should be treated?


Tag manager too?


I see Google Tag Manager in my list.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: