Whose Servers Are You Sharing My IP With?

View

I'm sat writing this in Austria. If I run the traceroute command to try to determine the network path from my laptop to fonts.googleapis.com, I see a number of different servers along the route. They start within Austria, and then hop over the Atlantic to an obscurely-named domain that Whois informs me is registered to Google LLC in the US state of California. This means that an include as seemingly-innocuous as the one below (would you think twice about adding this to your website?) exposes:

  • My approximate location (from my IP address)
  • And the very fact I visited your website (which alone could cause me privacy harms or even see me prosecuted)
  • To not just Google, but to the state government and law enforcement of California and by extension to other US government agencies; and also to any third-party that Google choose to transfer this data to.
<html>
  <head>
    <link rel="stylesheet"
          href="https://fonts.googleapis.com/css2?family=Crimson+Pro"
  ...

This means that, under GDPR, this simple font import counts as a cross-border data transfer of personal data - to a country that (since the Schrems II decision) isn't considered to provide an adequate level of data protection for EU users. Double plus ungood. You just exposed me as the user to potential privacy harms and put your company at legal risk. Maybe it's time to just host the font on your own server? Fortunately, Google fonts are free to download and mostly have quite permissive licenses, so this is straightforward. There are some minor performance drawbacks to doing so, which Google describe here.



Case Study: Why Is Stripe Everywhere?

Here's a question for all the devs. Have you ever been poking in your browser's Developer Tools and wondered: why is this page bundling Stripe's JavaScript and making network calls to Stripe, when there's no payment or checkout dialog here? (If you haven't, perhaps I need to reassess my hobbies 😜). This article will enlighten you to the true invasiveness and deceptive design of Stripe's JavaScript library, and discusses a range of mitigations at the code level. By default, just importing the library - even if you never run any of its functions - starts a telemetry thread sending data back to Stripe, and they actively encourage you to import this on every page of your website. Stripe support responded that this "is in the best interests of the user." 🤦‍♀️

📚 Reading Assignment: Stripe is Silently Recording Your Movements On its Customers' Websites - Michael Lynch (2020)

"Note that my app never even calls the loadStripe function. Stripe.js begins tracking user behavior as soon as the client app imports the library. For a single-page app, this occurs the moment the end-user loads any page of the website."

"Integrating it according to Stripe’s documentation causes the library to share user tracking data with Stripe throughout the user’s browsing session...This data includes:

  • Full URLs of each page the user visits, including query parameters and URL fragments [which can contain sensitive data]
  • Timings of how quickly the user moves their mouse during browsing
  • A cookie that allows Stripe to track the same user across the web"


User Analytics and Third-Party Integrations

What about analytics and user tracking libraries that you voluntarily include in your code (unlike Stripe's), or other third-party integrations like search providers? The Appendix of the Guide to GDPR for App Developers offers advice for a range of common libraries such as Google Analytics, Google DoubleClick, and the Facebook SDK. In general:

  • Never blindly import code. Carefully examine what it does first. What network requests is it making, and what data do they contain? (Don't just shrug and ignore it if the data's encoded or encrypted.) Which country is the data going to? Is this a new cross-border data transfer for your product?
  • Check the developer's Privacy Policy. Is it compatible with the jurisdictions in which you operate and your own product's Privacy Policy? Do they sell user data on to third parties? Are they setting cookies that they will use to track your users across other websites?
  • Do they offer privacy configuration settings? Does the integration wait until user consent is given before any data is transferred? If you use it for analytics, can you anonymize the stored IP addresses and mask sensitive data?
  • Do they provide support for exercising data subject rights? If one of your users asked you to delete all their personal data, would you be able to delete it from this third-party system? What is their data retention policy?

Most third-party integrations come out terribly when assessed with the questions above. Where possible, favor integrations that have been designed to be privacy-preserving. For example, Google Analytics is hugely invasive and is (as of 2023) illegal to use to track EU users since Schrems II, but there are privacy-preserving alternatives that do not repurpose the collected data for their own uses, such as Matomo, Simple Analytics, and Plausible Analytics. For low-traffic websites, Shynet is an open-source self-hosted solution. Note that some options by design have fewer features than Google Analytics because they deliberately collect less data. For example, Plausible only collects aggregate data and never collects IP addresses, while Matomo provides the option to track and record individual user sessions. Assess your needs realistically: if you only rarely examine session recordings (for example for debugging), then you can ditch them, saving money, protecting your users, and reducing your legal risk in one swoop.

💻 Exercise: try out The Markup's free tool, Blacklight, to scan your website. Which third-party trackers do you have? Are you using Google Analytics or the Facebook Pixel? Do you know how they are configured and what user data they are collecting? What privacy harms might occur because Google and Facebook know a user visited a specific webpage on your site?



Further Reading