Tracked Online


How it's done & how you can protect yourself

Techno-Activism 3rd Mondays NYC

October 20th 2014

Julia Angwin's tweet about Strongbox
Strongbox screenshot
The Guardian screenshot
Washington Post screenshot
Facebook Like

The 10,000 foot view

  • Privacy is a fundamental right.
  • The Web's (current) business model is advertising.
  • Advertising calls for collecting everything, always, forever.
The Internet is a surveillance state. Bruce Schneier, security and privacy expert

What is Web tracking?

  • Web tracking is collecting everything you do online.
  • Trackers are parts of pages you visit. For example, Facebook Like buttons.
  • If you click a Like button, we can call that "active" tracking.
  • If you don't, you still get tracked. Let's call that "passive" tracking.
    Facebook Likes can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. Private traits and attributes are predictable from digital records of human behavior, 2012 University of Cambridge study

What gets passively tracked?

  • The page you are on
  • Your browser/computer info (screen size, plugins, OS version, fonts, ...)
  • Your approximate location (from IP address)
  • Anything previously stored by the tracker's domain on your computer (cookies, for example)

Google Analytics is on

70.9%

of the top
10,000 sites.


BuiltWith Trends

Building a
(permanent)
record.

Google Ads Preferences Manager

www.google.com/settings/ads/onweb/

Google Ads settings screenshot

To what end?

Decisions big and small


  • Which ad to show
  • Which price to offer
  • Who doesn't get the job
  • ...

Who are you to a tracker?

  • Advertising platforms (Facebook/Google)
  • Location trackers (Foursquare)
    • In-store (Euclid Analytics, Nomi, ...)
  • Data brokers (Acxiom, Paramount Lists, ...)
  • ...

Display Advertising Technology Landscape

How do I track thee? (on the Web)

Client-side

  • Standard HTTP Cookies
  • Local Shared Objects (Flash Cookies)
  • Silverlight Isolated Storage
  • Storing cookies in PNGs
  • Storing cookies in Web History
  • Storing cookies in HTTP ETags
  • Storing cookies in Web cache
  • window.name caching
  • Internet Explorer userData storage
  • HTML5 Session/Local/Global Storage
  • HTML5 Database Storage via SQLite / IndexedDB
Clear Private Data screenshot

Samy "I'm Popular" Kamkar

Samy Kamkar profile photo

How do I track thee, pt. 2

Server-side: Device/browser fingerprinting

  • Server creates fingerprint based on browser request signals and script queries
    • User Agent
    • Screen Size
    • Fonts
    • Browser plugins
    • IP address
    • ...
  • Hard to detect
  • Can effectively persist across browsers/devices
  • Already an industry: BlueCava, ThreatMetrix, ReputationManager, ...

What are trackers?

Web page elements

  • scripts
  • embedded objects (Flash)
  • images (pixels)
  • iframes
  • ...

Google Analytics

<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXX-X']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script');
    ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ?
      'https://ssl' : 'http://www') +
        '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0];
    s.parentNode.insertBefore(ga, s);
  })();
</script>

Terminology minute!

First-party vs.
third-party vs.
fourth-party

How does passive tracking work?

Request URL: http://www.newyorker.com/strongbox/
Request Method: GET
Status Code: 200 OK

Request headers

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cache-Control: no-cache
Connection: keep-alive
Cookie: mobify=0; mbox=check#true#1372203648|session#1372203589523-979009#1372205448
DNT: 1
Host: www.newyorker.com
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1541.0 Safari/537.36

Response headers

Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: max-age=358
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 30018
Content-Type: text/html; charset=ISO-8859-1
Date: Tue, 25 Jun 2013 23:39:59 GMT
ETag: "a647b0-d556-4dcc54324dd00"
Expires: Tue, 25 Jun 2013 23:45:57 GMT
Last-Modified: Wed, 15 May 2013 17:41:40 GMT
Server: Apache/2.2.15 (Red Hat) mod_ssl/2.2.15 OpenSSL/1.0.0-fips
Vary: Accept-Encoding

Detection by URL matching

  • Intercept requests
  • Compare request URLs to known tracker URLs
  • Cancel requests matching blocked trackers
  • No request, no tracking
  • Adblock Plus, Ghostery, ...

Heuristic-based detection

  • Intercept activity
    • Cookie updates
    • JavaScript queries
    • ...
  • Apply heuristic
  • Cancel requests matching offending domains
  • No request, no tracking
  • Privacy Badger, Chameleon, ...

What can we do about it?

  • Privacy tools can help with passive tracking.
  • Adblock Plus logo  / Ghostery logoDisconnect logoPrivacy Badger logo / ...

  • Incognito/Private Browsing modes help too, but incompletely, since storing things on your computer is only part of the problem.
  • But not with what you search for, what you post, tweet, "like", "favorite", "follow", which apps you install, what you buy with credit cards, ...

Separating tracking from content

You can't, sometimes

  • Disqus
  • Brightcove
  • social buttons
  • ...
Ghostery blocking Disqus Click-to-Play screenshot

See also: Mozilla Lightbeam

Collusion screenshot

See also: Panopticlick

panopticlick.eff.org screenshot

Good reading

THE END

Alexei <alexeiatyahoodotcom@gmail.com>


ghostwords.github.io/ta3m-2014/

github.com/ghostwords/chameleon