Bits on the Wire

Foundations for Understanding Velocity

Mark Nottingham
@mnot

Follow Along: http://www.mnot.net/talks/bits-on-the-wire/

Abstractions.

APIs, libraries, protocols, services


  >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
  >>> r.status_code
  200
  >>> r.headers['content-type']
  'application/json; charset=utf8'
  >>> r.encoding
  'utf-8'
  >>> r.text
  u'{"type":"User"...'
  >>> r.json()
  {u'private_gists': 419, u'total_private_repos': 77, ...}
          

Requests


  function handler() {
    if(this.readyState == this.DONE) {
      if(this.status == 200 && this.responseXML != null &&
         this.responseXML.getElementById('test').textContent) {
        processData(this.responseXML.getElementById('test').textContent);
        return;
      }
      // something went wrong
      processData(null);
    }
  }

  var client = new XMLHttpRequest();
  client.onreadystatechange = handler;
  client.open("GET", "unicorn.xml");
  client.send();
          

XmlHttpRequest

Details are hidden.

  17:35:09.582380 IP6 mail.ietf.org.http > 2001-44b8-4170-ce00-0443-ba2c-387f-1861.static.ipv6.internode.on.net.49306: Flags [S.], seq 3049445547, ack 2649890532, win 5712, options [mss 1440,sackOK,TS val 198299816 ecr 1526375783], length 0
  	0x0000:  c82a 1457 1b19 0004 ed38 f73d 86dd 6000  .*.W.....8.=..`.
  	0x0010:  0000 0024 0632 2001 1890 123a 0000 0000  ...$.2.....:....
  	0x0020:  0000 0001 001e 2001 44b8 4170 ce00 0443  ........D.Ap...C
  	0x0030:  ba2c 387f 1861 0050 c09a b5c2 d8ab 9df2  .,8..a.P........
  	0x0040:  1ee4 9012 1650 8e52 0000 0204 05a0 0402  .....P.R........
  	0x0050:  080a 0bd1 d0a8 5afa a567                 ......Z..g
  17:35:09.582458 IP6 2001-44b8-4170-ce00-0443-ba2c-387f-1861.static.ipv6.internode.on.net.49306 > mail.ietf.org.http: Flags [.], ack 1, win 65535, options [nop,nop,TS val 1526375983 ecr 198299816], length 0
  	0x0000:  0004 ed38 f73d c82a 1457 1b19 86dd 6000  ...8.=.*.W....`.
  	0x0010:  0000 0020 0640 2001 44b8 4170 ce00 0443  .....@..D.Ap...C
  	0x0020:  ba2c 387f 1861 2001 1890 123a 0000 0000  .,8..a.....:....
  	0x0030:  0000 0001 001e c09a 0050 9df2 1ee4 b5c2  .........P......
  	0x0040:  d8ac 8010 ffff be84 0000 0101 080a 5afa  ..............Z.
  	0x0050:  a62f 0bd1 d0a8                           ./....
  17:35:09.585939 IP6 2001-44b8-4170-ce00-0443-ba2c-387f-1861.static.ipv6.internode.on.net.49306 > mail.ietf.org.http: Flags [P.], seq 1:329, ack 1, win 65535, options [nop,nop,TS val 1526375986 ecr 198299816], length 328
  	0x0000:  0004 ed38 f73d c82a 1457 1b19 86dd 6000  ...8.=.*.W....`.
  	0x0010:  0000 0168 0640 2001 44b8 4170 ce00 0443  ...h.@..D.Ap...C
  	0x0020:  ba2c 387f 1861 2001 1890 123a 0000 0000  .,8..a.....:....
  	0x0030:  0000 0001 001e c09a 0050 9df2 1ee4 b5c2  .........P......
  	0x0040:  d8ac 8018 ffff 0d02 0000 0101 080a 5afa  ..............Z.
  	0x0050:  a632 0bd1 d0a8 4745 5420 2f20 4854 5450  .2....GET./.HTTP
  	0x0060:  2f31 2e31 0d0a 486f 7374 3a20 6965 7466  /1.1..Host:.ietf
  	0x0070:  2e6f 7267 0d0a 5573 6572 2d41 6765 6e74  .org..User-Agent
  	0x0080:  3a20 4d6f 7a69 6c6c 612f 352e 3020 284d  :.Mozilla/5.0.(M
  	0x0090:  6163 696e 746f 7368 3b20 496e 7465 6c20  acintosh;.Intel.
  	0x00a0:  4d61 6320 4f53 2058 2031 305f 385f 3329  Mac.OS.X.10_8_3)
  	0x00b0:  2041 7070 6c65 5765 624b 6974 2f35 3336  .AppleWebKit/536
  	0x00c0:  2e32 392e 3133 2028 4b48 544d 4c2c 206c  .29.13.(KHTML,.l
  	0x00d0:  696b 6520 4765 636b 6f29 2056 6572 7369  ike.Gecko).Versi
  	0x00e0:  6f6e 2f36 2e30 2e34 2053 6166 6172 692f  on/6.0.4.Safari/
  	0x00f0:  3533 362e 3239 2e31 330d 0a41 6363 6570  536.29.13..Accep
  	0x0100:  743a 2074 6578 742f 6874 6d6c 2c61 7070  t:.text/html,app
  	0x0110:  6c69 6361 7469 6f6e 2f78 6874 6d6c 2b78  lication/xhtml+x
  	0x0120:  6d6c 2c61 7070 6c69 6361 7469 6f6e 2f78  ml,application/x
  	0x0130:  6d6c 3b71 3d30 2e39 2c2a 2f2a 3b71 3d30  ml;q=0.9,*/*;q=0
  	0x0140:  2e38 0d0a 444e 543a 2031 0d0a 4163 6365  .8..DNT:.1..Acce
  	0x0150:  7074 2d4c 616e 6775 6167 653a 2065 6e2d  pt-Language:.en-
  	0x0160:  6175 0d0a 4163 6365 7074 2d45 6e63 6f64  au..Accept-Encod
  	0x0170:  696e 673a 2067 7a69 702c 2064 6566 6c61  ing:.gzip,.defla
  	0x0180:  7465 0d0a 436f 6e6e 6563 7469 6f6e 3a20  te..Connection:.
  	0x0190:  6b65 6570 2d61 6c69 7665 0d0a 0d0a       keep-alive....
          

Abstractions are BRILLIANT!


"Standing on the shoulders of giants"

Managing
Complexity

http://www.flickr.com/photos/karlfrankowski/4964679574/

Specialisation

http://www.flickr.com/photos/barnshaws/8291192036/

Reuse

http://www.flickr.com/photos/zemlinki/3418067412/

But they can also

SUCK.

Abstractions are "Leaky."

http://www.flickr.com/photos/leftymgp/8095825898/

They're Inefficient.

http://www.flickr.com/photos/coda/8875325/

Problems You Can't See.

http://www.flickr.com/photos/norby/16692225/

Understanding the abstractions you use (and they use) can help you debug, extend, and optimise.

Abstracting

a Distributed System is Asking for

Trouble

A Note on Distributed Computing

Differences between local and distributed:

  • Latency (obvious, right?)
  • Memory Access (still obvious)
  • Partial Failure
  • Concurrency

What Abstractions Does
the Web Use?

Down the RFC1112 Rabbit Hole

  • Application: URL ➠ HTTP ( ➠ TLS ) ➠ DNS
  • Transport: TCP
  • Internet: IP[v4, v6]
  • Link: ...

A Typical Web Request


  >>> r = requests.get('http://www.mnot.net/photo')
	        

Input: URI or IRI


  >>> r = requests.get('http://www.mnot.net/photo')
	        
  • URI: ASCII string
  • IRI: UTF-8 string

What we need

  • DNS needs an ASCII hostname
  • HTTP needs an ASCII request-target
  • TCP will need an integer port

  >>> urlsplit("http://www.mnot.net/photo")
  SplitResult(
    scheme='http',
    netloc='www.mnot.net',
    path='/photo',
    query='',
    fragment=''
  )
          
  • hostname for DNS comes from netloc
  • request-target for HTTP is path + query
  • port is either in netloc or default

But what about...

http://JP納豆.例.jp/引き割り.html


  >>> urlsplit(u'http://JP納豆.例.jp/引き割り.html')
  SplitResult(
    scheme=u'http',
    netloc=u'JP\u7d0d\u8c46.\u4f8b.jp',
    path=u'/\u5f15\u304d\u5272\u308a.html',
    query='',
    fragment='')

  >>> gethostbyname(u'JP\u7d0d\u8c46.\u4f8b.jp')
  Traceback (most recent call last):
    File "", line 1, in 
  UnicodeEncodeError: 'ascii' codec can't encode 
	characters in position 2-3: ordinal not in 
	range(128)
          

IRIs


  >>> urlsplit(u'http://JP納豆.例.jp/引き割り.html'
  ... ).netloc.encode('idna')
  'xn--jp-cd2fp15c.xn--fsq.jp'
  >>> gethostbyname('xn--jp-cd2fp15c.xn--fsq.jp')
  '62.116.181.25'

  >>> urllib.quote(
  ... urlsplit(u'http://JP納豆.例.jp/引き割り.html')
  ... .path.encode('utf-8'))
  '/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html'
          
  • hostname is IDNA-encoded
  • path and query are percent-encoded UTF-8

This is because neither DNS nor HTTP uses UTF-8.

Next Step:

HyperText Transfer Protocol

The Spec: RFC2616

The Spec Yet to Be: HTTPbis

HTTP Request Bits


GET / HTTP/1.1
Host: www.etsy.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.14 (KHTML, like Gecko) Version/6.0.1 Safari/536.26.14
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
DNT: 1
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Cookie: uaid=uaid%3DVdhk5W6sexG-_Y7ZBeQFa3cq7yMQ%26_now%3D1325204464%26_slt%3Ds_LCLVpU%26_kid%3D1%26_ver%3D1%26_mac%3DlVnlM3hMdb3Cs3hqMVuk_dQEixsqQzUlNYCs9H_Kj8c.; user_prefs=1&2596706699&q0tPzMlJLaoEAA==
Connection: keep-alive

HTTP Response Bits


HTTP/1.0 200 OK
Server: Apache
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Set-Cookie: gzip=1; path=/; domain=.etsy.com
Set-Cookie: etala=111461200.758229460.1371513249.1371513249.1371513249.1.0; expires=Thu, 18-Jun-2015 11:31:41 GMT; path=/; domain=.etsy.com
Set-Cookie: etalb=111461200.1.10.1371513249; expires=Tue, 18-Jun-2013 00:24:09 GMT; path=/; domain=.etsy.com
Set-Cookie: last_browse_page=%2F; path=/; domain=.etsy.com
X-Recruiting: Is code your craft? http://www.etsy.com/careers
Set-Cookie: uaid=uaid%3DWk-Kje6yrVP1mQv26h4cRVIwjvkb%26_now%3D1371513249%26_slt%3DucnavmQa%26_kid%3D1%26_ver%3D1%26_mac%3DkMp6qo5aPuE3bzibzJVoWxlNC0Tz2wWvcoMNR0ZbgIU.; expires=Wed, 17-Jun-2015 23:54:09 GMT; path=/; domain=.etsy.com; httponly
Set-Cookie: autosuggest_split=1; expires=Tue, 18-Jun-2013 23:54:09 GMT; path=/; domain=.etsy.com
Set-Cookie: user_prefs=1&2596706699&q0tPzMlJLaoEAA==; expires=Tue, 17-Jun-2014 23:54:09 GMT; path=/; domain=.etsy.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8
Accept-Ranges: bytes
Date: Mon, 17 Jun 2013 23:54:09 GMT
Age: 0
X-Served-By: cache-sv63-SJC3
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1371513249.602169991,VS0,VE137
Vary: Accept-Encoding
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Via: 1.1 varnish, 1.1 localhost:3128 (squid/2.7.STABLE3)
Connection: close

Tool:

REDbot

HTTP Concurrency


htracr

Head of Line Blocking

  • Usually, one outstanding request per connection
  • Even with pipelining, a big / slow response blocks
  • State of the Art: Use 4-8 persistent connections

HTTP Intermediaries

The good guys (1):

Proxies

  • Explicitly Configured by Client

function FindProxyForURL(url, host) {
        // our local URLs from the domains below example.com don't need a proxy:
        if (shExpMatch(host, "*.example.com"))
        {
                return "DIRECT";
        }

        // URLs within this network are accessed through
        // port 8080 on fastproxy.example.com:
        if (isInNet(host, "10.0.0.0", "255.255.248.0"))
        {
                return "PROXY fastproxy.example.com:8080";
        }

        // All other requests go through port 8080 of proxy.example.com.
        // should that fail to respond, go directly to the WWW:
        return "PROXY proxy.example.com:8080; DIRECT";
}					
				

The good guys (2):

Gateways

  • Explicitly Configured by Server

The bad guys:

Interception Proxies

  • Configured by the Network Operator

Virus Scanners

  • strip Accept-Encoding
  • block responses
  • buffer responses

Content Modification

  • insert ads
  • filter content
  • adapt content

Take-away:

Cache-Control: no-transform
or TLS

Captive Portals

Caching

1. Freshness

2. Validation

Explicit Freshness


Cache-Control: max-age=60
Expires: Tue, 18 Jun 2013 06:09:22 GMT
				

NOT

Pragma

  • One-Second Granularity
  • Some Caches Ignore Small Numbers
  • Max-age beats Expires
  • Invalid value means it's stale

Heuristic Freshness

  • For status codes like 200, 404...
  • Usually based upon Last-Modified, but
  • No algorithm provided

The Other Half of Freshness:

Stale Responses

Is This Cacheable?


HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close

My personalised content.

Caching When You Least Expect It

Validation


GET /foo HTTP/1.1
If-None-Match: "abcdefg"
				

HTTP/1.1 304 Not Modified
Content-Type: text/html
				

Note:

If-Modified-Since validation
doesn't need to be based upon Last-Modified.

Next Step:

Where Are We Going?

DNS

The APIs


    >>> socket.gethostbyname("www.mnot.net")
    '50.56.234.188'
    >>> socket.getaddrinfo("www.mnot.net", 80)
    [(2, 2, 17, '', ('50.56.234.188', 80)), 
     (2, 1, 6, '', ('50.56.234.188', 80))]
  

The spec: RFC1035

(with some help from RFC1034)

The Bits

DNS OPCODEs

  • 0 - Query
  • 2 - Status

See IANA for the full list

DNS RCODEs

  • 0 - No Error
  • 2 - Server Failure
  • 3 - Non-Existant Domain (NXDOMAIN)
  • 5 - Query Refused

See IANA for the full list

DNS Record Types

  • 1 - A (host address)
  • 2 - NS (authoritative name server)
  • 5 - CNAME (canonical name for alias)
  • 15 - MX (mail exchange)
  • 16 - TXT (text)
  • 28 - AAAA (IP6 address)
  • 33 - SRV (server selection)

See IANA for the full list

Getting DNS onto the Wire

  • UDP-based (usually)
    • lightweight
    • 512-byte limit
    • retries and timeouts
  • TCP-based (optionally)

Tool: dig


DIG(1)                               BIND9                              DIG(1)

NAME
       dig - DNS lookup utility

SYNOPSIS
       dig [@server] [-b address] [-c class] [-f filename] [-k filename] [-m]
           [-p port#] [-q name] [-t type] [-x addr] [-y [hmac:]name:key] [-4]
           [-6] [name] [type] [class] [queryopt...]

       dig [-h]

       dig [global-queryopt...] [query...]

DESCRIPTION
       dig (domain information groper) is a flexible tool for interrogating
       DNS name servers. It performs DNS lookups and displays the answers that
       are returned from the name server(s) that were queried. Most DNS
       administrators use dig to troubleshoot DNS problems because of its
       flexibility, ease of use and clarity of output. Other lookup tools tend
       to have less functionality than dig.
								 

DNS Resolvers

  • In OS ("stub" resolvers)
  • On Network (e.g., BIND)
  • In Process (e.g., Chrome)

DNS Forwarding
and Recursion

DNS Forwarding

DNS Forwarding

DNS Recursion

DNS Recursion

DNS Recursion

DNS Recursion

DNS Recursion

DNS Recursion

Implications of DNS' Design

Latency

chrome://flags

Histogram: DNS.AttemptSuccessDuration recorded 204 samples, average = 175.4 (flags = 0x1)
0     ------------------------------------------------------------------------O (7 = 3.4%)
1     ... 
18    ---O                                                                      (1 = 0.5%) {3.4%}
21    ----------O                                                               (3 = 1.5%) {3.9%}
24    ---------------------O                                                    (8 = 3.9%) {5.4%}
28    ----------O                                                               (4 = 2.0%) {9.3%}
32    -------------------O                                                      (9 = 4.4%) {11.3%}
37    -------------------O                                                      (9 = 4.4%) {15.7%}
43    ------------O                                                             (6 = 2.9%) {20.1%}
50    ------------O                                                             (6 = 2.9%) {23.0%}
58    ----O                                                                     (2 = 1.0%) {26.0%}
67    ------------O                                                             (6 = 2.9%) {27.0%}
77    --------O                                                                 (4 = 2.0%) {29.9%}
89    ------------O                                                             (6 = 2.9%) {31.9%}
103   -----------------------------------------O                                (20 = 9.8%) {34.8%}
119   ---------------------------------O                                        (16 = 7.8%) {44.6%}
137   --------------O                                                           (7 = 3.4%) {52.5%}
158   -------------------O                                                      (9 = 4.4%) {55.9%}
182   ------------------------------------------------------------O             (29 = 14.2%) {60.3%}
210   -----------------------------O                                            (14 = 6.9%) {74.5%}
242   ---------------------------O                                              (13 = 6.4%) {81.4%}
279   ----------------O                                                         (8 = 3.9%) {87.7%}
322   --------O                                                                 (4 = 2.0%) {91.7%}
372   --------O                                                                 (4 = 2.0%) {93.6%}
429   --O                                                                       (1 = 0.5%) {95.6%}
495   ----O                                                                     (2 = 1.0%) {96.1%}
571   O                                                                         (0 = 0.0%) {97.1%}
659   ----O                                                                     (2 = 1.0%) {97.1%}
761   --O                                                                       (1 = 0.5%) {98.0%}
878   --O                                                                       (1 = 0.5%) {98.5%}
1013  ... 
1349  --O                                                                       (1 = 0.5%) {99.0%}
1557  O                                                                         (0 = 0.0%) {99.5%}
1797  --O                                                                       (1 = 0.5%) {99.5%}
2074  ... 
					

Global Load Balancing





The Enemy

Chromium Issue 122566

But Wait, There's More...

DNS (in-)Security

DNS (in-)Security

DNS (in-)Security

Mitigation:

Port Randomisation

NAT/PAT

DNSSEC

RFC4034 RFC4035

Before

sonic.net.berkeley.edu. 10801   IN      A       128.32.155.9

After

sonic.net.berkeley.edu. 10801   IN      A       128.32.155.9
sonic.net.berkeley.edu. 10801   IN      RRSIG   A 5 4 10801 20091221121838 20091215121838 19760 net.berkeley.edu. VarS4+QL+XVcP0LDXslOaPSWZCi6JutrzEM1JJ0H0hk5rTxmT1rtLskT1JOTxd64VeQtChSUxgTEB5hBKs16PSvTGiWYoFrSBN9tQ4DFTMQYaMNINNUDM51ny5KtOvdjW78Ogv9lRfe8Pg+ESpwYgtFUjzG80fKiZsf4qYpOu1c=

Remember, 512 bytes!

  • Many firewalls, NATs don't like TCP-based DNS
  • EDNS0 is another approach, but has its own problems

Other Obstacles

Worth Mentioning

  • Amplification Attacks
  • Application Over-Caching
  • Introducing new Record Types

Next Step:

Open A Connection

Transmission Control Protocol

The API


    >>> sock = socket.socket()
    >>> sock.connect(("www.mnot.net", 80))
  
  

The Spec

RFC793

The Bits

TCP Three Way Handshake

TCP Fast Open


draft-ietf-tcpm-fastopen

Managing Data Flow

  • Window - receiver capacity
  • Congestion Window - network capacity

The Internets

One User

Congestion

Fairness

Congestion Control

TCP Slow Start

Congestion Avoidance

Remember Those Big HTTP Headers?

  • 83 asset requests
  • IW = 3
  • ~1,400 bytes of headers
  • 7-8 Round Trips


Big req * many reqs / small IW = SLOW

HTTP Concurrency Redux


htracr

Optional Step:

Protecting Communication

Transport Layer Security

The Spec: RFC5246

The TLS Handshake

TLS Session Resumption

Certificates and CAs

  • Certificate Revocation Lists (CRLs)
  • Online Certificate Status Protocol (OCSP)
https://revocation-report.x509labs.com/

OCSP Stapling

TLS is Great, But...

  • Certificate Authorities are broken
  • Client Trust Model is naive

Fixing TLS Man-in-the-Middle

... by "pinning" the key:

... or, by logging:

Next Step:

The Tubes

Internet Protocol

(src_addr, dest_addr)

  • No Reliability
  • No Integrity

IPV4: The Bits

20 bytes

IPV6: The Bits

40 bytes

IPV4 and IPV6

  • Address Space (32 vs. 128 bits)
  • Fragmentation vs. Path MTU Discovery

IPV6 Requires ICMP!

Who are we Connecting To Again?


>>> socket.getaddrinfo('www.google.com', 80)
[ (2, 1, 6, '', ('74.125.232.50', 80))
, (2, 1, 6, '', ('74.125.232.49', 80))
, (2, 2, 17, '', ('74.125.232.50', 80))
, (2, 2, 17, '', ('74.125.232.51', 80))
, (2, 1, 6, '', ('74.125.232.51', 80))
, (2, 2, 17, '', ('74.125.232.52', 80))
, (2, 1, 6, '', ('74.125.232.52', 80))
, (2, 2, 17, '', ('74.125.232.48', 80))
, (2, 1, 6, '', ('74.125.232.48', 80))
, (2, 2, 17, '', ('74.125.232.49', 80))
, (30, 1, 6, '', ('2a00:1450:4010:c04::69', 80, 0, 0))
, (30, 2, 17, '', ('2a00:1450:4010:c04::69', 80, 0, 0))]					

Happy Eyeballs

Packet Loss

  • Queue drop (NOT random)
  • Signal quality (random)

Remember, Packet loss is used as a driver for TCP congestion control.

Bufferbloat

Tools:

Questions?

Thanks.