CMPUT 404

Web Applications and Architecture

Part 03: The Internet

Created by
Abram Hindle (abram.hindle@ualberta.ca)
and Hazel Campbell (hazel.campbell@ualberta.ca).
Copyright 2014-2023.

The Internet

The Simple View of a Web Client and Web Server

  • We use the web to request, search, navigate, and share information.
  • We use the web to access and operate software.
  • What protocols go over this wire?
  • But really it's more like this...
  • What protocols go over this wire?

Ethernet

  • 1 Ethernet frame is 64 to 1526 bytes
Preamble SFD Destination Source Length Payload CRC
7 1 6 6 2 46-1500 4
  • I'm going by the Ethernet spec and not anything fancy, IEEE Std 802.3TM-2012 (Revision of IEEE Std 802.3-2008)
  • CRC means each frame has some error detection.
  • SFD – Start Frame Delimiter
  • Originally used over radio, now used over wires.
  • This is not gigabit

Why are ethernet frames important to the Web?

  • Know the minimum packet sizes that can be sent.
  • Know the potential waste in a transmission
  • Know sizes that aren't fragmented or split
  • Know when you'll incur latency due to split packets/frames
  • Keep your message sizes smaller than 1.5kb to ensure you stay inside of packets.
  • Sending 1 byte, incurs many headers.
  • Most people are connected to the internet with something like ethernet and their MTU is 1500 or less.

IPV4

  • Ethernet is not routable
  • We need to communicate over large distances
  • We need to communicate to many computers
  • IP was a compromise to address computers
  • Stateless
  • Backbone of the internet
  • But we ran out of IPV4 addresses

IPV4 Header

IPv4 Header Format
Offsets Octet 0 1 2 3
Octet Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 0 Version IHL DSCP ECN Total Length
4 32 Identification Flags Fragment Offset
8 64 Time To Live Protocol Header Checksum
12 96 Source IP Address
16 128 Destination IP Address
20 160 Options (if IHL > 5)
24 192
28 224
32 256
Wikipedia contributors, "IPv4," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=IPv4&oldid=877557146 (accessed January 11, 2019).

IPV6

  • Like IPV4 but...
  • More address space (32->128 bit!)
  • TCP can fit over top of it
  • But the addresses look totally different...
    • 2001:0db8:0000:0000:0000:0000:0000:0001
    • 2001:db8:0:0:0:0:0:1
    • 2001:db8::1
    • So much for host:port...
    • https://[2001:db8::1]:443/

IPV6 Header

Fixed header format
Offsets Octet 0 1 2 3
Octet Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 0 Version Traffic Class Flow Label
4 32 Payload Length Next Header Hop Limit
8 64 Source Address
12 96
16 128
20 160
24 192 Destination Address
28 224
32 256
36 288
Wikipedia contributors, "IPv6 packet," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=IPv6_packet&oldid=876779509 (accessed January 11, 2019).

UDP

  • User means that user-space applications can use it
  • Provides checksums—some integrity
  • Provides port numbers
  • Stateless
  • Lossy, not ordered
    • Sent: 0 1 2 3 4 5 6 7 8 9
    • Received: 0 1 2 4 3 5 6 9
  • No connections
  • No guarantees

UDP

  • This header sits on top of IP
UDP Header
Offsets Octet 0 1 2 3
Octet Bit  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0  0 Source port Destination port
4 32 Length Checksum
  • The data comes after it. It'll be the IP data size minus the UDP header size.
  • Checksum is unfortunately optional but includes data, UDP header and IP header
Wikipedia contributors, "User Datagram Protocol," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=User_Datagram_Protocol&oldid=877748831 (accessed January 11, 2019).

DNS

  • Domain Name Service
  • Allows us to bind a name to another name, IP, or set of IPs.
  • A records point to an IP
  • CNAME records point to another name
  • Works on IPV6 and IPV4
  • Use host, dig or nslookup to check names

Example Name Record


$ host -a ualberta.ca 8.8.8.8
trying "ualberta.ca"
using domain server:
name: 8.8.8.8
address: 8.8.8.8#53
aliases: 

;; ->>header<<- opcode: query, status: noerror, id: 52691
;; flags: qr rd ra; query: 1, answer: 9, authority: 0, additional: 0

;; question section:
;ualberta.ca.                   in      any

;; answer section:
ualberta.ca.            1799    in      mx      5 mx.ualberta.ca.
ualberta.ca.            3599    in      txt     "v=spf1 include:_spf.ualberta.ca include:_spf.google.com ?all"
ualberta.ca.            1799    in      a       52.202.119.65
ualberta.ca.            1799    in      soa     uans1prd.ualberta.ca. dnsmaster.ualberta.ca. 17171 10800 1800 604800 1800
ualberta.ca.            3599    in      ns      ns2.d-zone.ca.
ualberta.ca.            3599    in      ns      name.ualberta.ca.
ualberta.ca.            3599    in      ns      ns1.d-zone.ca.
ualberta.ca.            3599    in      ns      nom.ualberta.ca.
ualberta.ca.            3599    in      ns      uans1prd.ualberta.ca.

received 286 bytes from 8.8.8.8#53 in 67 ms
        

Example CNAME Record


$ host -a beams.softwareprocess.es 8.8.8.8
trying "beams.softwareprocess.es"
using domain server:
name: 8.8.8.8
address: 8.8.8.8#53
aliases: 

;; ->>header<<- opcode: query, status: noerror, id: 60632
;; flags: qr rd ra; query: 1, answer: 1, authority: 0, additional: 0

;; question section:
;beams.softwareprocess.es.      in      any

;; answer section:
beams.softwareprocess.es. 14399 in      cname   ghs.google.com.

received 70 bytes from 8.8.8.8#53 in 152 ms
        

TCP

  • Transmission Control Protocol
  • Connections
  • 3-packet handshake
  • Acknowledge receiving of handshake
  • Maintains order
    • Sent: 0 1 2 3 4 5 6 7 8 9
    • Received: 0 1 2 3 4 5 6 7 8 9
  • Used by most internet applications
  • Used by HTTP FTP, SMTP, IMAP, POP3, GOPHER, Telnet

TCP Header

TCP Header
Offsets Octet 0 1 2 3
Octet Bit  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 0 Source port Destination port
4 32 Sequence number
8 64 Acknowledgment number (if ACK set)
12 96 Data offset Reserved
0 0 0
N
S
C
W
R
E
C
E
U
R
G
A
C
K
P
S
H
R
S
T
S
Y
N
F
I
N
Window Size
16 128 Checksum Urgent pointer (if URG set)
20
...
160
...
Options (if data offset > 5. Padded at the end with "0" bytes if necessary.)
...
Wikipedia contributors, "Transmission Control Protocol," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Transmission_Control_Protocol&oldid=877273958 (accessed January 11, 2019).

TCP Connections

Sergiodc2, Marty Pauley, Scil100 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)]

Firewalls

  • Usually prevent hosts from communicating on certain ports, or hosting services.
  • HTTP and firewalls means that webclients are unlikely to be webservers as well. That communication must be initiated by clients rather than webservices.
  • IETF seems unaware of their existence but at least HTTP gets through.

Scenario

Get http://slashdot.org
  • Context
    • I am at home on a Friday evening. It is 10pm and I haven't been outside all day.
    • I need to read slashdot because I'm bored
    • I have a cable modem internet connection from Shaw.
    • I've connected to the cable modem with CAT5 cables and ethernet.
$ python3
python 3.7.2 (default, jan  3 2019, 02:55:40) 
[gcc 8.2.0] on linux
type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get("http://slashdot.org")
<response [200]>
>>> 
    
  • Python requests tells the OS to connect to slashdot.org via TCP on TCP port 80.
  • The OS looks up slashdot.org and needs to contact a nameserver
  • OS Sends a UDP packet on port 53 to the nameserver configured (my Shaw router, 192.168.0.1)
  • The UDP packet is over IP
  • The IP packet is over ethernet
  • Cable modem accepts this packet, contacts Shaw's DNS server behalf of my computer, over UDP over IP over the cable modem link.
  • Shaw's DNS server receives my query and doesn't know slashdot.org so it asks a more authoritative server.
  • Sends DNS request over UDP over IP over ethernet to the switch in its datacenter, to an edge router, over the internet, to the root DNS server, asking about org.
  • Gets a DNS response indicating the address of the org server.
  • Sends a DNS request to the org server asking for the SOA (Start of Authority) of slashdot.org
  • Gets a DNS response indicating the DNS server that knows about slashdot.org
  • Sends a DNS request to the authoritative server for slashdot.org
  • Gets a response back on the UDP port, response contains an A record listing an IP of slashdot.org
  • Shaw's DNS server makes a DNS response packet and sends it back to me over UDP, over IP, over ethernet, over their private network, back to my cable modem, back on to ethernet, IP, and UDP back to my home computer.
  • My OS receives the DNS response, records the IP address and then initiates a TCP connection to port 80 of the slashdot.org IP.
  • A TCP SYN packet is sent to the slashdot.org IP at port 80, over IP, over ethernet to the cable modem, through shaw and through the internet to slashdot's datacenter where a copy of the packet appears on some ethernet cable, decoded as an IP, TCP connect SYN packet.
  • Slashdot.org sends a TCP SYN+ACK packet back across IP, across ethernet, over their network and internet back to shaw, over shaw's network, to my cable modem, over ethernet, over IP, back to my computer
  • My computer sends a TCP ACK packet back across all the way to slashdot.org through all the prior layers
  • A connection is established!
  • Now that my home computer is connected with slashdot.org over TCP I can send data packets across that TCP connection.
  • Python eventually runs
    send(ourSlashdotConnection, “GET / HTTP/1.0\r\nHost: slashdot.org\r\n\r\n”);
  • This causes a TCP data packet on the slashdot connection to be made, shuffled off to IP and ethernet, across to cable modem and back all the way to slashdot.org
  • Slashdot's webserver is waiting on the connection and it is reading bytes from the connection. After my packet is delivered to the webserver (over TCP, over IP, over ethernet, over the datacenter network, over the internet, ...) “GET / HTTP/1.0\r\nHost: slashdot.org\r\n\r\n”
  • Slashdot.org's webserver's TCP layers send a TCP ACK packet back to my IP address acknowledging the receipt of the packet that contained the GET request I sent.
  • Slashdot.org's webserver sends an HTTP response which is over 40kb in size broken up across 29 packets. All these packets needs to be acknowledged by my home computer.
  • 1 UDP DNS Request for slashdot.org
  • 1 UDP DNS Response from my nameserver for slashdot.org of 1.2.3.4
  • 1 TCP SYN for 1.2.3.4 on port 80
  • 1 TCP SYN+ACK from 1.2.3.4 port 80
  • 1 TCP ACK to 1.2.3.4 on port 80
  • 1 TCP data packet with the GET request to 1.2.3.4
  • 1 TCP ACK from 1.2.3.4
  • 1 TCP data packet from 1.2.3.4
  • 1 TCP ACK to 1.2.3.4
  • ... 26 data & ACKs later
  • 1 TCP data packet from 1.2.3.4
  • 1 TCP ACK to 1.2.3.4
  • 1 TCP FIN close from 1.2.3.4
  • 1 TCP FIN+ACK to 1.2.3.4
  • 1 TCP ACK from 1.2.3.4
  • ~2 UDP packets (except all the ones I didn't see because they were done on my behalf by Shaw's DNS server)
  • ~60 TCP packets
  • ~62 Ethernet packets
  • The TCP packets are probably copied at least 10 times across 10 or more links.
  • So my 1 request of 50KiB in size could cost the entire network more than 500KiB in traffic.
How did we get routed to slashdot?
hindle1@piggy:~$ sudo traceroute slashdot.org
traceroute to slashdot.org (216.34.181.45), 30 hops max, 60 byte packets
1 192.168.0.1 (192.168.0.1) 0.171 ms
2 * * *
3 xxxxxxxxxxxx.ed.shawcable.net (64.59.184.245) 33.812 ms
4 rc3sc-tge0-0-0-10.wp.shawcable.net (66.163.74.226) 44.058 ms
5 rc2so-tge0-4-0-1.cg.shawcable.net (66.163.77.98) 77.525 ms
6 ix-3-3-2-0.tcore1.ct8-chicago.as6453.net (66.110.14.13) 74.733 ms
7 64.86.78.10 (64.86.78.10) 70.375 ms
8 hr1-te-9-0-0.elkgrovech3.savvis.net (204.70.196.14) 74.230 ms
9 das5-v3032.ch3.savvis.net (64.37.207.158) 71.660 ms
10 64.27.160.194 (64.27.160.194) 83.311 ms
11 slashdot.org (216.34.181.45) 73.920 ms

Takeaways

  • The bandwidth used to send large or small messages.
  • If latency matters which transport should you use?
  • Packets get routed!
  • Ethernet often imposes requirements on communication
  • Can you think of any other takeaways?

Special IPs and Ports

  • 127.0.0.1 localhost (packets loop back to your computer)
  • 192.168.*.* and 10.*.*.* are common private subnets for local IP communication. E.g. 192.168.0.2 is my computer and 192.168.0.1 is my Shaw cable modem.
  • TCP Port 80 - HTTP
  • TCP Port 443 - HTTPS
  • UDP Port 53 - DNS

Resources

Resources

License

Copyright 2014-2023 ⓒ Abram Hindle

Copyright 2019-2023 ⓒ Hazel Victoria Campbell and contributors

Creative Commons Licence
The textual components and original images of this slide deck are placed under the Creative Commons is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Other images used under fair use and copyright their copyright holders.

License


Copyright (C) 2019-2023 Hazel Victoria Campbell
Copyright (C) 2014-2023 Abram Hindle and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN.

01234567890123456789012345678901234567890123456789012345678901234567890123456789