TCP/IP is a set of network protocols which is best known for connecting the machines that make up the Internet. However, it is generally assumed that the Internet started around 1995 and few people had heard of TCP/IP before then. The truth is that TCP/IP is one of the oldest network protocols and its survival is mainly based on its simplicity and universality. The underlying IP protocol knows nothing about the network state and does not attempt to make any guarantees, thus all error control must be done at the end points only.
The basics of networking were designed in the late 1960s and early 1970s. The first email was 1971, with FTP appearing in 1973. The Network Voice Protocol (precursor to VoIP) was designed in 1977, but had technical shortcomings. Our modern ethernet was designed in the 1970s by Xerox at the Palo Alto Research Center (PARC). The Xerox PUP (PARC Universal Packet) first ran in late 1974 and was the protocol most similar to modern TCP/IP (including the 16 bit port numbers). It was at Xerox that the “big dream” of personalizing computers was being realized. Until that time, computers were largely based on batch processing. It was an absolute necessity that maximal use was made of the expensive early machines, and so multitasking algorithms were developed to share a single CPU among multiple users by partitioning the CPU into small slices. The next logical step was to do the same to the wires that connected one “personal” machine to another. By sending information in small packets, you can send different packets to different destinations. This is generally referred to as packet switching.
The original IP protocol became official in RFC 675 in December of 1974 and started talking on networks in 1975 as the Transmission Control Program (TCP; although it’s now Protocol, not Program). There were a number of independant achievements, mainly because Xerox had a history of dropping the ball and not sharing information, even for profit. ARPANET started around 1969, but didn’t officially adopt TCP/IP until 1982! Back in 1972, Robert Kahn at DARPA was working with satellite and radio networks and saw the value of a protocol that could span both. By 1973, Kahn and Vinton Cerf had worked out the basis of the “Internetwork Protocol” who’s key feature was that reliability would be delegated to the hosts, allowing for an unparalleled degree of flexibility. It was based heavily on the Xerox PUP running on Ethernet, rather than the existing NCP (Network Control Program) used by Arpanet at the time. By 1978, TCP (version 3) was redefined as just the transport layer, while the network layer was named the Internetwork Protocol (IP). This gave rise to the name TCP/IP as the new common name. Later on, the connectionless portion developed into what we call the User Datagram Protocol (UDP), which runs on top of the IP protocol, but with no new name changes. Our IP version 4 dates to 1980.
Let’s briefly discuss the different layers of the OSI model. This helps us understand what is built on what, and how it all fits together.
- Physical Layer – This is plain and simple. It’s the network wire itself. It might be thin or thick coaxial ethernet cable with resistor terminations, twisted-pair (the most commonly known these days), or even wireless radio transmissions. This layer determines how bits are sent. There is even RFC 1149 for sending IP packets by “Avian Carrier” (yes, carrier pigeon, a joke dated April 1, 1990).
- Data Link Layer – The second layer is responsible for getting a packet over the physical layer. This layer determines how bits are organized into “frames”. For ethernet, this includes collisions, etc. An example of this would be the MAC layer with its 48-bit MAC address and ethernet frames, 802.11 WiFi networks, 802.15.4 ZigBee, and PPP. The MAC addresses are generally only useful on the physical network.
- Network Layer – This is where your IP (Internet Protocol) address comes in. The IP packet defined here is contained within the data of the underlying layers. This means we can now implement more advanced networking concepts because we can take the IP packet out of its originating ethernet frame (or wherever it may have come from) and wrap it into a new frame on a completely new network. This is why it’s called “Inter-net”.
- Transport Layer – This layer provides for the ability to perform error detection, error recovery, connection information, data sequencing, and the like. Both the UDP and TCP protocols reside in this layer, each for different purposes. Although note that the UDP protocol doesn’t do most of these – it just identifies the remote service by UDP port number. Different TCP and UDP services listen on different ports and new outgoing connections have a port number.
- Session Layer – UDP is sessionless protocol, but TCP allocates a sequence number that keeps track of the packets within a session. The source and destination IP addresses and source and destination port numbers of the transport layer identify the connection for which we keep track of the packet sequence for the session. This layer is sometimes called the “socket layer” because the BSD socket library (now on pretty much all machines in some form) abstracts all these details and differences.
- Presentation Layer – This layer determines the encoding of the data; ascii text, jpeg, gif, or some form of encryption which is fed into the socket.
- Application Layer – The application layer is the top layer of the stack, such as the HTTP protocol and the commands that make up the protocol.
Let’s look at this another way. You want to send something through the mail. You stick an address on your envelope. This is the IP address (layer 3). We don’t care if the mailbox is on the side of a house, a wall of mailboxes at an apartment complex, or what color it is. That’s layer 1. We don’t care if it gets there by truck, boat, or plane, or some combination. That’s layer 2. Do you care how many post offices it stops at? Nope … that’s handled by layer 3 routing protocols that just need that layer 3 IP address.
Now, will it get there in time? Do you want a return-receipt with signature-requested? These are layer 4 issues which determines our choice between UDP or TCP. Following the analogy, layer 5 gives us a flat-rate box, layer 6 is what we stick in the box, and layer 7 determines if the box says Happy Birthday, Merry Christmas, or RMA.
The most common form of IP addressing, and the easiest to understand is the IPv4 address. This uses a 32 bit (4 bytes) address size capable of a maximum of about 4 billion machines. This is the fourth generation of the protocol and is still widely used today. It is compatible with the 6th generation protocol, IPv6, which dispenses with the header checksum since most networks are fairly immune to bit-errors these days, but extends the address size to 128 bits.
The IPv4 address scheme never imagined we’d have computers in every home, much less half a dozen IP addressable devices per house. We are now looking at the Internet Of Things (IoT) where your thermostat, oven, coffee pot, and even each individual light bulb in your house may have its own IP address!
You will commonly see an IPv4 address in a “dotted quad” format, a series of 4 numbers (each byte), each from 0-255 (the values of a byte), separated by periods. An IPv6 address is generally given in hexadecimal (values 0-9 and A-F designate 4 bits of the total address), with semicolons inserted between every 4 digits (16 bits) forming a total of 8 fields. Remember that in the packet itself, you just have bits. The written representation is just for human readability.
Network Address Translation
Before IPv6, a number of methods have been devised to expand the number of addresses available. The most common is called a “NAT”. A NAT requires a specialized router that performs the address translation, basically translating a large number of addresses on a private network (usually using a set of reserved “non routable” IP addresses) to a smaller set of public IP addresses. In home routers, you generally will have a single public IP address that is assigned dynamically using DHCP (Dynamic Host Configuration Protocol) or assigned statically by a network service provider.
NAT is actually a bit of a hack operating mostly at Layer 4. As a new packet is sent out, the router records the local IP, destination IP, source port number, and destination port number. It stores this information in a table. It then rewrites the packet, placing its public IP into the source for packets destined to public addresses. When a packet comes in from the public network, it looks up the source IP (as the previous destination) and two port numbers in the table to find the original source of the packet. It can then overwrite the destination field of the IP packet and send it off to the local network. It’s important to note that incoming connections are limited since the destination won’t be in the table.
Different routers may handle incoming connections in a couple of ways. They may have a static table where specific incoming ports are sent to a specific IP address, configured by the network administrator. This process is referred to as “port forwarding”. They may allow a DMZ (DeMilitarized Zone) host, where incoming packets are sent that don’t otherwise have a destination. They may also simply drop incoming connections.
NATs also have problems when an IP address is sent in the upper layers of the OSI model. The NAT doesn’t generally know an IP address from other data. This may make direct connections between two people behind a NAT almost impossible without additional help such as the NAT helper plugins of the Linux iptables that can recognize many common protocols. UDP applications may support the use of a STUN service to get through the NAT. With IPv6, these tricks go away and we can use real publically routable IP addresses for every light bulb in your house (with all the security implications that unravels).
The simplest routing method, static routing, is still used by almost every machine in existence, although once you leave the local machine it’s generally used only for the smallest networks. In this scheme, all routing decisions are based on an internal table. Part of the IP address is used as the “network number”. All packets for a specific network are passed to that network. In the earlier days, the IP address itself could determine the network “class”, which would determine the network size and netmask. Any documentation that talks about network classes these days is simply outdated and incorrect thanks to CIDR, or Classless Inter-Domain Routing. CIDR was first introduced in 1993 and was last updated in 2006, making network classes a historical oddity of the 80s.
In CIDR, all networks must have a “netmask”. First, a binary AND is performed with the netmask to get the network number. Then, this network number is compared with each interface’s network number. If there is a match, the packet is sent on that interface. If there is no match, then the packet is sent to the default gateway (if one exists). If there are multiple routes that match or multiple default gateways, then the matching interface with the lowest metric is chosen as the best route.
It’s common to see an IP address in CIDR format which is just the machine’s IP address followed by a slash, then the number of bits in the netmask. The netmask will always be filled from left to right. For example, if you see 220.127.116.11/24, then you know that the first 24 bits of the netmask are 1’s (commonly written as 255.255.255.0) meaning the network number will be 18.104.22.168.
The network number itself is not available for use as a host address. Another address called the broadcast address is when all host bits are 1’s. In the last example, the broadcast address would be 22.214.171.124.
The broadcast address is used when you want to send a packet to all hosts on the local subnet. This can be used for a number of reasons and is often used for network discovery and various other broadcast messages. In general, you do not want broadcast messages to cross your firewall.
While you commonly see /24 as the common CIDR netmask for IPv4 networks, it bears mention that IPv6 will use a fixed 64-bits as the smallest subnet size. Yes, the smallest subnet is 4 billion times the number of currently available IPv4 addresses. When you get an IPv6 address, you are given a block of subnets. A /128 is a single address, such as the loopback address. A /64 is a single subnet. The recommendation (as of RFC 3177) is that most sites be given a /48 block, or 64K subnets (of 264 addresses to be used internally)! Home users (according to RFC 6177) should be given a /56 block, or 256 subnets. It should be clear why you don’t need NAT tricks with IPv6 and why you can give every LED in your house its own IP address, and every room its own subnet, or however you want to arrange things.
Layer 4 Protocols
IP is designed to be as basic and flexible as possible. This means that a router may send different packets through different routes, resulting in packets being out of order. Some networks may have a smaller MTU/MSS resulting in packets being fragmented into smaller pieces. Packets may also be lost. TCP is about guaranteed delivery.
It is up to TCP to maintain the sequence of data as a single connected stream. Note that TCP (and UDP) port numbers are 16 bit which limit a host to a maximum of 65536 simultaneous connections per IP address. Let’s look at how TCP does this, step-by-step.
The 3-Way Handshake
A connection is opened with a SYN packet with a suitably random initial sequence number (to make hijacking connections more difficult). SYN is a bit set in the TCP header contained within the IP packet. It means “let’s chat” and includes any specific options that the client wants to support. When the server gets a SYN packet it knows the source and destination ports and the client’s initial sequence number and it allocates space in a connection table. It then responds with SYN-ACK.
The SYN-ACK means both SYN and ACK bits are sent and tells the client that everything was received and the connection is ready to go. The client’s sequence number is returned after incrementing it (it will count packets). If the server is up, but nothing is listening on the indicated port, a RST packet would have been sent instead. This tells the client what options the server supports and our own random sequence number.
Now the client must send an ACK that acknowledges that we got the SYN-ACK and our connection is open with both sides aware of all options and sequence numbers needed. This ACK will return the server’s sequence number incremented by 1.
With all this information (and a 4-way handshake to actually close the connection), we can reassemble out of order packets, tell the other end to re-send specific packets. TCP also allows for a number of options for congestion control, selective windowed acknowledgements, out of band data, retransmissions, etc. One such example is timestamps, which allows for calculating round-trip delay. Consult Wikipedia and the relevant RFCs if you want to know all the intricacies and options available.
TCP has quite a bit of overhead. Just the 3 way handshake alone can contribute to considerable latency. Sometimes, you need to get data sent fast, and it may not matter if it actually gets there. The two most common uses of UDP are for VOIP (Voice Over IP or “Internet Telephony”) and DNS.
When you need to translate a name to an IP address, you look this information up in DNS servers. The actual process may require multiple successive lookups, starting with the “root” of the domain, the “.com” or “.net” at the end, through the domain, and into the hostname, often “www”. You may likely have multiple caching DNS servers to speed up this process. It’s common to ask all configured DNS servers for the required information and then use whatever response comes back first. All required information for the query is sent in the initial packet, and the reply contains the complete response. If the remote DNS server is down, we don’t really care. No connections need to be established. This makes UDP ideal for DNS.
For live streaming audio like a telephone conversation, latency can cause strange delays in the conversation, echo problems, and all sorts of issues. To make sure the data has the smallest latency possible, VoIP protocols generally use UDP. If a packet is missed, we approximate the missing packet’s data rather than trying to resend. By the time a resend arrives, it will be too late. The delay of the audio playback must be set to the longest expected packet delay in the stream to prevent buffer underruns. If you allow for packet retransmissions, this delay would be excessive and impossible to calculate. We basically blow packets at each other and assume they get there. The protocol has a few additions to detect major problems, but you can have considerable delay before the system detects an error.
Internet Control Message Protocol is used mainly by routers and other networking gear to report problems. If you ever got a “Host Unreachable” message it’s because a router couldn’t reach the IP address you asked for and sent back an ICMP packet to tell you. The old ping and traceroute tools generally use ICMP packets.
Entire books can and have been written about TCP/IP routing protocols. Here we address the most common.
Interior Routing Protocols
Interior protocols are for routing within a network, not for routing between networks like the Internet.
Router Information Protocol (RIP)
You are likely to see RIP v1 and v2 for IPv4 and RIPng for IPv6. Like TCP/IP, RIP has roots in Xerox PARC as the Gateway Information Protocol (GWINFO) for running on Xerox PUP (PARC Universal Packet). In RIP, routers use a distance-vector algorithm to select lowest cost routes. RIP routers periodically broadcast information about the routes they know. This will include information such as an available host and how many network hops (the “cost”) required to reach it. As routers receive this information, it can add the nearby routes, adding on to the cost. The messages sent between RIP routers use the UDP protocol.
RIP is relatively slow to reach convergence (when all routers have complete routing tables). The internal timeouts and default broadcasts means that minutes may elapse before a network change is detected. It also has no way to deal with routing loops.
Open Shortest Path First (OSPF)
The OSPF protocol is more complex, but also more capable than RIP. It dates back to the late 1989 (although was quickly changed by 1991) and it’s specification is over 240 pages! Research into this type of routing protocol began at ARPANET in the 1970s. The OSPF uses a distributed database of the link-states of the network which is shared among routing peers. The cost metric used can be set up by the network administrator and is not limited to hop-counts.
OSPF can allow larger networks to use a hierarchy of routers such that routing peers only share information about the specific area to which they are assigned. This forms a tree of information called the SPF tree (Shortest Path First).
OSPF messages do not use UDP, but rather use IP datagrams with the protocol field set to 89.
Exterior Routing Protocols
Routing within a network that a particular person or company controls is a much simpler affair than trying to route packets between independent entities. This is done via exterior protocols. There was an obsolete protocol named simply “Exterior Gateway Protocol (EGP)”. Today, the protocol of choice for the Internet is BGP.
Border Gateway Protocol (BGP/BGP4)
You simply can’t do much justice to BGP in a paragraph or two. BGP is likely one of the most critical protocols of the Internet. BGP ensures that different organizations can share routing information. BGP comes to us from RFC1105 from 1989. It is now up to revision 4.
Border routers that know about BGP are referred to as “speakers”. They don’t necessarily know anything about the interior of the network system. They simply speak to other BGP routers, called “neighbors”. Like simpler protocols, they use a database of information, the Routing Information Base (RIB), but the information is much more detailed. You may see the term path-vector algorithm as each data entry contains information about the entire path used to reach the given endpoint.
The BGP routers connect into network that must be pre-configured. At this point, the neighbors create TCP connections to send the routing information messages across. It will then send Open, Update, Keepalive, and Notification messages over this long-lived connection. It also bears note that ingress and egress traffic may be routed differently. The number of options in BGP is staggering.
The best way to troubleshoot IP problems is to start with a basic “ping” to test network connectivity. You can then move on to other tests such as a “traceroute” if you are trying to reach a distant host. Remember to test names and IP addresses separately since a broken DNS can look like a connectivity problem.
Larger problems often require a network analyzer such as Wireshark. Wireshark will allow you to record a set of network transactions and then open each layer of the OSI model. For example, you open an ethernet packet to see the IP header and its data payload. The IP data payload might be a TCP packet and you can view it’s header. Inside that may be an HTTP packet. Being able to see the exact sequence of data packets being sent and received over the line can save hours of guesswork. You will also want to familiarize yourself with all the individual high-level protocols and protocol extensions being used on your network.