The Internet – Technical Basics

When I first got an Internet connection I was very confused. I read some old material which talked about Veronica, Archie and Jughead. Hey, aren’t these comics characters? I could guess that a gopher wasn’t a rodent, but had something to do with finding things. What I read just didn’t seem to make any sense compared to what I was seeing.

I read someplace that it was better to copy files with FTP than to use email to get or send them. I read somewhere else that the “most practical” way to move files was as an attachment to email. It wasn’t clear whether the “web” was something which ran separately from the Internet, or was somehow part of it.

I decided I needed to educate myself so I could decide for myself the best ways to do things. If you are in a similar state of confusion or simply want to have a better idea what’s going on, this is written for you.

Clients and Servers

There is no distinction between the network called the Internet, and the world wide web. The “web” refers to how information is displayed. Viewing web pages is a two step process which begins with file transfer from a special computer, a web server, to a computer running a program called a web browser, a client. The browser (Google Chrome, Fire Fox, Internet Explorer) interprets the file and displays it on the screen.

The focus of this document is on how information is handled from source to destination.

In order to make the best possible use of applications it’s helpful to understand how they are supported by the Internet. There are often several ways to do things. Choosing the best and most efficient way can save time and frustration.

Internet Structure and Packet Routing

One way to visualize how the Internet is structured is to imagine it as a spider web. The points where the legs (or strands if you like) come together are occupied by special computers called routers. Each router has a link to two or three others, and sometimes many more. Some diagrams of the Internet show it as a cloud rather than a spider web. Information is sent into the cloud at one end and somehow finds it’s way out at the other end. What happens in between is routing.

Information (data if you like) is broken up into manageable sized pieces called packets. Each packet has enough information added at the beginning of it to identify where it came
from and where to send it. The packets are passed along from router to router. The routers themselves decide which paths to use. It’s not uncommon for information to traverse ten or fifteen routers on the way from source to destination. This is what’s called a packet switched network.

Internet Protocols

In order to organize the process and keep it functioning smoothly, a set of rules was devised that can operate on nearly any computer. A protocol is a set of rules which are used to write a set of instructions (a program). The protocols employed on the Internet fit fairly neatly into compartments, with each compartment performing a task or related tasks. Most of the protocols operate inside each source and destination computer.

Hosts and Peers

Two terms to understand are host and peer. A host is simply a computer connected to the Internet and running the protocols required to communicate. The
Internet is a packet switched peer to peer network. All the computers directly connected to it deal with each other as equals. Virtually any peer can run the software necessary to serve up files or receive them and any machine can directly communicate with any other.

The reason the term host is used is historical. Most of the machines connected to the Internet are running multi-user operating systems, principally UNIX. These machines often have dumb terminals connected to them. When someone on a terminal executes a command, it runs on the host machine, not on the terminal. The multi-user computer is a host to terminal operations, and a peer to other machines out on the internet. If you create a direct internet connection for your computer, your machine is referred to as a host and is also a peer to other directly connected machines.

Non – Internet Networks

The other common type of network is called a client – server network where each machine has a much narrower and more defined role. Two client machines attached to such a network can only share information by first sending it to a dedicated server and the second client picking it up from there. Two clients can’t communicate with each other directly.

Some client-server networks have a gateway to the internet. A gateway is simply a router. It’s connected on one side to the local network and on the other to all external destinations. The gateway machine puts packets from the local network inside Internet packets. This allows them to be sent out to some remote network where the internet packaging is stripped off. To the user it’s as though the remote client-server network were connected locally. This process is called tunneling.

Local IP Networks and Firewalls

Some local networks run the same protocols found on the internet. They are called IP (internet protocol) networks. If such a network is connected to the rest of the internet via a gateway, it becomes a part of the internet. Since the packets are already IP packets, no additional packaging is needed. The packets are simply routed in and out the gate. A computer connected to an IP network with a gateway can therefore be considered a host and a peer on the Internet. The Internet is essentially just a collection of such local networks connected by routers.

The only limitations on machines on a local network are imposed on them by the gateway computer. These limitations (both coming in and going out) are called a firewall. This is done for security reasons. A firewall can also be set up in individual machines to limit access or prevent certain operations.

Internet Service Providers

Internet service providers (ISP’s) run local IP networks with no firewall on the gateway, but many internal security precautions. They provide specialized servers to allow the use of email, news groups, and quite often to provide for client web pages. These servers are behind a firewall in the sense that access to them is limited to clients. When you dial an ISP, a server answers the call, performs the login process, and establishes a connection to the local network. Once the connection is complete, the dial in server acts as a router. All it does is put your IP packets onto the local network. If the packets have a destination outside the local network they get picked up by the gateway router and sent on their way.

The Department of Defense Layer Model

Below is a chart showing the Department of Defense Model on which the Internet is based. In order to get connected to the net and perform any work, protocols need to be loaded in your machine to perform the work that occurs at each layer. Each of these programs does it’s work and passes the results to the program above or below it. The programs are called a stack because they communicate only with the program above and below. Information gets passed up and down through each layer.

Process or Application Layer	Interface to user. This is where web browsers, email, file transfer, Internet Relay Chat, or any other user program operates.
Host to Host Layer	Maintains connections from computer to computer, breaks data into packets, and reassembles packets into data. Performs error checking and recovery. Protocols operating at this layer are: TCP and UDP
Internet or Internetwork Layer	Breaks information into smaller packets and routes packets of data between computers from local network to local network. Performs error checking and recovery and network to physical address resolution. Protocols: IP, ICMP, ARP, RARP
Network Access Layer	Provides the Physical connection between computers. Protocols and Frame (packet) types: Ethernet, Token Ring, FDDI, PPP, V.42, V.34 (etc.) and many others.

Understanding what’s happening at each layer is useful when trying to understand what’s possible or why things work the way they do. When two hosts communicate, each layer in each machine “talks” to the corresponding layer in the other machine. Most network processes occur at the lower layers without user intervention.

As packets pass through each layer a header is assigned by the protocol operating at that layer. You could think of this as each layer putting another envelope around the information, so by the time it leaves your machine there are several envelopes inside each envelope. Each envelope is opened in turn by the same protocol operating at the other end.

Application or Process Layer

Once your machine is up and connected to the Internet the only layer you are likely to be aware of is the Application layer. This is where the programs you interact with operate. Some examples are web browsers, email programs, news readers, and file transfer programs. Some web browsers (like Netscape for example) have many of these functions built in. Programs are available to perform each of them separately, usually with more flexibility and efficiency. However, it may be less convenient to use a separate application for a small task.

Applications are discussed after the protocols because how they work depends upon which of the lower layer protocols they employ.

Host to Host Layer

The protocols (programs) called TCP and UDP operate at this layer. These protocols govern the nature of the communication. The connection between TCP and applications or processes occurs through what is called a port or a socket. There may be several applications running simultaneously, with TCP passing information to and from several sockets to the appropriate application.

Internet or Internetwork Layer

This is the layer which locates computers and paths between them.

Network Access Layer

This is the physical connection to the network, and the programs necessary to operate the devices that make the physical connection. This layer is divided into (at least) two sub layers: MAC; and LLC. MAC ( Media Access Control ) is how and when information gets put on a wire. LLC (Logical Link Control ) is the process whereby the connection to a particular machine is initiated and regulated. MAC addresses are the physical (electronic) addresses of the hardware which reads from and writes to a wire.

Strictly speaking, programs that operate at this layer are not part of the Internet, but part of your local network. The internet is a collection of local networks that are capable of having packets pass through them to other local networks. Software operating at this layer is sometimes proprietary to a vendor and hardware specific.

Host to Host Layer Protocols:

TCP

TCP stands for Transmission Control Protocol. TCP knows where to look for things to send out, how to prepare them to be sent and which lower layer protocol to pass them to. It breaks outgoing data into packets. Each packet is given a unique header or tag, which will identify to the receiving machine what application and machine it came from. If a packet is missing, TCP will request that it be sent again. It also sends an acknowledgment of every packet (or set of packets) it receives.

TCP works in a way similar to the way a telephone call works. It requests a connection and the other machine responds by granting or denying the request. Packets may be exchanged periodically to make certain the connection is still intact. When the connection is to be closed, packets are exchanged to confirm this. TCP is more commonly used than UDP because it is more reliable.

UDP

UDP stands for User Datagram Protocol. TCP and UDP perform the same function with one huge difference. Unlike TCP, UDP packets are not acknowledged. This greatly improves speed, but is less reliable. If a packet fails to arrive at it’s destination, neither machine will be aware of the loss.

UDP is often used for such things as sending voice files. Voice files can usually be understood, but they may sound choppy. This is the result of lost packets not getting retransmitted. The big advantage is that the files can be sent in almost real time and the use of bandwidth (space on the wires) is much lower than with TCP.

Internet Layer Protocols:

IP stands for Internet Protocol. IP is responsible for routing packets from your machine to the machine you want to connect to. In some cases it knows the entire route, and in some cases it only knows about one other machine (a router) that is the gate to all possible destinations. IP takes on much greater importance outside your machine. It’s how the various legs of the journey from you to the other computer are used.

Keeping the header from the TCP protocol ( or UDP or another protocol ) intact, IP breaks the information into smaller packets (if needed) and adds a header which contains sequence and control information. At the other end, IP reassembles packets into TCP packets and passes these to the TCP protocol.

A router may have several connections to other computers, each connection being a hardware interface to a local network. A table is maintained which contains the physical addresses of machines on the local networks to which it’s connected and the corresponding IP addresses.

ICMP

The Internet Control Message Protocol is a special control protocol which can talk to either TCP or IP, but is normally considered a part of IP. When you see the message “No route to Host”, for example, what has happened is that an ICMP packet was sent to your IP program telling it that there is a broken link somewhere that can’t be bypassed. If the broken link could have been bypassed, the ICMP packet would have told IP about the alternate route. You wouldn’t know that this had happened because it occurred at the IP layer and got resolved there.

Routers on the Internet communicate with each other using this protocol. They update their information about routes with ICMP. As network conditions rapidly change, routers may choose alternate routes for speed, efficiency, cost, or because of congestion.

Network Access Layer Protocols:

This layer is divided into two levels. The first is called Logical Link Control (LLC). Protocols at this level handle connections between the upper layer protocols and the network hardware. Media Access Control (MAC) is the second level. MAC layer protocols control the hardware itself.

LLC Level Protocols: PPP and SLIP

PPP stands for Point to Point Protocol, and operates at the LLC (Logical Link Control) level of the Network Access Layer. SLIP stands for Serial Line Interface Protocol. They perform the same function, but PPP is a more robust, complete and reliable protocol. SLIP is no longer much used. PPP is used to establish a serial line link to another machine. It sends commands and moves data to and from the modem (or other device) and moves data to and from the IP protocol. All the protocols above do error checking, but the error checking performed by PPP is the most important, because the physical connection is where most errors occur. PPP also can perform the login function to a service provider.

MAC Level Protocols: V.34, V.35 V.42, V.FC, V.whatever

These are modem protocols. They operate at the Media Access Level of the Network Access Layer. Essentially, the data received by the modem is tokenized and converted to tones. The tokenization process is similar to that performed by compression programs such as Zip. Since this is done at the modem, your computer’s microprocessor is relieved of the task, and it can occur much more rapidly. Hardware handshaking and error detection is also done by V. x. Handshaking is the process which starts and stops transmissions, allowing the receiving and sending machines to stay synchronized and not overrun with data. One protocol of this type is referred to as V.34+ or V.35. Data transfer rates over good (but not “data grade”) phone lines of up to 33,600 Bits per second are possible using V.35. V.35 is a more efficient than V.42. Incidentally, it is often faster to transfer uncompressed data over two modems using V.35 than to compress it first. This is because zip programs tokenize in ways which make less tokenization possible at the modem. That is, zip compression is less efficient than V.35 compression.

Other Network Access Layer Items: Ethernet, Token Ring, FDDI

These are local network frame types, not internet protocols. A frame is essentially the same as a packet, but is always the outermost envelope since it is what is written and read by hardware devices. Ethernet frames are used with hardware that uses twisted pair wires (like phone wire), coaxial cable, and V.whatever modems.

The term token ring refers both to a frame type and a particular network logical link control method. Token Ring frames are used with token ring hardware, which usually uses shielded twisted pair wires on a local network. FDDI stands for fiber distributed data interface and refers to systems using fiber optic cable. Information can be moved through fiber optic cable dramatically faster than through wires.