Skip to content

Web Servers

How Web Servers Work

(February 25, 2000 - Tom Cameron)

[NOTE: This article covers some general information about web servers as well as some references to a specific server set-up that no longer exists. The information is still relevant and useful so I have left it unchanged. I have removed some names to keep the information anonymous]

For most people the Internet and how it gets to their computer is a bit of a mystery. Acronyms like HTML, XML, CGI, TCP/IP are meaningless. And for most, this is probably how it should remain. Today we drive a car and (although many will not admit it) the vast majority of people have no idea how this machine converts petrol to motion. I am sure that there will be a day when the Internet - and computers in general - will be the same.

Until then, however, we will continue to hear these words and acronyms and those of us in the industry should at least have a general understanding of where they fit in to the whole picture. Lately I have mentioned that we are using a technology called ZOPE, most will have heard that we use MS SQL and ASP, and some will know of XML. I recently was asked - "Why don’t we use XML instead of ZOPE?" To those in the know this is like asking "Why don’t we use a banana instead of a mobile phone?" So I figured I should do a little explaining.

In this article I have decided to concentrate on Web Server Systems - those very expensive, mysterious things, that reside somewhere a long way away, misbehave regularly, and require a team of 20 people to tame. In particular I will talk about our servers and the technology we are using. This is of course, just a small subset of the entire range of possibilities. There are hundreds of components, software, hardware, applications, protocols and they form a complex puzzle constrained by compatibility, and functional boundaries of each component.

Definitions

Open Source - some of the systems we use are what is called Open Source. There a few variations of what this means depending on the licence choice of the developer, but in general it can be described as follows. Open Source software is software that is freely available to any person or corporation. In general you can use it for any purpose you like. You get full access to the software and the source code used to develop it. You are free to use, copy, and even modify the source, provided that any additions or changes you make to the source are also made freely available to the general public. Linux, FreeBSD, Apache, Roxen, Zope, ODP, Squid, and Sendmail are all Open Source applications that are commonly used. We make use of most of them.

Scalable - I often refer to the necessity for scalability in servers. By this I mean the ability to easily increase the capacity of the system without any discontinuity in functionality. If we have to move the software to a bigger machine, or change the code significantly to increase capacity then the system is not scalable. To be truly scalable we need to have a system, which can simply have new pieces plugged in (while running) to add capacity.

Redundancy - a necessary part of a good web server is the ability to stay running when parts fail (and they always do). Essentially this means having spare parts that "kick in" automatically when one fails. Of course to be truly redundant the system must have redundancy of each and every component. In the case of our system - each machine has multiple hard drives, power supplies, processors, they are joined together through one of two switches, we have two load balancers and the machines are grouped by tasks. We have multiple machines for web serving, multiple database machines, and then tape backup of all. Every time we add a new server for a specific task, we actually add two identical machines, one of which just sits there in case the first one fails.

It is also important to understand the relationship between scalability and redundancy. Often a truly scalable system has inherent redundancy, but this is not always the case. Lets say we had a system that needed a web server and a database. We could buy two machines and set it up in two different ways. We could install the web serving and database software on both machines and then let them share the load or we could make one machine the web server and the other the database. The first option would have good redundancy, because if one of the machines died then the other could handle all the tasks. The second system would not have any redundancy, as both machines are required to stay alive for the system to function. The first system is also more scalable as we could simply keep adding more identical machines to the system to increase capacity.

Operating System

Web Servers can make use of many different Operating Systems (OS) - the very first bit of software that is installed onto the computer. This layer determines most of the software that runs on the computer. A simple IBM compatible PC, (which is often referred to these days as an Intel Machine from the name of the most common processor chip these days) can handle many different Operating Systems. Microsoft NT, Windows 98, Windows 3.11, UNIX, Linux, FreeBSD, BeOS, and Dos all run on Intel machines. Apple Macintosh, Solaris, and UNIX run on non-Intel platforms.

The choice of OS is often determined by the purpose of the machine. Most people are familiar with Windows and Mac as these are the most popular OS choices for a standard workstation PC. In the Internet server environment, however, Linux, FreeBSD, Solaris are more popular. Microsoft has worked very hard to get NT into the server environment. They have been relatively successful in doing so. The main reason for this is that it’s graphical interface is familiar to most people, which makes a choice of comfort for many people. NT, however, is still not the most scalable or powerful OS for servers. We make use of NT and FreeBSD on most of our servers.

Client - Server

These terms are often used quite a bit. In the context of the web it is relatively straightforward. The server is the part that holds all the information and sends it out over the Internet. The client is the bit that runs on your PC and helps you view the web. At the client end is the PC, your choice of OS (most likely windows 98 or NT), a web browser such as Microsoft Internet Explorer or Netscape Navigator, and you. The other end is what we will be discussing here.

It is accurate enough to simply think of these as two independent systems that talk to each other.  As with any conversation with two parties it is important that they speak the same language(s). The beauty of the Internet is that it has generally seen the development of a common language that runs on all types of machines (platform independent).  The servers send out information in HTML, VRML, JavaScript, XML and the clients generally know what to do with this, even if the server and client are on different platforms. Likewise the same information can be sent out by the server and interpreted by all types of clients, such as Mac or Windows NT.

Parts of the Server System

The Server system can be broken up into many parts. This diagram shows a simple breakdown of these parts. For ease I will break these down into Caching, Web Serving, Application, Database.

Figure 1 - Client/Server Components

Caching/Load Balancing - This layer is the link to the outside world. It is the first part of the system to receive requests from clients; it also handles load balancing and firewall/security functions. When you type in a URL in your web browser, or click on a link, your machines sends off a request to the appropriate server. This layer gets that request and decides what to do with it. If there is a security risk this layer will not let the request in. If the servers are heavily loaded, this layer will decide which machine handles the task. If it is a request that this layer has seen many times before then it may already know the correct response and can actually reply without loading the web server layer at all.

Many small websites do not have this layer.

Web Server - This layer is not to be confused with the Hardware Web Servers - which are physical machines. This is a thin software layer that converts the application output into a format that is web compatible and sends it out. In the case of small web sites with no Caching/Load Balancing layer this is the first layer to get requests.  On an NT machine you can use the Microsoft Internet Information Server (MS IIS). Other Web servers include Apache, Roxen, and Netscape Server.

The boundary between the Caching/Load Balancing layer and the Web Server layer is not solid. Many web servers handle firewall, proxy and even some load balancing tasks.

Application - This is the working part of the site. It can range from plain HTML pages to a complicated application that processes the requests and does complicated calculations. In the case of ####### this is the most complicated layer. This is where most of the programmers do the work. This layer can be made up of multiple small applications on different machines, even on different sites.

Much of ####### is coded in Active Server Pages (ASP) and runs on an NT OS. We use Visual Basic modules, which perform faster for some tasks. We are also developing some applications in other languages such as ZOPE and Python. When talking about application layers you may hear of ColdFusion, Perl, ModPerl, PHP, and Java Server - all of these are alternative solutions for the application layer.

Part of the application layer overlaps with the next layer - database. We can develop small modules that perform repeated functions inside the Database. These are called "stored procedures". They reside on the database machines and are written inside the database, but essentially they are part of the Application Layer.

Database - In a well-structured system, the data/content of the site is separated from the application. (not all sites are well structured so this is not always the case).  In the case of ####### the Database layer holds all the information. The content of each page, all our business listings, classifieds, news feeds, weather are all stored in the Database. Whenever a page is requested the application layer retrieves this information from the database builds the page and sends it to the web server.  There can be many separate databases of information on multiple machines in different locations.

Most of our data is stored in a Microsoft SQL database (MS SQL), another alternative is MY SQL, which runs on UNIX and is an Open Source solution. We also have Oracle and are planning to use this to backup our systems.

Hardware Components

The above information outlines the "layers" or parts of the web server. The following is a breakdown of the most common hardware components.

Switch - The switch is the piece that connects all the components together. Each of the other devices are physically plugged into the switch.

Proxy Servers - The proxy servers are the machines that handle the Caching tasks, as outlined above they are the first point to handle requests from the Internet.

Load Balancers - The load balancers are the devices that control access to all the other machines. Good load balancers are intelligent enough to know the status of the system and will only send tasks to machines that are functioning properly. They are also used to control access to machines and form the largest part of the security system.

Web Servers - These are the machines that do most of the work. They hold both the web server layer and application layer and are set-up in farms to share the load.

Database Servers - These machines hold the database software and control all the raw data/information on the site. They also do some amount of data processing in "stored procedures"

File Servers - The file server is the part where most of the information is stored. The code that runs on the web servers and in many cases the database containers that hold the data, do not reside on those servers, but on the file servers. The file server is made up of many hard drives (usually 6 or more) and is a highly redundant device itself. By keeping all the data in one location it is easy to control and backup. When we plug a new web server into the system we don’t actually load code onto that machine, we simply point it to the file server. This way, we can change the code on the site once and all web servers see the new code instantly.

The Servers

All the main servers are now located in ########. They are located in a facility owned by #######, one of the top hosting companies in the world. This location gives us ideal connectivity to all of the US as well as UK and reasonable connectivity to Australia.

This facility offers all the necessary security and stability of service. The security system uses palm recognition and security card access to all doors in the facility. A UPS (un-interruptible power supply) offers 30 minutes full power if mains power is lost, which provides enough time to fire up the diesel generators which can run the facility for over 1 month. A laser smoke detection system is used in all rooms and is linked to a FM200 Gas fire extinguishing system, which will suppress any fire without affecting machines.

We presently have 5 racks of space (A rack is about 8’ tall and 2’ wide and will hold about 16 web servers). This space is 30% full now, but will be completely full within 8 weeks. We also have first right of refusal on a further 25 racks beside our present set-up.

Our system is comprised of 2 Switches, 2 Load Balancers, 1 File Storage system, 4 Database Machines, and 12 Web servers. We are expanding the web servers to almost 30 and splitting the Database up into more machines.

The entire system can be managed remotely and we only need people on site for installations and routine checks.

Well, hopefully all this will give you an idea of the parts of a server system and an idea of where some of those unusual words and acronyms fit in. As you can see now ZOPE is an application server and XML is language that servers can use to communicate, the two are not interchangeable.

No two web server systems are the same, and the architecture and construction of servers is still quite a new science. We are constantly researching alternatives and experimenting with new software and hardware components. We work hard to ensure that we do not make choices that limit our options.


Last modified 2005-01-24 04:48 PM
 

Personal tools