2. ASPECTS OF THE MEDIUM

2.1 Background

The World Wide Web was first proposed in 1989 at CERN, the European Laboratory for Particle Physics, as a means for physics professionals to share information (Steadman et al., 1996). Since then the Web has grown into a phenomenon used and shared by everyone on the Internet. Like the Internet, the Web technically relies on an underlying communication protocol that all Web sites must adhere to. Called Hypertext Transport Protocol, HTTP defines a means for retrieving, transporting, and interacting with hypermedia "objects" (Steadman et al., 1996). These "objects" can include HTML documents, images, scripts, applets, or any other sort of document.

HTTP is a stateless protocol. When a client and server interact using HTTP, the server does not maintain any information or state about the client. If a client and server interact several times, each interaction is a unique encounter with no knowledge of any previous interactions. This is in contrast to a continuous based protocol such as telnet. When a client and server interact using the telnet protocol, a connection is maintained between the client and server during the entire interaction and state information is maintained and passed back and forth between client and server. One advantage of HTTP's stateless protocol is that once a document has been downloaded, the document can be read or used off-line at the client's convenience without maintaining a connection to the server. However, this also has a drawback, in that each time the client wants another document from the server, a connection must be reestablished with the server - a process that occasionally takes longer than downloading the document itself.

2.2 Establishing A Connection

Establishing a connection is a costly and resource intensive process. Initiating the process is the client's packet requesting a connection with the server. The packet is routed from the client's machine to the server's machine through a host of other machines, as is the nature of the Internet's architecture. Once the client's connection request packet arrives at the server the request may be queued until the server is ready to handle the client's request. A busy server may queue a client's packet for upwards of 3 to 5 minutes. When the server is ready to handle the client's request, control information, called a handshake, is exchanged between the two hosts to establish a dialog before data is transmitted (Hunt, 1992). This handshake is a three way process in which the client and server route packets back and forth three times to each other to establish synchronization.

After the handshake is complete, the client is confident that the server is alive and ready to send data. As soon as the connection is established data transfer can begin. Unfortunately, the three way handshake can be the slowest, and is often times the most resource intensive, part of a connection and must occur for every request the client makes from the server. Once the client and server are connected, they use HTTP interaction methods to transfer data.

HTTP supports a number of interaction methods between client and server, including GET, POST, and HEAD (Steadman et al., 1996). The GET method is the most common, and is used to retrieve documents from the server to the client. The POST method is used for transmitting form information from the client to the server usually via a Common Gateway Interface (CGI) script. The HEAD method is used to retrieve status information about a document, such as when the document was last modified.

As the Web uses HTTP to provide the protocol of communication between client and server, it uses HTML to provide the structural characteristics of a document (Steadman et al., 1996). HTML is the basis of all Web documents, and is used to define the links from one document to another. Because the Web was designed to make information available to all computer platforms regardless of display hardware, HTML was likewise designed with a large degree of platform independence. To display and arrange information, HTML uses markup tags within plain ASCII text. This simple format makes HTML available to a wide variety of computer systems.

Through CGI scripts, HTTP provides a means for servers to accept information from clients and serve dynamic documents based on the client's information. CGI stands for Common Gateway Interface, and as the name implies, it provides a "gateway" between the World Wide Web and many other types of services. While CGI was intended to make non-Web information available on the Web, it is more commonly used to dynamically create Web documents based on information provided from the client to the server. Using HTML forms, a client can pass information provided by the user to the server. The server can then use this information to create a dynamic document tailored specifically for the client's user.

2.3 Constraints of The Medium

Though the Web is easy to use and popular as an information delivery system, it imposes at least two serious constraints on a virtual community, that of multiple stateless connections, and net lag. Interactions between users is the heart of any virtual community, and it is this type of user to user interaction that is difficult to accomplish through the Web. The Web was initially designed as a stateless platform independent system for static document delivery. However, when two or more people interact via computer, the system must keep some state information about the users, such as the user's name, a unique identifier for that user, the user's configuration settings, and what the user's previous activity was. Because the Web is inherently stateless, using the Web as a conduit of real-time interaction between two or more users is therefore a difficult task. The stateless nature of HTTP must be conquered before interactions between users through a Web page can occur.

A combination of dynamic pages created with CGI scripts, and state saving URLs must be used to facilitate interactions between two or more people. One way around HTTP's stateless nature is to imbed state information in a URL. A URL (Universal Resource Locator) is the Web's addressing scheme. Using the parts of a URL's address called PATH_INFO and QUERY_STRING, information can be passed from client to server. If the requested document is actually a CGI script, and the URL contains information in it's PATH_INFO and/or QUERY_STRING, state can be maintained from request to request. The server's CGI script processes the client's information and based on that information serves to the client a dynamic document with state information reflecting the client's submitted information embedded in every URL on the new page. With the client's previous submitted information embedded in every possible link on the new page, the client can submit a new request with both the new and previous information to the server. The server can then process this new information in light of the previous information. This cycle of new and previous information exchange between client and server is the means in which state information can be maintained and thus interaction can be accomplished through a Web page.

In a popular virtual community such as bianca, many connections are established every second; a server taxing process in itself. Although most documents are static HTML pages, images, and sounds, dynamic on the fly documents are also served. Static documents are the least costly to serve, as the HTTP server can handle the request without having to execute other processes. Dynamic documents on the other hand can involve a good deal of behind the scenes server work and can greatly add to the server's load.

One consequence of a heavily loaded system is that of lag time experienced in user communication. Because the overhead of establishing a connection is high and the fact that CGI scripts must be used to accomplish interaction between users, a system with simultaneous connections occurring every second can become bogged down due to CPU overload. In a heavily loaded system, all processes slow down, introducing an annoying lag time into the user's interactions. At its worse, a lag time of 20 to 30 seconds may be experienced between user requests. In this situation, users can become extremely frustrated and more often than not, the lag time actually becomes the topic of conversation.

[ Next: About bianca ]