[catagits/fcgi2.git] / doc / fcgi-perf.htm

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
   <HEAD>
      <TITLE>
         Understanding FastCGI Application Performance
      </TITLE>
<STYLE TYPE="text/css">
 body {
  background-color: #FFFFFF;
  color: #000000;
 }
 :link { color: #cc0000 }
 :visited { color: #555555 }
 :active { color: #000011 }
 div.c3 {margin-left: 2em}
 h5.c2 {text-align: center}
 div.c1 {text-align: center}
</STYLE>
   </HEAD>
   <BODY>
      <DIV CLASS="c1">
         <A HREF="http://fastcgi.com"><IMG BORDER="0" SRC="../images/fcgi-hd.gif" ALT="[[FastCGI]]"></A>
      </DIV>
      <BR CLEAR="all">
      <DIV CLASS="c1">
         <H3>
            Understanding FastCGI Application Performance
         </H3>
      </DIV>
      <!--Copyright (c) 1996 Open Market, Inc.                                    -->
      <!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
      <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES.                   -->
      <DIV CLASS="c1">
         Mark R. Brown<BR>
         Open Market, Inc.<BR>
         <P>
            10 June 1996<BR>
         </P>
      </DIV>
      <P>
      </P>
      <H5 CLASS="c2">
         Copyright &copy; 1996 Open Market, Inc. 245 First Street, Cambridge, MA 02142 U.S.A.<BR>
         Tel: 617-621-9500 Fax: 617-621-1703 URL: <A HREF=
         "http://www.openmarket.com/">http://www.openmarket.com/</A><BR>
         $Id: fcgi-perf.htm,v 1.4 2002/02/25 00:42:59 robs Exp $<BR>
      </H5>
      <HR>
      <UL TYPE="square">
         <LI>
            <A HREF="#S1">1. Introduction</A>
         </LI>
         <LI>
            <A HREF="#S2">2. Performance Basics</A>
         </LI>
         <LI>
            <A HREF="#S3">3. Caching</A>
         </LI>
         <LI>
            <A HREF="#S4">4. Database Access</A>
         </LI>
         <LI>
            <A HREF="#S5">5. A Performance Test</A> 
            <UL TYPE="square">
               <LI>
                  <A HREF="#S5.1">5.1 Application Scenario</A>
               </LI>
               <LI>
                  <A HREF="#S5.2">5.2 Application Design</A>
               </LI>
               <LI>
                  <A HREF="#S5.3">5.3 Test Conditions</A>
               </LI>
               <LI>
                  <A HREF="#S5.4">5.4 Test Results and Discussion</A>
               </LI>
            </UL>
         </LI>
         <LI>
            <A HREF="#S6">6. Multi-threaded APIs</A>
         </LI>
         <LI>
            <A HREF="#S7">7. Conclusion</A>
         </LI>
      </UL>
      <P>
      </P>
      <HR>
      <H3>
         <A NAME="S1">1. Introduction</A>
      </H3>
      <P>
         Just how fast is FastCGI? How does the performance of a FastCGI application compare with the performance of
         the same application implemented using a Web server API?
      </P>
      <P>
         Of course, the answer is that it depends upon the application. A more complete answer is that FastCGI often
         wins by a significant margin, and seldom loses by very much.
      </P>
      <P>
         Papers on computer system performance can be laden with complex graphs showing how this varies with that.
         Seldom do the graphs shed much light on <I>why</I> one system is faster than another. Advertising copy is
         often even less informative. An ad from one large Web server vendor says that its server &quot;executes web
         applications up to five times faster than all other servers,&quot; but the ad gives little clue where the
         number &quot;five&quot; came from.
      </P>
      <P>
         This paper is meant to convey an understanding of the primary factors that influence the performance of Web
         server applications and to show that architectural differences between FastCGI and server APIs often give an
         &quot;unfair&quot; performance advantage to FastCGI applications. We run a test that shows a FastCGI
         application running three times faster than the corresponding Web server API application. Under different
         conditions this factor might be larger or smaller. We show you what you&#39;d need to measure to figure that
         out for the situation you face, rather than just saying &quot;we&#39;re three times faster&quot; and moving
         on.
      </P>
      <P>
         This paper makes no attempt to prove that FastCGI is better than Web server APIs for every application. Web
         server APIs enable lightweight protocol extensions, such as Open Market&#39;s SecureLink extension, to be
         added to Web servers, as well as allowing other forms of server customization. But APIs are not well matched
         to mainstream applications such as personalized content or access to corporate databases, because of API
         drawbacks including high complexity, low security, and limited scalability. FastCGI shines when used for the
         vast majority of Web applications.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S2">2. Performance Basics</A>
      </H3>
      <P>
         Since this paper is about performance we need to be clear on what &quot;performance&quot; is.
      </P>
      <P>
         The standard way to measure performance in a request-response system like the Web is to measure peak request
         throughput subject to a response time constriaint. For instance, a Web server application might be capable of
         performing 20 requests per second while responding to 90% of the requests in less than 2 seconds.
      </P>
      <P>
         Response time is a thorny thing to measure on the Web because client communications links to the Internet have
         widely varying bandwidth. If the client is slow to read the server&#39;s response, response time at both the
         client and the server will go up, and there&#39;s nothing the server can do about it. For the purposes of
         making repeatable measurements the client should have a high-bandwidth communications link to the server.
      </P>
      <P>
         [Footnote: When designing a Web server application that will be accessed over slow (e.g. 14.4 or even 28.8
         kilobit/second modem) channels, pay attention to the simultaneous connections bottleneck. Some servers are
         limited by design to only 100 or 200 simultaneous connections. If your application sends 50 kilobytes of data
         to a typical client that can read 2 kilobytes per second, then a request takes 25 seconds to complete. If your
         server is limited to 100 simultaneous connections, throughput is limited to just 4 requests per second.]
      </P>
      <P>
         Response time is seldom an issue when load is light, but response times rise quickly as the system approaches
         a bottleneck on some limited resource. The three resources that typical systems run out of are network I/O,
         disk I/O, and processor time. If short response time is a goal, it is a good idea to stay at or below 50% load
         on each of these resources. For instance, if your disk subsystem is capable of delivering 200 I/Os per second,
         then try to run your application at 100 I/Os per second to avoid having the disk subsystem contribute to slow
         response times. Through careful management it is possible to succeed in running closer to the edge, but
         careful management is both difficult and expensive so few systems get it.
      </P>
      <P>
         If a Web server application is local to the Web server machine, then its internal design has no impact on
         network I/O. Application design can have a big impact on usage of disk I/O and processor time.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S3">3. Caching</A>
      </H3>
      <P>
         It is a rare Web server application that doesn&#39;t run fast when all the information it needs is available
         in its memory. And if the application doesn&#39;t run fast under those conditions, the possible solutions are
         evident: Tune the processor-hungry parts of the application, install a faster processor, or change the
         application&#39;s functional specification so it doesn&#39;t need to do so much work.
      </P>
      <P>
         The way to make information available in memory is by caching. A cache is an in-memory data structure that
         contains information that&#39;s been read from its permanent home on disk. When the application needs
         information, it consults the cache, and uses the information if it is there. Otherwise is reads the
         information from disk and places a copy in the cache. If the cache is full, the application discards some old
         information before adding the new. When the application needs to change cached information, it changes both
         the cache entry and the information on disk. That way, if the application crashes, no information is lost; the
         application just runs more slowly for awhile after restarting, because the cache doesn&#39;t improve
         performance when it is empty.
      </P>
      <P>
         Caching can reduce both disk I/O and processor time, because reading information from disk uses more processor
         time than reading it from the cache. Because caching addresses both of the potential bottlenecks, it is the
         focal point of high-performance Web server application design. CGI applications couldn&#39;t perform in-memory
         caching, because they exited after processing just one request. Web server APIs promised to solve this
         problem. But how effective is the solution?
      </P>
      <P>
         Today&#39;s most widely deployed Web server APIs are based on a pool-of-processes server model. The Web server
         consists of a parent process and a pool of child processes. Processes do not share memory. An incoming request
         is assigned to an idle child at random. The child runs the request to completion before accepting a new
         request. A typical server has 32 child processes, a large server has 100 or 200.
      </P>
      <P>
         In-memory caching works very poorly in this server model because processes do not share memory and incoming
         requests are assigned to processes at random. For instance, to keep a frequently-used file available in memory
         the server must keep a file copy per child, which wastes memory. When the file is modified all the children
         need to be notified, which is complex (the APIs don&#39;t provide a way to do it).
      </P>
      <P>
         FastCGI is designed to allow effective in-memory caching. Requests are routed from any child process to a
         FastCGI application server. The FastCGI application process maintains an in-memory cache.
      </P>
      <P>
         In some cases a single FastCGI application server won&#39;t provide enough performance. FastCGI provides two
         solutions: session affinity and multi-threading.
      </P>
      <P>
         With session affinity you run a pool of application processes and the Web server routes requests to individual
         processes based on any information contained in the request. For instance, the server can route according to
         the area of content that&#39;s been requested, or according to the user. The user might be identified by an
         application-specific session identifier, by the user ID contained in an Open Market Secure Link ticket, by the
         Basic Authentication user name, or whatever. Each process maintains its own cache, and session affinity
         ensures that each incoming request has access to the cache that will speed up processing the most.
      </P>
      <P>
         With multi-threading you run an application process that is designed to handle several requests at the same
         time. The threads handling concurrent requests share process memory, so they all have access to the same
         cache. Multi-threaded programming is complex -- concurrency makes programs difficult to test and debug -- but
         with FastCGI you can write single threaded <I>or</I> multithreaded applications.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S4">4. Database Access</A>
      </H3>
      <P>
         Many Web server applications perform database access. Existing databases contain a lot of valuable
         information; Web server applications allow companies to give wider access to the information.
      </P>
      <P>
         Access to database management systems, even within a single machine, is via connection-oriented protocols. An
         application &quot;logs in&quot; to a database, creating a connection, then performs one or more accesses.
         Frequently, the cost of creating the database connection is several times the cost of accessing data over an
         established connection.
      </P>
      <P>
         To a first approximation database connections are just another type of state to be cached in memory by an
         application, so the discussion of caching above applies to caching database connections.
      </P>
      <P>
         But database connections are special in one respect: They are often the basis for database licensing. You pay
         the database vendor according to the number of concurrent connections the database system can sustain. A
         100-connection license costs much more than a 5-connection license. It follows that caching a database
         connection per Web server child process is not just wasteful of system&#39;s hardware resources, it could
         break your software budget.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S5">5. A Performance Test</A>
      </H3>
      <P>
         We designed a test application to illustrate performance issues. The application represents a class of
         applications that deliver personalized content. The test application is quite a bit simpler than any real
         application would be, but still illustrates the main performance issues. We implemented the application using
         both FastCGI and a current Web server API, and measured the performance of each.
      </P>
      <P>
      </P>
      <H4>
         <A NAME="S5.1">5.1 Application Scenario</A>
      </H4>
      <P>
         The application is based on a user database and a set of content files. When a user requests a content file,
         the application performs substitutions in the file using information from the user database. The application
         then returns the modified content to the user.
      </P>
      <P>
         Each request accomplishes the following:
      </P>
      <P>
      </P>
      <OL>
         <LI>
            authentication check: The user id is used to retrieve and check the password.
            <P>
            </P>
         </LI>
         <LI>
            attribute retrieval: The user id is used to retrieve all of the user&#39;s attribute values.
            <P>
            </P>
         </LI>
         <LI>
            file retrieval and filtering: The request identifies a content file. This file is read and all occurrences
            of variable names are replaced with the user&#39;s corresponding attribute values. The modified HTML is
            returned to the user.<BR>
            <BR>
         </LI>
      </OL>
      <P>
         Of course, it is fair game to perform caching to shortcut any of these steps.
      </P>
      <P>
         Each user&#39;s database record (including password and attribute values) is approximately 100 bytes long.
         Each content file is 3,000 bytes long. Both database and content files are stored on disks attached to the
         server platform.
      </P>
      <P>
         A typical user makes 10 file accesses with realistic think times (30-60 seconds) between accesses, then
         disappears for a long time.
      </P>
      <P>
      </P>
      <H4>
         <A NAME="S5.2">5.2 Application Design</A>
      </H4>
      <P>
         The FastCGI application maintains a cache of recently-accessed attribute values from the database. When the
         cache misses the application reads from the database. Because only a small number of FastCGI application
         processes are needed, each process opens a database connection on startup and keeps it open.
      </P>
      <P>
         The FastCGI application is configured as multiple application processes. This is desirable in order to get
         concurrent application processing during database reads and file reads. Requests are routed to these
         application processes using FastCGI session affinity keyed on the user id. This way all a user&#39;s requests
         after the first hit in the application&#39;s cache.
      </P>
      <P>
         The API application does not maintain a cache; the API application has no way to share the cache among its
         processes, so the cache hit rate would be too low to make caching pay. The API application opens and closes a
         database connection on every request; keeping database connections open between requests would result in an
         unrealistically large number of database connections open at the same time, and very low utilization of each
         connection.
      </P>
      <P>
      </P>
      <H4>
         <A NAME="S5.3">5.3 Test Conditions</A>
      </H4>
      <P>
         The test load is generated by 10 HTTP client processes. The processes represent disjoint sets of users. A
         process makes a request for a user, then a request for a different user, and so on until it is time for the
         first user to make another request.
      </P>
      <P>
         For simplicity the 10 client processes run on the same machine as the Web server. This avoids the possibility
         that a network bottleneck will obscure the test results. The database system also runs on this machine, as
         specified in the application scenario.
      </P>
      <P>
         Response time is not an issue under the test conditions. We just measure throughput.
      </P>
      <P>
         The API Web server is in these tests is Netscape 1.1.
      </P>
      <P>
      </P>
      <H4>
         <A NAME="S5.4">5.4 Test Results and Discussion</A>
      </H4>
      <P>
         Here are the test results:
      </P>
      <P>
      </P>
      <DIV CLASS="c3">
<PRE>
    FastCGI  12.0 msec per request = 83 requests per second
    API      36.6 msec per request = 27 requests per second
</PRE>
      </DIV>
      <P>
         Given the big architectural advantage that the FastCGI application enjoys over the API application, it is not
         surprising that the FastCGI application runs a lot faster. To gain a deeper understanding of these results we
         measured two more conditions:
      </P>
      <P>
      </P>
      <UL>
         <LI>
            API with sustained database connections. If you could afford the extra licensing cost, how much faster
            would your API application run?
            <P>
            </P>
<PRE>
    API      16.0 msec per request = 61 requests per second
</PRE>
            Answer: Still not as fast as the FastCGI application.
            <P>
            </P>
         </LI>
         <LI>
            FastCGI with cache disabled. How much benefit does the FastCGI application get from its cache?
            <P>
            </P>
<PRE>
    FastCGI  20.1 msec per request = 50 requests per second
</PRE>
            Answer: A very substantial benefit, even though the database access is quite simple.<BR>
            <BR>
         </LI>
      </UL>
      <P>
         What these two extra experiments show is that if the API and FastCGI applications are implemented in exactly
         the same way -- caching database connections but not caching user profile data -- the API application is
         slightly faster. This is what you&#39;d expect, since the FastCGI application has to pay the cost of
         inter-process communication not present in the API application.
      </P>
      <P>
         In the real world the two applications would not be implemented in the same way. FastCGI&#39;s architectural
         advantage results in much higher performance -- a factor of 3 in this test. With a remote database or more
         expensive database access the factor would be higher. With more substantial processing of the content files
         the factor would be smaller.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S6">6. Multi-threaded APIs</A>
      </H3>
      <P>
         Web servers with a multi-threaded internal structure (and APIs to match) are now starting to become more
         common. These servers don&#39;t have all of the disadvantages described in Section 3. Does this mean that
         FastCGI&#39;s performance advantages will disappear?
      </P>
      <P>
         A superficial analysis says yes. An API-based application in a single-process, multi-threaded server can
         maintain caches and database connections the same way a FastCGI application can. The API-based application
         does not pay for inter-process communication, so the API-based application will be slightly faster than the
         FastCGI application.
      </P>
      <P>
         A deeper analysis says no. Multi-threaded programming is complex, because concurrency makes programs much more
         difficult to test and debug. In the case of multi-threaded programming to Web server APIs, the normal problems
         with multi-threading are compounded by the lack of isolation between different applications and between the
         applications and the Web server. With FastCGI you can write programs in the familiar single-threaded style,
         get all the reliability and maintainability of process isolation, and still get very high performance. If you
         truly need multi-threading, you can write multi-threaded FastCGI and still isolate your multi-threaded
         application from other applications and from the server. In short, multi-threading makes Web server APIs
         unusable for practially all applications, reducing the choice to FastCGI versus CGI. The performance winner in
         that contest is obviously FastCGI.
      </P>
      <P>
      </P>
      <H3>
         <A NAME="S7">7. Conclusion</A>
      </H3>
      <P>
         Just how fast is FastCGI? The answer: very fast indeed. Not because it has some specially-greased path through
         the operating system, but because its design is well matched to the needs of most applications. We invite you
         to make FastCGI the fast, open foundation for your Web server applications.
      </P>
      <P>
      </P>
      <HR>
      <A HREF="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></A> 
      <ADDRESS>
         &copy; 1995, Open Market, Inc. / mbrown@openmarket.com
      </ADDRESS>
   </BODY>
</HTML>
Commit	Line	Data
852467e2	1	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
	2	<HTML>
	3	<HEAD>
	4	<TITLE>
	5	Understanding FastCGI Application Performance
	6	</TITLE>
	7	<STYLE TYPE="text/css">
	8	body {
	9	background-color: #FFFFFF;
	10	color: #000000;
	11	}
	12	:link { color: #cc0000 }
	13	:visited { color: #555555 }
	14	:active { color: #000011 }
	15	div.c3 {margin-left: 2em}
	16	h5.c2 {text-align: center}
	17	div.c1 {text-align: center}
	18	</STYLE>
	19	</HEAD>
	20	<BODY>
	21	<DIV CLASS="c1">
	22	<A HREF="http://fastcgi.com"><IMG BORDER="0" SRC="../images/fcgi-hd.gif" ALT="[[FastCGI]]"></A>
	23	</DIV>
	24	<BR CLEAR="all">
	25	<DIV CLASS="c1">
	26	<H3>
	27	Understanding FastCGI Application Performance
	28	</H3>
	29	</DIV>
	30	<!--Copyright (c) 1996 Open Market, Inc. -->
	31	<!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
	32	<!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. -->
	33	<DIV CLASS="c1">
	34	Mark R. Brown<BR>
	35	Open Market, Inc.<BR>
	36	<P>
	37	10 June 1996<BR>
	38	</P>
	39	</DIV>
	40	<P>
	41	</P>
	42	<H5 CLASS="c2">
	43	Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge, MA 02142 U.S.A.<BR>
	44	Tel: 617-621-9500 Fax: 617-621-1703 URL: <A HREF=
	45	"http://www.openmarket.com/">http://www.openmarket.com/</A><BR>
	46	$Id: fcgi-perf.htm,v 1.4 2002/02/25 00:42:59 robs Exp $<BR>
	47	</H5>
	48	<HR>
	49	<UL TYPE="square">
	50	<LI>
	51	<A HREF="#S1">1. Introduction</A>
	52	</LI>
	53	<LI>
	54	<A HREF="#S2">2. Performance Basics</A>
	55	</LI>
	56	<LI>
	57	<A HREF="#S3">3. Caching</A>
	58	</LI>
	59	<LI>
	60	<A HREF="#S4">4. Database Access</A>
	61	</LI>
	62	<LI>
	63	<A HREF="#S5">5. A Performance Test</A>
	64	<UL TYPE="square">
65	<LI>
66	<A HREF="#S5.1">5.1 Application Scenario</A>
67	</LI>
68	<LI>
69	<A HREF="#S5.2">5.2 Application Design</A>
70	</LI>
71	<LI>
72	<A HREF="#S5.3">5.3 Test Conditions</A>
73	</LI>
74	<LI>
75	<A HREF="#S5.4">5.4 Test Results and Discussion</A>
76	</LI>
77	</UL>
78	</LI>
79	<LI>
80	<A HREF="#S6">6. Multi-threaded APIs</A>
81	</LI>
82	<LI>
83	<A HREF="#S7">7. Conclusion</A>
84	</LI>
85	</UL>
86	<P>
87	</P>
88	<HR>
89	<H3>
90	<A NAME="S1">1. Introduction</A>
91	</H3>
92	<P>
93	Just how fast is FastCGI? How does the performance of a FastCGI application compare with the performance of
94	the same application implemented using a Web server API?
95	</P>
96	<P>
97	Of course, the answer is that it depends upon the application. A more complete answer is that FastCGI often
98	wins by a significant margin, and seldom loses by very much.
99	</P>
100	<P>
101	Papers on computer system performance can be laden with complex graphs showing how this varies with that.
102	Seldom do the graphs shed much light on <I>why</I> one system is faster than another. Advertising copy is
103	often even less informative. An ad from one large Web server vendor says that its server "executes web
104	applications up to five times faster than all other servers," but the ad gives little clue where the
105	number "five" came from.
106	</P>
107	<P>
108	This paper is meant to convey an understanding of the primary factors that influence the performance of Web
109	server applications and to show that architectural differences between FastCGI and server APIs often give an
110	"unfair" performance advantage to FastCGI applications. We run a test that shows a FastCGI
111	application running three times faster than the corresponding Web server API application. Under different
112	conditions this factor might be larger or smaller. We show you what you'd need to measure to figure that
113	out for the situation you face, rather than just saying "we're three times faster" and moving
114	on.
115	</P>
116	<P>
117	This paper makes no attempt to prove that FastCGI is better than Web server APIs for every application. Web
118	server APIs enable lightweight protocol extensions, such as Open Market's SecureLink extension, to be
119	added to Web servers, as well as allowing other forms of server customization. But APIs are not well matched
120	to mainstream applications such as personalized content or access to corporate databases, because of API
121	drawbacks including high complexity, low security, and limited scalability. FastCGI shines when used for the
122	vast majority of Web applications.
123	</P>
124	<P>
125	</P>
126	<H3>
127	<A NAME="S2">2. Performance Basics</A>
128	</H3>
129	<P>
130	Since this paper is about performance we need to be clear on what "performance" is.
131	</P>
132	<P>
133	The standard way to measure performance in a request-response system like the Web is to measure peak request
134	throughput subject to a response time constriaint. For instance, a Web server application might be capable of
135	performing 20 requests per second while responding to 90% of the requests in less than 2 seconds.
136	</P>
137	<P>
138	Response time is a thorny thing to measure on the Web because client communications links to the Internet have
139	widely varying bandwidth. If the client is slow to read the server's response, response time at both the
140	client and the server will go up, and there's nothing the server can do about it. For the purposes of
141	making repeatable measurements the client should have a high-bandwidth communications link to the server.
142	</P>
143	<P>
144	[Footnote: When designing a Web server application that will be accessed over slow (e.g. 14.4 or even 28.8
145	kilobit/second modem) channels, pay attention to the simultaneous connections bottleneck. Some servers are
146	limited by design to only 100 or 200 simultaneous connections. If your application sends 50 kilobytes of data
147	to a typical client that can read 2 kilobytes per second, then a request takes 25 seconds to complete. If your
148	server is limited to 100 simultaneous connections, throughput is limited to just 4 requests per second.]
149	</P>
150	<P>
151	Response time is seldom an issue when load is light, but response times rise quickly as the system approaches
152	a bottleneck on some limited resource. The three resources that typical systems run out of are network I/O,
153	disk I/O, and processor time. If short response time is a goal, it is a good idea to stay at or below 50% load
154	on each of these resources. For instance, if your disk subsystem is capable of delivering 200 I/Os per second,
155	then try to run your application at 100 I/Os per second to avoid having the disk subsystem contribute to slow
156	response times. Through careful management it is possible to succeed in running closer to the edge, but
157	careful management is both difficult and expensive so few systems get it.
158	</P>
159	<P>
160	If a Web server application is local to the Web server machine, then its internal design has no impact on
161	network I/O. Application design can have a big impact on usage of disk I/O and processor time.
162	</P>
163	<P>
164	</P>
165	<H3>
166	<A NAME="S3">3. Caching</A>
167	</H3>
168	<P>
169	It is a rare Web server application that doesn't run fast when all the information it needs is available
170	in its memory. And if the application doesn't run fast under those conditions, the possible solutions are
171	evident: Tune the processor-hungry parts of the application, install a faster processor, or change the
172	application's functional specification so it doesn't need to do so much work.
173	</P>
174	<P>
175	The way to make information available in memory is by caching. A cache is an in-memory data structure that
176	contains information that's been read from its permanent home on disk. When the application needs
177	information, it consults the cache, and uses the information if it is there. Otherwise is reads the
178	information from disk and places a copy in the cache. If the cache is full, the application discards some old
179	information before adding the new. When the application needs to change cached information, it changes both
180	the cache entry and the information on disk. That way, if the application crashes, no information is lost; the
181	application just runs more slowly for awhile after restarting, because the cache doesn't improve
182	performance when it is empty.
183	</P>
184	<P>
185	Caching can reduce both disk I/O and processor time, because reading information from disk uses more processor
186	time than reading it from the cache. Because caching addresses both of the potential bottlenecks, it is the
187	focal point of high-performance Web server application design. CGI applications couldn't perform in-memory
188	caching, because they exited after processing just one request. Web server APIs promised to solve this
189	problem. But how effective is the solution?
190	</P>
191	<P>
192	Today's most widely deployed Web server APIs are based on a pool-of-processes server model. The Web server
193	consists of a parent process and a pool of child processes. Processes do not share memory. An incoming request
194	is assigned to an idle child at random. The child runs the request to completion before accepting a new
195	request. A typical server has 32 child processes, a large server has 100 or 200.
196	</P>
197	<P>
198	In-memory caching works very poorly in this server model because processes do not share memory and incoming
199	requests are assigned to processes at random. For instance, to keep a frequently-used file available in memory
200	the server must keep a file copy per child, which wastes memory. When the file is modified all the children
201	need to be notified, which is complex (the APIs don't provide a way to do it).
202	</P>
203	<P>
204	FastCGI is designed to allow effective in-memory caching. Requests are routed from any child process to a
205	FastCGI application server. The FastCGI application process maintains an in-memory cache.
206	</P>
207	<P>
208	In some cases a single FastCGI application server won't provide enough performance. FastCGI provides two
209	solutions: session affinity and multi-threading.
210	</P>
211	<P>
212	With session affinity you run a pool of application processes and the Web server routes requests to individual
213	processes based on any information contained in the request. For instance, the server can route according to
214	the area of content that's been requested, or according to the user. The user might be identified by an
215	application-specific session identifier, by the user ID contained in an Open Market Secure Link ticket, by the
216	Basic Authentication user name, or whatever. Each process maintains its own cache, and session affinity
217	ensures that each incoming request has access to the cache that will speed up processing the most.
218	</P>
219	<P>
220	With multi-threading you run an application process that is designed to handle several requests at the same
221	time. The threads handling concurrent requests share process memory, so they all have access to the same
222	cache. Multi-threaded programming is complex -- concurrency makes programs difficult to test and debug -- but
223	with FastCGI you can write single threaded <I>or</I> multithreaded applications.
224	</P>
225	<P>
226	</P>
227	<H3>
228	<A NAME="S4">4. Database Access</A>
229	</H3>
230	<P>
231	Many Web server applications perform database access. Existing databases contain a lot of valuable
232	information; Web server applications allow companies to give wider access to the information.
233	</P>
234	<P>
235	Access to database management systems, even within a single machine, is via connection-oriented protocols. An
236	application "logs in" to a database, creating a connection, then performs one or more accesses.
237	Frequently, the cost of creating the database connection is several times the cost of accessing data over an
238	established connection.
239	</P>
240	<P>
241	To a first approximation database connections are just another type of state to be cached in memory by an
242	application, so the discussion of caching above applies to caching database connections.
243	</P>
244	<P>
245	But database connections are special in one respect: They are often the basis for database licensing. You pay
246	the database vendor according to the number of concurrent connections the database system can sustain. A
247	100-connection license costs much more than a 5-connection license. It follows that caching a database
248	connection per Web server child process is not just wasteful of system's hardware resources, it could
249	break your software budget.
250	</P>
251	<P>
252	</P>
253	<H3>
254	<A NAME="S5">5. A Performance Test</A>
255	</H3>
256	<P>
257	We designed a test application to illustrate performance issues. The application represents a class of
258	applications that deliver personalized content. The test application is quite a bit simpler than any real
259	application would be, but still illustrates the main performance issues. We implemented the application using
260	both FastCGI and a current Web server API, and measured the performance of each.
261	</P>
262	<P>
263	</P>
264	<H4>
265	<A NAME="S5.1">5.1 Application Scenario</A>
266	</H4>
267	<P>
268	The application is based on a user database and a set of content files. When a user requests a content file,
269	the application performs substitutions in the file using information from the user database. The application
270	then returns the modified content to the user.
271	</P>
272	<P>
273	Each request accomplishes the following:
274	</P>
275	<P>
276	</P>
277	<OL>
278	<LI>
279	authentication check: The user id is used to retrieve and check the password.
280	<P>
281	</P>
282	</LI>
283	<LI>
284	attribute retrieval: The user id is used to retrieve all of the user's attribute values.
285	<P>
286	</P>
287	</LI>
288	<LI>
289	file retrieval and filtering: The request identifies a content file. This file is read and all occurrences
290	of variable names are replaced with the user's corresponding attribute values. The modified HTML is
291	returned to the user.<BR>
292	<BR>
293	</LI>
294	</OL>
295	<P>
296	Of course, it is fair game to perform caching to shortcut any of these steps.
297	</P>
298	<P>
299	Each user's database record (including password and attribute values) is approximately 100 bytes long.
300	Each content file is 3,000 bytes long. Both database and content files are stored on disks attached to the
301	server platform.
302	</P>
303	<P>
304	A typical user makes 10 file accesses with realistic think times (30-60 seconds) between accesses, then
305	disappears for a long time.
306	</P>
307	<P>
308	</P>
309	<H4>
310	<A NAME="S5.2">5.2 Application Design</A>
311	</H4>
312	<P>
313	The FastCGI application maintains a cache of recently-accessed attribute values from the database. When the
314	cache misses the application reads from the database. Because only a small number of FastCGI application
315	processes are needed, each process opens a database connection on startup and keeps it open.
316	</P>
317	<P>
318	The FastCGI application is configured as multiple application processes. This is desirable in order to get
319	concurrent application processing during database reads and file reads. Requests are routed to these
320	application processes using FastCGI session affinity keyed on the user id. This way all a user's requests
321	after the first hit in the application's cache.
322	</P>
323	<P>
324	The API application does not maintain a cache; the API application has no way to share the cache among its
325	processes, so the cache hit rate would be too low to make caching pay. The API application opens and closes a
326	database connection on every request; keeping database connections open between requests would result in an
327	unrealistically large number of database connections open at the same time, and very low utilization of each
328	connection.
329	</P>
330	<P>
331	</P>
332	<H4>
333	<A NAME="S5.3">5.3 Test Conditions</A>
334	</H4>
335	<P>
336	The test load is generated by 10 HTTP client processes. The processes represent disjoint sets of users. A
337	process makes a request for a user, then a request for a different user, and so on until it is time for the
338	first user to make another request.
339	</P>
340	<P>
341	For simplicity the 10 client processes run on the same machine as the Web server. This avoids the possibility
342	that a network bottleneck will obscure the test results. The database system also runs on this machine, as
343	specified in the application scenario.
344	</P>
345	<P>
346	Response time is not an issue under the test conditions. We just measure throughput.
347	</P>
348	<P>
349	The API Web server is in these tests is Netscape 1.1.
350	</P>
351	<P>
352	</P>
353	<H4>
354	<A NAME="S5.4">5.4 Test Results and Discussion</A>
355	</H4>
356	<P>
357	Here are the test results:
358	</P>
359	<P>
360	</P>
361	<DIV CLASS="c3">
362	<PRE>
363	FastCGI 12.0 msec per request = 83 requests per second
364	API 36.6 msec per request = 27 requests per second
365	</PRE>
366	</DIV>
367	<P>
368	Given the big architectural advantage that the FastCGI application enjoys over the API application, it is not
369	surprising that the FastCGI application runs a lot faster. To gain a deeper understanding of these results we
370	measured two more conditions:
371	</P>
372	<P>
373	</P>
374	<UL>
375	<LI>
376	API with sustained database connections. If you could afford the extra licensing cost, how much faster
377	would your API application run?
378	<P>
379	</P>
380	<PRE>
381	API 16.0 msec per request = 61 requests per second
382	</PRE>
383	Answer: Still not as fast as the FastCGI application.
384	<P>
385	</P>
386	</LI>
387	<LI>
388	FastCGI with cache disabled. How much benefit does the FastCGI application get from its cache?
389	<P>
390	</P>
391	<PRE>
392	FastCGI 20.1 msec per request = 50 requests per second
393	</PRE>
394	Answer: A very substantial benefit, even though the database access is quite simple.<BR>
395	<BR>
396	</LI>
397	</UL>
398	<P>
399	What these two extra experiments show is that if the API and FastCGI applications are implemented in exactly
400	the same way -- caching database connections but not caching user profile data -- the API application is
401	slightly faster. This is what you'd expect, since the FastCGI application has to pay the cost of
402	inter-process communication not present in the API application.
403	</P>
404	<P>
405	In the real world the two applications would not be implemented in the same way. FastCGI's architectural
406	advantage results in much higher performance -- a factor of 3 in this test. With a remote database or more
407	expensive database access the factor would be higher. With more substantial processing of the content files
408	the factor would be smaller.
409	</P>
410	<P>
411	</P>
412	<H3>
413	<A NAME="S6">6. Multi-threaded APIs</A>
414	</H3>
415	<P>
416	Web servers with a multi-threaded internal structure (and APIs to match) are now starting to become more
417	common. These servers don't have all of the disadvantages described in Section 3. Does this mean that
418	FastCGI's performance advantages will disappear?
419	</P>
420	<P>
421	A superficial analysis says yes. An API-based application in a single-process, multi-threaded server can
422	maintain caches and database connections the same way a FastCGI application can. The API-based application
423	does not pay for inter-process communication, so the API-based application will be slightly faster than the
424	FastCGI application.
425	</P>
426	<P>
427	A deeper analysis says no. Multi-threaded programming is complex, because concurrency makes programs much more
428	difficult to test and debug. In the case of multi-threaded programming to Web server APIs, the normal problems
429	with multi-threading are compounded by the lack of isolation between different applications and between the
430	applications and the Web server. With FastCGI you can write programs in the familiar single-threaded style,
431	get all the reliability and maintainability of process isolation, and still get very high performance. If you
432	truly need multi-threading, you can write multi-threaded FastCGI and still isolate your multi-threaded
433	application from other applications and from the server. In short, multi-threading makes Web server APIs
434	unusable for practially all applications, reducing the choice to FastCGI versus CGI. The performance winner in
435	that contest is obviously FastCGI.
436	</P>
437	<P>
438	</P>
439	<H3>
440	<A NAME="S7">7. Conclusion</A>
441	</H3>
442	<P>
443	Just how fast is FastCGI? The answer: very fast indeed. Not because it has some specially-greased path through
444	the operating system, but because its design is well matched to the needs of most applications. We invite you
445	to make FastCGI the fast, open foundation for your Web server applications.
446	</P>
447	<P>
448	</P>
449	<HR>
450	<A HREF="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></A>
451	<ADDRESS>
452	© 1995, Open Market, Inc. / mbrown@openmarket.com
453	</ADDRESS>
454	</BODY>
455	</HTML>
456