[catagits/fcgi2.git] / doc / fcgi-perf.htm

<html>
<head><title>Understanding FastCGI Application Performance</title>
</head>

<body bgcolor="#FFFFFF" text="#000000" link="#cc0000" alink="#000011" 
vlink="#555555">

<center>
<a href="/fastcgi/words">
    <img border=0 src="../images/fcgi-hd.gif" alt="[[FastCGI]]"></a>
</center>
<br clear=all>
<h3><center>Understanding FastCGI Application Performance</center></h3>

<!--Copyright (c) 1996 Open Market, Inc.                                    -->
<!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
<!--of this file, and for a DISCLAIMER OF ALL WARRANTIES.                   -->

<center>
Mark R. Brown<br>
Open Market, Inc.<br>
<p>

10 June 1996<br>
</center>
<p>

<h5 align=center>
Copyright &copy; 1996 Open Market, Inc.  245 First Street, Cambridge,
  MA 02142 U.S.A.<br>
Tel: 617-621-9500 Fax: 617-621-1703 URL:
  <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
$Id: fcgi-perf.htm,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
</h5>
<hr>

<ul type=square>
    <li><a HREF = "#S1">1. Introduction</a>
    <li><a HREF = "#S2">2. Performance Basics</a>
    <li><a HREF = "#S3">3. Caching</a>
    <li><a HREF = "#S4">4. Database Access</a>
    <li><a HREF = "#S5">5. A Performance Test</a>
    <ul type=square>
        <li><a HREF = "#S5.1">5.1 Application Scenario</a>
        <li><a HREF = "#S5.2">5.2 Application Design</a>
        <li><a HREF = "#S5.3">5.3 Test Conditions</a>
        <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
    </ul>
    <li><a HREF = "#S6">6. Multi-threaded APIs</a>
    <li><a HREF = "#S7">7. Conclusion</a>
</ul>
<p>

<hr>


<h3><a name = "S1">1. Introduction</a></h3>


Just how fast is FastCGI?  How does the performance of a FastCGI
application compare with the performance of the same
application implemented using a Web server API?<p>

Of course, the answer is that it depends upon the application.
A more complete answer is that FastCGI often wins by a significant
margin, and seldom loses by very much.<p>

Papers on computer system performance can be laden with complex graphs
showing how this varies with that.  Seldom do the graphs shed much
light on <i>why</i> one system is faster than another.  Advertising copy is
often even less informative.  An ad from one large Web server vendor
says that its server "executes web applications up to five times
faster than all other servers," but the ad gives little clue where the
number "five" came from.<p>

This paper is meant to convey an understanding of the primary factors
that influence the performance of Web server applications and to show
that architectural differences between FastCGI and server APIs often
give an "unfair" performance advantage to FastCGI applications.  We
run a test that shows a FastCGI application running three times faster
than the corresponding Web server API application.  Under different
conditions this factor might be larger or smaller.  We show you what
you'd need to measure to figure that out for the situation you face,
rather than just saying "we're three times faster" and moving on.<p>

This paper makes no attempt to prove that FastCGI is better than Web
server APIs for every application.  Web server APIs enable lightweight
protocol extensions, such as Open Market's SecureLink extension, to be
added to Web servers, as well as allowing other forms of server
customization.  But APIs are not well matched to mainstream applications
such as personalized content or access to corporate databases, because
of API drawbacks including high complexity, low security, and
limited scalability.  FastCGI shines when used for the vast
majority of Web applications.<p>


<h3><a name = "S2">2. Performance Basics</a></h3>


Since this paper is about performance we need to be clear on
what "performance" is.<p>

The standard way to measure performance in a request-response system
like the Web is to measure peak request throughput subject to a
response time constriaint.  For instance, a Web server application
might be capable of performing 20 requests per second while responding
to 90% of the requests in less than 2 seconds.<p>

Response time is a thorny thing to measure on the Web because client
communications links to the Internet have widely varying bandwidth.
If the client is slow to read the server's response, response time at
both the client and the server will go up, and there's nothing the
server can do about it.  For the purposes of making repeatable
measurements the client should have a high-bandwidth communications
link to the server.<p>

[Footnote: When designing a Web server application that will be
accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
channels, pay attention to the simultaneous connections bottleneck.
Some servers are limited by design to only 100 or 200 simultaneous
connections.  If your application sends 50 kilobytes of data to a
typical client that can read 2 kilobytes per second, then a request
takes 25 seconds to complete.  If your server is limited to 100
simultaneous connections, throughput is limited to just 4 requests per
second.]<p>

Response time is seldom an issue when load is light, but response
times rise quickly as the system approaches a bottleneck on some
limited resource.  The three resources that typical systems run out of
are network I/O, disk I/O, and processor time.  If short response time
is a goal, it is a good idea to stay at or below 50% load on each of
these resources.  For instance, if your disk subsystem is capable of
delivering 200 I/Os per second, then try to run your application at
100 I/Os per second to avoid having the disk subsystem contribute to
slow response times.  Through careful management it is possible to
succeed in running closer to the edge, but careful management is both
difficult and expensive so few systems get it.<p>

If a Web server application is local to the Web server machine, then
its internal design has no impact on network I/O.  Application design
can have a big impact on usage of disk I/O and processor time.<p>


<h3><a name = "S3">3. Caching</a></h3>


It is a rare Web server application that doesn't run fast when all the
information it needs is available in its memory.  And if the
application doesn't run fast under those conditions, the possible
solutions are evident: Tune the processor-hungry parts of the
application, install a faster processor, or change the application's
functional specification so it doesn't need to do so much work.<p>

The way to make information available in memory is by caching.  A
cache is an in-memory data structure that contains information that's
been read from its permanent home on disk.  When the application needs
information, it consults the cache, and uses the information if it is
there.  Otherwise is reads the information from disk and places a copy
in the cache.  If the cache is full, the application discards some old
information before adding the new.  When the application needs to
change cached information, it changes both the cache entry and the
information on disk.  That way, if the application crashes, no
information is lost; the application just runs more slowly for awhile
after restarting, because the cache doesn't improve performance
when it is empty.<p>

Caching can reduce both disk I/O and processor time, because reading
information from disk uses more processor time than reading it from
the cache.  Because caching addresses both of the potential
bottlenecks, it is the focal point of high-performance Web server
application design.  CGI applications couldn't perform in-memory
caching, because they exited after processing just one request.  Web
server APIs promised to solve this problem.  But how effective is the
solution?<p>

Today's most widely deployed Web server APIs are based on a
pool-of-processes server model.  The Web server consists of a parent
process and a pool of child processes.  Processes do not share memory.
An incoming request is assigned to an idle child at random.  The child
runs the request to completion before accepting a new request.  A
typical server has 32 child processes, a large server has 100 or 200.<p>

In-memory caching works very poorly in this server model because
processes do not share memory and incoming requests are assigned to
processes at random.  For instance, to keep a frequently-used file
available in memory the server must keep a file copy per child, which
wastes memory.  When the file is modified all the children need to be
notified, which is complex (the APIs don't provide a way to do it).<p>

FastCGI is designed to allow effective in-memory caching.  Requests
are routed from any child process to a FastCGI application server.
The FastCGI application process maintains an in-memory cache.<p>

In some cases a single FastCGI application server won't
provide enough performance.  FastCGI provides two solutions:
session affinity and multi-threading.<p>

With session affinity you run a pool of application processes and the
Web server routes requests to individual processes based on any
information contained in the request.  For instance, the server can
route according to the area of content that's been requested, or
according to the user.  The user might be identified by an
application-specific session identifier, by the user ID contained in
an Open Market Secure Link ticket, by the Basic Authentication user
name, or whatever.  Each process maintains its own cache, and session
affinity ensures that each incoming request has access to the cache
that will speed up processing the most.<p>

With multi-threading you run an application process that is designed
to handle several requests at the same time.  The threads handling
concurrent requests share process memory, so they all have access to
the same cache.  Multi-threaded programming is complex -- concurrency
makes programs difficult to test and debug -- but with FastCGI you can
write single threaded <i>or</i> multithreaded applications.<p>


<h3><a name = "S4">4. Database Access</a></h3>


Many Web server applications perform database access.  Existing
databases contain a lot of valuable information; Web server
applications allow companies to give wider access to the information.<p>

Access to database management systems, even within a single machine,
is via connection-oriented protocols.  An application "logs in" to a
database, creating a connection, then performs one or more accesses.
Frequently, the cost of creating the database connection is several
times the cost of accessing data over an established connection.<p>

To a first approximation database connections are just another type of
state to be cached in memory by an application, so the discussion of
caching above applies to caching database connections.<p>

But database connections are special in one respect: They are often
the basis for database licensing.  You pay the database vendor
according to the number of concurrent connections the database system
can sustain.  A 100-connection license costs much more than a
5-connection license.  It follows that caching a database connection
per Web server child process is not just wasteful of system's hardware
resources, it could break your software budget.<p>


<h3><a name = "S5">5. A Performance Test</a></h3>


We designed a test application to illustrate performance issues.  The
application represents a class of applications that deliver
personalized content.  The test application is quite a bit simpler
than any real application would be, but still illustrates the main
performance issues.  We implemented the application using both FastCGI
and a current Web server API, and measured the performance of each.<p>

<h4><a name = "S5.1">5.1 Application Scenario</a></h4>

The application is based on a user database and a set of
content files.  When a user requests a content file, the application
performs substitutions in the file using information from the
user database.  The application then returns the modified
content to the user.<p>

Each request accomplishes the following:<p>

<ol>
    <li>authentication check: The user id is used to retrieve and
        check the password.<p>

    <li>attribute retrieval: The user id is used to retrieve all
        of the user's attribute values.<p>

    <li>file retrieval and filtering: The request identifies a
        content file. This file is read and all occurrences of variable
        names are replaced with the user's corresponding attribute values.
        The modified HTML is returned to the user.<p>
</ol>

Of course, it is fair game to perform caching to shortcut
any of these steps.<p>

Each user's database record (including password and attribute
values) is approximately 100 bytes long.  Each content file is 3,000
bytes long.  Both database and content files are stored
on disks attached to the server platform.<p>

A typical user makes 10 file accesses with realistic think times
(30-60 seconds) between accesses, then disappears for a long time.<p>


<h4><a name = "S5.2">5.2 Application Design</a></h4>

The FastCGI application maintains a cache of recently-accessed
attribute values from the database.  When the cache misses
the application reads from the database.  Because only a small
number of FastCGI application processes are needed, each process
opens a database connection on startup and keeps it open.<p>

The FastCGI application is configured as multiple application
processes.  This is desirable in order to get concurrent application
processing during database reads and file reads.  Requests are routed
to these application processes using FastCGI session affinity keyed on
the user id.  This way all a user's requests after the first hit in
the application's cache.<p>

The API application does not maintain a cache; the API application has
no way to share the cache among its processes, so the cache hit rate
would be too low to make caching pay.  The API application opens and
closes a database connection on every request; keeping database
connections open between requests would result in an unrealistically
large number of database connections open at the same time, and very
low utilization of each connection.<p>


<h4><a name = "S5.3">5.3 Test Conditions</a></h4>

The test load is generated by 10 HTTP client processes.  The processes
represent disjoint sets of users.  A process makes a request for a
user, then a request for a different user, and so on until it is time
for the first user to make another request.<p>

For simplicity the 10 client processes run on the same machine
as the Web server.  This avoids the possibility that a network
bottleneck will obscure the test results.  The database system
also runs on this machine, as specified in the application scenario.<p>

Response time is not an issue under the test conditions.  We just
measure throughput.<p>

The API Web server is in these tests is Netscape 1.1.<p>


<h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>

Here are the test results:<p>

<ul>
<pre>
    FastCGI  12.0 msec per request = 83 requests per second
    API      36.6 msec per request = 27 requests per second
</pre>
</ul>

Given the big architectural advantage that the FastCGI application
enjoys over the API application, it is not surprising that the
FastCGI application runs a lot faster.  To gain a deeper
understanding of these results we measured two more conditions:<p>

<ul>
    <li>API with sustained database connections.  If you could
        afford the extra licensing cost, how much faster would
        your API application run?<p>

<pre>
    API      16.0 msec per request = 61 requests per second
</pre>

        Answer: Still not as fast as the FastCGI application.<p>

    <li>FastCGI with cache disabled.  How much benefit does the
        FastCGI application get from its cache?<p>

<pre>
    FastCGI  20.1 msec per request = 50 requests per second
</pre>

        Answer: A very substantial benefit, even though the database
        access is quite simple.<p>
</ul>

What these two extra experiments show is that if the API and FastCGI
applications are implemented in exactly the same way -- caching
database connections but not caching user profile data -- the API
application is slightly faster.  This is what you'd expect, since the
FastCGI application has to pay the cost of inter-process
communication not present in the API application.<p>

In the real world the two applications would not be implemented in the
same way.  FastCGI's architectural advantage results in much higher
performance -- a factor of 3 in this test.  With a remote database
or more expensive database access the factor would be higher.
With more substantial processing of the content files the factor
would be smaller.<p>


<h3><a name = "S6">6. Multi-threaded APIs</a></h3>


Web servers with a multi-threaded internal structure (and APIs to
match) are now starting to become more common.  These servers don't
have all of the disadvantages described in Section 3.  Does this mean
that FastCGI's performance advantages will disappear?<p>

A superficial analysis says yes.  An API-based application in a
single-process, multi-threaded server can maintain caches and database
connections the same way a FastCGI application can.  The API-based
application does not pay for inter-process communication, so the
API-based application will be slightly faster than the FastCGI
application.<p>

A deeper analysis says no.  Multi-threaded programming is complex,
because concurrency makes programs much more difficult to test and
debug.  In the case of multi-threaded programming to Web server APIs,
the normal problems with multi-threading are compounded by the lack of
isolation between different applications and between the applications
and the Web server.  With FastCGI you can write programs in the
familiar single-threaded style, get all the reliability and
maintainability of process isolation, and still get very high
performance.  If you truly need multi-threading, you can write
multi-threaded FastCGI and still isolate your multi-threaded
application from other applications and from the server.  In short,
multi-threading makes Web server APIs unusable for practially all
applications, reducing the choice to FastCGI versus CGI.  The
performance winner in that contest is obviously FastCGI.<p>


<h3><a name = "S7">7. Conclusion</a></h3>


Just how fast is FastCGI?  The answer: very fast indeed.  Not because
it has some specially-greased path through the operating system, but
because its design is well matched to the needs of most applications.
We invite you to make FastCGI the fast, open foundation for your Web
server applications.<p>


<hr>
<a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>

<address>
&#169 1995, Open Market, Inc. / mbrown@openmarket.com
</address>

</body>
</html>
</body>
</html>
Commit	Line	Data
0198fd3c	1	<html>
	2	<head><title>Understanding FastCGI Application Performance</title>
	3	</head>
	4
	5	<body bgcolor="#FFFFFF" text="#000000" link="#cc0000" alink="#000011"
	6	vlink="#555555">
	7
	8	<center>
	9	<a href="/fastcgi/words">
	10	<img border=0 src="../images/fcgi-hd.gif" alt="[[FastCGI]]"></a>
	11	</center>
	12	<br clear=all>
	13	<h3><center>Understanding FastCGI Application Performance</center></h3>
	14
	15	<!--Copyright (c) 1996 Open Market, Inc. -->
	16	<!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
	17	<!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. -->
	18
	19	<center>
	20	Mark R. Brown<br>
	21	Open Market, Inc.<br>
	22	<p>
	23
	24	10 June 1996<br>
	25	</center>
	26	<p>
	27
	28	<h5 align=center>
	29	Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge,
	30	MA 02142 U.S.A.<br>
	31	Tel: 617-621-9500 Fax: 617-621-1703 URL:
	32	<a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
	33	$Id: fcgi-perf.htm,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
	34	</h5>
	35	<hr>
	36
	37	<ul type=square>
	38	<li><a HREF = "#S1">1. Introduction</a>
	39	<li><a HREF = "#S2">2. Performance Basics</a>
	40	<li><a HREF = "#S3">3. Caching</a>
	41	<li><a HREF = "#S4">4. Database Access</a>
	42	<li><a HREF = "#S5">5. A Performance Test</a>
	43	<ul type=square>
	44	<li><a HREF = "#S5.1">5.1 Application Scenario</a>
	45	<li><a HREF = "#S5.2">5.2 Application Design</a>
	46	<li><a HREF = "#S5.3">5.3 Test Conditions</a>
	47	<li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
	48	</ul>
	49	<li><a HREF = "#S6">6. Multi-threaded APIs</a>
	50	<li><a HREF = "#S7">7. Conclusion</a>
	51	</ul>
	52	<p>
	53
	54	<hr>
	55
	56
	57	<h3><a name = "S1">1. Introduction</a></h3>
	58
	59
	60	Just how fast is FastCGI? How does the performance of a FastCGI
	61	application compare with the performance of the same
	62	application implemented using a Web server API?<p>
	63
	64	Of course, the answer is that it depends upon the application.
65	A more complete answer is that FastCGI often wins by a significant
66	margin, and seldom loses by very much.<p>
67
68	Papers on computer system performance can be laden with complex graphs
69	showing how this varies with that. Seldom do the graphs shed much
70	light on <i>why</i> one system is faster than another. Advertising copy is
71	often even less informative. An ad from one large Web server vendor
72	says that its server "executes web applications up to five times
73	faster than all other servers," but the ad gives little clue where the
74	number "five" came from.<p>
75
76	This paper is meant to convey an understanding of the primary factors
77	that influence the performance of Web server applications and to show
78	that architectural differences between FastCGI and server APIs often
79	give an "unfair" performance advantage to FastCGI applications. We
80	run a test that shows a FastCGI application running three times faster
81	than the corresponding Web server API application. Under different
82	conditions this factor might be larger or smaller. We show you what
83	you'd need to measure to figure that out for the situation you face,
84	rather than just saying "we're three times faster" and moving on.<p>
85
86	This paper makes no attempt to prove that FastCGI is better than Web
87	server APIs for every application. Web server APIs enable lightweight
88	protocol extensions, such as Open Market's SecureLink extension, to be
89	added to Web servers, as well as allowing other forms of server
90	customization. But APIs are not well matched to mainstream applications
91	such as personalized content or access to corporate databases, because
92	of API drawbacks including high complexity, low security, and
93	limited scalability. FastCGI shines when used for the vast
94	majority of Web applications.<p>
95
96
97
98	<h3><a name = "S2">2. Performance Basics</a></h3>
99
100
101	Since this paper is about performance we need to be clear on
102	what "performance" is.<p>
103
104	The standard way to measure performance in a request-response system
105	like the Web is to measure peak request throughput subject to a
106	response time constriaint. For instance, a Web server application
107	might be capable of performing 20 requests per second while responding
108	to 90% of the requests in less than 2 seconds.<p>
109
110	Response time is a thorny thing to measure on the Web because client
111	communications links to the Internet have widely varying bandwidth.
112	If the client is slow to read the server's response, response time at
113	both the client and the server will go up, and there's nothing the
114	server can do about it. For the purposes of making repeatable
115	measurements the client should have a high-bandwidth communications
116	link to the server.<p>
117
118	[Footnote: When designing a Web server application that will be
119	accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
120	channels, pay attention to the simultaneous connections bottleneck.
121	Some servers are limited by design to only 100 or 200 simultaneous
122	connections. If your application sends 50 kilobytes of data to a
123	typical client that can read 2 kilobytes per second, then a request
124	takes 25 seconds to complete. If your server is limited to 100
125	simultaneous connections, throughput is limited to just 4 requests per
126	second.]<p>
127
128	Response time is seldom an issue when load is light, but response
129	times rise quickly as the system approaches a bottleneck on some
130	limited resource. The three resources that typical systems run out of
131	are network I/O, disk I/O, and processor time. If short response time
132	is a goal, it is a good idea to stay at or below 50% load on each of
133	these resources. For instance, if your disk subsystem is capable of
134	delivering 200 I/Os per second, then try to run your application at
135	100 I/Os per second to avoid having the disk subsystem contribute to
136	slow response times. Through careful management it is possible to
137	succeed in running closer to the edge, but careful management is both
138	difficult and expensive so few systems get it.<p>
139
140	If a Web server application is local to the Web server machine, then
141	its internal design has no impact on network I/O. Application design
142	can have a big impact on usage of disk I/O and processor time.<p>
143
144
145
146	<h3><a name = "S3">3. Caching</a></h3>
147
148
149	It is a rare Web server application that doesn't run fast when all the
150	information it needs is available in its memory. And if the
151	application doesn't run fast under those conditions, the possible
152	solutions are evident: Tune the processor-hungry parts of the
153	application, install a faster processor, or change the application's
154	functional specification so it doesn't need to do so much work.<p>
155
156	The way to make information available in memory is by caching. A
157	cache is an in-memory data structure that contains information that's
158	been read from its permanent home on disk. When the application needs
159	information, it consults the cache, and uses the information if it is
160	there. Otherwise is reads the information from disk and places a copy
161	in the cache. If the cache is full, the application discards some old
162	information before adding the new. When the application needs to
163	change cached information, it changes both the cache entry and the
164	information on disk. That way, if the application crashes, no
165	information is lost; the application just runs more slowly for awhile
166	after restarting, because the cache doesn't improve performance
167	when it is empty.<p>
168
169	Caching can reduce both disk I/O and processor time, because reading
170	information from disk uses more processor time than reading it from
171	the cache. Because caching addresses both of the potential
172	bottlenecks, it is the focal point of high-performance Web server
173	application design. CGI applications couldn't perform in-memory
174	caching, because they exited after processing just one request. Web
175	server APIs promised to solve this problem. But how effective is the
176	solution?<p>
177
178	Today's most widely deployed Web server APIs are based on a
179	pool-of-processes server model. The Web server consists of a parent
180	process and a pool of child processes. Processes do not share memory.
181	An incoming request is assigned to an idle child at random. The child
182	runs the request to completion before accepting a new request. A
183	typical server has 32 child processes, a large server has 100 or 200.<p>
184
185	In-memory caching works very poorly in this server model because
186	processes do not share memory and incoming requests are assigned to
187	processes at random. For instance, to keep a frequently-used file
188	available in memory the server must keep a file copy per child, which
189	wastes memory. When the file is modified all the children need to be
190	notified, which is complex (the APIs don't provide a way to do it).<p>
191
192	FastCGI is designed to allow effective in-memory caching. Requests
193	are routed from any child process to a FastCGI application server.
194	The FastCGI application process maintains an in-memory cache.<p>
195
196	In some cases a single FastCGI application server won't
197	provide enough performance. FastCGI provides two solutions:
198	session affinity and multi-threading.<p>
199
200	With session affinity you run a pool of application processes and the
201	Web server routes requests to individual processes based on any
202	information contained in the request. For instance, the server can
203	route according to the area of content that's been requested, or
204	according to the user. The user might be identified by an
205	application-specific session identifier, by the user ID contained in
206	an Open Market Secure Link ticket, by the Basic Authentication user
207	name, or whatever. Each process maintains its own cache, and session
208	affinity ensures that each incoming request has access to the cache
209	that will speed up processing the most.<p>
210
211	With multi-threading you run an application process that is designed
212	to handle several requests at the same time. The threads handling
213	concurrent requests share process memory, so they all have access to
214	the same cache. Multi-threaded programming is complex -- concurrency
215	makes programs difficult to test and debug -- but with FastCGI you can
216	write single threaded <i>or</i> multithreaded applications.<p>
217
218
219
220	<h3><a name = "S4">4. Database Access</a></h3>
221
222
223	Many Web server applications perform database access. Existing
224	databases contain a lot of valuable information; Web server
225	applications allow companies to give wider access to the information.<p>
226
227	Access to database management systems, even within a single machine,
228	is via connection-oriented protocols. An application "logs in" to a
229	database, creating a connection, then performs one or more accesses.
230	Frequently, the cost of creating the database connection is several
231	times the cost of accessing data over an established connection.<p>
232
233	To a first approximation database connections are just another type of
234	state to be cached in memory by an application, so the discussion of
235	caching above applies to caching database connections.<p>
236
237	But database connections are special in one respect: They are often
238	the basis for database licensing. You pay the database vendor
239	according to the number of concurrent connections the database system
240	can sustain. A 100-connection license costs much more than a
241	5-connection license. It follows that caching a database connection
242	per Web server child process is not just wasteful of system's hardware
243	resources, it could break your software budget.<p>
244
245
246
247	<h3><a name = "S5">5. A Performance Test</a></h3>
248
249
250	We designed a test application to illustrate performance issues. The
251	application represents a class of applications that deliver
252	personalized content. The test application is quite a bit simpler
253	than any real application would be, but still illustrates the main
254	performance issues. We implemented the application using both FastCGI
255	and a current Web server API, and measured the performance of each.<p>
256
257	<h4><a name = "S5.1">5.1 Application Scenario</a></h4>
258
259	The application is based on a user database and a set of
260	content files. When a user requests a content file, the application
261	performs substitutions in the file using information from the
262	user database. The application then returns the modified
263	content to the user.<p>
264
265	Each request accomplishes the following:<p>
266
267	<ol>
268	<li>authentication check: The user id is used to retrieve and
269	check the password.<p>
270
271	<li>attribute retrieval: The user id is used to retrieve all
272	of the user's attribute values.<p>
273
274	<li>file retrieval and filtering: The request identifies a
275	content file. This file is read and all occurrences of variable
276	names are replaced with the user's corresponding attribute values.
277	The modified HTML is returned to the user.<p>
278	</ol>
279
280	Of course, it is fair game to perform caching to shortcut
281	any of these steps.<p>
282
283	Each user's database record (including password and attribute
284	values) is approximately 100 bytes long. Each content file is 3,000
285	bytes long. Both database and content files are stored
286	on disks attached to the server platform.<p>
287
288	A typical user makes 10 file accesses with realistic think times
289	(30-60 seconds) between accesses, then disappears for a long time.<p>
290
291
292	<h4><a name = "S5.2">5.2 Application Design</a></h4>
293
294	The FastCGI application maintains a cache of recently-accessed
295	attribute values from the database. When the cache misses
296	the application reads from the database. Because only a small
297	number of FastCGI application processes are needed, each process
298	opens a database connection on startup and keeps it open.<p>
299
300	The FastCGI application is configured as multiple application
301	processes. This is desirable in order to get concurrent application
302	processing during database reads and file reads. Requests are routed
303	to these application processes using FastCGI session affinity keyed on
304	the user id. This way all a user's requests after the first hit in
305	the application's cache.<p>
306
307	The API application does not maintain a cache; the API application has
308	no way to share the cache among its processes, so the cache hit rate
309	would be too low to make caching pay. The API application opens and
310	closes a database connection on every request; keeping database
311	connections open between requests would result in an unrealistically
312	large number of database connections open at the same time, and very
313	low utilization of each connection.<p>
314
315
316	<h4><a name = "S5.3">5.3 Test Conditions</a></h4>
317
318	The test load is generated by 10 HTTP client processes. The processes
319	represent disjoint sets of users. A process makes a request for a
320	user, then a request for a different user, and so on until it is time
321	for the first user to make another request.<p>
322
323	For simplicity the 10 client processes run on the same machine
324	as the Web server. This avoids the possibility that a network
325	bottleneck will obscure the test results. The database system
326	also runs on this machine, as specified in the application scenario.<p>
327
328	Response time is not an issue under the test conditions. We just
329	measure throughput.<p>
330
331	The API Web server is in these tests is Netscape 1.1.<p>
332
333
334	<h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>
335
336	Here are the test results:<p>
337
338	<ul>
339	<pre>
340	FastCGI 12.0 msec per request = 83 requests per second
341	API 36.6 msec per request = 27 requests per second
342	</pre>
343	</ul>
344
345	Given the big architectural advantage that the FastCGI application
346	enjoys over the API application, it is not surprising that the
347	FastCGI application runs a lot faster. To gain a deeper
348	understanding of these results we measured two more conditions:<p>
349
350	<ul>
351	<li>API with sustained database connections. If you could
352	afford the extra licensing cost, how much faster would
353	your API application run?<p>
354
355	<pre>
356	API 16.0 msec per request = 61 requests per second
357	</pre>
358
359	Answer: Still not as fast as the FastCGI application.<p>
360
361	<li>FastCGI with cache disabled. How much benefit does the
362	FastCGI application get from its cache?<p>
363
364	<pre>
365	FastCGI 20.1 msec per request = 50 requests per second
366	</pre>
367
368	Answer: A very substantial benefit, even though the database
369	access is quite simple.<p>
370	</ul>
371
372	What these two extra experiments show is that if the API and FastCGI
373	applications are implemented in exactly the same way -- caching
374	database connections but not caching user profile data -- the API
375	application is slightly faster. This is what you'd expect, since the
376	FastCGI application has to pay the cost of inter-process
377	communication not present in the API application.<p>
378
379	In the real world the two applications would not be implemented in the
380	same way. FastCGI's architectural advantage results in much higher
381	performance -- a factor of 3 in this test. With a remote database
382	or more expensive database access the factor would be higher.
383	With more substantial processing of the content files the factor
384	would be smaller.<p>
385
386
387
388	<h3><a name = "S6">6. Multi-threaded APIs</a></h3>
389
390
391	Web servers with a multi-threaded internal structure (and APIs to
392	match) are now starting to become more common. These servers don't
393	have all of the disadvantages described in Section 3. Does this mean
394	that FastCGI's performance advantages will disappear?<p>
395
396	A superficial analysis says yes. An API-based application in a
397	single-process, multi-threaded server can maintain caches and database
398	connections the same way a FastCGI application can. The API-based
399	application does not pay for inter-process communication, so the
400	API-based application will be slightly faster than the FastCGI
401	application.<p>
402
403	A deeper analysis says no. Multi-threaded programming is complex,
404	because concurrency makes programs much more difficult to test and
405	debug. In the case of multi-threaded programming to Web server APIs,
406	the normal problems with multi-threading are compounded by the lack of
407	isolation between different applications and between the applications
408	and the Web server. With FastCGI you can write programs in the
409	familiar single-threaded style, get all the reliability and
410	maintainability of process isolation, and still get very high
411	performance. If you truly need multi-threading, you can write
412	multi-threaded FastCGI and still isolate your multi-threaded
413	application from other applications and from the server. In short,
414	multi-threading makes Web server APIs unusable for practially all
415	applications, reducing the choice to FastCGI versus CGI. The
416	performance winner in that contest is obviously FastCGI.<p>
417
418
419
420	<h3><a name = "S7">7. Conclusion</a></h3>
421
422
423	Just how fast is FastCGI? The answer: very fast indeed. Not because
424	it has some specially-greased path through the operating system, but
425	because its design is well matched to the needs of most applications.
426	We invite you to make FastCGI the fast, open foundation for your Web
427	server applications.<p>
428
429
430
431	<hr>
432	<a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>
433
434	<address>
435	&#169 1995, Open Market, Inc. / mbrown@openmarket.com
436	</address>
437
438	</body>
439	</html>
440	</body>
441	</html>