don't need this - it doesn't work
[catagits/fcgi2.git] / doc / fcgi-perf.htm
CommitLineData
0198fd3c 1<html>
2<head><title>Understanding FastCGI Application Performance</title>
3</head>
4
5<body bgcolor="#FFFFFF" text="#000000" link="#cc0000" alink="#000011"
6vlink="#555555">
7
8<center>
9<a href="/fastcgi/words">
10 <img border=0 src="../images/fcgi-hd.gif" alt="[[FastCGI]]"></a>
11</center>
12<br clear=all>
13<h3><center>Understanding FastCGI Application Performance</center></h3>
14
15<!--Copyright (c) 1996 Open Market, Inc. -->
16<!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
17<!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. -->
18
19<center>
20Mark R. Brown<br>
21Open Market, Inc.<br>
22<p>
23
2410 June 1996<br>
25</center>
26<p>
27
28<h5 align=center>
29Copyright &copy; 1996 Open Market, Inc. 245 First Street, Cambridge,
30 MA 02142 U.S.A.<br>
31Tel: 617-621-9500 Fax: 617-621-1703 URL:
32 <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
33$Id: fcgi-perf.htm,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
34</h5>
35<hr>
36
37<ul type=square>
38 <li><a HREF = "#S1">1. Introduction</a>
39 <li><a HREF = "#S2">2. Performance Basics</a>
40 <li><a HREF = "#S3">3. Caching</a>
41 <li><a HREF = "#S4">4. Database Access</a>
42 <li><a HREF = "#S5">5. A Performance Test</a>
43 <ul type=square>
44 <li><a HREF = "#S5.1">5.1 Application Scenario</a>
45 <li><a HREF = "#S5.2">5.2 Application Design</a>
46 <li><a HREF = "#S5.3">5.3 Test Conditions</a>
47 <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
48 </ul>
49 <li><a HREF = "#S6">6. Multi-threaded APIs</a>
50 <li><a HREF = "#S7">7. Conclusion</a>
51</ul>
52<p>
53
54<hr>
55
56
57<h3><a name = "S1">1. Introduction</a></h3>
58
59
60Just how fast is FastCGI? How does the performance of a FastCGI
61application compare with the performance of the same
62application implemented using a Web server API?<p>
63
64Of course, the answer is that it depends upon the application.
65A more complete answer is that FastCGI often wins by a significant
66margin, and seldom loses by very much.<p>
67
68Papers on computer system performance can be laden with complex graphs
69showing how this varies with that. Seldom do the graphs shed much
70light on <i>why</i> one system is faster than another. Advertising copy is
71often even less informative. An ad from one large Web server vendor
72says that its server "executes web applications up to five times
73faster than all other servers," but the ad gives little clue where the
74number "five" came from.<p>
75
76This paper is meant to convey an understanding of the primary factors
77that influence the performance of Web server applications and to show
78that architectural differences between FastCGI and server APIs often
79give an "unfair" performance advantage to FastCGI applications. We
80run a test that shows a FastCGI application running three times faster
81than the corresponding Web server API application. Under different
82conditions this factor might be larger or smaller. We show you what
83you'd need to measure to figure that out for the situation you face,
84rather than just saying "we're three times faster" and moving on.<p>
85
86This paper makes no attempt to prove that FastCGI is better than Web
87server APIs for every application. Web server APIs enable lightweight
88protocol extensions, such as Open Market's SecureLink extension, to be
89added to Web servers, as well as allowing other forms of server
90customization. But APIs are not well matched to mainstream applications
91such as personalized content or access to corporate databases, because
92of API drawbacks including high complexity, low security, and
93limited scalability. FastCGI shines when used for the vast
94majority of Web applications.<p>
95
96
97
98<h3><a name = "S2">2. Performance Basics</a></h3>
99
100
101Since this paper is about performance we need to be clear on
102what "performance" is.<p>
103
104The standard way to measure performance in a request-response system
105like the Web is to measure peak request throughput subject to a
106response time constriaint. For instance, a Web server application
107might be capable of performing 20 requests per second while responding
108to 90% of the requests in less than 2 seconds.<p>
109
110Response time is a thorny thing to measure on the Web because client
111communications links to the Internet have widely varying bandwidth.
112If the client is slow to read the server's response, response time at
113both the client and the server will go up, and there's nothing the
114server can do about it. For the purposes of making repeatable
115measurements the client should have a high-bandwidth communications
116link to the server.<p>
117
118[Footnote: When designing a Web server application that will be
119accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
120channels, pay attention to the simultaneous connections bottleneck.
121Some servers are limited by design to only 100 or 200 simultaneous
122connections. If your application sends 50 kilobytes of data to a
123typical client that can read 2 kilobytes per second, then a request
124takes 25 seconds to complete. If your server is limited to 100
125simultaneous connections, throughput is limited to just 4 requests per
126second.]<p>
127
128Response time is seldom an issue when load is light, but response
129times rise quickly as the system approaches a bottleneck on some
130limited resource. The three resources that typical systems run out of
131are network I/O, disk I/O, and processor time. If short response time
132is a goal, it is a good idea to stay at or below 50% load on each of
133these resources. For instance, if your disk subsystem is capable of
134delivering 200 I/Os per second, then try to run your application at
135100 I/Os per second to avoid having the disk subsystem contribute to
136slow response times. Through careful management it is possible to
137succeed in running closer to the edge, but careful management is both
138difficult and expensive so few systems get it.<p>
139
140If a Web server application is local to the Web server machine, then
141its internal design has no impact on network I/O. Application design
142can have a big impact on usage of disk I/O and processor time.<p>
143
144
145
146<h3><a name = "S3">3. Caching</a></h3>
147
148
149It is a rare Web server application that doesn't run fast when all the
150information it needs is available in its memory. And if the
151application doesn't run fast under those conditions, the possible
152solutions are evident: Tune the processor-hungry parts of the
153application, install a faster processor, or change the application's
154functional specification so it doesn't need to do so much work.<p>
155
156The way to make information available in memory is by caching. A
157cache is an in-memory data structure that contains information that's
158been read from its permanent home on disk. When the application needs
159information, it consults the cache, and uses the information if it is
160there. Otherwise is reads the information from disk and places a copy
161in the cache. If the cache is full, the application discards some old
162information before adding the new. When the application needs to
163change cached information, it changes both the cache entry and the
164information on disk. That way, if the application crashes, no
165information is lost; the application just runs more slowly for awhile
166after restarting, because the cache doesn't improve performance
167when it is empty.<p>
168
169Caching can reduce both disk I/O and processor time, because reading
170information from disk uses more processor time than reading it from
171the cache. Because caching addresses both of the potential
172bottlenecks, it is the focal point of high-performance Web server
173application design. CGI applications couldn't perform in-memory
174caching, because they exited after processing just one request. Web
175server APIs promised to solve this problem. But how effective is the
176solution?<p>
177
178Today's most widely deployed Web server APIs are based on a
179pool-of-processes server model. The Web server consists of a parent
180process and a pool of child processes. Processes do not share memory.
181An incoming request is assigned to an idle child at random. The child
182runs the request to completion before accepting a new request. A
183typical server has 32 child processes, a large server has 100 or 200.<p>
184
185In-memory caching works very poorly in this server model because
186processes do not share memory and incoming requests are assigned to
187processes at random. For instance, to keep a frequently-used file
188available in memory the server must keep a file copy per child, which
189wastes memory. When the file is modified all the children need to be
190notified, which is complex (the APIs don't provide a way to do it).<p>
191
192FastCGI is designed to allow effective in-memory caching. Requests
193are routed from any child process to a FastCGI application server.
194The FastCGI application process maintains an in-memory cache.<p>
195
196In some cases a single FastCGI application server won't
197provide enough performance. FastCGI provides two solutions:
198session affinity and multi-threading.<p>
199
200With session affinity you run a pool of application processes and the
201Web server routes requests to individual processes based on any
202information contained in the request. For instance, the server can
203route according to the area of content that's been requested, or
204according to the user. The user might be identified by an
205application-specific session identifier, by the user ID contained in
206an Open Market Secure Link ticket, by the Basic Authentication user
207name, or whatever. Each process maintains its own cache, and session
208affinity ensures that each incoming request has access to the cache
209that will speed up processing the most.<p>
210
211With multi-threading you run an application process that is designed
212to handle several requests at the same time. The threads handling
213concurrent requests share process memory, so they all have access to
214the same cache. Multi-threaded programming is complex -- concurrency
215makes programs difficult to test and debug -- but with FastCGI you can
216write single threaded <i>or</i> multithreaded applications.<p>
217
218
219
220<h3><a name = "S4">4. Database Access</a></h3>
221
222
223Many Web server applications perform database access. Existing
224databases contain a lot of valuable information; Web server
225applications allow companies to give wider access to the information.<p>
226
227Access to database management systems, even within a single machine,
228is via connection-oriented protocols. An application "logs in" to a
229database, creating a connection, then performs one or more accesses.
230Frequently, the cost of creating the database connection is several
231times the cost of accessing data over an established connection.<p>
232
233To a first approximation database connections are just another type of
234state to be cached in memory by an application, so the discussion of
235caching above applies to caching database connections.<p>
236
237But database connections are special in one respect: They are often
238the basis for database licensing. You pay the database vendor
239according to the number of concurrent connections the database system
240can sustain. A 100-connection license costs much more than a
2415-connection license. It follows that caching a database connection
242per Web server child process is not just wasteful of system's hardware
243resources, it could break your software budget.<p>
244
245
246
247<h3><a name = "S5">5. A Performance Test</a></h3>
248
249
250We designed a test application to illustrate performance issues. The
251application represents a class of applications that deliver
252personalized content. The test application is quite a bit simpler
253than any real application would be, but still illustrates the main
254performance issues. We implemented the application using both FastCGI
255and a current Web server API, and measured the performance of each.<p>
256
257<h4><a name = "S5.1">5.1 Application Scenario</a></h4>
258
259The application is based on a user database and a set of
260content files. When a user requests a content file, the application
261performs substitutions in the file using information from the
262user database. The application then returns the modified
263content to the user.<p>
264
265Each request accomplishes the following:<p>
266
267<ol>
268 <li>authentication check: The user id is used to retrieve and
269 check the password.<p>
270
271 <li>attribute retrieval: The user id is used to retrieve all
272 of the user's attribute values.<p>
273
274 <li>file retrieval and filtering: The request identifies a
275 content file. This file is read and all occurrences of variable
276 names are replaced with the user's corresponding attribute values.
277 The modified HTML is returned to the user.<p>
278</ol>
279
280Of course, it is fair game to perform caching to shortcut
281any of these steps.<p>
282
283Each user's database record (including password and attribute
284values) is approximately 100 bytes long. Each content file is 3,000
285bytes long. Both database and content files are stored
286on disks attached to the server platform.<p>
287
288A typical user makes 10 file accesses with realistic think times
289(30-60 seconds) between accesses, then disappears for a long time.<p>
290
291
292<h4><a name = "S5.2">5.2 Application Design</a></h4>
293
294The FastCGI application maintains a cache of recently-accessed
295attribute values from the database. When the cache misses
296the application reads from the database. Because only a small
297number of FastCGI application processes are needed, each process
298opens a database connection on startup and keeps it open.<p>
299
300The FastCGI application is configured as multiple application
301processes. This is desirable in order to get concurrent application
302processing during database reads and file reads. Requests are routed
303to these application processes using FastCGI session affinity keyed on
304the user id. This way all a user's requests after the first hit in
305the application's cache.<p>
306
307The API application does not maintain a cache; the API application has
308no way to share the cache among its processes, so the cache hit rate
309would be too low to make caching pay. The API application opens and
310closes a database connection on every request; keeping database
311connections open between requests would result in an unrealistically
312large number of database connections open at the same time, and very
313low utilization of each connection.<p>
314
315
316<h4><a name = "S5.3">5.3 Test Conditions</a></h4>
317
318The test load is generated by 10 HTTP client processes. The processes
319represent disjoint sets of users. A process makes a request for a
320user, then a request for a different user, and so on until it is time
321for the first user to make another request.<p>
322
323For simplicity the 10 client processes run on the same machine
324as the Web server. This avoids the possibility that a network
325bottleneck will obscure the test results. The database system
326also runs on this machine, as specified in the application scenario.<p>
327
328Response time is not an issue under the test conditions. We just
329measure throughput.<p>
330
331The API Web server is in these tests is Netscape 1.1.<p>
332
333
334<h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>
335
336Here are the test results:<p>
337
338<ul>
339<pre>
340 FastCGI 12.0 msec per request = 83 requests per second
341 API 36.6 msec per request = 27 requests per second
342</pre>
343</ul>
344
345Given the big architectural advantage that the FastCGI application
346enjoys over the API application, it is not surprising that the
347FastCGI application runs a lot faster. To gain a deeper
348understanding of these results we measured two more conditions:<p>
349
350<ul>
351 <li>API with sustained database connections. If you could
352 afford the extra licensing cost, how much faster would
353 your API application run?<p>
354
355<pre>
356 API 16.0 msec per request = 61 requests per second
357</pre>
358
359 Answer: Still not as fast as the FastCGI application.<p>
360
361 <li>FastCGI with cache disabled. How much benefit does the
362 FastCGI application get from its cache?<p>
363
364<pre>
365 FastCGI 20.1 msec per request = 50 requests per second
366</pre>
367
368 Answer: A very substantial benefit, even though the database
369 access is quite simple.<p>
370</ul>
371
372What these two extra experiments show is that if the API and FastCGI
373applications are implemented in exactly the same way -- caching
374database connections but not caching user profile data -- the API
375application is slightly faster. This is what you'd expect, since the
376FastCGI application has to pay the cost of inter-process
377communication not present in the API application.<p>
378
379In the real world the two applications would not be implemented in the
380same way. FastCGI's architectural advantage results in much higher
381performance -- a factor of 3 in this test. With a remote database
382or more expensive database access the factor would be higher.
383With more substantial processing of the content files the factor
384would be smaller.<p>
385
386
387
388<h3><a name = "S6">6. Multi-threaded APIs</a></h3>
389
390
391Web servers with a multi-threaded internal structure (and APIs to
392match) are now starting to become more common. These servers don't
393have all of the disadvantages described in Section 3. Does this mean
394that FastCGI's performance advantages will disappear?<p>
395
396A superficial analysis says yes. An API-based application in a
397single-process, multi-threaded server can maintain caches and database
398connections the same way a FastCGI application can. The API-based
399application does not pay for inter-process communication, so the
400API-based application will be slightly faster than the FastCGI
401application.<p>
402
403A deeper analysis says no. Multi-threaded programming is complex,
404because concurrency makes programs much more difficult to test and
405debug. In the case of multi-threaded programming to Web server APIs,
406the normal problems with multi-threading are compounded by the lack of
407isolation between different applications and between the applications
408and the Web server. With FastCGI you can write programs in the
409familiar single-threaded style, get all the reliability and
410maintainability of process isolation, and still get very high
411performance. If you truly need multi-threading, you can write
412multi-threaded FastCGI and still isolate your multi-threaded
413application from other applications and from the server. In short,
414multi-threading makes Web server APIs unusable for practially all
415applications, reducing the choice to FastCGI versus CGI. The
416performance winner in that contest is obviously FastCGI.<p>
417
418
419
420<h3><a name = "S7">7. Conclusion</a></h3>
421
422
423Just how fast is FastCGI? The answer: very fast indeed. Not because
424it has some specially-greased path through the operating system, but
425because its design is well matched to the needs of most applications.
426We invite you to make FastCGI the fast, open foundation for your Web
427server applications.<p>
428
429
430
431<hr>
432<a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>
433
434<address>
435&#169 1995, Open Market, Inc. / mbrown@openmarket.com
436</address>
437
438</body>
439</html>
440</body>
441</html>