1 Understanding FastCGI Application Performance
5 <center>Understanding FastCGI Application Performance</center>
7 <!--Copyright (c) 1996 Open Market, Inc. -->
8 <!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
9 <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. -->
21 Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge,
23 Tel: 617-621-9500 Fax: 617-621-1703 URL:
24 <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
25 $Id: fcgi-perf.gut,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
30 <li><a HREF = "#S1">1. Introduction</a>
31 <li><a HREF = "#S2">2. Performance Basics</a>
32 <li><a HREF = "#S3">3. Caching</a>
33 <li><a HREF = "#S4">4. Database Access</a>
34 <li><a HREF = "#S5">5. A Performance Test</a>
36 <li><a HREF = "#S5.1">5.1 Application Scenario</a>
37 <li><a HREF = "#S5.2">5.2 Application Design</a>
38 <li><a HREF = "#S5.3">5.3 Test Conditions</a>
39 <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
41 <li><a HREF = "#S6">6. Multi-threaded APIs</a>
42 <li><a HREF = "#S7">7. Conclusion</a>
49 <h3><a name = "S1">1. Introduction</a></h3>
52 Just how fast is FastCGI? How does the performance of a FastCGI
53 application compare with the performance of the same
54 application implemented using a Web server API?<p>
56 Of course, the answer is that it depends upon the application.
57 A more complete answer is that FastCGI often wins by a significant
58 margin, and seldom loses by very much.<p>
60 Papers on computer system performance can be laden with complex graphs
61 showing how this varies with that. Seldom do the graphs shed much
62 light on <i>why</i> one system is faster than another. Advertising copy is
63 often even less informative. An ad from one large Web server vendor
64 says that its server "executes web applications up to five times
65 faster than all other servers," but the ad gives little clue where the
66 number "five" came from.<p>
68 This paper is meant to convey an understanding of the primary factors
69 that influence the performance of Web server applications and to show
70 that architectural differences between FastCGI and server APIs often
71 give an "unfair" performance advantage to FastCGI applications. We
72 run a test that shows a FastCGI application running three times faster
73 than the corresponding Web server API application. Under different
74 conditions this factor might be larger or smaller. We show you what
75 you'd need to measure to figure that out for the situation you face,
76 rather than just saying "we're three times faster" and moving on.<p>
78 This paper makes no attempt to prove that FastCGI is better than Web
79 server APIs for every application. Web server APIs enable lightweight
80 protocol extensions, such as Open Market's SecureLink extension, to be
81 added to Web servers, as well as allowing other forms of server
82 customization. But APIs are not well matched to mainstream applications
83 such as personalized content or access to corporate databases, because
84 of API drawbacks including high complexity, low security, and
85 limited scalability. FastCGI shines when used for the vast
86 majority of Web applications.<p>
90 <h3><a name = "S2">2. Performance Basics</a></h3>
93 Since this paper is about performance we need to be clear on
94 what "performance" is.<p>
96 The standard way to measure performance in a request-response system
97 like the Web is to measure peak request throughput subject to a
98 response time constriaint. For instance, a Web server application
99 might be capable of performing 20 requests per second while responding
100 to 90% of the requests in less than 2 seconds.<p>
102 Response time is a thorny thing to measure on the Web because client
103 communications links to the Internet have widely varying bandwidth.
104 If the client is slow to read the server's response, response time at
105 both the client and the server will go up, and there's nothing the
106 server can do about it. For the purposes of making repeatable
107 measurements the client should have a high-bandwidth communications
108 link to the server.<p>
110 [Footnote: When designing a Web server application that will be
111 accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
112 channels, pay attention to the simultaneous connections bottleneck.
113 Some servers are limited by design to only 100 or 200 simultaneous
114 connections. If your application sends 50 kilobytes of data to a
115 typical client that can read 2 kilobytes per second, then a request
116 takes 25 seconds to complete. If your server is limited to 100
117 simultaneous connections, throughput is limited to just 4 requests per
120 Response time is seldom an issue when load is light, but response
121 times rise quickly as the system approaches a bottleneck on some
122 limited resource. The three resources that typical systems run out of
123 are network I/O, disk I/O, and processor time. If short response time
124 is a goal, it is a good idea to stay at or below 50% load on each of
125 these resources. For instance, if your disk subsystem is capable of
126 delivering 200 I/Os per second, then try to run your application at
127 100 I/Os per second to avoid having the disk subsystem contribute to
128 slow response times. Through careful management it is possible to
129 succeed in running closer to the edge, but careful management is both
130 difficult and expensive so few systems get it.<p>
132 If a Web server application is local to the Web server machine, then
133 its internal design has no impact on network I/O. Application design
134 can have a big impact on usage of disk I/O and processor time.<p>
138 <h3><a name = "S3">3. Caching</a></h3>
141 It is a rare Web server application that doesn't run fast when all the
142 information it needs is available in its memory. And if the
143 application doesn't run fast under those conditions, the possible
144 solutions are evident: Tune the processor-hungry parts of the
145 application, install a faster processor, or change the application's
146 functional specification so it doesn't need to do so much work.<p>
148 The way to make information available in memory is by caching. A
149 cache is an in-memory data structure that contains information that's
150 been read from its permanent home on disk. When the application needs
151 information, it consults the cache, and uses the information if it is
152 there. Otherwise is reads the information from disk and places a copy
153 in the cache. If the cache is full, the application discards some old
154 information before adding the new. When the application needs to
155 change cached information, it changes both the cache entry and the
156 information on disk. That way, if the application crashes, no
157 information is lost; the application just runs more slowly for awhile
158 after restarting, because the cache doesn't improve performance
161 Caching can reduce both disk I/O and processor time, because reading
162 information from disk uses more processor time than reading it from
163 the cache. Because caching addresses both of the potential
164 bottlenecks, it is the focal point of high-performance Web server
165 application design. CGI applications couldn't perform in-memory
166 caching, because they exited after processing just one request. Web
167 server APIs promised to solve this problem. But how effective is the
170 Today's most widely deployed Web server APIs are based on a
171 pool-of-processes server model. The Web server consists of a parent
172 process and a pool of child processes. Processes do not share memory.
173 An incoming request is assigned to an idle child at random. The child
174 runs the request to completion before accepting a new request. A
175 typical server has 32 child processes, a large server has 100 or 200.<p>
177 In-memory caching works very poorly in this server model because
178 processes do not share memory and incoming requests are assigned to
179 processes at random. For instance, to keep a frequently-used file
180 available in memory the server must keep a file copy per child, which
181 wastes memory. When the file is modified all the children need to be
182 notified, which is complex (the APIs don't provide a way to do it).<p>
184 FastCGI is designed to allow effective in-memory caching. Requests
185 are routed from any child process to a FastCGI application server.
186 The FastCGI application process maintains an in-memory cache.<p>
188 In some cases a single FastCGI application server won't
189 provide enough performance. FastCGI provides two solutions:
190 session affinity and multi-threading.<p>
192 With session affinity you run a pool of application processes and the
193 Web server routes requests to individual processes based on any
194 information contained in the request. For instance, the server can
195 route according to the area of content that's been requested, or
196 according to the user. The user might be identified by an
197 application-specific session identifier, by the user ID contained in
198 an Open Market Secure Link ticket, by the Basic Authentication user
199 name, or whatever. Each process maintains its own cache, and session
200 affinity ensures that each incoming request has access to the cache
201 that will speed up processing the most.<p>
203 With multi-threading you run an application process that is designed
204 to handle several requests at the same time. The threads handling
205 concurrent requests share process memory, so they all have access to
206 the same cache. Multi-threaded programming is complex -- concurrency
207 makes programs difficult to test and debug -- but with FastCGI you can
208 write single threaded <i>or</i> multithreaded applications.<p>
212 <h3><a name = "S4">4. Database Access</a></h3>
215 Many Web server applications perform database access. Existing
216 databases contain a lot of valuable information; Web server
217 applications allow companies to give wider access to the information.<p>
219 Access to database management systems, even within a single machine,
220 is via connection-oriented protocols. An application "logs in" to a
221 database, creating a connection, then performs one or more accesses.
222 Frequently, the cost of creating the database connection is several
223 times the cost of accessing data over an established connection.<p>
225 To a first approximation database connections are just another type of
226 state to be cached in memory by an application, so the discussion of
227 caching above applies to caching database connections.<p>
229 But database connections are special in one respect: They are often
230 the basis for database licensing. You pay the database vendor
231 according to the number of concurrent connections the database system
232 can sustain. A 100-connection license costs much more than a
233 5-connection license. It follows that caching a database connection
234 per Web server child process is not just wasteful of system's hardware
235 resources, it could break your software budget.<p>
239 <h3><a name = "S5">5. A Performance Test</a></h3>
242 We designed a test application to illustrate performance issues. The
243 application represents a class of applications that deliver
244 personalized content. The test application is quite a bit simpler
245 than any real application would be, but still illustrates the main
246 performance issues. We implemented the application using both FastCGI
247 and a current Web server API, and measured the performance of each.<p>
249 <h4><a name = "S5.1">5.1 Application Scenario</a></h4>
251 The application is based on a user database and a set of
252 content files. When a user requests a content file, the application
253 performs substitutions in the file using information from the
254 user database. The application then returns the modified
255 content to the user.<p>
257 Each request accomplishes the following:<p>
260 <li>authentication check: The user id is used to retrieve and
261 check the password.<p>
263 <li>attribute retrieval: The user id is used to retrieve all
264 of the user's attribute values.<p>
266 <li>file retrieval and filtering: The request identifies a
267 content file. This file is read and all occurrences of variable
268 names are replaced with the user's corresponding attribute values.
269 The modified HTML is returned to the user.<p>
272 Of course, it is fair game to perform caching to shortcut
273 any of these steps.<p>
275 Each user's database record (including password and attribute
276 values) is approximately 100 bytes long. Each content file is 3,000
277 bytes long. Both database and content files are stored
278 on disks attached to the server platform.<p>
280 A typical user makes 10 file accesses with realistic think times
281 (30-60 seconds) between accesses, then disappears for a long time.<p>
284 <h4><a name = "S5.2">5.2 Application Design</a></h4>
286 The FastCGI application maintains a cache of recently-accessed
287 attribute values from the database. When the cache misses
288 the application reads from the database. Because only a small
289 number of FastCGI application processes are needed, each process
290 opens a database connection on startup and keeps it open.<p>
292 The FastCGI application is configured as multiple application
293 processes. This is desirable in order to get concurrent application
294 processing during database reads and file reads. Requests are routed
295 to these application processes using FastCGI session affinity keyed on
296 the user id. This way all a user's requests after the first hit in
297 the application's cache.<p>
299 The API application does not maintain a cache; the API application has
300 no way to share the cache among its processes, so the cache hit rate
301 would be too low to make caching pay. The API application opens and
302 closes a database connection on every request; keeping database
303 connections open between requests would result in an unrealistically
304 large number of database connections open at the same time, and very
305 low utilization of each connection.<p>
308 <h4><a name = "S5.3">5.3 Test Conditions</a></h4>
310 The test load is generated by 10 HTTP client processes. The processes
311 represent disjoint sets of users. A process makes a request for a
312 user, then a request for a different user, and so on until it is time
313 for the first user to make another request.<p>
315 For simplicity the 10 client processes run on the same machine
316 as the Web server. This avoids the possibility that a network
317 bottleneck will obscure the test results. The database system
318 also runs on this machine, as specified in the application scenario.<p>
320 Response time is not an issue under the test conditions. We just
321 measure throughput.<p>
323 The API Web server is in these tests is Netscape 1.1.<p>
326 <h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>
328 Here are the test results:<p>
332 FastCGI 12.0 msec per request = 83 requests per second
333 API 36.6 msec per request = 27 requests per second
337 Given the big architectural advantage that the FastCGI application
338 enjoys over the API application, it is not surprising that the
339 FastCGI application runs a lot faster. To gain a deeper
340 understanding of these results we measured two more conditions:<p>
343 <li>API with sustained database connections. If you could
344 afford the extra licensing cost, how much faster would
345 your API application run?<p>
348 API 16.0 msec per request = 61 requests per second
351 Answer: Still not as fast as the FastCGI application.<p>
353 <li>FastCGI with cache disabled. How much benefit does the
354 FastCGI application get from its cache?<p>
357 FastCGI 20.1 msec per request = 50 requests per second
360 Answer: A very substantial benefit, even though the database
361 access is quite simple.<p>
364 What these two extra experiments show is that if the API and FastCGI
365 applications are implemented in exactly the same way -- caching
366 database connections but not caching user profile data -- the API
367 application is slightly faster. This is what you'd expect, since the
368 FastCGI application has to pay the cost of inter-process
369 communication not present in the API application.<p>
371 In the real world the two applications would not be implemented in the
372 same way. FastCGI's architectural advantage results in much higher
373 performance -- a factor of 3 in this test. With a remote database
374 or more expensive database access the factor would be higher.
375 With more substantial processing of the content files the factor
380 <h3><a name = "S6">6. Multi-threaded APIs</a></h3>
383 Web servers with a multi-threaded internal structure (and APIs to
384 match) are now starting to become more common. These servers don't
385 have all of the disadvantages described in Section 3. Does this mean
386 that FastCGI's performance advantages will disappear?<p>
388 A superficial analysis says yes. An API-based application in a
389 single-process, multi-threaded server can maintain caches and database
390 connections the same way a FastCGI application can. The API-based
391 application does not pay for inter-process communication, so the
392 API-based application will be slightly faster than the FastCGI
395 A deeper analysis says no. Multi-threaded programming is complex,
396 because concurrency makes programs much more difficult to test and
397 debug. In the case of multi-threaded programming to Web server APIs,
398 the normal problems with multi-threading are compounded by the lack of
399 isolation between different applications and between the applications
400 and the Web server. With FastCGI you can write programs in the
401 familiar single-threaded style, get all the reliability and
402 maintainability of process isolation, and still get very high
403 performance. If you truly need multi-threading, you can write
404 multi-threaded FastCGI and still isolate your multi-threaded
405 application from other applications and from the server. In short,
406 multi-threading makes Web server APIs unusable for practially all
407 applications, reducing the choice to FastCGI versus CGI. The
408 performance winner in that contest is obviously FastCGI.<p>
412 <h3><a name = "S7">7. Conclusion</a></h3>
415 Just how fast is FastCGI? The answer: very fast indeed. Not because
416 it has some specially-greased path through the operating system, but
417 because its design is well matched to the needs of most applications.
418 We invite you to make FastCGI the fast, open foundation for your Web
419 server applications.<p>
424 <a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>
427 © 1995, Open Market, Inc. / mbrown@openmarket.com