doc/fcgi-perf.htm

   1 <html>
   2 <head><title>Understanding FastCGI Application Performance</title>
   3 </head>
   4
   5 <body bgcolor="#FFFFFF" text="#000000" link="#cc0000" alink="#000011"
   6 vlink="#555555">
   7
   8 <center>
   9 <a href="/fastcgi/words">
  10     <img border=0 src="../images/fcgi-hd.gif" alt="[[FastCGI]]"></a>
  11 </center>
  12 <br clear=all>
  13 <h3><center>Understanding FastCGI Application Performance</center></h3>
  14
  15 <!--Copyright (c) 1996 Open Market, Inc.                                    -->
  16 <!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
  17 <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES.                   -->
  18
  19 <center>
  20 Mark R. Brown<br>
  21 Open Market, Inc.<br>
  22 <p>
  23
  24 10 June 1996<br>
  25 </center>
  26 <p>
  27
  28 <h5 align=center>
  29 Copyright &copy; 1996 Open Market, Inc.  245 First Street, Cambridge,
  30   MA 02142 U.S.A.<br>
  31 Tel: 617-621-9500 Fax: 617-621-1703 URL:
  32   <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
  33 $Id: fcgi-perf.htm,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
  34 </h5>
  35 <hr>
  36
  37 <ul type=square>
  38     <li><a HREF = "#S1">1. Introduction</a>
  39     <li><a HREF = "#S2">2. Performance Basics</a>
  40     <li><a HREF = "#S3">3. Caching</a>
  41     <li><a HREF = "#S4">4. Database Access</a>
  42     <li><a HREF = "#S5">5. A Performance Test</a>
  43     <ul type=square>
  44         <li><a HREF = "#S5.1">5.1 Application Scenario</a>
  45         <li><a HREF = "#S5.2">5.2 Application Design</a>
  46         <li><a HREF = "#S5.3">5.3 Test Conditions</a>
  47         <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
  48     </ul>
  49     <li><a HREF = "#S6">6. Multi-threaded APIs</a>
  50     <li><a HREF = "#S7">7. Conclusion</a>
  51 </ul>
  52 <p>
  53
  54 <hr>
  55
  56
  57 <h3><a name = "S1">1. Introduction</a></h3>
  58
  59
  60 Just how fast is FastCGI?  How does the performance of a FastCGI
  61 application compare with the performance of the same
  62 application implemented using a Web server API?<p>
  63
  64 Of course, the answer is that it depends upon the application.
  65 A more complete answer is that FastCGI often wins by a significant
  66 margin, and seldom loses by very much.<p>
  67
  68 Papers on computer system performance can be laden with complex graphs
  69 showing how this varies with that.  Seldom do the graphs shed much
  70 light on <i>why</i> one system is faster than another.  Advertising copy is
  71 often even less informative.  An ad from one large Web server vendor
  72 says that its server "executes web applications up to five times
  73 faster than all other servers," but the ad gives little clue where the
  74 number "five" came from.<p>
  75
  76 This paper is meant to convey an understanding of the primary factors
  77 that influence the performance of Web server applications and to show
  78 that architectural differences between FastCGI and server APIs often
  79 give an "unfair" performance advantage to FastCGI applications.  We
  80 run a test that shows a FastCGI application running three times faster
  81 than the corresponding Web server API application.  Under different
  82 conditions this factor might be larger or smaller.  We show you what
  83 you'd need to measure to figure that out for the situation you face,
  84 rather than just saying "we're three times faster" and moving on.<p>
  85
  86 This paper makes no attempt to prove that FastCGI is better than Web
  87 server APIs for every application.  Web server APIs enable lightweight
  88 protocol extensions, such as Open Market's SecureLink extension, to be
  89 added to Web servers, as well as allowing other forms of server
  90 customization.  But APIs are not well matched to mainstream applications
  91 such as personalized content or access to corporate databases, because
  92 of API drawbacks including high complexity, low security, and
  93 limited scalability.  FastCGI shines when used for the vast
  94 majority of Web applications.<p>
  95
  96
  97
  98 <h3><a name = "S2">2. Performance Basics</a></h3>
  99
 100
 101 Since this paper is about performance we need to be clear on
 102 what "performance" is.<p>
 103
 104 The standard way to measure performance in a request-response system
 105 like the Web is to measure peak request throughput subject to a
 106 response time constriaint.  For instance, a Web server application
 107 might be capable of performing 20 requests per second while responding
 108 to 90% of the requests in less than 2 seconds.<p>
 109
 110 Response time is a thorny thing to measure on the Web because client
 111 communications links to the Internet have widely varying bandwidth.
 112 If the client is slow to read the server's response, response time at
 113 both the client and the server will go up, and there's nothing the
 114 server can do about it.  For the purposes of making repeatable
 115 measurements the client should have a high-bandwidth communications
 116 link to the server.<p>
 117
 118 [Footnote: When designing a Web server application that will be
 119 accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
 120 channels, pay attention to the simultaneous connections bottleneck.
 121 Some servers are limited by design to only 100 or 200 simultaneous
 122 connections.  If your application sends 50 kilobytes of data to a
 123 typical client that can read 2 kilobytes per second, then a request
 124 takes 25 seconds to complete.  If your server is limited to 100
 125 simultaneous connections, throughput is limited to just 4 requests per
 126 second.]<p>
 127
 128 Response time is seldom an issue when load is light, but response
 129 times rise quickly as the system approaches a bottleneck on some
 130 limited resource.  The three resources that typical systems run out of
 131 are network I/O, disk I/O, and processor time.  If short response time
 132 is a goal, it is a good idea to stay at or below 50% load on each of
 133 these resources.  For instance, if your disk subsystem is capable of
 134 delivering 200 I/Os per second, then try to run your application at
 135 100 I/Os per second to avoid having the disk subsystem contribute to
 136 slow response times.  Through careful management it is possible to
 137 succeed in running closer to the edge, but careful management is both
 138 difficult and expensive so few systems get it.<p>
 139
 140 If a Web server application is local to the Web server machine, then
 141 its internal design has no impact on network I/O.  Application design
 142 can have a big impact on usage of disk I/O and processor time.<p>
 143
 144
 145
 146 <h3><a name = "S3">3. Caching</a></h3>
 147
 148
 149 It is a rare Web server application that doesn't run fast when all the
 150 information it needs is available in its memory.  And if the
 151 application doesn't run fast under those conditions, the possible
 152 solutions are evident: Tune the processor-hungry parts of the
 153 application, install a faster processor, or change the application's
 154 functional specification so it doesn't need to do so much work.<p>
 155
 156 The way to make information available in memory is by caching.  A
 157 cache is an in-memory data structure that contains information that's
 158 been read from its permanent home on disk.  When the application needs
 159 information, it consults the cache, and uses the information if it is
 160 there.  Otherwise is reads the information from disk and places a copy
 161 in the cache.  If the cache is full, the application discards some old
 162 information before adding the new.  When the application needs to
 163 change cached information, it changes both the cache entry and the
 164 information on disk.  That way, if the application crashes, no
 165 information is lost; the application just runs more slowly for awhile
 166 after restarting, because the cache doesn't improve performance
 167 when it is empty.<p>
 168
 169 Caching can reduce both disk I/O and processor time, because reading
 170 information from disk uses more processor time than reading it from
 171 the cache.  Because caching addresses both of the potential
 172 bottlenecks, it is the focal point of high-performance Web server
 173 application design.  CGI applications couldn't perform in-memory
 174 caching, because they exited after processing just one request.  Web
 175 server APIs promised to solve this problem.  But how effective is the
 176 solution?<p>
 177
 178 Today's most widely deployed Web server APIs are based on a
 179 pool-of-processes server model.  The Web server consists of a parent
 180 process and a pool of child processes.  Processes do not share memory.
 181 An incoming request is assigned to an idle child at random.  The child
 182 runs the request to completion before accepting a new request.  A
 183 typical server has 32 child processes, a large server has 100 or 200.<p>
 184
 185 In-memory caching works very poorly in this server model because
 186 processes do not share memory and incoming requests are assigned to
 187 processes at random.  For instance, to keep a frequently-used file
 188 available in memory the server must keep a file copy per child, which
 189 wastes memory.  When the file is modified all the children need to be
 190 notified, which is complex (the APIs don't provide a way to do it).<p>
 191
 192 FastCGI is designed to allow effective in-memory caching.  Requests
 193 are routed from any child process to a FastCGI application server.
 194 The FastCGI application process maintains an in-memory cache.<p>
 195
 196 In some cases a single FastCGI application server won't
 197 provide enough performance.  FastCGI provides two solutions:
 198 session affinity and multi-threading.<p>
 199
 200 With session affinity you run a pool of application processes and the
 201 Web server routes requests to individual processes based on any
 202 information contained in the request.  For instance, the server can
 203 route according to the area of content that's been requested, or
 204 according to the user.  The user might be identified by an
 205 application-specific session identifier, by the user ID contained in
 206 an Open Market Secure Link ticket, by the Basic Authentication user
 207 name, or whatever.  Each process maintains its own cache, and session
 208 affinity ensures that each incoming request has access to the cache
 209 that will speed up processing the most.<p>
 210
 211 With multi-threading you run an application process that is designed
 212 to handle several requests at the same time.  The threads handling
 213 concurrent requests share process memory, so they all have access to
 214 the same cache.  Multi-threaded programming is complex -- concurrency
 215 makes programs difficult to test and debug -- but with FastCGI you can
 216 write single threaded <i>or</i> multithreaded applications.<p>
 217
 218
 219
 220 <h3><a name = "S4">4. Database Access</a></h3>
 221
 222
 223 Many Web server applications perform database access.  Existing
 224 databases contain a lot of valuable information; Web server
 225 applications allow companies to give wider access to the information.<p>
 226
 227 Access to database management systems, even within a single machine,
 228 is via connection-oriented protocols.  An application "logs in" to a
 229 database, creating a connection, then performs one or more accesses.
 230 Frequently, the cost of creating the database connection is several
 231 times the cost of accessing data over an established connection.<p>
 232
 233 To a first approximation database connections are just another type of
 234 state to be cached in memory by an application, so the discussion of
 235 caching above applies to caching database connections.<p>
 236
 237 But database connections are special in one respect: They are often
 238 the basis for database licensing.  You pay the database vendor
 239 according to the number of concurrent connections the database system
 240 can sustain.  A 100-connection license costs much more than a
 241 5-connection license.  It follows that caching a database connection
 242 per Web server child process is not just wasteful of system's hardware
 243 resources, it could break your software budget.<p>
 244
 245
 246
 247 <h3><a name = "S5">5. A Performance Test</a></h3>
 248
 249
 250 We designed a test application to illustrate performance issues.  The
 251 application represents a class of applications that deliver
 252 personalized content.  The test application is quite a bit simpler
 253 than any real application would be, but still illustrates the main
 254 performance issues.  We implemented the application using both FastCGI
 255 and a current Web server API, and measured the performance of each.<p>
 256
 257 <h4><a name = "S5.1">5.1 Application Scenario</a></h4>
 258
 259 The application is based on a user database and a set of
 260 content files.  When a user requests a content file, the application
 261 performs substitutions in the file using information from the
 262 user database.  The application then returns the modified
 263 content to the user.<p>
 264
 265 Each request accomplishes the following:<p>
 266
 267 <ol>
 268     <li>authentication check: The user id is used to retrieve and
 269         check the password.<p>
 270
 271     <li>attribute retrieval: The user id is used to retrieve all
 272         of the user's attribute values.<p>
 273
 274     <li>file retrieval and filtering: The request identifies a
 275         content file. This file is read and all occurrences of variable
 276         names are replaced with the user's corresponding attribute values.
 277         The modified HTML is returned to the user.<p>
 278 </ol>
 279
 280 Of course, it is fair game to perform caching to shortcut
 281 any of these steps.<p>
 282
 283 Each user's database record (including password and attribute
 284 values) is approximately 100 bytes long.  Each content file is 3,000
 285 bytes long.  Both database and content files are stored
 286 on disks attached to the server platform.<p>
 287
 288 A typical user makes 10 file accesses with realistic think times
 289 (30-60 seconds) between accesses, then disappears for a long time.<p>
 290
 291
 292 <h4><a name = "S5.2">5.2 Application Design</a></h4>
 293
 294 The FastCGI application maintains a cache of recently-accessed
 295 attribute values from the database.  When the cache misses
 296 the application reads from the database.  Because only a small
 297 number of FastCGI application processes are needed, each process
 298 opens a database connection on startup and keeps it open.<p>
 299
 300 The FastCGI application is configured as multiple application
 301 processes.  This is desirable in order to get concurrent application
 302 processing during database reads and file reads.  Requests are routed
 303 to these application processes using FastCGI session affinity keyed on
 304 the user id.  This way all a user's requests after the first hit in
 305 the application's cache.<p>
 306
 307 The API application does not maintain a cache; the API application has
 308 no way to share the cache among its processes, so the cache hit rate
 309 would be too low to make caching pay.  The API application opens and
 310 closes a database connection on every request; keeping database
 311 connections open between requests would result in an unrealistically
 312 large number of database connections open at the same time, and very
 313 low utilization of each connection.<p>
 314
 315
 316 <h4><a name = "S5.3">5.3 Test Conditions</a></h4>
 317
 318 The test load is generated by 10 HTTP client processes.  The processes
 319 represent disjoint sets of users.  A process makes a request for a
 320 user, then a request for a different user, and so on until it is time
 321 for the first user to make another request.<p>
 322
 323 For simplicity the 10 client processes run on the same machine
 324 as the Web server.  This avoids the possibility that a network
 325 bottleneck will obscure the test results.  The database system
 326 also runs on this machine, as specified in the application scenario.<p>
 327
 328 Response time is not an issue under the test conditions.  We just
 329 measure throughput.<p>
 330
 331 The API Web server is in these tests is Netscape 1.1.<p>
 332
 333
 334 <h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>
 335
 336 Here are the test results:<p>
 337
 338 <ul>
 339 <pre>
 340     FastCGI  12.0 msec per request = 83 requests per second
 341     API      36.6 msec per request = 27 requests per second
 342 </pre>
 343 </ul>
 344
 345 Given the big architectural advantage that the FastCGI application
 346 enjoys over the API application, it is not surprising that the
 347 FastCGI application runs a lot faster.  To gain a deeper
 348 understanding of these results we measured two more conditions:<p>
 349
 350 <ul>
 351     <li>API with sustained database connections.  If you could
 352         afford the extra licensing cost, how much faster would
 353         your API application run?<p>
 354
 355 <pre>
 356     API      16.0 msec per request = 61 requests per second
 357 </pre>
 358
 359         Answer: Still not as fast as the FastCGI application.<p>
 360
 361     <li>FastCGI with cache disabled.  How much benefit does the
 362         FastCGI application get from its cache?<p>
 363
 364 <pre>
 365     FastCGI  20.1 msec per request = 50 requests per second
 366 </pre>
 367
 368         Answer: A very substantial benefit, even though the database
 369         access is quite simple.<p>
 370 </ul>
 371
 372 What these two extra experiments show is that if the API and FastCGI
 373 applications are implemented in exactly the same way -- caching
 374 database connections but not caching user profile data -- the API
 375 application is slightly faster.  This is what you'd expect, since the
 376 FastCGI application has to pay the cost of inter-process
 377 communication not present in the API application.<p>
 378
 379 In the real world the two applications would not be implemented in the
 380 same way.  FastCGI's architectural advantage results in much higher
 381 performance -- a factor of 3 in this test.  With a remote database
 382 or more expensive database access the factor would be higher.
 383 With more substantial processing of the content files the factor
 384 would be smaller.<p>
 385
 386
 387
 388 <h3><a name = "S6">6. Multi-threaded APIs</a></h3>
 389
 390
 391 Web servers with a multi-threaded internal structure (and APIs to
 392 match) are now starting to become more common.  These servers don't
 393 have all of the disadvantages described in Section 3.  Does this mean
 394 that FastCGI's performance advantages will disappear?<p>
 395
 396 A superficial analysis says yes.  An API-based application in a
 397 single-process, multi-threaded server can maintain caches and database
 398 connections the same way a FastCGI application can.  The API-based
 399 application does not pay for inter-process communication, so the
 400 API-based application will be slightly faster than the FastCGI
 401 application.<p>
 402
 403 A deeper analysis says no.  Multi-threaded programming is complex,
 404 because concurrency makes programs much more difficult to test and
 405 debug.  In the case of multi-threaded programming to Web server APIs,
 406 the normal problems with multi-threading are compounded by the lack of
 407 isolation between different applications and between the applications
 408 and the Web server.  With FastCGI you can write programs in the
 409 familiar single-threaded style, get all the reliability and
 410 maintainability of process isolation, and still get very high
 411 performance.  If you truly need multi-threading, you can write
 412 multi-threaded FastCGI and still isolate your multi-threaded
 413 application from other applications and from the server.  In short,
 414 multi-threading makes Web server APIs unusable for practially all
 415 applications, reducing the choice to FastCGI versus CGI.  The
 416 performance winner in that contest is obviously FastCGI.<p>
 417
 418
 419
 420 <h3><a name = "S7">7. Conclusion</a></h3>
 421
 422
 423 Just how fast is FastCGI?  The answer: very fast indeed.  Not because
 424 it has some specially-greased path through the operating system, but
 425 because its design is well matched to the needs of most applications.
 426 We invite you to make FastCGI the fast, open foundation for your Web
 427 server applications.<p>
 428
 429
 430
 431 <hr>
 432 <a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>
 433
 434 <address>
 435 &#169 1995, Open Market, Inc. / mbrown@openmarket.com
 436 </address>
 437
 438 </body>
 439 </html>
 440 </body>
 441 </html>