doc/fcgi-perf.gut

   1 Understanding FastCGI Application Performance
   2 /fastcgi/words
   3 fcgi-hd.gif
   4 [FastCGI]
   5 <center>Understanding FastCGI Application Performance</center>
   6
   7 <!--Copyright (c) 1996 Open Market, Inc.                                    -->
   8 <!--See the file "LICENSE.TERMS" for information on usage and redistribution-->
   9 <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES.                   -->
  10
  11 <center>
  12 Mark R. Brown<br>
  13 Open Market, Inc.<br>
  14 <p>
  15
  16 10 June 1996<br>
  17 </center>
  18 <p>
  19
  20 <h5 align=center>
  21 Copyright &copy; 1996 Open Market, Inc.  245 First Street, Cambridge,
  22   MA 02142 U.S.A.<br>
  23 Tel: 617-621-9500 Fax: 617-621-1703 URL:
  24   <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br>
  25 $Id: fcgi-perf.gut,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br>
  26 </h5>
  27 <hr>
  28
  29 <ul type=square>
  30     <li><a HREF = "#S1">1. Introduction</a>
  31     <li><a HREF = "#S2">2. Performance Basics</a>
  32     <li><a HREF = "#S3">3. Caching</a>
  33     <li><a HREF = "#S4">4. Database Access</a>
  34     <li><a HREF = "#S5">5. A Performance Test</a>
  35     <ul type=square>
  36         <li><a HREF = "#S5.1">5.1 Application Scenario</a>
  37         <li><a HREF = "#S5.2">5.2 Application Design</a>
  38         <li><a HREF = "#S5.3">5.3 Test Conditions</a>
  39         <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a>
  40     </ul>
  41     <li><a HREF = "#S6">6. Multi-threaded APIs</a>
  42     <li><a HREF = "#S7">7. Conclusion</a>
  43 </ul>
  44 <p>
  45
  46 <hr>
  47
  48
  49 <h3><a name = "S1">1. Introduction</a></h3>
  50
  51
  52 Just how fast is FastCGI?  How does the performance of a FastCGI
  53 application compare with the performance of the same
  54 application implemented using a Web server API?<p>
  55
  56 Of course, the answer is that it depends upon the application.
  57 A more complete answer is that FastCGI often wins by a significant
  58 margin, and seldom loses by very much.<p>
  59
  60 Papers on computer system performance can be laden with complex graphs
  61 showing how this varies with that.  Seldom do the graphs shed much
  62 light on <i>why</i> one system is faster than another.  Advertising copy is
  63 often even less informative.  An ad from one large Web server vendor
  64 says that its server "executes web applications up to five times
  65 faster than all other servers," but the ad gives little clue where the
  66 number "five" came from.<p>
  67
  68 This paper is meant to convey an understanding of the primary factors
  69 that influence the performance of Web server applications and to show
  70 that architectural differences between FastCGI and server APIs often
  71 give an "unfair" performance advantage to FastCGI applications.  We
  72 run a test that shows a FastCGI application running three times faster
  73 than the corresponding Web server API application.  Under different
  74 conditions this factor might be larger or smaller.  We show you what
  75 you'd need to measure to figure that out for the situation you face,
  76 rather than just saying "we're three times faster" and moving on.<p>
  77
  78 This paper makes no attempt to prove that FastCGI is better than Web
  79 server APIs for every application.  Web server APIs enable lightweight
  80 protocol extensions, such as Open Market's SecureLink extension, to be
  81 added to Web servers, as well as allowing other forms of server
  82 customization.  But APIs are not well matched to mainstream applications
  83 such as personalized content or access to corporate databases, because
  84 of API drawbacks including high complexity, low security, and
  85 limited scalability.  FastCGI shines when used for the vast
  86 majority of Web applications.<p>
  87
  88
  89
  90 <h3><a name = "S2">2. Performance Basics</a></h3>
  91
  92
  93 Since this paper is about performance we need to be clear on
  94 what "performance" is.<p>
  95
  96 The standard way to measure performance in a request-response system
  97 like the Web is to measure peak request throughput subject to a
  98 response time constriaint.  For instance, a Web server application
  99 might be capable of performing 20 requests per second while responding
 100 to 90% of the requests in less than 2 seconds.<p>
 101
 102 Response time is a thorny thing to measure on the Web because client
 103 communications links to the Internet have widely varying bandwidth.
 104 If the client is slow to read the server's response, response time at
 105 both the client and the server will go up, and there's nothing the
 106 server can do about it.  For the purposes of making repeatable
 107 measurements the client should have a high-bandwidth communications
 108 link to the server.<p>
 109
 110 [Footnote: When designing a Web server application that will be
 111 accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem)
 112 channels, pay attention to the simultaneous connections bottleneck.
 113 Some servers are limited by design to only 100 or 200 simultaneous
 114 connections.  If your application sends 50 kilobytes of data to a
 115 typical client that can read 2 kilobytes per second, then a request
 116 takes 25 seconds to complete.  If your server is limited to 100
 117 simultaneous connections, throughput is limited to just 4 requests per
 118 second.]<p>
 119
 120 Response time is seldom an issue when load is light, but response
 121 times rise quickly as the system approaches a bottleneck on some
 122 limited resource.  The three resources that typical systems run out of
 123 are network I/O, disk I/O, and processor time.  If short response time
 124 is a goal, it is a good idea to stay at or below 50% load on each of
 125 these resources.  For instance, if your disk subsystem is capable of
 126 delivering 200 I/Os per second, then try to run your application at
 127 100 I/Os per second to avoid having the disk subsystem contribute to
 128 slow response times.  Through careful management it is possible to
 129 succeed in running closer to the edge, but careful management is both
 130 difficult and expensive so few systems get it.<p>
 131
 132 If a Web server application is local to the Web server machine, then
 133 its internal design has no impact on network I/O.  Application design
 134 can have a big impact on usage of disk I/O and processor time.<p>
 135
 136
 137
 138 <h3><a name = "S3">3. Caching</a></h3>
 139
 140
 141 It is a rare Web server application that doesn't run fast when all the
 142 information it needs is available in its memory.  And if the
 143 application doesn't run fast under those conditions, the possible
 144 solutions are evident: Tune the processor-hungry parts of the
 145 application, install a faster processor, or change the application's
 146 functional specification so it doesn't need to do so much work.<p>
 147
 148 The way to make information available in memory is by caching.  A
 149 cache is an in-memory data structure that contains information that's
 150 been read from its permanent home on disk.  When the application needs
 151 information, it consults the cache, and uses the information if it is
 152 there.  Otherwise is reads the information from disk and places a copy
 153 in the cache.  If the cache is full, the application discards some old
 154 information before adding the new.  When the application needs to
 155 change cached information, it changes both the cache entry and the
 156 information on disk.  That way, if the application crashes, no
 157 information is lost; the application just runs more slowly for awhile
 158 after restarting, because the cache doesn't improve performance
 159 when it is empty.<p>
 160
 161 Caching can reduce both disk I/O and processor time, because reading
 162 information from disk uses more processor time than reading it from
 163 the cache.  Because caching addresses both of the potential
 164 bottlenecks, it is the focal point of high-performance Web server
 165 application design.  CGI applications couldn't perform in-memory
 166 caching, because they exited after processing just one request.  Web
 167 server APIs promised to solve this problem.  But how effective is the
 168 solution?<p>
 169
 170 Today's most widely deployed Web server APIs are based on a
 171 pool-of-processes server model.  The Web server consists of a parent
 172 process and a pool of child processes.  Processes do not share memory.
 173 An incoming request is assigned to an idle child at random.  The child
 174 runs the request to completion before accepting a new request.  A
 175 typical server has 32 child processes, a large server has 100 or 200.<p>
 176
 177 In-memory caching works very poorly in this server model because
 178 processes do not share memory and incoming requests are assigned to
 179 processes at random.  For instance, to keep a frequently-used file
 180 available in memory the server must keep a file copy per child, which
 181 wastes memory.  When the file is modified all the children need to be
 182 notified, which is complex (the APIs don't provide a way to do it).<p>
 183
 184 FastCGI is designed to allow effective in-memory caching.  Requests
 185 are routed from any child process to a FastCGI application server.
 186 The FastCGI application process maintains an in-memory cache.<p>
 187
 188 In some cases a single FastCGI application server won't
 189 provide enough performance.  FastCGI provides two solutions:
 190 session affinity and multi-threading.<p>
 191
 192 With session affinity you run a pool of application processes and the
 193 Web server routes requests to individual processes based on any
 194 information contained in the request.  For instance, the server can
 195 route according to the area of content that's been requested, or
 196 according to the user.  The user might be identified by an
 197 application-specific session identifier, by the user ID contained in
 198 an Open Market Secure Link ticket, by the Basic Authentication user
 199 name, or whatever.  Each process maintains its own cache, and session
 200 affinity ensures that each incoming request has access to the cache
 201 that will speed up processing the most.<p>
 202
 203 With multi-threading you run an application process that is designed
 204 to handle several requests at the same time.  The threads handling
 205 concurrent requests share process memory, so they all have access to
 206 the same cache.  Multi-threaded programming is complex -- concurrency
 207 makes programs difficult to test and debug -- but with FastCGI you can
 208 write single threaded <i>or</i> multithreaded applications.<p>
 209
 210
 211
 212 <h3><a name = "S4">4. Database Access</a></h3>
 213
 214
 215 Many Web server applications perform database access.  Existing
 216 databases contain a lot of valuable information; Web server
 217 applications allow companies to give wider access to the information.<p>
 218
 219 Access to database management systems, even within a single machine,
 220 is via connection-oriented protocols.  An application "logs in" to a
 221 database, creating a connection, then performs one or more accesses.
 222 Frequently, the cost of creating the database connection is several
 223 times the cost of accessing data over an established connection.<p>
 224
 225 To a first approximation database connections are just another type of
 226 state to be cached in memory by an application, so the discussion of
 227 caching above applies to caching database connections.<p>
 228
 229 But database connections are special in one respect: They are often
 230 the basis for database licensing.  You pay the database vendor
 231 according to the number of concurrent connections the database system
 232 can sustain.  A 100-connection license costs much more than a
 233 5-connection license.  It follows that caching a database connection
 234 per Web server child process is not just wasteful of system's hardware
 235 resources, it could break your software budget.<p>
 236
 237
 238
 239 <h3><a name = "S5">5. A Performance Test</a></h3>
 240
 241
 242 We designed a test application to illustrate performance issues.  The
 243 application represents a class of applications that deliver
 244 personalized content.  The test application is quite a bit simpler
 245 than any real application would be, but still illustrates the main
 246 performance issues.  We implemented the application using both FastCGI
 247 and a current Web server API, and measured the performance of each.<p>
 248
 249 <h4><a name = "S5.1">5.1 Application Scenario</a></h4>
 250
 251 The application is based on a user database and a set of
 252 content files.  When a user requests a content file, the application
 253 performs substitutions in the file using information from the
 254 user database.  The application then returns the modified
 255 content to the user.<p>
 256
 257 Each request accomplishes the following:<p>
 258
 259 <ol>
 260     <li>authentication check: The user id is used to retrieve and
 261         check the password.<p>
 262
 263     <li>attribute retrieval: The user id is used to retrieve all
 264         of the user's attribute values.<p>
 265
 266     <li>file retrieval and filtering: The request identifies a
 267         content file. This file is read and all occurrences of variable
 268         names are replaced with the user's corresponding attribute values.
 269         The modified HTML is returned to the user.<p>
 270 </ol>
 271
 272 Of course, it is fair game to perform caching to shortcut
 273 any of these steps.<p>
 274
 275 Each user's database record (including password and attribute
 276 values) is approximately 100 bytes long.  Each content file is 3,000
 277 bytes long.  Both database and content files are stored
 278 on disks attached to the server platform.<p>
 279
 280 A typical user makes 10 file accesses with realistic think times
 281 (30-60 seconds) between accesses, then disappears for a long time.<p>
 282
 283
 284 <h4><a name = "S5.2">5.2 Application Design</a></h4>
 285
 286 The FastCGI application maintains a cache of recently-accessed
 287 attribute values from the database.  When the cache misses
 288 the application reads from the database.  Because only a small
 289 number of FastCGI application processes are needed, each process
 290 opens a database connection on startup and keeps it open.<p>
 291
 292 The FastCGI application is configured as multiple application
 293 processes.  This is desirable in order to get concurrent application
 294 processing during database reads and file reads.  Requests are routed
 295 to these application processes using FastCGI session affinity keyed on
 296 the user id.  This way all a user's requests after the first hit in
 297 the application's cache.<p>
 298
 299 The API application does not maintain a cache; the API application has
 300 no way to share the cache among its processes, so the cache hit rate
 301 would be too low to make caching pay.  The API application opens and
 302 closes a database connection on every request; keeping database
 303 connections open between requests would result in an unrealistically
 304 large number of database connections open at the same time, and very
 305 low utilization of each connection.<p>
 306
 307
 308 <h4><a name = "S5.3">5.3 Test Conditions</a></h4>
 309
 310 The test load is generated by 10 HTTP client processes.  The processes
 311 represent disjoint sets of users.  A process makes a request for a
 312 user, then a request for a different user, and so on until it is time
 313 for the first user to make another request.<p>
 314
 315 For simplicity the 10 client processes run on the same machine
 316 as the Web server.  This avoids the possibility that a network
 317 bottleneck will obscure the test results.  The database system
 318 also runs on this machine, as specified in the application scenario.<p>
 319
 320 Response time is not an issue under the test conditions.  We just
 321 measure throughput.<p>
 322
 323 The API Web server is in these tests is Netscape 1.1.<p>
 324
 325
 326 <h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4>
 327
 328 Here are the test results:<p>
 329
 330 <ul>
 331 <pre>
 332     FastCGI  12.0 msec per request = 83 requests per second
 333     API      36.6 msec per request = 27 requests per second
 334 </pre>
 335 </ul>
 336
 337 Given the big architectural advantage that the FastCGI application
 338 enjoys over the API application, it is not surprising that the
 339 FastCGI application runs a lot faster.  To gain a deeper
 340 understanding of these results we measured two more conditions:<p>
 341
 342 <ul>
 343     <li>API with sustained database connections.  If you could
 344         afford the extra licensing cost, how much faster would
 345         your API application run?<p>
 346
 347 <pre>
 348     API      16.0 msec per request = 61 requests per second
 349 </pre>
 350
 351         Answer: Still not as fast as the FastCGI application.<p>
 352
 353     <li>FastCGI with cache disabled.  How much benefit does the
 354         FastCGI application get from its cache?<p>
 355
 356 <pre>
 357     FastCGI  20.1 msec per request = 50 requests per second
 358 </pre>
 359
 360         Answer: A very substantial benefit, even though the database
 361         access is quite simple.<p>
 362 </ul>
 363
 364 What these two extra experiments show is that if the API and FastCGI
 365 applications are implemented in exactly the same way -- caching
 366 database connections but not caching user profile data -- the API
 367 application is slightly faster.  This is what you'd expect, since the
 368 FastCGI application has to pay the cost of inter-process
 369 communication not present in the API application.<p>
 370
 371 In the real world the two applications would not be implemented in the
 372 same way.  FastCGI's architectural advantage results in much higher
 373 performance -- a factor of 3 in this test.  With a remote database
 374 or more expensive database access the factor would be higher.
 375 With more substantial processing of the content files the factor
 376 would be smaller.<p>
 377
 378
 379
 380 <h3><a name = "S6">6. Multi-threaded APIs</a></h3>
 381
 382
 383 Web servers with a multi-threaded internal structure (and APIs to
 384 match) are now starting to become more common.  These servers don't
 385 have all of the disadvantages described in Section 3.  Does this mean
 386 that FastCGI's performance advantages will disappear?<p>
 387
 388 A superficial analysis says yes.  An API-based application in a
 389 single-process, multi-threaded server can maintain caches and database
 390 connections the same way a FastCGI application can.  The API-based
 391 application does not pay for inter-process communication, so the
 392 API-based application will be slightly faster than the FastCGI
 393 application.<p>
 394
 395 A deeper analysis says no.  Multi-threaded programming is complex,
 396 because concurrency makes programs much more difficult to test and
 397 debug.  In the case of multi-threaded programming to Web server APIs,
 398 the normal problems with multi-threading are compounded by the lack of
 399 isolation between different applications and between the applications
 400 and the Web server.  With FastCGI you can write programs in the
 401 familiar single-threaded style, get all the reliability and
 402 maintainability of process isolation, and still get very high
 403 performance.  If you truly need multi-threading, you can write
 404 multi-threaded FastCGI and still isolate your multi-threaded
 405 application from other applications and from the server.  In short,
 406 multi-threading makes Web server APIs unusable for practially all
 407 applications, reducing the choice to FastCGI versus CGI.  The
 408 performance winner in that contest is obviously FastCGI.<p>
 409
 410
 411
 412 <h3><a name = "S7">7. Conclusion</a></h3>
 413
 414
 415 Just how fast is FastCGI?  The answer: very fast indeed.  Not because
 416 it has some specially-greased path through the operating system, but
 417 because its design is well matched to the needs of most applications.
 418 We invite you to make FastCGI the fast, open foundation for your Web
 419 server applications.<p>
 420
 421
 422
 423 <hr>
 424 <a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a>
 425
 426 <address>
 427 &#169 1995, Open Market, Inc. / mbrown@openmarket.com
 428 </address>
 429
 430 </body>
 431 </html>