Commit | Line | Data |
e88ae2ce |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">\r |
2 | <HTML>\r |
3 | <HEAD>\r |
4 | <TITLE>\r |
5 | Understanding FastCGI Application Performance\r |
6 | </TITLE>\r |
7 | <STYLE TYPE="text/css">\r |
8 | body {\r |
9 | background-color: #FFFFFF;\r |
10 | color: #000000;\r |
11 | }\r |
12 | :link { color: #cc0000 }\r |
13 | :visited { color: #555555 }\r |
14 | :active { color: #000011 }\r |
15 | div.c3 {margin-left: 2em}\r |
16 | h5.c2 {text-align: center}\r |
17 | div.c1 {text-align: center}\r |
18 | </STYLE>\r |
19 | </HEAD>\r |
20 | <BODY>\r |
21 | <DIV CLASS="c1">\r |
22 | <A HREF="http://fastcgi.com"><IMG BORDER="0" SRC="../images/fcgi-hd.gif" ALT="[[FastCGI]]"></A>\r |
23 | </DIV>\r |
24 | <BR CLEAR="all">\r |
25 | <DIV CLASS="c1">\r |
26 | <H3>\r |
27 | Understanding FastCGI Application Performance\r |
28 | </H3>\r |
29 | </DIV>\r |
30 | <!--Copyright (c) 1996 Open Market, Inc. -->\r |
31 | <!--See the file "LICENSE.TERMS" for information on usage and redistribution-->\r |
32 | <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. -->\r |
33 | <DIV CLASS="c1">\r |
34 | Mark R. Brown<BR>\r |
35 | Open Market, Inc.<BR>\r |
36 | <P>\r |
37 | 10 June 1996<BR>\r |
38 | </P>\r |
39 | </DIV>\r |
40 | <P>\r |
41 | </P>\r |
42 | <H5 CLASS="c2">\r |
43 | Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge, MA 02142 U.S.A.<BR>\r |
44 | Tel: 617-621-9500 Fax: 617-621-1703 URL: <A HREF=\r |
45 | "http://www.openmarket.com/">http://www.openmarket.com/</A><BR>\r |
46 | $Id: fcgi-perf.htm,v 1.3 2001/11/27 01:03:47 robs Exp $<BR>\r |
47 | </H5>\r |
48 | <HR>\r |
49 | <UL TYPE="square">\r |
50 | <LI>\r |
51 | <A HREF="#S1">1. Introduction</A>\r |
52 | </LI>\r |
53 | <LI>\r |
54 | <A HREF="#S2">2. Performance Basics</A>\r |
55 | </LI>\r |
56 | <LI>\r |
57 | <A HREF="#S3">3. Caching</A>\r |
58 | </LI>\r |
59 | <LI>\r |
60 | <A HREF="#S4">4. Database Access</A>\r |
61 | </LI>\r |
62 | <LI>\r |
63 | <A HREF="#S5">5. A Performance Test</A> \r |
64 | <UL TYPE="square">\r |
65 | <LI>\r |
66 | <A HREF="#S5.1">5.1 Application Scenario</A>\r |
67 | </LI>\r |
68 | <LI>\r |
69 | <A HREF="#S5.2">5.2 Application Design</A>\r |
70 | </LI>\r |
71 | <LI>\r |
72 | <A HREF="#S5.3">5.3 Test Conditions</A>\r |
73 | </LI>\r |
74 | <LI>\r |
75 | <A HREF="#S5.4">5.4 Test Results and Discussion</A>\r |
76 | </LI>\r |
77 | </UL>\r |
78 | </LI>\r |
79 | <LI>\r |
80 | <A HREF="#S6">6. Multi-threaded APIs</A>\r |
81 | </LI>\r |
82 | <LI>\r |
83 | <A HREF="#S7">7. Conclusion</A>\r |
84 | </LI>\r |
85 | </UL>\r |
86 | <P>\r |
87 | </P>\r |
88 | <HR>\r |
89 | <H3>\r |
90 | <A NAME="S1">1. Introduction</A>\r |
91 | </H3>\r |
92 | <P>\r |
93 | Just how fast is FastCGI? How does the performance of a FastCGI application compare with the performance of\r |
94 | the same application implemented using a Web server API?\r |
95 | </P>\r |
96 | <P>\r |
97 | Of course, the answer is that it depends upon the application. A more complete answer is that FastCGI often\r |
98 | wins by a significant margin, and seldom loses by very much.\r |
99 | </P>\r |
100 | <P>\r |
101 | Papers on computer system performance can be laden with complex graphs showing how this varies with that.\r |
102 | Seldom do the graphs shed much light on <I>why</I> one system is faster than another. Advertising copy is\r |
103 | often even less informative. An ad from one large Web server vendor says that its server "executes web\r |
104 | applications up to five times faster than all other servers," but the ad gives little clue where the\r |
105 | number "five" came from.\r |
106 | </P>\r |
107 | <P>\r |
108 | This paper is meant to convey an understanding of the primary factors that influence the performance of Web\r |
109 | server applications and to show that architectural differences between FastCGI and server APIs often give an\r |
110 | "unfair" performance advantage to FastCGI applications. We run a test that shows a FastCGI\r |
111 | application running three times faster than the corresponding Web server API application. Under different\r |
112 | conditions this factor might be larger or smaller. We show you what you'd need to measure to figure that\r |
113 | out for the situation you face, rather than just saying "we're three times faster" and moving\r |
114 | on.\r |
115 | </P>\r |
116 | <P>\r |
117 | This paper makes no attempt to prove that FastCGI is better than Web server APIs for every application. Web\r |
118 | server APIs enable lightweight protocol extensions, such as Open Market's SecureLink extension, to be\r |
119 | added to Web servers, as well as allowing other forms of server customization. But APIs are not well matched\r |
120 | to mainstream applications such as personalized content or access to corporate databases, because of API\r |
121 | drawbacks including high complexity, low security, and limited scalability. FastCGI shines when used for the\r |
122 | vast majority of Web applications.\r |
123 | </P>\r |
124 | <P>\r |
125 | </P>\r |
126 | <H3>\r |
127 | <A NAME="S2">2. Performance Basics</A>\r |
128 | </H3>\r |
129 | <P>\r |
130 | Since this paper is about performance we need to be clear on what "performance" is.\r |
131 | </P>\r |
132 | <P>\r |
133 | The standard way to measure performance in a request-response system like the Web is to measure peak request\r |
134 | throughput subject to a response time constriaint. For instance, a Web server application might be capable of\r |
135 | performing 20 requests per second while responding to 90% of the requests in less than 2 seconds.\r |
136 | </P>\r |
137 | <P>\r |
138 | Response time is a thorny thing to measure on the Web because client communications links to the Internet have\r |
139 | widely varying bandwidth. If the client is slow to read the server's response, response time at both the\r |
140 | client and the server will go up, and there's nothing the server can do about it. For the purposes of\r |
141 | making repeatable measurements the client should have a high-bandwidth communications link to the server.\r |
142 | </P>\r |
143 | <P>\r |
144 | [Footnote: When designing a Web server application that will be accessed over slow (e.g. 14.4 or even 28.8\r |
145 | kilobit/second modem) channels, pay attention to the simultaneous connections bottleneck. Some servers are\r |
146 | limited by design to only 100 or 200 simultaneous connections. If your application sends 50 kilobytes of data\r |
147 | to a typical client that can read 2 kilobytes per second, then a request takes 25 seconds to complete. If your\r |
148 | server is limited to 100 simultaneous connections, throughput is limited to just 4 requests per second.]\r |
149 | </P>\r |
150 | <P>\r |
151 | Response time is seldom an issue when load is light, but response times rise quickly as the system approaches\r |
152 | a bottleneck on some limited resource. The three resources that typical systems run out of are network I/O,\r |
153 | disk I/O, and processor time. If short response time is a goal, it is a good idea to stay at or below 50% load\r |
154 | on each of these resources. For instance, if your disk subsystem is capable of delivering 200 I/Os per second,\r |
155 | then try to run your application at 100 I/Os per second to avoid having the disk subsystem contribute to slow\r |
156 | response times. Through careful management it is possible to succeed in running closer to the edge, but\r |
157 | careful management is both difficult and expensive so few systems get it.\r |
158 | </P>\r |
159 | <P>\r |
160 | If a Web server application is local to the Web server machine, then its internal design has no impact on\r |
161 | network I/O. Application design can have a big impact on usage of disk I/O and processor time.\r |
162 | </P>\r |
163 | <P>\r |
164 | </P>\r |
165 | <H3>\r |
166 | <A NAME="S3">3. Caching</A>\r |
167 | </H3>\r |
168 | <P>\r |
169 | It is a rare Web server application that doesn't run fast when all the information it needs is available\r |
170 | in its memory. And if the application doesn't run fast under those conditions, the possible solutions are\r |
171 | evident: Tune the processor-hungry parts of the application, install a faster processor, or change the\r |
172 | application's functional specification so it doesn't need to do so much work.\r |
173 | </P>\r |
174 | <P>\r |
175 | The way to make information available in memory is by caching. A cache is an in-memory data structure that\r |
176 | contains information that's been read from its permanent home on disk. When the application needs\r |
177 | information, it consults the cache, and uses the information if it is there. Otherwise is reads the\r |
178 | information from disk and places a copy in the cache. If the cache is full, the application discards some old\r |
179 | information before adding the new. When the application needs to change cached information, it changes both\r |
180 | the cache entry and the information on disk. That way, if the application crashes, no information is lost; the\r |
181 | application just runs more slowly for awhile after restarting, because the cache doesn't improve\r |
182 | performance when it is empty.\r |
183 | </P>\r |
184 | <P>\r |
185 | Caching can reduce both disk I/O and processor time, because reading information from disk uses more processor\r |
186 | time than reading it from the cache. Because caching addresses both of the potential bottlenecks, it is the\r |
187 | focal point of high-performance Web server application design. CGI applications couldn't perform in-memory\r |
188 | caching, because they exited after processing just one request. Web server APIs promised to solve this\r |
189 | problem. But how effective is the solution?\r |
190 | </P>\r |
191 | <P>\r |
192 | Today's most widely deployed Web server APIs are based on a pool-of-processes server model. The Web server\r |
193 | consists of a parent process and a pool of child processes. Processes do not share memory. An incoming request\r |
194 | is assigned to an idle child at random. The child runs the request to completion before accepting a new\r |
195 | request. A typical server has 32 child processes, a large server has 100 or 200.\r |
196 | </P>\r |
197 | <P>\r |
198 | In-memory caching works very poorly in this server model because processes do not share memory and incoming\r |
199 | requests are assigned to processes at random. For instance, to keep a frequently-used file available in memory\r |
200 | the server must keep a file copy per child, which wastes memory. When the file is modified all the children\r |
201 | need to be notified, which is complex (the APIs don't provide a way to do it).\r |
202 | </P>\r |
203 | <P>\r |
204 | FastCGI is designed to allow effective in-memory caching. Requests are routed from any child process to a\r |
205 | FastCGI application server. The FastCGI application process maintains an in-memory cache.\r |
206 | </P>\r |
207 | <P>\r |
208 | In some cases a single FastCGI application server won't provide enough performance. FastCGI provides two\r |
209 | solutions: session affinity and multi-threading.\r |
210 | </P>\r |
211 | <P>\r |
212 | With session affinity you run a pool of application processes and the Web server routes requests to individual\r |
213 | processes based on any information contained in the request. For instance, the server can route according to\r |
214 | the area of content that's been requested, or according to the user. The user might be identified by an\r |
215 | application-specific session identifier, by the user ID contained in an Open Market Secure Link ticket, by the\r |
216 | Basic Authentication user name, or whatever. Each process maintains its own cache, and session affinity\r |
217 | ensures that each incoming request has access to the cache that will speed up processing the most.\r |
218 | </P>\r |
219 | <P>\r |
220 | With multi-threading you run an application process that is designed to handle several requests at the same\r |
221 | time. The threads handling concurrent requests share process memory, so they all have access to the same\r |
222 | cache. Multi-threaded programming is complex -- concurrency makes programs difficult to test and debug -- but\r |
223 | with FastCGI you can write single threaded <I>or</I> multithreaded applications.\r |
224 | </P>\r |
225 | <P>\r |
226 | </P>\r |
227 | <H3>\r |
228 | <A NAME="S4">4. Database Access</A>\r |
229 | </H3>\r |
230 | <P>\r |
231 | Many Web server applications perform database access. Existing databases contain a lot of valuable\r |
232 | information; Web server applications allow companies to give wider access to the information.\r |
233 | </P>\r |
234 | <P>\r |
235 | Access to database management systems, even within a single machine, is via connection-oriented protocols. An\r |
236 | application "logs in" to a database, creating a connection, then performs one or more accesses.\r |
237 | Frequently, the cost of creating the database connection is several times the cost of accessing data over an\r |
238 | established connection.\r |
239 | </P>\r |
240 | <P>\r |
241 | To a first approximation database connections are just another type of state to be cached in memory by an\r |
242 | application, so the discussion of caching above applies to caching database connections.\r |
243 | </P>\r |
244 | <P>\r |
245 | But database connections are special in one respect: They are often the basis for database licensing. You pay\r |
246 | the database vendor according to the number of concurrent connections the database system can sustain. A\r |
247 | 100-connection license costs much more than a 5-connection license. It follows that caching a database\r |
248 | connection per Web server child process is not just wasteful of system's hardware resources, it could\r |
249 | break your software budget.\r |
250 | </P>\r |
251 | <P>\r |
252 | </P>\r |
253 | <H3>\r |
254 | <A NAME="S5">5. A Performance Test</A>\r |
255 | </H3>\r |
256 | <P>\r |
257 | We designed a test application to illustrate performance issues. The application represents a class of\r |
258 | applications that deliver personalized content. The test application is quite a bit simpler than any real\r |
259 | application would be, but still illustrates the main performance issues. We implemented the application using\r |
260 | both FastCGI and a current Web server API, and measured the performance of each.\r |
261 | </P>\r |
262 | <P>\r |
263 | </P>\r |
264 | <H4>\r |
265 | <A NAME="S5.1">5.1 Application Scenario</A>\r |
266 | </H4>\r |
267 | <P>\r |
268 | The application is based on a user database and a set of content files. When a user requests a content file,\r |
269 | the application performs substitutions in the file using information from the user database. The application\r |
270 | then returns the modified content to the user.\r |
271 | </P>\r |
272 | <P>\r |
273 | Each request accomplishes the following:\r |
274 | </P>\r |
275 | <P>\r |
276 | </P>\r |
277 | <OL>\r |
278 | <LI>\r |
279 | authentication check: The user id is used to retrieve and check the password.\r |
280 | <P>\r |
281 | </P>\r |
282 | </LI>\r |
283 | <LI>\r |
284 | attribute retrieval: The user id is used to retrieve all of the user's attribute values.\r |
285 | <P>\r |
286 | </P>\r |
287 | </LI>\r |
288 | <LI>\r |
289 | file retrieval and filtering: The request identifies a content file. This file is read and all occurrences\r |
290 | of variable names are replaced with the user's corresponding attribute values. The modified HTML is\r |
291 | returned to the user.<BR>\r |
292 | <BR>\r |
293 | </LI>\r |
294 | </OL>\r |
295 | <P>\r |
296 | Of course, it is fair game to perform caching to shortcut any of these steps.\r |
297 | </P>\r |
298 | <P>\r |
299 | Each user's database record (including password and attribute values) is approximately 100 bytes long.\r |
300 | Each content file is 3,000 bytes long. Both database and content files are stored on disks attached to the\r |
301 | server platform.\r |
302 | </P>\r |
303 | <P>\r |
304 | A typical user makes 10 file accesses with realistic think times (30-60 seconds) between accesses, then\r |
305 | disappears for a long time.\r |
306 | </P>\r |
307 | <P>\r |
308 | </P>\r |
309 | <H4>\r |
310 | <A NAME="S5.2">5.2 Application Design</A>\r |
311 | </H4>\r |
312 | <P>\r |
313 | The FastCGI application maintains a cache of recently-accessed attribute values from the database. When the\r |
314 | cache misses the application reads from the database. Because only a small number of FastCGI application\r |
315 | processes are needed, each process opens a database connection on startup and keeps it open.\r |
316 | </P>\r |
317 | <P>\r |
318 | The FastCGI application is configured as multiple application processes. This is desirable in order to get\r |
319 | concurrent application processing during database reads and file reads. Requests are routed to these\r |
320 | application processes using FastCGI session affinity keyed on the user id. This way all a user's requests\r |
321 | after the first hit in the application's cache.\r |
322 | </P>\r |
323 | <P>\r |
324 | The API application does not maintain a cache; the API application has no way to share the cache among its\r |
325 | processes, so the cache hit rate would be too low to make caching pay. The API application opens and closes a\r |
326 | database connection on every request; keeping database connections open between requests would result in an\r |
327 | unrealistically large number of database connections open at the same time, and very low utilization of each\r |
328 | connection.\r |
329 | </P>\r |
330 | <P>\r |
331 | </P>\r |
332 | <H4>\r |
333 | <A NAME="S5.3">5.3 Test Conditions</A>\r |
334 | </H4>\r |
335 | <P>\r |
336 | The test load is generated by 10 HTTP client processes. The processes represent disjoint sets of users. A\r |
337 | process makes a request for a user, then a request for a different user, and so on until it is time for the\r |
338 | first user to make another request.\r |
339 | </P>\r |
340 | <P>\r |
341 | For simplicity the 10 client processes run on the same machine as the Web server. This avoids the possibility\r |
342 | that a network bottleneck will obscure the test results. The database system also runs on this machine, as\r |
343 | specified in the application scenario.\r |
344 | </P>\r |
345 | <P>\r |
346 | Response time is not an issue under the test conditions. We just measure throughput.\r |
347 | </P>\r |
348 | <P>\r |
349 | The API Web server is in these tests is Netscape 1.1.\r |
350 | </P>\r |
351 | <P>\r |
352 | </P>\r |
353 | <H4>\r |
354 | <A NAME="S5.4">5.4 Test Results and Discussion</A>\r |
355 | </H4>\r |
356 | <P>\r |
357 | Here are the test results:\r |
358 | </P>\r |
359 | <P>\r |
360 | </P>\r |
361 | <DIV CLASS="c3">\r |
362 | <PRE>\r |
363 | FastCGI 12.0 msec per request = 83 requests per second\r |
364 | API 36.6 msec per request = 27 requests per second\r |
365 | </PRE>\r |
366 | </DIV>\r |
367 | <P>\r |
368 | Given the big architectural advantage that the FastCGI application enjoys over the API application, it is not\r |
369 | surprising that the FastCGI application runs a lot faster. To gain a deeper understanding of these results we\r |
370 | measured two more conditions:\r |
371 | </P>\r |
372 | <P>\r |
373 | </P>\r |
374 | <UL>\r |
375 | <LI>\r |
376 | API with sustained database connections. If you could afford the extra licensing cost, how much faster\r |
377 | would your API application run?\r |
378 | <P>\r |
379 | </P>\r |
380 | <PRE>\r |
381 | API 16.0 msec per request = 61 requests per second\r |
382 | </PRE>\r |
383 | Answer: Still not as fast as the FastCGI application.\r |
384 | <P>\r |
385 | </P>\r |
386 | </LI>\r |
387 | <LI>\r |
388 | FastCGI with cache disabled. How much benefit does the FastCGI application get from its cache?\r |
389 | <P>\r |
390 | </P>\r |
391 | <PRE>\r |
392 | FastCGI 20.1 msec per request = 50 requests per second\r |
393 | </PRE>\r |
394 | Answer: A very substantial benefit, even though the database access is quite simple.<BR>\r |
395 | <BR>\r |
396 | </LI>\r |
397 | </UL>\r |
398 | <P>\r |
399 | What these two extra experiments show is that if the API and FastCGI applications are implemented in exactly\r |
400 | the same way -- caching database connections but not caching user profile data -- the API application is\r |
401 | slightly faster. This is what you'd expect, since the FastCGI application has to pay the cost of\r |
402 | inter-process communication not present in the API application.\r |
403 | </P>\r |
404 | <P>\r |
405 | In the real world the two applications would not be implemented in the same way. FastCGI's architectural\r |
406 | advantage results in much higher performance -- a factor of 3 in this test. With a remote database or more\r |
407 | expensive database access the factor would be higher. With more substantial processing of the content files\r |
408 | the factor would be smaller.\r |
409 | </P>\r |
410 | <P>\r |
411 | </P>\r |
412 | <H3>\r |
413 | <A NAME="S6">6. Multi-threaded APIs</A>\r |
414 | </H3>\r |
415 | <P>\r |
416 | Web servers with a multi-threaded internal structure (and APIs to match) are now starting to become more\r |
417 | common. These servers don't have all of the disadvantages described in Section 3. Does this mean that\r |
418 | FastCGI's performance advantages will disappear?\r |
419 | </P>\r |
420 | <P>\r |
421 | A superficial analysis says yes. An API-based application in a single-process, multi-threaded server can\r |
422 | maintain caches and database connections the same way a FastCGI application can. The API-based application\r |
423 | does not pay for inter-process communication, so the API-based application will be slightly faster than the\r |
424 | FastCGI application.\r |
425 | </P>\r |
426 | <P>\r |
427 | A deeper analysis says no. Multi-threaded programming is complex, because concurrency makes programs much more\r |
428 | difficult to test and debug. In the case of multi-threaded programming to Web server APIs, the normal problems\r |
429 | with multi-threading are compounded by the lack of isolation between different applications and between the\r |
430 | applications and the Web server. With FastCGI you can write programs in the familiar single-threaded style,\r |
431 | get all the reliability and maintainability of process isolation, and still get very high performance. If you\r |
432 | truly need multi-threading, you can write multi-threaded FastCGI and still isolate your multi-threaded\r |
433 | application from other applications and from the server. In short, multi-threading makes Web server APIs\r |
434 | unusable for practially all applications, reducing the choice to FastCGI versus CGI. The performance winner in\r |
435 | that contest is obviously FastCGI.\r |
436 | </P>\r |
437 | <P>\r |
438 | </P>\r |
439 | <H3>\r |
440 | <A NAME="S7">7. Conclusion</A>\r |
441 | </H3>\r |
442 | <P>\r |
443 | Just how fast is FastCGI? The answer: very fast indeed. Not because it has some specially-greased path through\r |
444 | the operating system, but because its design is well matched to the needs of most applications. We invite you\r |
445 | to make FastCGI the fast, open foundation for your Web server applications.\r |
446 | </P>\r |
447 | <P>\r |
448 | </P>\r |
449 | <HR>\r |
450 | <A HREF="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></A> \r |
451 | <ADDRESS>\r |
452 | © 1995, Open Market, Inc. / mbrown@openmarket.com\r |
453 | </ADDRESS>\r |
454 | </BODY>\r |
455 | </HTML>\r |
456 | \r |