Commit | Line | Data |
0198fd3c |
1 | <html> |
2 | <head><title>Understanding FastCGI Application Performance</title> |
3 | </head> |
4 | |
5 | <body bgcolor="#FFFFFF" text="#000000" link="#cc0000" alink="#000011" |
6 | vlink="#555555"> |
7 | |
8 | <center> |
9 | <a href="/fastcgi/words"> |
10 | <img border=0 src="../images/fcgi-hd.gif" alt="[[FastCGI]]"></a> |
11 | </center> |
12 | <br clear=all> |
13 | <h3><center>Understanding FastCGI Application Performance</center></h3> |
14 | |
15 | <!--Copyright (c) 1996 Open Market, Inc. --> |
16 | <!--See the file "LICENSE.TERMS" for information on usage and redistribution--> |
17 | <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. --> |
18 | |
19 | <center> |
20 | Mark R. Brown<br> |
21 | Open Market, Inc.<br> |
22 | <p> |
23 | |
24 | 10 June 1996<br> |
25 | </center> |
26 | <p> |
27 | |
28 | <h5 align=center> |
29 | Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge, |
30 | MA 02142 U.S.A.<br> |
31 | Tel: 617-621-9500 Fax: 617-621-1703 URL: |
32 | <a href="http://www.openmarket.com/">http://www.openmarket.com/</a><br> |
33 | $Id: fcgi-perf.htm,v 1.1 1997/09/16 15:36:26 stanleyg Exp $ <br> |
34 | </h5> |
35 | <hr> |
36 | |
37 | <ul type=square> |
38 | <li><a HREF = "#S1">1. Introduction</a> |
39 | <li><a HREF = "#S2">2. Performance Basics</a> |
40 | <li><a HREF = "#S3">3. Caching</a> |
41 | <li><a HREF = "#S4">4. Database Access</a> |
42 | <li><a HREF = "#S5">5. A Performance Test</a> |
43 | <ul type=square> |
44 | <li><a HREF = "#S5.1">5.1 Application Scenario</a> |
45 | <li><a HREF = "#S5.2">5.2 Application Design</a> |
46 | <li><a HREF = "#S5.3">5.3 Test Conditions</a> |
47 | <li><a HREF = "#S5.4">5.4 Test Results and Discussion</a> |
48 | </ul> |
49 | <li><a HREF = "#S6">6. Multi-threaded APIs</a> |
50 | <li><a HREF = "#S7">7. Conclusion</a> |
51 | </ul> |
52 | <p> |
53 | |
54 | <hr> |
55 | |
56 | |
57 | <h3><a name = "S1">1. Introduction</a></h3> |
58 | |
59 | |
60 | Just how fast is FastCGI? How does the performance of a FastCGI |
61 | application compare with the performance of the same |
62 | application implemented using a Web server API?<p> |
63 | |
64 | Of course, the answer is that it depends upon the application. |
65 | A more complete answer is that FastCGI often wins by a significant |
66 | margin, and seldom loses by very much.<p> |
67 | |
68 | Papers on computer system performance can be laden with complex graphs |
69 | showing how this varies with that. Seldom do the graphs shed much |
70 | light on <i>why</i> one system is faster than another. Advertising copy is |
71 | often even less informative. An ad from one large Web server vendor |
72 | says that its server "executes web applications up to five times |
73 | faster than all other servers," but the ad gives little clue where the |
74 | number "five" came from.<p> |
75 | |
76 | This paper is meant to convey an understanding of the primary factors |
77 | that influence the performance of Web server applications and to show |
78 | that architectural differences between FastCGI and server APIs often |
79 | give an "unfair" performance advantage to FastCGI applications. We |
80 | run a test that shows a FastCGI application running three times faster |
81 | than the corresponding Web server API application. Under different |
82 | conditions this factor might be larger or smaller. We show you what |
83 | you'd need to measure to figure that out for the situation you face, |
84 | rather than just saying "we're three times faster" and moving on.<p> |
85 | |
86 | This paper makes no attempt to prove that FastCGI is better than Web |
87 | server APIs for every application. Web server APIs enable lightweight |
88 | protocol extensions, such as Open Market's SecureLink extension, to be |
89 | added to Web servers, as well as allowing other forms of server |
90 | customization. But APIs are not well matched to mainstream applications |
91 | such as personalized content or access to corporate databases, because |
92 | of API drawbacks including high complexity, low security, and |
93 | limited scalability. FastCGI shines when used for the vast |
94 | majority of Web applications.<p> |
95 | |
96 | |
97 | |
98 | <h3><a name = "S2">2. Performance Basics</a></h3> |
99 | |
100 | |
101 | Since this paper is about performance we need to be clear on |
102 | what "performance" is.<p> |
103 | |
104 | The standard way to measure performance in a request-response system |
105 | like the Web is to measure peak request throughput subject to a |
106 | response time constriaint. For instance, a Web server application |
107 | might be capable of performing 20 requests per second while responding |
108 | to 90% of the requests in less than 2 seconds.<p> |
109 | |
110 | Response time is a thorny thing to measure on the Web because client |
111 | communications links to the Internet have widely varying bandwidth. |
112 | If the client is slow to read the server's response, response time at |
113 | both the client and the server will go up, and there's nothing the |
114 | server can do about it. For the purposes of making repeatable |
115 | measurements the client should have a high-bandwidth communications |
116 | link to the server.<p> |
117 | |
118 | [Footnote: When designing a Web server application that will be |
119 | accessed over slow (e.g. 14.4 or even 28.8 kilobit/second modem) |
120 | channels, pay attention to the simultaneous connections bottleneck. |
121 | Some servers are limited by design to only 100 or 200 simultaneous |
122 | connections. If your application sends 50 kilobytes of data to a |
123 | typical client that can read 2 kilobytes per second, then a request |
124 | takes 25 seconds to complete. If your server is limited to 100 |
125 | simultaneous connections, throughput is limited to just 4 requests per |
126 | second.]<p> |
127 | |
128 | Response time is seldom an issue when load is light, but response |
129 | times rise quickly as the system approaches a bottleneck on some |
130 | limited resource. The three resources that typical systems run out of |
131 | are network I/O, disk I/O, and processor time. If short response time |
132 | is a goal, it is a good idea to stay at or below 50% load on each of |
133 | these resources. For instance, if your disk subsystem is capable of |
134 | delivering 200 I/Os per second, then try to run your application at |
135 | 100 I/Os per second to avoid having the disk subsystem contribute to |
136 | slow response times. Through careful management it is possible to |
137 | succeed in running closer to the edge, but careful management is both |
138 | difficult and expensive so few systems get it.<p> |
139 | |
140 | If a Web server application is local to the Web server machine, then |
141 | its internal design has no impact on network I/O. Application design |
142 | can have a big impact on usage of disk I/O and processor time.<p> |
143 | |
144 | |
145 | |
146 | <h3><a name = "S3">3. Caching</a></h3> |
147 | |
148 | |
149 | It is a rare Web server application that doesn't run fast when all the |
150 | information it needs is available in its memory. And if the |
151 | application doesn't run fast under those conditions, the possible |
152 | solutions are evident: Tune the processor-hungry parts of the |
153 | application, install a faster processor, or change the application's |
154 | functional specification so it doesn't need to do so much work.<p> |
155 | |
156 | The way to make information available in memory is by caching. A |
157 | cache is an in-memory data structure that contains information that's |
158 | been read from its permanent home on disk. When the application needs |
159 | information, it consults the cache, and uses the information if it is |
160 | there. Otherwise is reads the information from disk and places a copy |
161 | in the cache. If the cache is full, the application discards some old |
162 | information before adding the new. When the application needs to |
163 | change cached information, it changes both the cache entry and the |
164 | information on disk. That way, if the application crashes, no |
165 | information is lost; the application just runs more slowly for awhile |
166 | after restarting, because the cache doesn't improve performance |
167 | when it is empty.<p> |
168 | |
169 | Caching can reduce both disk I/O and processor time, because reading |
170 | information from disk uses more processor time than reading it from |
171 | the cache. Because caching addresses both of the potential |
172 | bottlenecks, it is the focal point of high-performance Web server |
173 | application design. CGI applications couldn't perform in-memory |
174 | caching, because they exited after processing just one request. Web |
175 | server APIs promised to solve this problem. But how effective is the |
176 | solution?<p> |
177 | |
178 | Today's most widely deployed Web server APIs are based on a |
179 | pool-of-processes server model. The Web server consists of a parent |
180 | process and a pool of child processes. Processes do not share memory. |
181 | An incoming request is assigned to an idle child at random. The child |
182 | runs the request to completion before accepting a new request. A |
183 | typical server has 32 child processes, a large server has 100 or 200.<p> |
184 | |
185 | In-memory caching works very poorly in this server model because |
186 | processes do not share memory and incoming requests are assigned to |
187 | processes at random. For instance, to keep a frequently-used file |
188 | available in memory the server must keep a file copy per child, which |
189 | wastes memory. When the file is modified all the children need to be |
190 | notified, which is complex (the APIs don't provide a way to do it).<p> |
191 | |
192 | FastCGI is designed to allow effective in-memory caching. Requests |
193 | are routed from any child process to a FastCGI application server. |
194 | The FastCGI application process maintains an in-memory cache.<p> |
195 | |
196 | In some cases a single FastCGI application server won't |
197 | provide enough performance. FastCGI provides two solutions: |
198 | session affinity and multi-threading.<p> |
199 | |
200 | With session affinity you run a pool of application processes and the |
201 | Web server routes requests to individual processes based on any |
202 | information contained in the request. For instance, the server can |
203 | route according to the area of content that's been requested, or |
204 | according to the user. The user might be identified by an |
205 | application-specific session identifier, by the user ID contained in |
206 | an Open Market Secure Link ticket, by the Basic Authentication user |
207 | name, or whatever. Each process maintains its own cache, and session |
208 | affinity ensures that each incoming request has access to the cache |
209 | that will speed up processing the most.<p> |
210 | |
211 | With multi-threading you run an application process that is designed |
212 | to handle several requests at the same time. The threads handling |
213 | concurrent requests share process memory, so they all have access to |
214 | the same cache. Multi-threaded programming is complex -- concurrency |
215 | makes programs difficult to test and debug -- but with FastCGI you can |
216 | write single threaded <i>or</i> multithreaded applications.<p> |
217 | |
218 | |
219 | |
220 | <h3><a name = "S4">4. Database Access</a></h3> |
221 | |
222 | |
223 | Many Web server applications perform database access. Existing |
224 | databases contain a lot of valuable information; Web server |
225 | applications allow companies to give wider access to the information.<p> |
226 | |
227 | Access to database management systems, even within a single machine, |
228 | is via connection-oriented protocols. An application "logs in" to a |
229 | database, creating a connection, then performs one or more accesses. |
230 | Frequently, the cost of creating the database connection is several |
231 | times the cost of accessing data over an established connection.<p> |
232 | |
233 | To a first approximation database connections are just another type of |
234 | state to be cached in memory by an application, so the discussion of |
235 | caching above applies to caching database connections.<p> |
236 | |
237 | But database connections are special in one respect: They are often |
238 | the basis for database licensing. You pay the database vendor |
239 | according to the number of concurrent connections the database system |
240 | can sustain. A 100-connection license costs much more than a |
241 | 5-connection license. It follows that caching a database connection |
242 | per Web server child process is not just wasteful of system's hardware |
243 | resources, it could break your software budget.<p> |
244 | |
245 | |
246 | |
247 | <h3><a name = "S5">5. A Performance Test</a></h3> |
248 | |
249 | |
250 | We designed a test application to illustrate performance issues. The |
251 | application represents a class of applications that deliver |
252 | personalized content. The test application is quite a bit simpler |
253 | than any real application would be, but still illustrates the main |
254 | performance issues. We implemented the application using both FastCGI |
255 | and a current Web server API, and measured the performance of each.<p> |
256 | |
257 | <h4><a name = "S5.1">5.1 Application Scenario</a></h4> |
258 | |
259 | The application is based on a user database and a set of |
260 | content files. When a user requests a content file, the application |
261 | performs substitutions in the file using information from the |
262 | user database. The application then returns the modified |
263 | content to the user.<p> |
264 | |
265 | Each request accomplishes the following:<p> |
266 | |
267 | <ol> |
268 | <li>authentication check: The user id is used to retrieve and |
269 | check the password.<p> |
270 | |
271 | <li>attribute retrieval: The user id is used to retrieve all |
272 | of the user's attribute values.<p> |
273 | |
274 | <li>file retrieval and filtering: The request identifies a |
275 | content file. This file is read and all occurrences of variable |
276 | names are replaced with the user's corresponding attribute values. |
277 | The modified HTML is returned to the user.<p> |
278 | </ol> |
279 | |
280 | Of course, it is fair game to perform caching to shortcut |
281 | any of these steps.<p> |
282 | |
283 | Each user's database record (including password and attribute |
284 | values) is approximately 100 bytes long. Each content file is 3,000 |
285 | bytes long. Both database and content files are stored |
286 | on disks attached to the server platform.<p> |
287 | |
288 | A typical user makes 10 file accesses with realistic think times |
289 | (30-60 seconds) between accesses, then disappears for a long time.<p> |
290 | |
291 | |
292 | <h4><a name = "S5.2">5.2 Application Design</a></h4> |
293 | |
294 | The FastCGI application maintains a cache of recently-accessed |
295 | attribute values from the database. When the cache misses |
296 | the application reads from the database. Because only a small |
297 | number of FastCGI application processes are needed, each process |
298 | opens a database connection on startup and keeps it open.<p> |
299 | |
300 | The FastCGI application is configured as multiple application |
301 | processes. This is desirable in order to get concurrent application |
302 | processing during database reads and file reads. Requests are routed |
303 | to these application processes using FastCGI session affinity keyed on |
304 | the user id. This way all a user's requests after the first hit in |
305 | the application's cache.<p> |
306 | |
307 | The API application does not maintain a cache; the API application has |
308 | no way to share the cache among its processes, so the cache hit rate |
309 | would be too low to make caching pay. The API application opens and |
310 | closes a database connection on every request; keeping database |
311 | connections open between requests would result in an unrealistically |
312 | large number of database connections open at the same time, and very |
313 | low utilization of each connection.<p> |
314 | |
315 | |
316 | <h4><a name = "S5.3">5.3 Test Conditions</a></h4> |
317 | |
318 | The test load is generated by 10 HTTP client processes. The processes |
319 | represent disjoint sets of users. A process makes a request for a |
320 | user, then a request for a different user, and so on until it is time |
321 | for the first user to make another request.<p> |
322 | |
323 | For simplicity the 10 client processes run on the same machine |
324 | as the Web server. This avoids the possibility that a network |
325 | bottleneck will obscure the test results. The database system |
326 | also runs on this machine, as specified in the application scenario.<p> |
327 | |
328 | Response time is not an issue under the test conditions. We just |
329 | measure throughput.<p> |
330 | |
331 | The API Web server is in these tests is Netscape 1.1.<p> |
332 | |
333 | |
334 | <h4><a name = "S5.4">5.4 Test Results and Discussion</a></h4> |
335 | |
336 | Here are the test results:<p> |
337 | |
338 | <ul> |
339 | <pre> |
340 | FastCGI 12.0 msec per request = 83 requests per second |
341 | API 36.6 msec per request = 27 requests per second |
342 | </pre> |
343 | </ul> |
344 | |
345 | Given the big architectural advantage that the FastCGI application |
346 | enjoys over the API application, it is not surprising that the |
347 | FastCGI application runs a lot faster. To gain a deeper |
348 | understanding of these results we measured two more conditions:<p> |
349 | |
350 | <ul> |
351 | <li>API with sustained database connections. If you could |
352 | afford the extra licensing cost, how much faster would |
353 | your API application run?<p> |
354 | |
355 | <pre> |
356 | API 16.0 msec per request = 61 requests per second |
357 | </pre> |
358 | |
359 | Answer: Still not as fast as the FastCGI application.<p> |
360 | |
361 | <li>FastCGI with cache disabled. How much benefit does the |
362 | FastCGI application get from its cache?<p> |
363 | |
364 | <pre> |
365 | FastCGI 20.1 msec per request = 50 requests per second |
366 | </pre> |
367 | |
368 | Answer: A very substantial benefit, even though the database |
369 | access is quite simple.<p> |
370 | </ul> |
371 | |
372 | What these two extra experiments show is that if the API and FastCGI |
373 | applications are implemented in exactly the same way -- caching |
374 | database connections but not caching user profile data -- the API |
375 | application is slightly faster. This is what you'd expect, since the |
376 | FastCGI application has to pay the cost of inter-process |
377 | communication not present in the API application.<p> |
378 | |
379 | In the real world the two applications would not be implemented in the |
380 | same way. FastCGI's architectural advantage results in much higher |
381 | performance -- a factor of 3 in this test. With a remote database |
382 | or more expensive database access the factor would be higher. |
383 | With more substantial processing of the content files the factor |
384 | would be smaller.<p> |
385 | |
386 | |
387 | |
388 | <h3><a name = "S6">6. Multi-threaded APIs</a></h3> |
389 | |
390 | |
391 | Web servers with a multi-threaded internal structure (and APIs to |
392 | match) are now starting to become more common. These servers don't |
393 | have all of the disadvantages described in Section 3. Does this mean |
394 | that FastCGI's performance advantages will disappear?<p> |
395 | |
396 | A superficial analysis says yes. An API-based application in a |
397 | single-process, multi-threaded server can maintain caches and database |
398 | connections the same way a FastCGI application can. The API-based |
399 | application does not pay for inter-process communication, so the |
400 | API-based application will be slightly faster than the FastCGI |
401 | application.<p> |
402 | |
403 | A deeper analysis says no. Multi-threaded programming is complex, |
404 | because concurrency makes programs much more difficult to test and |
405 | debug. In the case of multi-threaded programming to Web server APIs, |
406 | the normal problems with multi-threading are compounded by the lack of |
407 | isolation between different applications and between the applications |
408 | and the Web server. With FastCGI you can write programs in the |
409 | familiar single-threaded style, get all the reliability and |
410 | maintainability of process isolation, and still get very high |
411 | performance. If you truly need multi-threading, you can write |
412 | multi-threaded FastCGI and still isolate your multi-threaded |
413 | application from other applications and from the server. In short, |
414 | multi-threading makes Web server APIs unusable for practially all |
415 | applications, reducing the choice to FastCGI versus CGI. The |
416 | performance winner in that contest is obviously FastCGI.<p> |
417 | |
418 | |
419 | |
420 | <h3><a name = "S7">7. Conclusion</a></h3> |
421 | |
422 | |
423 | Just how fast is FastCGI? The answer: very fast indeed. Not because |
424 | it has some specially-greased path through the operating system, but |
425 | because its design is well matched to the needs of most applications. |
426 | We invite you to make FastCGI the fast, open foundation for your Web |
427 | server applications.<p> |
428 | |
429 | |
430 | |
431 | <hr> |
432 | <a href="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></a> |
433 | |
434 | <address> |
435 | © 1995, Open Market, Inc. / mbrown@openmarket.com |
436 | </address> |
437 | |
438 | </body> |
439 | </html> |
440 | </body> |
441 | </html> |