Commit | Line | Data |
852467e2 |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
2 | <HTML> |
3 | <HEAD> |
4 | <TITLE> |
5 | Understanding FastCGI Application Performance |
6 | </TITLE> |
7 | <STYLE TYPE="text/css"> |
8 | body { |
9 | background-color: #FFFFFF; |
10 | color: #000000; |
11 | } |
12 | :link { color: #cc0000 } |
13 | :visited { color: #555555 } |
14 | :active { color: #000011 } |
15 | div.c3 {margin-left: 2em} |
16 | h5.c2 {text-align: center} |
17 | div.c1 {text-align: center} |
18 | </STYLE> |
19 | </HEAD> |
20 | <BODY> |
21 | <DIV CLASS="c1"> |
22 | <A HREF="http://fastcgi.com"><IMG BORDER="0" SRC="../images/fcgi-hd.gif" ALT="[[FastCGI]]"></A> |
23 | </DIV> |
24 | <BR CLEAR="all"> |
25 | <DIV CLASS="c1"> |
26 | <H3> |
27 | Understanding FastCGI Application Performance |
28 | </H3> |
29 | </DIV> |
30 | <!--Copyright (c) 1996 Open Market, Inc. --> |
31 | <!--See the file "LICENSE.TERMS" for information on usage and redistribution--> |
32 | <!--of this file, and for a DISCLAIMER OF ALL WARRANTIES. --> |
33 | <DIV CLASS="c1"> |
34 | Mark R. Brown<BR> |
35 | Open Market, Inc.<BR> |
36 | <P> |
37 | 10 June 1996<BR> |
38 | </P> |
39 | </DIV> |
40 | <P> |
41 | </P> |
42 | <H5 CLASS="c2"> |
43 | Copyright © 1996 Open Market, Inc. 245 First Street, Cambridge, MA 02142 U.S.A.<BR> |
44 | Tel: 617-621-9500 Fax: 617-621-1703 URL: <A HREF= |
45 | "http://www.openmarket.com/">http://www.openmarket.com/</A><BR> |
46 | $Id: fcgi-perf.htm,v 1.4 2002/02/25 00:42:59 robs Exp $<BR> |
47 | </H5> |
48 | <HR> |
49 | <UL TYPE="square"> |
50 | <LI> |
51 | <A HREF="#S1">1. Introduction</A> |
52 | </LI> |
53 | <LI> |
54 | <A HREF="#S2">2. Performance Basics</A> |
55 | </LI> |
56 | <LI> |
57 | <A HREF="#S3">3. Caching</A> |
58 | </LI> |
59 | <LI> |
60 | <A HREF="#S4">4. Database Access</A> |
61 | </LI> |
62 | <LI> |
63 | <A HREF="#S5">5. A Performance Test</A> |
64 | <UL TYPE="square"> |
65 | <LI> |
66 | <A HREF="#S5.1">5.1 Application Scenario</A> |
67 | </LI> |
68 | <LI> |
69 | <A HREF="#S5.2">5.2 Application Design</A> |
70 | </LI> |
71 | <LI> |
72 | <A HREF="#S5.3">5.3 Test Conditions</A> |
73 | </LI> |
74 | <LI> |
75 | <A HREF="#S5.4">5.4 Test Results and Discussion</A> |
76 | </LI> |
77 | </UL> |
78 | </LI> |
79 | <LI> |
80 | <A HREF="#S6">6. Multi-threaded APIs</A> |
81 | </LI> |
82 | <LI> |
83 | <A HREF="#S7">7. Conclusion</A> |
84 | </LI> |
85 | </UL> |
86 | <P> |
87 | </P> |
88 | <HR> |
89 | <H3> |
90 | <A NAME="S1">1. Introduction</A> |
91 | </H3> |
92 | <P> |
93 | Just how fast is FastCGI? How does the performance of a FastCGI application compare with the performance of |
94 | the same application implemented using a Web server API? |
95 | </P> |
96 | <P> |
97 | Of course, the answer is that it depends upon the application. A more complete answer is that FastCGI often |
98 | wins by a significant margin, and seldom loses by very much. |
99 | </P> |
100 | <P> |
101 | Papers on computer system performance can be laden with complex graphs showing how this varies with that. |
102 | Seldom do the graphs shed much light on <I>why</I> one system is faster than another. Advertising copy is |
103 | often even less informative. An ad from one large Web server vendor says that its server "executes web |
104 | applications up to five times faster than all other servers," but the ad gives little clue where the |
105 | number "five" came from. |
106 | </P> |
107 | <P> |
108 | This paper is meant to convey an understanding of the primary factors that influence the performance of Web |
109 | server applications and to show that architectural differences between FastCGI and server APIs often give an |
110 | "unfair" performance advantage to FastCGI applications. We run a test that shows a FastCGI |
111 | application running three times faster than the corresponding Web server API application. Under different |
112 | conditions this factor might be larger or smaller. We show you what you'd need to measure to figure that |
113 | out for the situation you face, rather than just saying "we're three times faster" and moving |
114 | on. |
115 | </P> |
116 | <P> |
117 | This paper makes no attempt to prove that FastCGI is better than Web server APIs for every application. Web |
118 | server APIs enable lightweight protocol extensions, such as Open Market's SecureLink extension, to be |
119 | added to Web servers, as well as allowing other forms of server customization. But APIs are not well matched |
120 | to mainstream applications such as personalized content or access to corporate databases, because of API |
121 | drawbacks including high complexity, low security, and limited scalability. FastCGI shines when used for the |
122 | vast majority of Web applications. |
123 | </P> |
124 | <P> |
125 | </P> |
126 | <H3> |
127 | <A NAME="S2">2. Performance Basics</A> |
128 | </H3> |
129 | <P> |
130 | Since this paper is about performance we need to be clear on what "performance" is. |
131 | </P> |
132 | <P> |
133 | The standard way to measure performance in a request-response system like the Web is to measure peak request |
134 | throughput subject to a response time constriaint. For instance, a Web server application might be capable of |
135 | performing 20 requests per second while responding to 90% of the requests in less than 2 seconds. |
136 | </P> |
137 | <P> |
138 | Response time is a thorny thing to measure on the Web because client communications links to the Internet have |
139 | widely varying bandwidth. If the client is slow to read the server's response, response time at both the |
140 | client and the server will go up, and there's nothing the server can do about it. For the purposes of |
141 | making repeatable measurements the client should have a high-bandwidth communications link to the server. |
142 | </P> |
143 | <P> |
144 | [Footnote: When designing a Web server application that will be accessed over slow (e.g. 14.4 or even 28.8 |
145 | kilobit/second modem) channels, pay attention to the simultaneous connections bottleneck. Some servers are |
146 | limited by design to only 100 or 200 simultaneous connections. If your application sends 50 kilobytes of data |
147 | to a typical client that can read 2 kilobytes per second, then a request takes 25 seconds to complete. If your |
148 | server is limited to 100 simultaneous connections, throughput is limited to just 4 requests per second.] |
149 | </P> |
150 | <P> |
151 | Response time is seldom an issue when load is light, but response times rise quickly as the system approaches |
152 | a bottleneck on some limited resource. The three resources that typical systems run out of are network I/O, |
153 | disk I/O, and processor time. If short response time is a goal, it is a good idea to stay at or below 50% load |
154 | on each of these resources. For instance, if your disk subsystem is capable of delivering 200 I/Os per second, |
155 | then try to run your application at 100 I/Os per second to avoid having the disk subsystem contribute to slow |
156 | response times. Through careful management it is possible to succeed in running closer to the edge, but |
157 | careful management is both difficult and expensive so few systems get it. |
158 | </P> |
159 | <P> |
160 | If a Web server application is local to the Web server machine, then its internal design has no impact on |
161 | network I/O. Application design can have a big impact on usage of disk I/O and processor time. |
162 | </P> |
163 | <P> |
164 | </P> |
165 | <H3> |
166 | <A NAME="S3">3. Caching</A> |
167 | </H3> |
168 | <P> |
169 | It is a rare Web server application that doesn't run fast when all the information it needs is available |
170 | in its memory. And if the application doesn't run fast under those conditions, the possible solutions are |
171 | evident: Tune the processor-hungry parts of the application, install a faster processor, or change the |
172 | application's functional specification so it doesn't need to do so much work. |
173 | </P> |
174 | <P> |
175 | The way to make information available in memory is by caching. A cache is an in-memory data structure that |
176 | contains information that's been read from its permanent home on disk. When the application needs |
177 | information, it consults the cache, and uses the information if it is there. Otherwise is reads the |
178 | information from disk and places a copy in the cache. If the cache is full, the application discards some old |
179 | information before adding the new. When the application needs to change cached information, it changes both |
180 | the cache entry and the information on disk. That way, if the application crashes, no information is lost; the |
181 | application just runs more slowly for awhile after restarting, because the cache doesn't improve |
182 | performance when it is empty. |
183 | </P> |
184 | <P> |
185 | Caching can reduce both disk I/O and processor time, because reading information from disk uses more processor |
186 | time than reading it from the cache. Because caching addresses both of the potential bottlenecks, it is the |
187 | focal point of high-performance Web server application design. CGI applications couldn't perform in-memory |
188 | caching, because they exited after processing just one request. Web server APIs promised to solve this |
189 | problem. But how effective is the solution? |
190 | </P> |
191 | <P> |
192 | Today's most widely deployed Web server APIs are based on a pool-of-processes server model. The Web server |
193 | consists of a parent process and a pool of child processes. Processes do not share memory. An incoming request |
194 | is assigned to an idle child at random. The child runs the request to completion before accepting a new |
195 | request. A typical server has 32 child processes, a large server has 100 or 200. |
196 | </P> |
197 | <P> |
198 | In-memory caching works very poorly in this server model because processes do not share memory and incoming |
199 | requests are assigned to processes at random. For instance, to keep a frequently-used file available in memory |
200 | the server must keep a file copy per child, which wastes memory. When the file is modified all the children |
201 | need to be notified, which is complex (the APIs don't provide a way to do it). |
202 | </P> |
203 | <P> |
204 | FastCGI is designed to allow effective in-memory caching. Requests are routed from any child process to a |
205 | FastCGI application server. The FastCGI application process maintains an in-memory cache. |
206 | </P> |
207 | <P> |
208 | In some cases a single FastCGI application server won't provide enough performance. FastCGI provides two |
209 | solutions: session affinity and multi-threading. |
210 | </P> |
211 | <P> |
212 | With session affinity you run a pool of application processes and the Web server routes requests to individual |
213 | processes based on any information contained in the request. For instance, the server can route according to |
214 | the area of content that's been requested, or according to the user. The user might be identified by an |
215 | application-specific session identifier, by the user ID contained in an Open Market Secure Link ticket, by the |
216 | Basic Authentication user name, or whatever. Each process maintains its own cache, and session affinity |
217 | ensures that each incoming request has access to the cache that will speed up processing the most. |
218 | </P> |
219 | <P> |
220 | With multi-threading you run an application process that is designed to handle several requests at the same |
221 | time. The threads handling concurrent requests share process memory, so they all have access to the same |
222 | cache. Multi-threaded programming is complex -- concurrency makes programs difficult to test and debug -- but |
223 | with FastCGI you can write single threaded <I>or</I> multithreaded applications. |
224 | </P> |
225 | <P> |
226 | </P> |
227 | <H3> |
228 | <A NAME="S4">4. Database Access</A> |
229 | </H3> |
230 | <P> |
231 | Many Web server applications perform database access. Existing databases contain a lot of valuable |
232 | information; Web server applications allow companies to give wider access to the information. |
233 | </P> |
234 | <P> |
235 | Access to database management systems, even within a single machine, is via connection-oriented protocols. An |
236 | application "logs in" to a database, creating a connection, then performs one or more accesses. |
237 | Frequently, the cost of creating the database connection is several times the cost of accessing data over an |
238 | established connection. |
239 | </P> |
240 | <P> |
241 | To a first approximation database connections are just another type of state to be cached in memory by an |
242 | application, so the discussion of caching above applies to caching database connections. |
243 | </P> |
244 | <P> |
245 | But database connections are special in one respect: They are often the basis for database licensing. You pay |
246 | the database vendor according to the number of concurrent connections the database system can sustain. A |
247 | 100-connection license costs much more than a 5-connection license. It follows that caching a database |
248 | connection per Web server child process is not just wasteful of system's hardware resources, it could |
249 | break your software budget. |
250 | </P> |
251 | <P> |
252 | </P> |
253 | <H3> |
254 | <A NAME="S5">5. A Performance Test</A> |
255 | </H3> |
256 | <P> |
257 | We designed a test application to illustrate performance issues. The application represents a class of |
258 | applications that deliver personalized content. The test application is quite a bit simpler than any real |
259 | application would be, but still illustrates the main performance issues. We implemented the application using |
260 | both FastCGI and a current Web server API, and measured the performance of each. |
261 | </P> |
262 | <P> |
263 | </P> |
264 | <H4> |
265 | <A NAME="S5.1">5.1 Application Scenario</A> |
266 | </H4> |
267 | <P> |
268 | The application is based on a user database and a set of content files. When a user requests a content file, |
269 | the application performs substitutions in the file using information from the user database. The application |
270 | then returns the modified content to the user. |
271 | </P> |
272 | <P> |
273 | Each request accomplishes the following: |
274 | </P> |
275 | <P> |
276 | </P> |
277 | <OL> |
278 | <LI> |
279 | authentication check: The user id is used to retrieve and check the password. |
280 | <P> |
281 | </P> |
282 | </LI> |
283 | <LI> |
284 | attribute retrieval: The user id is used to retrieve all of the user's attribute values. |
285 | <P> |
286 | </P> |
287 | </LI> |
288 | <LI> |
289 | file retrieval and filtering: The request identifies a content file. This file is read and all occurrences |
290 | of variable names are replaced with the user's corresponding attribute values. The modified HTML is |
291 | returned to the user.<BR> |
292 | <BR> |
293 | </LI> |
294 | </OL> |
295 | <P> |
296 | Of course, it is fair game to perform caching to shortcut any of these steps. |
297 | </P> |
298 | <P> |
299 | Each user's database record (including password and attribute values) is approximately 100 bytes long. |
300 | Each content file is 3,000 bytes long. Both database and content files are stored on disks attached to the |
301 | server platform. |
302 | </P> |
303 | <P> |
304 | A typical user makes 10 file accesses with realistic think times (30-60 seconds) between accesses, then |
305 | disappears for a long time. |
306 | </P> |
307 | <P> |
308 | </P> |
309 | <H4> |
310 | <A NAME="S5.2">5.2 Application Design</A> |
311 | </H4> |
312 | <P> |
313 | The FastCGI application maintains a cache of recently-accessed attribute values from the database. When the |
314 | cache misses the application reads from the database. Because only a small number of FastCGI application |
315 | processes are needed, each process opens a database connection on startup and keeps it open. |
316 | </P> |
317 | <P> |
318 | The FastCGI application is configured as multiple application processes. This is desirable in order to get |
319 | concurrent application processing during database reads and file reads. Requests are routed to these |
320 | application processes using FastCGI session affinity keyed on the user id. This way all a user's requests |
321 | after the first hit in the application's cache. |
322 | </P> |
323 | <P> |
324 | The API application does not maintain a cache; the API application has no way to share the cache among its |
325 | processes, so the cache hit rate would be too low to make caching pay. The API application opens and closes a |
326 | database connection on every request; keeping database connections open between requests would result in an |
327 | unrealistically large number of database connections open at the same time, and very low utilization of each |
328 | connection. |
329 | </P> |
330 | <P> |
331 | </P> |
332 | <H4> |
333 | <A NAME="S5.3">5.3 Test Conditions</A> |
334 | </H4> |
335 | <P> |
336 | The test load is generated by 10 HTTP client processes. The processes represent disjoint sets of users. A |
337 | process makes a request for a user, then a request for a different user, and so on until it is time for the |
338 | first user to make another request. |
339 | </P> |
340 | <P> |
341 | For simplicity the 10 client processes run on the same machine as the Web server. This avoids the possibility |
342 | that a network bottleneck will obscure the test results. The database system also runs on this machine, as |
343 | specified in the application scenario. |
344 | </P> |
345 | <P> |
346 | Response time is not an issue under the test conditions. We just measure throughput. |
347 | </P> |
348 | <P> |
349 | The API Web server is in these tests is Netscape 1.1. |
350 | </P> |
351 | <P> |
352 | </P> |
353 | <H4> |
354 | <A NAME="S5.4">5.4 Test Results and Discussion</A> |
355 | </H4> |
356 | <P> |
357 | Here are the test results: |
358 | </P> |
359 | <P> |
360 | </P> |
361 | <DIV CLASS="c3"> |
362 | <PRE> |
363 | FastCGI 12.0 msec per request = 83 requests per second |
364 | API 36.6 msec per request = 27 requests per second |
365 | </PRE> |
366 | </DIV> |
367 | <P> |
368 | Given the big architectural advantage that the FastCGI application enjoys over the API application, it is not |
369 | surprising that the FastCGI application runs a lot faster. To gain a deeper understanding of these results we |
370 | measured two more conditions: |
371 | </P> |
372 | <P> |
373 | </P> |
374 | <UL> |
375 | <LI> |
376 | API with sustained database connections. If you could afford the extra licensing cost, how much faster |
377 | would your API application run? |
378 | <P> |
379 | </P> |
380 | <PRE> |
381 | API 16.0 msec per request = 61 requests per second |
382 | </PRE> |
383 | Answer: Still not as fast as the FastCGI application. |
384 | <P> |
385 | </P> |
386 | </LI> |
387 | <LI> |
388 | FastCGI with cache disabled. How much benefit does the FastCGI application get from its cache? |
389 | <P> |
390 | </P> |
391 | <PRE> |
392 | FastCGI 20.1 msec per request = 50 requests per second |
393 | </PRE> |
394 | Answer: A very substantial benefit, even though the database access is quite simple.<BR> |
395 | <BR> |
396 | </LI> |
397 | </UL> |
398 | <P> |
399 | What these two extra experiments show is that if the API and FastCGI applications are implemented in exactly |
400 | the same way -- caching database connections but not caching user profile data -- the API application is |
401 | slightly faster. This is what you'd expect, since the FastCGI application has to pay the cost of |
402 | inter-process communication not present in the API application. |
403 | </P> |
404 | <P> |
405 | In the real world the two applications would not be implemented in the same way. FastCGI's architectural |
406 | advantage results in much higher performance -- a factor of 3 in this test. With a remote database or more |
407 | expensive database access the factor would be higher. With more substantial processing of the content files |
408 | the factor would be smaller. |
409 | </P> |
410 | <P> |
411 | </P> |
412 | <H3> |
413 | <A NAME="S6">6. Multi-threaded APIs</A> |
414 | </H3> |
415 | <P> |
416 | Web servers with a multi-threaded internal structure (and APIs to match) are now starting to become more |
417 | common. These servers don't have all of the disadvantages described in Section 3. Does this mean that |
418 | FastCGI's performance advantages will disappear? |
419 | </P> |
420 | <P> |
421 | A superficial analysis says yes. An API-based application in a single-process, multi-threaded server can |
422 | maintain caches and database connections the same way a FastCGI application can. The API-based application |
423 | does not pay for inter-process communication, so the API-based application will be slightly faster than the |
424 | FastCGI application. |
425 | </P> |
426 | <P> |
427 | A deeper analysis says no. Multi-threaded programming is complex, because concurrency makes programs much more |
428 | difficult to test and debug. In the case of multi-threaded programming to Web server APIs, the normal problems |
429 | with multi-threading are compounded by the lack of isolation between different applications and between the |
430 | applications and the Web server. With FastCGI you can write programs in the familiar single-threaded style, |
431 | get all the reliability and maintainability of process isolation, and still get very high performance. If you |
432 | truly need multi-threading, you can write multi-threaded FastCGI and still isolate your multi-threaded |
433 | application from other applications and from the server. In short, multi-threading makes Web server APIs |
434 | unusable for practially all applications, reducing the choice to FastCGI versus CGI. The performance winner in |
435 | that contest is obviously FastCGI. |
436 | </P> |
437 | <P> |
438 | </P> |
439 | <H3> |
440 | <A NAME="S7">7. Conclusion</A> |
441 | </H3> |
442 | <P> |
443 | Just how fast is FastCGI? The answer: very fast indeed. Not because it has some specially-greased path through |
444 | the operating system, but because its design is well matched to the needs of most applications. We invite you |
445 | to make FastCGI the fast, open foundation for your Web server applications. |
446 | </P> |
447 | <P> |
448 | </P> |
449 | <HR> |
450 | <A HREF="http://www.openmarket.com/"><IMG SRC="omi-logo.gif" ALT="OMI Home Page"></A> |
451 | <ADDRESS> |
452 | © 1995, Open Market, Inc. / mbrown@openmarket.com |
453 | </ADDRESS> |
454 | </BODY> |
455 | </HTML> |
456 | |