From: Tomas Doran Date: Wed, 28 Jul 2010 22:50:57 +0000 (+0000) Subject: Merge 'trunk' into 'fix_iis_cgi' X-Git-Tag: 5.80025~2 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=catagits%2FCatalyst-Runtime.git;a=commitdiff_plain;h=fef8c827fc1a87b1c32c487e330b5e967874f581;hp=-c Merge 'trunk' into 'fix_iis_cgi' r13096@spaceinvaders (orig r13060): rafl | 2010-03-22 10:30:30 +0000 It's --help, not -help. r13098@spaceinvaders (orig r13062): miyagawa | 2010-03-22 11:07:22 +0000 added cpanm :) r13099@spaceinvaders (orig r13063): ijw | 2010-03-22 11:30:38 +0000 Disabled name lookup for server hostname in favour of returning IP address. Annoyingly, I can't find any actual use of the name in this module (it's passed as SERVER_NAME in the env), but we believe it ends up in the 'Host' header. r13100@spaceinvaders (orig r13064): rafl | 2010-03-22 12:32:25 +0000 no tabs. kthx! r13118@spaceinvaders (orig r13082): rafl | 2010-03-26 16:00:54 +0000 Make sure to construct Upload objects properly, even if there are multiple Content-Type headers. Closes RT#55976. r13119@spaceinvaders (orig r13083): t0m | 2010-03-28 17:43:37 +0100 Back out 13063. This changes the CGI environment we construct to just be plain wrong, SERVER_NAME is meant to be a name after all. I have a feeling that all of this crap could be simplified, but I don't want to spend a lot of tuits changing it (and possibly breaking things for people with non-obvious behaviour changes like this one) - when those tuits could be spent on Plack stuff so that all this code dies anyway (and we do a major release so breakage can be more reasonable expected / extensively tested for). This is all possibly being _too_ paranoid here, but given the original commit didn't know what was happening / what changed _at all_, I think that's fair.. r13120@spaceinvaders (orig r13084): t0m | 2010-03-28 17:54:36 +0100 Cache the IP address => hostname lookups which could be performed multiple times to mitigate slow DNS servers. Poor man's replacement for 13063 r13121@spaceinvaders (orig r13085): t0m | 2010-03-28 18:24:18 +0100 Back out behaviour change in debug logging we don't want, keeping only the addition of 1 line of response info. More refactoring to give hooks to put the behaviour I just removed back from a plugin r13122@spaceinvaders (orig r13086): t0m | 2010-03-28 19:18:45 +0100 More splitting up of the response logging methods r13123@spaceinvaders (orig r13087): t0m | 2010-03-28 19:25:41 +0100 Make the tables from the log_headers method scale to term size nicely r13139@spaceinvaders (orig r13103): t0m | 2010-03-29 16:55:39 +0100 Fail, commit version bump r13142@spaceinvaders (orig r13106): rafl | 2010-03-29 17:09:05 +0100 Remove $VERSION hacks. For future dev releases we'll just use "-TRIAL" in the dist name. r13143@spaceinvaders (orig r13107): rafl | 2010-03-29 17:19:15 +0100 We always have a metaclass after setup, right? r13152@spaceinvaders (orig r13116): t0m | 2010-03-31 21:09:17 +0100 Adding ability to switch X-Catalyst on by config. For everyone who would like to let Netcraft know how awesome Catalyst is. (Sorry rafl...) r13157@spaceinvaders (orig r13121): rafl | 2010-04-02 00:30:06 +0100 Import croak, which the test already uses in various places. r13160@spaceinvaders (orig r13124): t0m | 2010-04-02 19:27:33 +0100 Require new CGI::Simple::Cookie, changelog r13187@spaceinvaders (orig r13151): t0m | 2010-04-12 20:16:39 +0100 Fix dsadinoff's mod_rewrite bug I hope r13188@spaceinvaders (orig r13152): t0m | 2010-04-12 20:18:04 +0100 Revert unintentional change in r13151 r13191@spaceinvaders (orig r13155): t0m | 2010-04-13 22:03:58 +0100 Fix spelling errors - RT#54335 r13192@spaceinvaders (orig r13156): t0m | 2010-04-13 23:06:19 +0100 Fix RT#41442 so that temporary files are always, always cleaned up. r13193@spaceinvaders (orig r13157): t0m | 2010-04-13 23:12:13 +0100 Document return of C::T::get is bytes not characters, RT#53678 r13194@spaceinvaders (orig r13158): t0m | 2010-04-13 23:18:19 +0100 Fix RT#49267 r13195@spaceinvaders (orig r13159): t0m | 2010-04-13 23:29:10 +0100 Trivial test feature, fixes RT#53653 r13202@spaceinvaders (orig r13166): t0m | 2010-04-19 00:03:56 +0100 Fix unquoted regex as per RT#24951 r13205@spaceinvaders (orig r13169): t0m | 2010-04-19 03:40:24 +0100 Document the action config here, as people don't seem to find it and this may help.. r13206@spaceinvaders (orig r13170): t0m | 2010-04-19 03:41:57 +0100 Bah, accidentally removed.. r13207@spaceinvaders (orig r13171): t0m | 2010-04-19 08:22:49 +0100 Go away useless warning r13213@spaceinvaders (orig r13177): ajgb | 2010-04-21 12:10:51 +0100 Fix not stripping backslashes in DispatchType::Regex::uri_for_action r15483@spaceinvaders (orig r13190): rafl | 2010-04-28 23:54:04 +0100 Make sure path_to returns an instance of the right Path::Class class. r15484@spaceinvaders (orig r13191): edenc | 2010-04-29 00:29:02 +0100 minor documentation fix for handle_request r15489@spaceinvaders (orig r13193): rafl | 2010-05-03 00:16:25 +0100 Allow parameterized roles to be applied as plugins. r15490@spaceinvaders (orig r13194): t0m | 2010-05-03 00:27:43 +0100 Back out crazy heuristics r15492@spaceinvaders (orig r13196): rafl | 2010-05-03 00:44:30 +0100 Unbreak tests by actually adding the module they're supposed to test. r15494@spaceinvaders (orig r13198): rafl | 2010-05-03 01:51:43 +0100 Remove useless conditional. r15515@spaceinvaders (orig r13219): wreis | 2010-05-06 13:34:10 +0100 make uri_for a bit cleaner r15516@spaceinvaders (orig r13220): wreis | 2010-05-06 14:30:19 +0100 minor fix for Changes file | add me as a contributor r15517@spaceinvaders (orig r13221): rafl | 2010-05-07 22:11:10 +0100 Pass along options to load_class for plugins. r15518@spaceinvaders (orig r13222): rafl | 2010-05-07 22:48:51 +0100 Changelogging. r15519@spaceinvaders (orig r13223): rafl | 2010-05-07 23:06:26 +0100 Version 5.80023. r15535@spaceinvaders (orig r13239): ribasushi | 2010-05-12 12:48:40 +0100 Better stats API explanation (SpiceMan) r15559@spaceinvaders (orig r13263): t0m | 2010-05-15 10:42:58 +0100 r13208@spaceinvaders (orig r13172): t0m | 2010-04-19 09:54:56 +0200 Branch to try and fix the request uri stuff. r13209@spaceinvaders (orig r13173): t0m | 2010-04-19 09:58:37 +0200 Just add comments to tests, no functional changes r13210@spaceinvaders (orig r13174): t0m | 2010-04-19 09:59:14 +0200 Get it mostly working, except uri_for is still buggered r15488@spaceinvaders (orig r13192): t0m | 2010-05-03 00:26:22 +0200 Revert to old behaviour, allow config for new behaviour. Config option name is rubbish, needs fixing r15532@spaceinvaders (orig r13236): t0m | 2010-05-09 01:09:01 +0200 I hate this name less. Others may feel differently r15556@spaceinvaders (orig r13260): t0m | 2010-05-15 10:52:16 +0200 Simplify madness some more, back to how it looked in the original fix_path_info_decoding branch so that we aren't using dodgy heuristics to determine the path. Alter the prepare_path tests so that they're testing the appropriate config option so that we now have tests for both code paths r15557@spaceinvaders (orig r13261): t0m | 2010-05-15 11:20:16 +0200 Add a pile of docs for the new use_request_uri_for_path setting r15558@spaceinvaders (orig r13262): t0m | 2010-05-15 11:38:06 +0200 Add recommendation r15560@spaceinvaders (orig r13264): t0m | 2010-05-15 10:55:07 +0100 Changelog, bump versions, add new contributor :) r15567@spaceinvaders (orig r13271): jhannah | 2010-05-19 23:36:21 +0100 We appear to have a bug where if lazy => 1 isn't set an exception occurs. r15575@spaceinvaders (orig r13279): jhannah | 2010-05-20 20:46:31 +0100 Oops. I should have TODO'd this one. rafl++ r15577@spaceinvaders (orig r13281): t0m | 2010-05-20 21:31:39 +0100 r13203@t0mlaptop (orig r13167): t0m | 2010-04-19 02:57:27 +0100 Branch for doys upcoming metaclass compat fixes. r13204@t0mlaptop (orig r13168): t0m | 2010-04-19 03:02:36 +0100 Remove the fugly hack to avoid metaclass compat issues now that Moose is fixed r15584@spaceinvaders (orig r13282): t0m | 2010-05-20 21:34:34 +0100 Changelog and dep bump for more_metaclass_compat branch merge r15585@spaceinvaders (orig r13283): t0m | 2010-05-20 21:35:20 +0100 Back out hunks I accidentally committed r15598@spaceinvaders (orig r13296): t0m | 2010-05-22 11:17:18 +0100 Changelog, bump Moose dep r15599@spaceinvaders (orig r13297): t0m | 2010-05-22 11:18:14 +0100 r15581@spaceinvaders: t0m | 2010-05-21 18:11:33 +0100 Mech tests branch r15582@spaceinvaders: t0m | 2010-05-21 18:20:01 +0100 Fix --mech as reportedon list as 'create controller 'option' -mechanize fails' r15583@spaceinvaders: t0m | 2010-05-21 18:21:18 +0100 Fix missing - in option, options must be -- r15645@spaceinvaders (orig r13336): t0m | 2010-06-10 10:54:11 +0100 Fix $self vs $class, davewood++ r15660@spaceinvaders (orig r13351): rainboxx | 2010-06-15 12:53:05 +0100 Applied doc patches that might help to figure out how to receive component's config values. r15672@spaceinvaders (orig r13363): jester | 2010-06-22 21:47:45 +0100 Fixed typo r15674@spaceinvaders (orig r13365): bricas | 2010-06-24 13:52:20 +0100 add typo fix to Changes r15689@spaceinvaders (orig r13380): arcanez | 2010-07-03 01:04:07 +0100 remove extra ' r15716@spaceinvaders (orig r13407): hobbs | 2010-07-09 10:40:44 +0100 Try harder to make finalize_error encoding-safe. r15750@spaceinvaders (orig r13441): t0m | 2010-07-28 22:39:41 +0100 r15666@spaceinvaders (orig r13357): jnapiorkowski | 2010-06-16 20:12:46 +0100 new branch r15667@spaceinvaders (orig r13358): jnapiorkowski | 2010-06-16 20:55:08 +0100 broke out and documented action_class method r15749@spaceinvaders (orig r13440): t0m | 2010-07-28 22:39:32 +0100 Changelog r15755@spaceinvaders (orig r13446): t0m | 2010-07-28 23:00:14 +0100 r15662@spaceinvaders (orig r13353): t0m | 2010-06-15 22:15:31 +0100 Branch for rt58057 r15663@spaceinvaders (orig r13354): t0m | 2010-06-15 22:17:32 +0100 Commit standalone test, test which makes the TestApp blow up and the nasty workaround r15726@spaceinvaders (orig r13417): t0m | 2010-07-23 13:29:21 +0100 I know hobbs has a patch to add a load of these, but we should at least add (es) r15753@spaceinvaders (orig r13444): t0m | 2010-07-28 22:56:25 +0100 Entirely the wrong branch, idiot. Remove r13417 r15754@spaceinvaders (orig r13445): t0m | 2010-07-28 23:00:05 +0100 Document the horrible, changelog. I'm going to merge this and fix a real issue as I don't have time to fuck around inside Moose right now and it's been waiting way too long already r15756@spaceinvaders (orig r13447): t0m | 2010-07-28 23:26:21 +0100 Fix TODO tests. r15757@spaceinvaders (orig r13448): t0m | 2010-07-28 23:26:31 +0100 Whitespace --- fef8c827fc1a87b1c32c487e330b5e967874f581 diff --combined lib/Catalyst/Engine/CGI.pm index a8d64a5,0617742..076f5b0 --- a/lib/Catalyst/Engine/CGI.pm +++ b/lib/Catalyst/Engine/CGI.pm @@@ -28,6 -28,43 +28,43 @@@ appropriate engine module This is the Catalyst engine specialized for the CGI environment. + =head1 PATH DECODING + + Most web server environments pass the requested path to the application using environment variables, + from which Catalyst has to reconstruct the request base (i.e. the top level path to / in the application, + exposed as C<< $c->request->base >>) and the request path below that base. + + There are two methods of doing this, both of which have advantages and disadvantages. Which method is used + is determined by the C<< $c->config(use_request_uri_for_path) >> setting (which can either be true or false). + + =head2 use_request_uri_for_path => 0 + + This is the default (and the) traditional method that Catalyst has used for determining the path information. + The path is synthesised from a combination of the C and C environment variables. + The allows the application to behave correctly when C is being used to redirect requests + into the application, as these variables are adjusted by mod_rewrite to take account for the redirect. + + However this method has the major disadvantage that it is impossible to correctly decode some elements + of the path, as RFC 3875 says: "C<< Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot + contain path-segment parameters. >>" This means PATH_INFO is B decoded, and therefore Catalyst + can't distinguish / vs %2F in paths (in addition to other encoded values). + + =head2 use_request_uri_for_path => 1 + + This method uses the C and C environment variables. As C is never + decoded, this means that applications using this mode can correctly handle URIs including the %2F character + (i.e. with C set to C in Apache). + + Given that this method of path resolution is provably more correct, it is recommended that you use + this unless you have a specific need to deploy your application in a non-standard environment, and you are + aware of the implications of not being able to handle encoded URI paths correctly. + + However it also means that in a number of cases when the app isn't installed directly at a path, but instead + is having paths rewritten into it (e.g. as a .cgi/fcgi in a public_html directory, with mod_rewrite in a + .htaccess file, or when SSI is used to rewrite pages into the app, or when sub-paths of the app are exposed + at other URIs than that which the app is 'normally' based at with C), the resolution of + C<< $c->request->base >> will be incorrect. + =head1 OVERLOADED METHODS This class overloads some methods from C. @@@ -117,19 -154,13 +154,19 @@@ sub prepare_path my $scheme = $c->request->secure ? 'https' : 'http'; my $host = $ENV{HTTP_HOST} || $ENV{SERVER_NAME}; my $port = $ENV{SERVER_PORT} || 80; + + # fix up for IIS + if ($ENV{SERVER_SOFTWARE} && $ENV{SERVER_SOFTWARE} =~ m{IIS/[6-9]\.\d}) { + $ENV{PATH_INFO} =~ s/^\Q$ENV{SCRIPT_NAME}\E//; + } + my $script_name = $ENV{SCRIPT_NAME}; $script_name =~ s/([^$URI::uric])/$URI::Escape::escapes{$1}/go if $script_name; my $base_path; if ( exists $ENV{REDIRECT_URL} ) { $base_path = $ENV{REDIRECT_URL}; - $base_path =~ s/$ENV{PATH_INFO}$//; + $base_path =~ s/\Q$ENV{PATH_INFO}\E$//; } else { $base_path = $script_name || '/'; @@@ -154,30 -185,19 +191,19 @@@ } } - # RFC 3875: "Unlike a URI path, the PATH_INFO is not URL-encoded, - # and cannot contain path-segment parameters." This means PATH_INFO - # is always decoded, and the script can't distinguish / vs %2F. - # See https://issues.apache.org/bugzilla/show_bug.cgi?id=35256 - # Here we try to resurrect the original encoded URI from REQUEST_URI. my $path_info = $ENV{PATH_INFO}; - if (my $req_uri = $ENV{REQUEST_URI}) { - $req_uri =~ s/^\Q$base_path\E//; - $req_uri =~ s/\?.*$//; - if ($req_uri) { - # Note that if REQUEST_URI doesn't start with a /, then the user - # is probably using mod_rewrite or something to rewrite requests - # into a sub-path of their application.. - # This means that REQUEST_URI needs information from PATH_INFO - # prepending to it to be useful, otherwise the sub path which is - # being redirected to becomes the app base address which is - # incorrect. - if (substr($req_uri, 0, 1) ne '/') { - my ($match) = $req_uri =~ m|^([^/]+)|; - my ($path_info_part) = $path_info =~ m|^(.*?\Q$match\E)|; - substr($req_uri, 0, length($match), $path_info_part) - if $path_info_part; + if ($c->config->{use_request_uri_for_path}) { + # RFC 3875: "Unlike a URI path, the PATH_INFO is not URL-encoded, + # and cannot contain path-segment parameters." This means PATH_INFO + # is always decoded, and the script can't distinguish / vs %2F. + # See https://issues.apache.org/bugzilla/show_bug.cgi?id=35256 + # Here we try to resurrect the original encoded URI from REQUEST_URI. + if (my $req_uri = $ENV{REQUEST_URI}) { + if (defined $script_name) { + $req_uri =~ s/^\Q$script_name\E//; } - $path_info = $req_uri; + $req_uri =~ s/\?.*$//; + $path_info = $req_uri if $req_uri; } } diff --combined t/aggregate/unit_core_engine_cgi-prepare_path.t index 9e297a1,d5cdc58..2f30ce9 --- a/t/aggregate/unit_core_engine_cgi-prepare_path.t +++ b/t/aggregate/unit_core_engine_cgi-prepare_path.t @@@ -8,18 -8,18 +8,18 @@@ use Catalyst::Engine::CGI # mod_rewrite to app root for non / based app { - my $r = get_req ( + my $r = get_req (0, REDIRECT_URL => '/comics/', SCRIPT_NAME => '/comics/dispatch.cgi', REQUEST_URI => '/comics/', ); - is ''.$r->uri, 'http://www.foo.com/comics/'; - is ''.$r->base, 'http://www.foo.com/comics/'; + is ''.$r->uri, 'http://www.foo.com/comics/', 'uri is correct'; + is ''.$r->base, 'http://www.foo.com/comics/', 'base is correct'; } # mod_rewrite to sub path under app root for non / based app { - my $r = get_req ( + my $r = get_req (0, PATH_INFO => '/foo/bar.gif', REDIRECT_URL => '/comics/foo/bar.gif', SCRIPT_NAME => '/comics/dispatch.cgi', @@@ -31,7 -31,7 +31,7 @@@ # Standard CGI hit for non / based app { - my $r = get_req ( + my $r = get_req (0, PATH_INFO => '/static/css/blueprint/screen.css', SCRIPT_NAME => '/~bobtfish/Gitalist/script/gitalist.cgi', REQUEST_URI => '/~bobtfish/Gitalist/script/gitalist.cgi/static/css/blueprint/screen.css', @@@ -41,19 -41,19 +41,19 @@@ } # / %2F %252F escaping case. { - my $r = get_req ( + my $r = get_req (1, PATH_INFO => '/%2F/%2F', SCRIPT_NAME => '/~bobtfish/Gitalist/script/gitalist.cgi', REQUEST_URI => '/~bobtfish/Gitalist/script/gitalist.cgi/%252F/%252F', ); - is ''.$r->uri, 'http://www.foo.com/~bobtfish/Gitalist/script/gitalist.cgi/%252F/%252F'; - is ''.$r->base, 'http://www.foo.com/~bobtfish/Gitalist/script/gitalist.cgi/'; + is ''.$r->uri, 'http://www.foo.com/~bobtfish/Gitalist/script/gitalist.cgi/%252F/%252F', 'uri correct'; + is ''.$r->base, 'http://www.foo.com/~bobtfish/Gitalist/script/gitalist.cgi/', 'base correct'; } # Using rewrite rules to ask for a sub-path in your app. # E.g. RewriteRule ^(.*)$ /path/to/fastcgi/domainprofi.fcgi/iframeredirect$1 [L,NS] { - my $r = get_req ( + my $r = get_req (0, PATH_INFO => '/iframeredirect/info', SCRIPT_NAME => '', REQUEST_URI => '/info', @@@ -64,7 -64,7 +64,7 @@@ # nginx example from espent with path /"foo" { - my $r = get_req ( + my $r = get_req (0, PATH_INFO => '"foo"', SCRIPT_NAME => '/', REQUEST_URI => '/%22foo%22', @@@ -76,30 -76,43 +76,55 @@@ # nginx example from espent with path /"foo" and the app based at /oslobilder { - my $r = get_req ( + my $r = get_req (1, PATH_INFO => 'oslobilder/"foo"', SCRIPT_NAME => '/oslobilder/', REQUEST_URI => '/oslobilder/%22foo%22', ); - is ''.$r->path, '%22foo%22'; - is ''.$r->uri, 'http://www.foo.com/oslobilder/%22foo%22'; - is ''.$r->base, 'http://www.foo.com/oslobilder/'; + is ''.$r->path, '%22foo%22', 'path correct'; + is ''.$r->uri, 'http://www.foo.com/oslobilder/%22foo%22', 'uri correct'; + is ''.$r->base, 'http://www.foo.com/oslobilder/', 'base correct'; } +# CGI hit on IIS for non / based app +{ - my $r = get_req ( ++ my $r = get_req(0, + SERVER_SOFTWARE => 'Microsoft-IIS/6.0', + PATH_INFO => '/bobtfish/Gitalist/script/gitalist.cgi/static/css/blueprint/screen.css', + SCRIPT_NAME => '/bobtfish/Gitalist/script/gitalist.cgi', + PATH_TRANSLATED => +'C:\\Inetpub\\vhosts\\foo.com\\httpdocs\\bobtfish\\Gitalist\\script\\gitalist.cgi\\static\\css\\blueprint\\screen.css', + ); + is ''.$r->uri, 'http://www.foo.com/bobtfish/Gitalist/script/gitalist.cgi/static/css/blueprint/screen.css'; + is ''.$r->base, 'http://www.foo.com/bobtfish/Gitalist/script/gitalist.cgi/'; +} + + { + my $r = get_req (0, + PATH_INFO => '/auth/login', + SCRIPT_NAME => '/tx', + REQUEST_URI => '/login', + ); + is ''.$r->path, 'auth/login', 'path correct'; + is ''.$r->uri, 'http://www.foo.com/tx/auth/login', 'uri correct'; + is ''.$r->base, 'http://www.foo.com/tx/', 'base correct'; + } + + # test req->base and c->uri_for work correctly after an internally redirected request + # (i.e. REDIRECT_URL set) when the PATH_INFO contains a regex + { + my $path = '/engine/request/uri/Rx(here)'; + my $r = get_req (0, + SCRIPT_NAME => '/', + PATH_INFO => $path, + REQUEST_URI => $path, + REDIRECT_URL => $path, + ); + + is $r->path, 'engine/request/uri/Rx(here)', 'URI contains correct path'; + is $r->base, 'http://www.foo.com/', 'Base is correct'; + } - # FIXME - Test proxy logic # - Test query string # - Test non standard port numbers @@@ -107,6 -120,8 +132,8 @@@ # - Test scheme (secure request on port 80) sub get_req { + my $use_request_uri_for_path = shift; + my %template = ( HTTP_HOST => 'www.foo.com', PATH_INFO => '/', @@@ -115,6 -130,9 +142,9 @@@ local %ENV = (%template, @_); my $i = TestApp->new; + $i->setup_finished(0); + $i->config(use_request_uri_for_path => $use_request_uri_for_path); + $i->setup_finished(1); $i->engine(Catalyst::Engine::CGI->new); $i->engine->prepare_path($i); return $i->req;