X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=catagits%2FCatalyst-Runtime.git;a=blobdiff_plain;f=lib%2FCatalyst%2FUTF8.pod;h=ec398df76d71babec3d5c9453bc9a1ede36e2966;hp=b32bc31a3e340cd7dd655b801120dc39d8fd950a;hb=f9d5afbcf71ea8161b7145df4047cd95b8f63be0;hpb=473078ffb70c9a5585a6b190fc973f5e9000c11b diff --git a/lib/Catalyst/UTF8.pod b/lib/Catalyst/UTF8.pod index b32bc31..ec398df 100644 --- a/lib/Catalyst/UTF8.pod +++ b/lib/Catalyst/UTF8.pod @@ -196,6 +196,26 @@ to send over the wire via HTTP needs to be bytes (not unicode characters). Remember if you use any utf8 literals in your source code, you should use the C pragma. +B Assuming UTF-8 in your query parameters and keywords may be an issue if you have +legacy code where you created URL in templates manually and used an encoding other than UTF-8. +In these cases you may find versions of Catalyst after 5.90080+ will incorrectly decode. For +backwards compatibility we offer three configurations settings, here described in order of +precedence: + +C + +If true, then do not try to character decode any wide characters in your +request URL query or keywords. + +C + +This setting allows one to specify a fixed value for how to decode your query, instead of using +the default, UTF-8. + +C + +If this is true we decode using whatever you set C to. + =head2 UTF8 in Form POST In general most modern browsers will follow the specification, which says that POSTed @@ -254,6 +274,43 @@ based tricks and workarounds for even more odd cases (just search the web for th a number of approaches. Hopefully as more compliant browsers become popular these edge cases will fade. +B It is possible for a form POST multipart response (normally a file upload) to contain +inline content with mixed content character sets and encoding. For example one might create +a POST like this: + + use utf8; + use HTTP::Request::Common; + + my $utf8 = 'test ♥'; + my $shiftjs = 'test テスト'; + my $req = POST '/root/echo_arg', + Content_Type => 'form-data', + Content => [ + arg0 => 'helloworld', + Encode::encode('UTF-8','♥') => Encode::encode('UTF-8','♥♥'), + arg1 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=UTF-8', + 'Content' => Encode::encode('UTF-8', $utf8)], + arg2 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=SHIFT_JIS', + 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], + arg2 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=SHIFT_JIS', + 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], + ]; + +In this case we've created a POST request but each part specifies its own content +character set (and setting a content encoding would also be possible). Generally one +would not run into this situation in a web browser context but for completeness sake +Catalyst will notice if a multipart POST contains parts with complex or extended +header information and in those cases it will not attempt to apply decoding to the +form values. Instead the part will be represented as an instance of an object +L which will contain all the header information needed +for you to perform custom parser of the data. + =head1 UTF8 Encoding in Body Response When does L encode your response body and what rules does it use to @@ -558,10 +615,17 @@ so you can disable this with the following configurations setting: Where C is your L subclass. +If you do not wish to disable all the Catalyst encoding features, you may disable specific +features via two additional configuration options: 'skip_body_param_unicode_decoding' +and 'skip_complex_post_part_handling'. The first will skip any attempt to decode POST +parameters in the creating of body parameters and the second will skip creation of instances +of L in the case that the multipart form upload contains parts +with a mix of content character sets. + If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to report it (and not try to hack a workaround in your local code). We also recommend that you regard such a workaround as a temporary solution. It is ideal if L extension -authors can start to count on L doing the write thing for encoding +authors can start to count on L doing the write thing for encoding. =head1 Conclusion