From: John Napiorkowski Date: Mon, 23 Feb 2015 22:07:33 +0000 (-0600) Subject: extend the docs on UTF8 to include recent updates X-Git-Tag: 5.90084~1 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=catagits%2FCatalyst-Runtime.git;a=commitdiff_plain;h=b16a64afb615b6037bfddcca42e92c909a77f0db extend the docs on UTF8 to include recent updates --- diff --git a/Changes b/Changes index a782049..d334b25 100644 --- a/Changes +++ b/Changes @@ -8,7 +8,7 @@ file uploads. In these cases when Catalyst can't determine what the value of a form upload is, will return an instance of Catalyst::Request::PartData will all the information need to figure it out. Documentation about this corner - case. + case. For RT https://rt.cpan.org/Ticket/Display.html?id=101556 - Two new application configuration parameters 'skip_body_param_unicode_decoding' and 'skip_complex_post_part_handling' to assist you with any backward compatibility issues with all the new UTF8 work in the most recent stable diff --git a/lib/Catalyst/UTF8.pod b/lib/Catalyst/UTF8.pod index b32bc31..91aeaed 100644 --- a/lib/Catalyst/UTF8.pod +++ b/lib/Catalyst/UTF8.pod @@ -254,6 +254,43 @@ based tricks and workarounds for even more odd cases (just search the web for th a number of approaches. Hopefully as more compliant browsers become popular these edge cases will fade. +B It is possible for a form POST multipart response (normally a file upload) to contain +inline content with mixed content character sets and encoding. For example one might create +a POST like this: + + use utf8; + use HTTP::Request::Common; + + my $utf8 = 'test ♥'; + my $shiftjs = 'test テスト'; + my $req = POST '/root/echo_arg', + Content_Type => 'form-data', + Content => [ + arg0 => 'helloworld', + Encode::encode('UTF-8','♥') => Encode::encode('UTF-8','♥♥'), + arg1 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=UTF-8', + 'Content' => Encode::encode('UTF-8', $utf8)], + arg2 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=SHIFT_JIS', + 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], + arg2 => [ + undef, '', + 'Content-Type' =>'text/plain; charset=SHIFT_JIS', + 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], + ]; + +In this case we've created a POST request but each part specifies its own content +character set (and setting a content encoding would also be possible). Generally one +would not run into this situation in a web browser context but for completeness sake +Catalyst will notice if a multipart POST contains parts with complex or extended +header information and in those cases it will not attempt to apply decoding to the +form values. Instead the part will be represented as an instance of an object +L which will contain all the header information needed +for you to perform custom parser of the data. + =head1 UTF8 Encoding in Body Response When does L encode your response body and what rules does it use to @@ -558,10 +595,17 @@ so you can disable this with the following configurations setting: Where C is your L subclass. +If you do not wish to disable all the Catalyst encoding features, you may disable specific +features via two additional configuration options: 'skip_body_param_unicode_decoding' +and 'skip_complex_post_part_handling'. The first will skip any attempt to decode POST +parameters in the creating of body parameters and the second will skip creation of instances +of L in the case that the multipart form upload contains parts +with a mix of content character sets. + If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to report it (and not try to hack a workaround in your local code). We also recommend that you regard such a workaround as a temporary solution. It is ideal if L extension -authors can start to count on L doing the write thing for encoding +authors can start to count on L doing the write thing for encoding. =head1 Conclusion