update distar url
[catagits/Catalyst-Runtime.git] / lib / Catalyst / UTF8.pod
CommitLineData
a09b49d2 1=encoding UTF-8
2
3=head1 Name
4
d63cc9c8 5Catalyst::UTF8 - All About UTF8 and Catalyst Encoding
a09b49d2 6
7=head1 Description
8
b596572b 9Starting in 5.90080 L<Catalyst> will enable UTF8 encoding by default for
a09b49d2 10text like body responses. In addition we've made a ton of fixes around encoding
11and utf8 scattered throughout the codebase. This document attempts to give
12an overview of the assumptions and practices that L<Catalyst> uses when
13dealing with UTF8 and encoding issues. You should also review the
14Changes file, L<Catalyst::Delta> and L<Catalyst::Upgrading> for more.
15
d63cc9c8 16We attempt to describe all relevant processes, try to give some advice
a09b49d2 17and explain where we may have been exceptional to respect our commitment
18to backwards compatibility.
19
b596572b 20=head1 UTF8 in Controller Actions
a09b49d2 21
22Using UTF8 characters in your Controller classes and actions.
23
24=head2 Summary
25
26In this section we will review changes to how UTF8 characters can be used in
27controller actions, how it looks in the debugging screens (and your logs)
28as well as how you construct L<URL> objects to actions with UTF8 paths
29(or using UTF8 args or captures).
30
31=head2 Unicode in Controllers and URLs
32
33 package MyApp::Controller::Root;
34
473078ff 35 use utf8;
a09b49d2 36 use base 'Catalyst::Controller';
37
38 sub heart_with_arg :Path('♥') Args(1) {
39 my ($self, $c, $arg) = @_;
40 }
41
42 sub base :Chained('/') CaptureArgs(0) {
43 my ($self, $c) = @_;
44 }
45
46 sub capture :Chained('base') PathPart('♥') CaptureArgs(1) {
47 my ($self, $c, $capture) = @_;
48 }
49
50 sub arg :Chained('capture') PathPart('♥') Args(1) {
51 my ($self, $c, $arg) = @_;
52 }
53
54=head2 Discussion
55
56In the example controller above we have constructed two matchable URL routes:
57
58 http://localhost/root/♥/{arg}
59 http://localhost/base/♥/{capture}/♥/{arg}
60
61The first one is a classic Path type action and the second uses Chaining, and
62spans three actions in total. As you can see, you can use unicode characters
473078ff 63in your Path and PathPart attributes (remember to use the C<utf8> pragma to allow
a09b49d2 64these multibyte characters in your source). The two constructed matchable routes
65would match the following incoming URLs:
66
67 (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg}
68 (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg}
69
70That path path C<%E2%99%A5> is url encoded unicode (assuming you are hitting this with
71a reasonably modern browser). Its basically what goes over HTTP when your type a
72browser location that has the unicode 'heart' in it. However we will use the unicode
73symbol in your debugging messages:
74
75 [debug] Loaded Path actions:
76 .-------------------------------------+--------------------------------------.
77 | Path | Private |
78 +-------------------------------------+--------------------------------------+
79 | /root/♥/* | /root/heart_with_arg |
80 '-------------------------------------+--------------------------------------'
81
82 [debug] Loaded Chained actions:
83 .-------------------------------------+--------------------------------------.
84 | Path Spec | Private |
85 +-------------------------------------+--------------------------------------+
86 | /base/♥/*/♥/* | /root/base (0) |
87 | | -> /root/capture (1) |
88 | | => /root/arg |
89 '-------------------------------------+--------------------------------------'
90
91And if the requested URL uses unicode characters in your captures or args (such as
92C<http://localhost:/base/♥/♥/♥/♥>) you should see the arguments and captures as their
93unicode characters as well:
94
95 [debug] Arguments are "♥"
96 [debug] "GET" request for "base/♥/♥/♥/♥" from "127.0.0.1"
97 .------------------------------------------------------------+-----------.
98 | Action | Time |
99 +------------------------------------------------------------+-----------+
100 | /root/base | 0.000080s |
101 | /root/capture | 0.000075s |
102 | /root/arg | 0.000755s |
103 '------------------------------------------------------------+-----------'
104
105Again, remember that we are display the unicode character and using it to match actions
106containing such multibyte characters BUT over HTTP you are getting these as URL encoded
b596572b 107bytes. For example if you looked at the L<PSGI> C<$env> value for C<REQUEST_URI> you
108would see (for the above request)
a09b49d2 109
110 REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5"
111
112So on the incoming request we decode so that we can match and display unicode characters
113(after decoding the URL encoding). This makes it straightforward to use these types of
114multibyte characters in your actions and see them incoming in captures and arguments. Please
115keep this in might if you are doing for example regular expression matching, length determination
116or other string comparisons, you will need to try these incoming variables as though UTF8
117strings. For example in the following action:
118
119 sub arg :Chained('capture') PathPart('♥') Args(1) {
120 my ($self, $c, $arg) = @_;
121 }
122
123when $arg is "♥" you should expect C<length($arg)> to be C<1> since it is indeed one character
124although it will take more than one byte to store.
125
126=head2 UTF8 in constructing URLs via $c->uri_for
127
128For the reverse (constructing meaningful URLs to actions that contain multibyte characters
129in their paths or path parts, or when you want to include such characters in your captures
130or arguments) L<Catalyst> will do the right thing (again just remember to use the C<utf8>
131pragma).
132
133 use utf8;
134 my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']);
135
473078ff 136When you stringify this object (for use in a template, for example) it will automatically
a09b49d2 137do the right thing regarding utf8 encoding and url encoding.
138
139 http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5
140
141Since again what you want is a properly url encoded version of this. In this case your string
142length will reflect URL encoded bytes, not the character length. Ultimately what you want
143to send over the wire via HTTP needs to be bytes.
144
145=head1 UTF8 in GET Query and Form POST
146
147What Catalyst does with UTF8 in your GET and classic HTML Form POST
148
149=head2 UTF8 in URL query and keywords
150
473078ff 151The same rules that we find in URL paths also cover URL query parts. That is
152if one types a URL like this into the browser
a09b49d2 153
88e5a8b0 154 http://localhost/example?♥=♥♥
a09b49d2 155
156When this goes 'over the wire' to your application server its going to be as
157percent encoded bytes:
158
159
88e5a8b0 160 http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
a09b49d2 161
162When L<Catalyst> encounters this we decode the percent encoding and the utf8
163so that we can properly display this information (such as in the debugging
164logs or in a response.)
165
88e5a8b0 166 [debug] Query Parameters are:
167 .-------------------------------------+--------------------------------------.
168 | Parameter | Value |
169 +-------------------------------------+--------------------------------------+
170 | ♥ | ♥♥ |
171 '-------------------------------------+--------------------------------------'
a09b49d2 172
173All the values and keys that are part of $c->req->query_parameters will be
174utf8 decoded. So you should not need to do anything special to take those
175values/keys and send them to the body response (since as we will see later
176L<Catalyst> will do all the necessary encoding for you).
177
178Again, remember that values of your parameters are now decode into Unicode strings. so
179for example you'd expect the result of length to reflect the character length not
b596572b 180the byte length.
a09b49d2 181
182Just like with arguments and captures, you can use utf8 literals (or utf8
183strings) in $c->uri_for:
184
88e5a8b0 185 use utf8;
186 my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'});
a09b49d2 187
473078ff 188When you stringify this object (for use in a template, for example) it will automatically
a09b49d2 189do the right thing regarding utf8 encoding and url encoding.
190
88e5a8b0 191 http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
a09b49d2 192
193Since again what you want is a properly url encoded version of this. Ultimately what you want
b596572b 194to send over the wire via HTTP needs to be bytes (not unicode characters).
a09b49d2 195
196Remember if you use any utf8 literals in your source code, you should use the
197C<use utf8> pragma.
198
f9d5afbc 199B<NOTE:> Assuming UTF-8 in your query parameters and keywords may be an issue if you have
200legacy code where you created URL in templates manually and used an encoding other than UTF-8.
201In these cases you may find versions of Catalyst after 5.90080+ will incorrectly decode. For
202backwards compatibility we offer three configurations settings, here described in order of
203precedence:
204
205C<do_not_decode_query>
206
207If true, then do not try to character decode any wide characters in your
27d0c51a 208request URL query or keywords. You will need to handle this manually in your action code
77b90892 209(although if you choose this setting, chances are you already do this).
f9d5afbc 210
211C<default_query_encoding>
212
213This setting allows one to specify a fixed value for how to decode your query, instead of using
214the default, UTF-8.
215
216C<decode_query_using_global_encoding>
217
218If this is true we decode using whatever you set C<encoding> to.
219
a09b49d2 220=head2 UTF8 in Form POST
221
222In general most modern browsers will follow the specification, which says that POSTed
223form fields should be encoded in the same way that the document was served with. That means
224that if you are using modern Catalyst and serving UTF8 encoded responses, a browser is
225supposed to notice that and encode the form POSTs accordingly.
226
227As a result since L<Catalyst> now serves UTF8 encoded responses by default, this means that
228you can mostly rely on incoming form POSTs to be so encoded. L<Catalyst> will make this
229assumption and decode accordingly (unless you explicitly turn off encoding...) If you are
b596572b 230running Catalyst in developer debug, then you will see the correct unicode characters in
a09b49d2 231the debug output. For example if you generate a POST request:
232
88e5a8b0 233 use Catalyst::Test 'MyApp';
234 use utf8;
a09b49d2 235
88e5a8b0 236 my $res = request POST "/example/posted", ['♥'=>'♥', '♥♥'=>'♥'];
a09b49d2 237
238Running in CATALYST_DEBUG=1 mode you should see output like this:
239
240 [debug] Body Parameters are:
241 .-------------------------------------+--------------------------------------.
242 | Parameter | Value |
243 +-------------------------------------+--------------------------------------+
244 | ♥ | ♥ |
245 | ♥♥ | ♥ |
246 '-------------------------------------+--------------------------------------'
247
248And if you had a controller like this:
249
88e5a8b0 250 package MyApp::Controller::Example;
b596572b 251
88e5a8b0 252 use base 'Catalyst::Controller';
a09b49d2 253
88e5a8b0 254 sub posted :POST Local {
255 my ($self, $c) = @_;
256 $c->res->content_type('text/plain');
257 $c->res->body("hearts => ${\$c->req->post_parameters->{♥}}");
258 }
a09b49d2 259
260The following test case would be true:
261
88e5a8b0 262 use Encode 2.21 'decode_utf8';
263 is decode_utf8($req->content), 'hearts => ♥';
a09b49d2 264
b596572b 265In this case we decode so that we can print and compare strings with multibyte characters.
a09b49d2 266
267B<NOTE> In some cases some browsers may not follow the specification and set the form POST
268encoding based on the server response. Catalyst itself doesn't attempt any workarounds, but one
269common approach is to use a hidden form field with a UTF8 value (You might be familiar with
270this from how Ruby on Rails has HTML form helpers that do that automatically). In that case
271some browsers will send UTF8 encoded if it notices the hidden input field contains such a
272character. Also, you can add an HTML attribute to your form tag which many modern browsers
273will respect to set the encoding (accept-charset="utf-8"). And lastly there are some javascript
274based tricks and workarounds for even more odd cases (just search the web for this will return
275a number of approaches. Hopefully as more compliant browsers become popular these edge cases
276will fade.
277
b16a64af 278B<NOTE> It is possible for a form POST multipart response (normally a file upload) to contain
279inline content with mixed content character sets and encoding. For example one might create
280a POST like this:
281
282 use utf8;
283 use HTTP::Request::Common;
284
285 my $utf8 = 'test ♥';
286 my $shiftjs = 'test テスト';
287 my $req = POST '/root/echo_arg',
288 Content_Type => 'form-data',
289 Content => [
290 arg0 => 'helloworld',
291 Encode::encode('UTF-8','♥') => Encode::encode('UTF-8','♥♥'),
292 arg1 => [
293 undef, '',
294 'Content-Type' =>'text/plain; charset=UTF-8',
295 'Content' => Encode::encode('UTF-8', $utf8)],
296 arg2 => [
297 undef, '',
298 'Content-Type' =>'text/plain; charset=SHIFT_JIS',
299 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
300 arg2 => [
301 undef, '',
302 'Content-Type' =>'text/plain; charset=SHIFT_JIS',
303 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
304 ];
305
306In this case we've created a POST request but each part specifies its own content
307character set (and setting a content encoding would also be possible). Generally one
308would not run into this situation in a web browser context but for completeness sake
309Catalyst will notice if a multipart POST contains parts with complex or extended
743f6b46 310header information. In these cases we will try to inspect the meta data and do the
311right thing (in the above case we'd use SHIFT_JIS to decode, not UTF-8). However if
312after inspecting the headers we cannot figure out how to decode the data, in those cases it
313will not attempt to apply decoding to the form values. Instead the part will be represented as
88e5a8b0 314an instance of an object L<Catalyst::Request::PartData> which will contain all the header
743f6b46 315information needed for you to perform custom parser of the data.
316
317Ideally we'd fix L<Catalyst> to be smarter about decoding so please submit your cases of
79fb8f95 318this so we can add intelligence to the parser and find a way to extract a valid value out
743f6b46 319of it.
b16a64af 320
a09b49d2 321=head1 UTF8 Encoding in Body Response
322
323When does L<Catalyst> encode your response body and what rules does it use to
324determine when that is needed.
325
326=head2 Summary
327
88e5a8b0 328 use utf8;
329 use warnings;
330 use strict;
a09b49d2 331
88e5a8b0 332 package MyApp::Controller::Root;
a09b49d2 333
88e5a8b0 334 use base 'Catalyst::Controller';
335 use File::Spec;
a09b49d2 336
88e5a8b0 337 sub scalar_body :Local {
338 my ($self, $c) = @_;
339 $c->response->content_type('text/html');
340 $c->response->body("<p>This is scalar_body action ♥</p>");
341 }
a09b49d2 342
88e5a8b0 343 sub stream_write :Local {
344 my ($self, $c) = @_;
345 $c->response->content_type('text/html');
346 $c->response->write("<p>This is stream_write action ♥</p>");
347 }
a09b49d2 348
88e5a8b0 349 sub stream_write_fh :Local {
350 my ($self, $c) = @_;
351 $c->response->content_type('text/html');
a09b49d2 352
88e5a8b0 353 my $writer = $c->res->write_fh;
354 $writer->write_encoded('<p>This is stream_write_fh action ♥</p>');
355 $writer->close;
356 }
a09b49d2 357
88e5a8b0 358 sub stream_body_fh :Local {
359 my ($self, $c) = @_;
360 my $path = File::Spec->catfile('t', 'utf8.txt');
361 open(my $fh, '<', $path) || die "trouble: $!";
362 $c->response->content_type('text/html');
363 $c->response->body($fh);
364 }
a09b49d2 365
366=head2 Discussion
367
368Beginning with L<Catalyst> version 5.90080 You no longer need to set the encoding
369configuration (although doing so won't hurt anything).
370
371Currently we only encode if the content type is one of the types which generally expects a
372UTF8 encoding. This is determined by the following regular expression:
373
374 our $DEFAULT_ENCODE_CONTENT_TYPE_MATCH = qr{text|xml$|javascript$};
375 $c->response->content_type =~ /$DEFAULT_ENCODE_CONTENT_TYPE_MATCH/
376
377This is a global variable in L<Catalyst::Response> which is stored in the C<encodable_content_type>
378attribute of $c->response. You may currently alter this directly on the response or globally. In
379the future we may offer a configuration setting for this.
380
381This would match content-types like the following (examples)
382
383 text/plain
384 text/html
385 text/xml
386 application/javascript
387 application/xml
388 application/vnd.user+xml
389
b596572b 390You should set your content type prior to header finalization if you want L<Catalyst> to
a09b49d2 391encode.
392
393B<NOTE> We do not attempt to encode C<application/json> since the two most commonly used
394approaches (L<Catalyst::View::JSON> and L<Catalyst::Action::REST>) have already configured
395their JSON encoders to produce properly encoding UTF8 responses. If you are rolling your
396own JSON encoding, you may need to set the encoder to do the right thing (or override
397the global regular expression to include the JSON media type).
398
399=head2 Encoding with Scalar Body
400
401L<Catalyst> supports several methods of supplying your response with body content. The first
402and currently most common is to set the L<Catalyst::Response> ->body with a scalar string (
403as in the example):
404
88e5a8b0 405 use utf8;
a09b49d2 406
88e5a8b0 407 sub scalar_body :Local {
408 my ($self, $c) = @_;
409 $c->response->content_type('text/html');
410 $c->response->body("<p>This is scalar_body action ♥</p>");
411 }
a09b49d2 412
413In general you should need to do nothing else since L<Catalyst> will automatically encode
414this string during body finalization. The only matter to watch out for is to make sure
415the string has not already been encoded, as this will result in double encoding errors.
416
417B<NOTE> pay attention to the content-type setting in the example. L<Catalyst> inspects that
418content type carefully to determine if the body needs encoding).
419
420B<NOTE> If you set the character set of the response L<Catalyst> will skip encoding IF the
473078ff 421character set is set to something that doesn't match $c->encoding->mime_name. We will assume
a09b49d2 422if you are setting an alternative character set, that means you want to handle the encoding
423yourself. However it might be easier to set $c->encoding for a given response cycle since
424you can override this for a given response. For example here's how to override the default
425encoding and set the correct character set in the response:
426
427 sub override_encoding :Local {
88e5a8b0 428 my ($self, $c) = @_;
429 $c->res->content_type('text/plain');
430 $c->encoding(Encode::find_encoding('Shift_JIS'));
431 $c->response->body("テスト");
a09b49d2 432 }
433
434This will use the alternative encoding for a single response.
435
436B<NOTE> If you manually set the content-type character set to whatever $c->encoding->mime_name
437is set to, we STILL encode, rather than assume your manual setting is a flag to override. This
aca337aa 438is done to support backward compatible assumptions (in particular L<Catalyst::View::TT> has set
439a utf-8 character set in its default content-type for ages, even though it does not itself do any
440encoding on the body response). If you are going to handle encoding manually you may set
441$c->clear_encoding for a single request response cycle, or as in the above example set an alternative
442encoding.
a09b49d2 443
444=head2 Encoding with streaming type responses
445
446L<Catalyst> offers two approaches to streaming your body response. Again, you must remember
447to set your content type prior to streaming, since invoking a streaming response will automatically
448finalize and send your HTTP headers (and your content type MUST be one that matches the regular
449expression given above.)
450
451Also, if you are going to override $c->encoding (or invoke $c->clear_encoding), you should do
452that before anything else!
453
454The first streaming method is to use the C<write> method on the response object. This method
455allows 'inlined' streaming and is generally used with blocking style servers.
456
88e5a8b0 457 sub stream_write :Local {
458 my ($self, $c) = @_;
459 $c->response->content_type('text/html');
460 $c->response->write("<p>This is stream_write action ♥</p>");
461 }
a09b49d2 462
463You may call the C<write> method as often as you need to finish streaming all your content.
464L<Catalyst> will encode each line in turn as long as the content-type meets the 'encodable types'
465requirement and $c->encoding is set (which it is, as long as you did not change it).
466
467B<NOTE> If you try to change the encoding after you start the stream, this will invoke an error
473078ff 468response. However since you've already started streaming this will not show up as an HTTP error
a09b49d2 469status code, but rather error information in your body response and an error in your logs.
470
80ba671f 471B<NOTE> If you use ->body AFTER using ->write (for example you may do this to write your HTML
472HEAD information as fast as possible) we expect the contents to body to be encoded as it
473normally would be if you never called ->write. In general unless you are doing weird custom
474stuff with encoding this is likely to just already do the correct thing.
475
a09b49d2 476The second way to stream a response is to get the response writer object and invoke methods
477on that directly:
478
88e5a8b0 479 sub stream_write_fh :Local {
480 my ($self, $c) = @_;
481 $c->response->content_type('text/html');
a09b49d2 482
88e5a8b0 483 my $writer = $c->res->write_fh;
484 $writer->write_encoded('<p>This is stream_write_fh action ♥</p>');
485 $writer->close;
486 }
a09b49d2 487
473078ff 488This can be used just like the C<write> method, but typically you request this object when
a09b49d2 489you want to do a nonblocking style response since the writer object can be closed over or
490sent to a model that will invoke it in a non blocking manner. For more on using the writer
491object for non blocking responses you should review the C<Catalyst> documentation and also
492you can look at several articles from last years advent, in particular:
493
494L<http://www.catalystframework.org/calendar/2013/10>, L<http://www.catalystframework.org/calendar/2013/11>,
495L<http://www.catalystframework.org/calendar/2013/12>, L<http://www.catalystframework.org/calendar/2013/13>,
496L<http://www.catalystframework.org/calendar/2013/14>.
497
498The main difference this year is that previously calling ->write_fh would return the actual
27d0c51a 499L<Plack> writer object that was supplied by your Plack application handler, whereas now we wrap
a09b49d2 500that object in a lightweight decorator object that proxies the C<write> and C<close> methods
501and supplies an additional C<write_encoded> method. C<write_encoded> does the exact same thing
502as C<write> except that it will first encode the string when necessary. In general if you are
503streaming encodable content such as HTML this is the method to use. If you are streaming
504binary content, you should just use the C<write> method (although if the content type is set
505correctly we would skip encoding anyway, but you may as well avoid the extra noop overhead).
506
507The last style of content response that L<Catalyst> supports is setting the body to a filehandle
508like object. In this case the object is passed down to the Plack application handler directly
509and currently we do nothing to set encoding.
510
88e5a8b0 511 sub stream_body_fh :Local {
512 my ($self, $c) = @_;
513 my $path = File::Spec->catfile('t', 'utf8.txt');
514 open(my $fh, '<', $path) || die "trouble: $!";
515 $c->response->content_type('text/html');
516 $c->response->body($fh);
517 }
a09b49d2 518
519In this example we create a filehandle to a text file that contains UTF8 encoded characters. We
520pass this down without modification, which I think is correct since we don't want to double
521encode. However this may change in a future development release so please be sure to double
522check the current docs and changelog. Its possible a future release will require you to to set
523a encoding on the IO layer level so that we can be sure to properly encode at body finalization.
524So this is still an edge case we are writing test examples for. But for now if you are returning
525a filehandle like response, you are expected to make sure you are following the L<PSGI> specification
473078ff 526and return raw bytes.
a09b49d2 527
528=head2 Override the Encoding on Context
529
530As already noted you may change the current encoding (or remove it) by setting an alternative
531encoding on the context;
532
533 $c->encoding(Encode::find_encoding('Shift_JIS'));
534
535Please note that you can continue to change encoding UNTIL the headers have been finalized. The
536last setting always wins. Trying to change encoding after header finalization is an error.
537
538=head2 Setting the Content Encoding HTTP Header
539
540In some cases you may set a content encoding on your response. For example if you are encoding
541your response with gzip. In this case you are again on your own. If we notice that the
542content encoding header is set when we hit finalization, we skip automatic encoding:
543
544 use Encode;
545 use Compress::Zlib;
546 use utf8;
547
548 sub gzipped :Local {
88e5a8b0 549 my ($self, $c) = @_;
a09b49d2 550
88e5a8b0 551 $c->res->content_type('text/plain');
552 $c->res->content_type_charset('UTF-8');
553 $c->res->content_encoding('gzip');
a09b49d2 554
88e5a8b0 555 $c->response->body(
556 Compress::Zlib::memGzip(
557 Encode::encode_utf8("manual_1 ♥")));
a09b49d2 558 }
559
560
561If you are using L<Catalyst::Plugin::Compress> you need to upgrade to the most recent version
562in order to be compatible with changes introduced in L<Catalyst> 5.90080. Other plugins may
563require updates (please open bugs if you find them).
564
565B<NOTE> Content encoding may be set to 'identify' and we will still perform automatic encoding
566if the content type is encodable and an encoding is present for the context.
567
568=head2 Using Common Views
569
570The following common views have been updated so that their tests pass with default UTF8
571encoding for L<Catalyst>:
572
573L<Catalyst::View::TT>, L<Catalyst::View::Mason>, L<Catalyst::View::HTML::Mason>,
574L<Catalyst::View::Xslate>
575
576See L<Catalyst::Upgrading> for additional information on L<Catalyst> extensions that require
577upgrades.
578
579In generally for the common views you should not need to do anything special. If your actual
580template files contain UTF8 literals you should set configuration on your View to enable that.
581For example in TT, if your template has actual UTF8 character in it you should do the following:
582
583 MyApp::View::TT->config(ENCODING => 'utf-8');
584
585However L<Catalyst::View::Xslate> wants to do the UTF8 encoding for you (We assume that the
586authors of that view did this as a workaround to the fact that until now encoding was not core
587to L<Catalyst>. So if you use that view, you either need to tell it to not encode, or you need
588to turn off encoding for Catalyst.
589
590 MyApp::View::Xslate->config(encode_body => 0);
591
592or
593
594 MyApp->config(encoding=>undef);
595
596Preference is to disable it in the View.
597
598Other views may be similar. You should review View documentation and test during upgrading.
599We tried to make sure most common views worked properly and noted all workaround but if we
600missed something please alert the development team (instead of introducing a local hack into
601your application that will mean nobody will ever upgrade it...).
602
aca337aa 603=head2 Setting the response from an external PSGI application.
604
605L<Catalyst::Response> allows one to set the response from an external L<PSGI> application.
606If you do this, and that external application sets a character set on the content-type, we
607C<clear_encoding> for the rest of the response. This is done to prevent double encoding.
608
609B<NOTE> Even if the character set of the content type is the same as the encoding set in
610$c->encoding, we still skip encoding. This is a regrettable difference from the general rule
611outlined above, where if the current character set is the same as the current encoding, we
612encode anyway. Nevertheless I think this is the correct behavior since the earlier rule exists
613only to support backward compatibility with L<Catalyst::View::TT>.
614
615In general if you want L<Catalyst> to handle encoding, you should avoid setting the content
616type character set since Catalyst will do so automatically based on the requested response
617encoding. Its best to request alternative encodings by setting $c->encoding and if you really
618want manual control of encoding you should always $c->clear_encoding so that programmers that
619come after you are very clear as to your intentions.
620
a09b49d2 621=head2 Disabling default UTF8 encoding
622
623You may encounter issues with your legacy code running under default UTF8 body encoding. If
624so you can disable this with the following configurations setting:
625
88e5a8b0 626 MyApp->config(encoding=>undef);
a09b49d2 627
628Where C<MyApp> is your L<Catalyst> subclass.
629
b16a64af 630If you do not wish to disable all the Catalyst encoding features, you may disable specific
631features via two additional configuration options: 'skip_body_param_unicode_decoding'
632and 'skip_complex_post_part_handling'. The first will skip any attempt to decode POST
633parameters in the creating of body parameters and the second will skip creation of instances
634of L<Catalyst::Request::PartData> in the case that the multipart form upload contains parts
635with a mix of content character sets.
636
a09b49d2 637If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to
638report it (and not try to hack a workaround in your local code). We also recommend that you
639regard such a workaround as a temporary solution. It is ideal if L<Catalyst> extension
d4764084 640authors can start to count on L<Catalyst> doing the right thing for encoding.
a09b49d2 641
642=head1 Conclusion
643
644This document has attempted to be a complete review of how UTF8 and encoding works in the
645current version of L<Catalyst> and also to document known issues, gotchas and backward
646compatible hacks. Please report issues to the development team.
647
648=head1 Author
649
88e5a8b0 650John Napiorkowski L<jjnapiork@cpan.org|mailto:jjnapiork@cpan.org>
a09b49d2 651
652=cut
653