Commit | Line | Data |
a09b49d2 |
1 | =encoding UTF-8 |
2 | |
3 | =head1 Name |
4 | |
d63cc9c8 |
5 | Catalyst::UTF8 - All About UTF8 and Catalyst Encoding |
a09b49d2 |
6 | |
7 | =head1 Description |
8 | |
b596572b |
9 | Starting in 5.90080 L<Catalyst> will enable UTF8 encoding by default for |
a09b49d2 |
10 | text like body responses. In addition we've made a ton of fixes around encoding |
11 | and utf8 scattered throughout the codebase. This document attempts to give |
12 | an overview of the assumptions and practices that L<Catalyst> uses when |
13 | dealing with UTF8 and encoding issues. You should also review the |
14 | Changes file, L<Catalyst::Delta> and L<Catalyst::Upgrading> for more. |
15 | |
d63cc9c8 |
16 | We attempt to describe all relevant processes, try to give some advice |
a09b49d2 |
17 | and explain where we may have been exceptional to respect our commitment |
18 | to backwards compatibility. |
19 | |
b596572b |
20 | =head1 UTF8 in Controller Actions |
a09b49d2 |
21 | |
22 | Using UTF8 characters in your Controller classes and actions. |
23 | |
24 | =head2 Summary |
25 | |
26 | In this section we will review changes to how UTF8 characters can be used in |
27 | controller actions, how it looks in the debugging screens (and your logs) |
28 | as well as how you construct L<URL> objects to actions with UTF8 paths |
29 | (or using UTF8 args or captures). |
30 | |
31 | =head2 Unicode in Controllers and URLs |
32 | |
33 | package MyApp::Controller::Root; |
34 | |
35 | use uf8; |
36 | use base 'Catalyst::Controller'; |
37 | |
38 | sub heart_with_arg :Path('♥') Args(1) { |
39 | my ($self, $c, $arg) = @_; |
40 | } |
41 | |
42 | sub base :Chained('/') CaptureArgs(0) { |
43 | my ($self, $c) = @_; |
44 | } |
45 | |
46 | sub capture :Chained('base') PathPart('♥') CaptureArgs(1) { |
47 | my ($self, $c, $capture) = @_; |
48 | } |
49 | |
50 | sub arg :Chained('capture') PathPart('♥') Args(1) { |
51 | my ($self, $c, $arg) = @_; |
52 | } |
53 | |
54 | =head2 Discussion |
55 | |
56 | In the example controller above we have constructed two matchable URL routes: |
57 | |
58 | http://localhost/root/♥/{arg} |
59 | http://localhost/base/♥/{capture}/♥/{arg} |
60 | |
61 | The first one is a classic Path type action and the second uses Chaining, and |
62 | spans three actions in total. As you can see, you can use unicode characters |
63 | in your Path and PartPart attributes (remember to use the C<utf8> pragma to allow |
64 | these multibyte characters in your source). The two constructed matchable routes |
65 | would match the following incoming URLs: |
66 | |
67 | (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg} |
68 | (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg} |
69 | |
70 | That path path C<%E2%99%A5> is url encoded unicode (assuming you are hitting this with |
71 | a reasonably modern browser). Its basically what goes over HTTP when your type a |
72 | browser location that has the unicode 'heart' in it. However we will use the unicode |
73 | symbol in your debugging messages: |
74 | |
75 | [debug] Loaded Path actions: |
76 | .-------------------------------------+--------------------------------------. |
77 | | Path | Private | |
78 | +-------------------------------------+--------------------------------------+ |
79 | | /root/♥/* | /root/heart_with_arg | |
80 | '-------------------------------------+--------------------------------------' |
81 | |
82 | [debug] Loaded Chained actions: |
83 | .-------------------------------------+--------------------------------------. |
84 | | Path Spec | Private | |
85 | +-------------------------------------+--------------------------------------+ |
86 | | /base/♥/*/♥/* | /root/base (0) | |
87 | | | -> /root/capture (1) | |
88 | | | => /root/arg | |
89 | '-------------------------------------+--------------------------------------' |
90 | |
91 | And if the requested URL uses unicode characters in your captures or args (such as |
92 | C<http://localhost:/base/♥/♥/♥/♥>) you should see the arguments and captures as their |
93 | unicode characters as well: |
94 | |
95 | [debug] Arguments are "♥" |
96 | [debug] "GET" request for "base/♥/♥/♥/♥" from "127.0.0.1" |
97 | .------------------------------------------------------------+-----------. |
98 | | Action | Time | |
99 | +------------------------------------------------------------+-----------+ |
100 | | /root/base | 0.000080s | |
101 | | /root/capture | 0.000075s | |
102 | | /root/arg | 0.000755s | |
103 | '------------------------------------------------------------+-----------' |
104 | |
105 | Again, remember that we are display the unicode character and using it to match actions |
106 | containing such multibyte characters BUT over HTTP you are getting these as URL encoded |
b596572b |
107 | bytes. For example if you looked at the L<PSGI> C<$env> value for C<REQUEST_URI> you |
108 | would see (for the above request) |
a09b49d2 |
109 | |
110 | REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5" |
111 | |
112 | So on the incoming request we decode so that we can match and display unicode characters |
113 | (after decoding the URL encoding). This makes it straightforward to use these types of |
114 | multibyte characters in your actions and see them incoming in captures and arguments. Please |
115 | keep this in might if you are doing for example regular expression matching, length determination |
116 | or other string comparisons, you will need to try these incoming variables as though UTF8 |
117 | strings. For example in the following action: |
118 | |
119 | sub arg :Chained('capture') PathPart('♥') Args(1) { |
120 | my ($self, $c, $arg) = @_; |
121 | } |
122 | |
123 | when $arg is "♥" you should expect C<length($arg)> to be C<1> since it is indeed one character |
124 | although it will take more than one byte to store. |
125 | |
126 | =head2 UTF8 in constructing URLs via $c->uri_for |
127 | |
128 | For the reverse (constructing meaningful URLs to actions that contain multibyte characters |
129 | in their paths or path parts, or when you want to include such characters in your captures |
130 | or arguments) L<Catalyst> will do the right thing (again just remember to use the C<utf8> |
131 | pragma). |
132 | |
133 | use utf8; |
134 | my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']); |
135 | |
136 | When you stringyfy this object (for use in a template, for example) it will automatically |
137 | do the right thing regarding utf8 encoding and url encoding. |
138 | |
139 | http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5 |
140 | |
141 | Since again what you want is a properly url encoded version of this. In this case your string |
142 | length will reflect URL encoded bytes, not the character length. Ultimately what you want |
143 | to send over the wire via HTTP needs to be bytes. |
144 | |
145 | =head1 UTF8 in GET Query and Form POST |
146 | |
147 | What Catalyst does with UTF8 in your GET and classic HTML Form POST |
148 | |
149 | =head2 UTF8 in URL query and keywords |
150 | |
151 | The same rules that we find in URL paths also cover URL query parts. That is if |
152 | one types a URL like this into the browser (again assuming a modernish UI that |
153 | allows unicode) |
154 | |
155 | http://localhost/example?♥=♥♥ |
156 | |
157 | When this goes 'over the wire' to your application server its going to be as |
158 | percent encoded bytes: |
159 | |
160 | |
161 | http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5 |
162 | |
163 | When L<Catalyst> encounters this we decode the percent encoding and the utf8 |
164 | so that we can properly display this information (such as in the debugging |
165 | logs or in a response.) |
166 | |
167 | [debug] Query Parameters are: |
168 | .-------------------------------------+--------------------------------------. |
169 | | Parameter | Value | |
170 | +-------------------------------------+--------------------------------------+ |
171 | | ♥ | ♥♥ | |
172 | '-------------------------------------+--------------------------------------' |
173 | |
174 | All the values and keys that are part of $c->req->query_parameters will be |
175 | utf8 decoded. So you should not need to do anything special to take those |
176 | values/keys and send them to the body response (since as we will see later |
177 | L<Catalyst> will do all the necessary encoding for you). |
178 | |
179 | Again, remember that values of your parameters are now decode into Unicode strings. so |
180 | for example you'd expect the result of length to reflect the character length not |
b596572b |
181 | the byte length. |
a09b49d2 |
182 | |
183 | Just like with arguments and captures, you can use utf8 literals (or utf8 |
184 | strings) in $c->uri_for: |
185 | |
186 | use utf8; |
187 | my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'}); |
188 | |
189 | When you stringyfy this object (for use in a template, for example) it will automatically |
190 | do the right thing regarding utf8 encoding and url encoding. |
191 | |
192 | http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5 |
193 | |
194 | Since again what you want is a properly url encoded version of this. Ultimately what you want |
b596572b |
195 | to send over the wire via HTTP needs to be bytes (not unicode characters). |
a09b49d2 |
196 | |
197 | Remember if you use any utf8 literals in your source code, you should use the |
198 | C<use utf8> pragma. |
199 | |
200 | =head2 UTF8 in Form POST |
201 | |
202 | In general most modern browsers will follow the specification, which says that POSTed |
203 | form fields should be encoded in the same way that the document was served with. That means |
204 | that if you are using modern Catalyst and serving UTF8 encoded responses, a browser is |
205 | supposed to notice that and encode the form POSTs accordingly. |
206 | |
207 | As a result since L<Catalyst> now serves UTF8 encoded responses by default, this means that |
208 | you can mostly rely on incoming form POSTs to be so encoded. L<Catalyst> will make this |
209 | assumption and decode accordingly (unless you explicitly turn off encoding...) If you are |
b596572b |
210 | running Catalyst in developer debug, then you will see the correct unicode characters in |
a09b49d2 |
211 | the debug output. For example if you generate a POST request: |
212 | |
213 | use Catalyst::Test 'MyApp'; |
214 | use utf8; |
215 | |
216 | my $res = request POST "/example/posted", ['♥'=>'♥', '♥♥'=>'♥']; |
217 | |
218 | Running in CATALYST_DEBUG=1 mode you should see output like this: |
219 | |
220 | [debug] Body Parameters are: |
221 | .-------------------------------------+--------------------------------------. |
222 | | Parameter | Value | |
223 | +-------------------------------------+--------------------------------------+ |
224 | | ♥ | ♥ | |
225 | | ♥♥ | ♥ | |
226 | '-------------------------------------+--------------------------------------' |
227 | |
228 | And if you had a controller like this: |
229 | |
230 | package MyApp::Controller::Example; |
b596572b |
231 | |
a09b49d2 |
232 | use base 'Catalyst::Controller'; |
233 | |
234 | sub posted :POST Local { |
235 | my ($self, $c) = @_; |
236 | $c->res->content_type('text/plain'); |
237 | $c->res->body("hearts => ${\$c->req->post_parameters->{♥}}"); |
238 | } |
239 | |
240 | The following test case would be true: |
241 | |
242 | use Encode 2.21 'decode_utf8'; |
243 | is decode_utf8($req->content), 'hearts => ♥'; |
244 | |
b596572b |
245 | In this case we decode so that we can print and compare strings with multibyte characters. |
a09b49d2 |
246 | |
247 | B<NOTE> In some cases some browsers may not follow the specification and set the form POST |
248 | encoding based on the server response. Catalyst itself doesn't attempt any workarounds, but one |
249 | common approach is to use a hidden form field with a UTF8 value (You might be familiar with |
250 | this from how Ruby on Rails has HTML form helpers that do that automatically). In that case |
251 | some browsers will send UTF8 encoded if it notices the hidden input field contains such a |
252 | character. Also, you can add an HTML attribute to your form tag which many modern browsers |
253 | will respect to set the encoding (accept-charset="utf-8"). And lastly there are some javascript |
254 | based tricks and workarounds for even more odd cases (just search the web for this will return |
255 | a number of approaches. Hopefully as more compliant browsers become popular these edge cases |
256 | will fade. |
257 | |
258 | =head1 UTF8 Encoding in Body Response |
259 | |
260 | When does L<Catalyst> encode your response body and what rules does it use to |
261 | determine when that is needed. |
262 | |
263 | =head2 Summary |
264 | |
265 | use utf8; |
266 | use warnings; |
267 | use strict; |
268 | |
269 | package MyApp::Controller::Root; |
270 | |
271 | use base 'Catalyst::Controller'; |
272 | use File::Spec; |
273 | |
274 | sub scalar_body :Local { |
275 | my ($self, $c) = @_; |
276 | $c->response->content_type('text/html'); |
277 | $c->response->body("<p>This is scalar_body action ♥</p>"); |
278 | } |
279 | |
280 | sub stream_write :Local { |
281 | my ($self, $c) = @_; |
282 | $c->response->content_type('text/html'); |
283 | $c->response->write("<p>This is stream_write action ♥</p>"); |
b596572b |
284 | } |
a09b49d2 |
285 | |
286 | sub stream_write_fh :Local { |
287 | my ($self, $c) = @_; |
288 | $c->response->content_type('text/html'); |
289 | |
290 | my $writer = $c->res->write_fh; |
291 | $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); |
292 | $writer->close; |
293 | } |
294 | |
295 | sub stream_body_fh :Local { |
296 | my ($self, $c) = @_; |
297 | my $path = File::Spec->catfile('t', 'utf8.txt'); |
298 | open(my $fh, '<', $path) || die "trouble: $!"; |
299 | $c->response->content_type('text/html'); |
300 | $c->response->body($fh); |
301 | } |
302 | |
303 | =head2 Discussion |
304 | |
305 | Beginning with L<Catalyst> version 5.90080 You no longer need to set the encoding |
306 | configuration (although doing so won't hurt anything). |
307 | |
308 | Currently we only encode if the content type is one of the types which generally expects a |
309 | UTF8 encoding. This is determined by the following regular expression: |
310 | |
311 | our $DEFAULT_ENCODE_CONTENT_TYPE_MATCH = qr{text|xml$|javascript$}; |
312 | $c->response->content_type =~ /$DEFAULT_ENCODE_CONTENT_TYPE_MATCH/ |
313 | |
314 | This is a global variable in L<Catalyst::Response> which is stored in the C<encodable_content_type> |
315 | attribute of $c->response. You may currently alter this directly on the response or globally. In |
316 | the future we may offer a configuration setting for this. |
317 | |
318 | This would match content-types like the following (examples) |
319 | |
320 | text/plain |
321 | text/html |
322 | text/xml |
323 | application/javascript |
324 | application/xml |
325 | application/vnd.user+xml |
326 | |
b596572b |
327 | You should set your content type prior to header finalization if you want L<Catalyst> to |
a09b49d2 |
328 | encode. |
329 | |
330 | B<NOTE> We do not attempt to encode C<application/json> since the two most commonly used |
331 | approaches (L<Catalyst::View::JSON> and L<Catalyst::Action::REST>) have already configured |
332 | their JSON encoders to produce properly encoding UTF8 responses. If you are rolling your |
333 | own JSON encoding, you may need to set the encoder to do the right thing (or override |
334 | the global regular expression to include the JSON media type). |
335 | |
336 | =head2 Encoding with Scalar Body |
337 | |
338 | L<Catalyst> supports several methods of supplying your response with body content. The first |
339 | and currently most common is to set the L<Catalyst::Response> ->body with a scalar string ( |
340 | as in the example): |
341 | |
342 | use utf8; |
343 | |
344 | sub scalar_body :Local { |
345 | my ($self, $c) = @_; |
346 | $c->response->content_type('text/html'); |
347 | $c->response->body("<p>This is scalar_body action ♥</p>"); |
348 | } |
349 | |
350 | In general you should need to do nothing else since L<Catalyst> will automatically encode |
351 | this string during body finalization. The only matter to watch out for is to make sure |
352 | the string has not already been encoded, as this will result in double encoding errors. |
353 | |
354 | B<NOTE> pay attention to the content-type setting in the example. L<Catalyst> inspects that |
355 | content type carefully to determine if the body needs encoding). |
356 | |
357 | B<NOTE> If you set the character set of the response L<Catalyst> will skip encoding IF the |
358 | character set is set to somethng that doesn't match $c->encoding->mime_name. We will assume |
359 | if you are setting an alternative character set, that means you want to handle the encoding |
360 | yourself. However it might be easier to set $c->encoding for a given response cycle since |
361 | you can override this for a given response. For example here's how to override the default |
362 | encoding and set the correct character set in the response: |
363 | |
364 | sub override_encoding :Local { |
365 | my ($self, $c) = @_; |
366 | $c->res->content_type('text/plain'); |
367 | $c->encoding(Encode::find_encoding('Shift_JIS')); |
368 | $c->response->body("テスト"); |
369 | } |
370 | |
371 | This will use the alternative encoding for a single response. |
372 | |
373 | B<NOTE> If you manually set the content-type character set to whatever $c->encoding->mime_name |
374 | is set to, we STILL encode, rather than assume your manual setting is a flag to override. This |
aca337aa |
375 | is done to support backward compatible assumptions (in particular L<Catalyst::View::TT> has set |
376 | a utf-8 character set in its default content-type for ages, even though it does not itself do any |
377 | encoding on the body response). If you are going to handle encoding manually you may set |
378 | $c->clear_encoding for a single request response cycle, or as in the above example set an alternative |
379 | encoding. |
a09b49d2 |
380 | |
381 | =head2 Encoding with streaming type responses |
382 | |
383 | L<Catalyst> offers two approaches to streaming your body response. Again, you must remember |
384 | to set your content type prior to streaming, since invoking a streaming response will automatically |
385 | finalize and send your HTTP headers (and your content type MUST be one that matches the regular |
386 | expression given above.) |
387 | |
388 | Also, if you are going to override $c->encoding (or invoke $c->clear_encoding), you should do |
389 | that before anything else! |
390 | |
391 | The first streaming method is to use the C<write> method on the response object. This method |
392 | allows 'inlined' streaming and is generally used with blocking style servers. |
393 | |
394 | sub stream_write :Local { |
395 | my ($self, $c) = @_; |
396 | $c->response->content_type('text/html'); |
397 | $c->response->write("<p>This is stream_write action ♥</p>"); |
398 | } |
399 | |
400 | You may call the C<write> method as often as you need to finish streaming all your content. |
401 | L<Catalyst> will encode each line in turn as long as the content-type meets the 'encodable types' |
402 | requirement and $c->encoding is set (which it is, as long as you did not change it). |
403 | |
404 | B<NOTE> If you try to change the encoding after you start the stream, this will invoke an error |
405 | reponse. However since you've already started streaming this will not show up as an HTTP error |
406 | status code, but rather error information in your body response and an error in your logs. |
407 | |
408 | The second way to stream a response is to get the response writer object and invoke methods |
409 | on that directly: |
410 | |
411 | sub stream_write_fh :Local { |
412 | my ($self, $c) = @_; |
413 | $c->response->content_type('text/html'); |
414 | |
415 | my $writer = $c->res->write_fh; |
416 | $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); |
417 | $writer->close; |
418 | } |
419 | |
420 | This can be used just like the C<write> method, but typicallty you request this object when |
421 | you want to do a nonblocking style response since the writer object can be closed over or |
422 | sent to a model that will invoke it in a non blocking manner. For more on using the writer |
423 | object for non blocking responses you should review the C<Catalyst> documentation and also |
424 | you can look at several articles from last years advent, in particular: |
425 | |
426 | L<http://www.catalystframework.org/calendar/2013/10>, L<http://www.catalystframework.org/calendar/2013/11>, |
427 | L<http://www.catalystframework.org/calendar/2013/12>, L<http://www.catalystframework.org/calendar/2013/13>, |
428 | L<http://www.catalystframework.org/calendar/2013/14>. |
429 | |
430 | The main difference this year is that previously calling ->write_fh would return the actual |
431 | L<Plack> writer object that was supplied by your plack application handler, whereas now we wrap |
432 | that object in a lightweight decorator object that proxies the C<write> and C<close> methods |
433 | and supplies an additional C<write_encoded> method. C<write_encoded> does the exact same thing |
434 | as C<write> except that it will first encode the string when necessary. In general if you are |
435 | streaming encodable content such as HTML this is the method to use. If you are streaming |
436 | binary content, you should just use the C<write> method (although if the content type is set |
437 | correctly we would skip encoding anyway, but you may as well avoid the extra noop overhead). |
438 | |
439 | The last style of content response that L<Catalyst> supports is setting the body to a filehandle |
440 | like object. In this case the object is passed down to the Plack application handler directly |
441 | and currently we do nothing to set encoding. |
442 | |
443 | sub stream_body_fh :Local { |
444 | my ($self, $c) = @_; |
445 | my $path = File::Spec->catfile('t', 'utf8.txt'); |
446 | open(my $fh, '<', $path) || die "trouble: $!"; |
447 | $c->response->content_type('text/html'); |
448 | $c->response->body($fh); |
449 | } |
450 | |
451 | In this example we create a filehandle to a text file that contains UTF8 encoded characters. We |
452 | pass this down without modification, which I think is correct since we don't want to double |
453 | encode. However this may change in a future development release so please be sure to double |
454 | check the current docs and changelog. Its possible a future release will require you to to set |
455 | a encoding on the IO layer level so that we can be sure to properly encode at body finalization. |
456 | So this is still an edge case we are writing test examples for. But for now if you are returning |
457 | a filehandle like response, you are expected to make sure you are following the L<PSGI> specification |
458 | and that unencoded bytes are returned. |
459 | |
460 | =head2 Override the Encoding on Context |
461 | |
462 | As already noted you may change the current encoding (or remove it) by setting an alternative |
463 | encoding on the context; |
464 | |
465 | $c->encoding(Encode::find_encoding('Shift_JIS')); |
466 | |
467 | Please note that you can continue to change encoding UNTIL the headers have been finalized. The |
468 | last setting always wins. Trying to change encoding after header finalization is an error. |
469 | |
470 | =head2 Setting the Content Encoding HTTP Header |
471 | |
472 | In some cases you may set a content encoding on your response. For example if you are encoding |
473 | your response with gzip. In this case you are again on your own. If we notice that the |
474 | content encoding header is set when we hit finalization, we skip automatic encoding: |
475 | |
476 | use Encode; |
477 | use Compress::Zlib; |
478 | use utf8; |
479 | |
480 | sub gzipped :Local { |
481 | my ($self, $c) = @_; |
482 | |
483 | $c->res->content_type('text/plain'); |
484 | $c->res->content_type_charset('UTF-8'); |
485 | $c->res->content_encoding('gzip'); |
486 | |
487 | $c->response->body( |
488 | Compress::Zlib::memGzip( |
489 | Encode::encode_utf8("manual_1 ♥"))); |
490 | } |
491 | |
492 | |
493 | If you are using L<Catalyst::Plugin::Compress> you need to upgrade to the most recent version |
494 | in order to be compatible with changes introduced in L<Catalyst> 5.90080. Other plugins may |
495 | require updates (please open bugs if you find them). |
496 | |
497 | B<NOTE> Content encoding may be set to 'identify' and we will still perform automatic encoding |
498 | if the content type is encodable and an encoding is present for the context. |
499 | |
500 | =head2 Using Common Views |
501 | |
502 | The following common views have been updated so that their tests pass with default UTF8 |
503 | encoding for L<Catalyst>: |
504 | |
505 | L<Catalyst::View::TT>, L<Catalyst::View::Mason>, L<Catalyst::View::HTML::Mason>, |
506 | L<Catalyst::View::Xslate> |
507 | |
508 | See L<Catalyst::Upgrading> for additional information on L<Catalyst> extensions that require |
509 | upgrades. |
510 | |
511 | In generally for the common views you should not need to do anything special. If your actual |
512 | template files contain UTF8 literals you should set configuration on your View to enable that. |
513 | For example in TT, if your template has actual UTF8 character in it you should do the following: |
514 | |
515 | MyApp::View::TT->config(ENCODING => 'utf-8'); |
516 | |
517 | However L<Catalyst::View::Xslate> wants to do the UTF8 encoding for you (We assume that the |
518 | authors of that view did this as a workaround to the fact that until now encoding was not core |
519 | to L<Catalyst>. So if you use that view, you either need to tell it to not encode, or you need |
520 | to turn off encoding for Catalyst. |
521 | |
522 | MyApp::View::Xslate->config(encode_body => 0); |
523 | |
524 | or |
525 | |
526 | MyApp->config(encoding=>undef); |
527 | |
528 | Preference is to disable it in the View. |
529 | |
530 | Other views may be similar. You should review View documentation and test during upgrading. |
531 | We tried to make sure most common views worked properly and noted all workaround but if we |
532 | missed something please alert the development team (instead of introducing a local hack into |
533 | your application that will mean nobody will ever upgrade it...). |
534 | |
aca337aa |
535 | =head2 Setting the response from an external PSGI application. |
536 | |
537 | L<Catalyst::Response> allows one to set the response from an external L<PSGI> application. |
538 | If you do this, and that external application sets a character set on the content-type, we |
539 | C<clear_encoding> for the rest of the response. This is done to prevent double encoding. |
540 | |
541 | B<NOTE> Even if the character set of the content type is the same as the encoding set in |
542 | $c->encoding, we still skip encoding. This is a regrettable difference from the general rule |
543 | outlined above, where if the current character set is the same as the current encoding, we |
544 | encode anyway. Nevertheless I think this is the correct behavior since the earlier rule exists |
545 | only to support backward compatibility with L<Catalyst::View::TT>. |
546 | |
547 | In general if you want L<Catalyst> to handle encoding, you should avoid setting the content |
548 | type character set since Catalyst will do so automatically based on the requested response |
549 | encoding. Its best to request alternative encodings by setting $c->encoding and if you really |
550 | want manual control of encoding you should always $c->clear_encoding so that programmers that |
551 | come after you are very clear as to your intentions. |
552 | |
a09b49d2 |
553 | =head2 Disabling default UTF8 encoding |
554 | |
555 | You may encounter issues with your legacy code running under default UTF8 body encoding. If |
556 | so you can disable this with the following configurations setting: |
557 | |
558 | MyApp->config(encoding=>undef); |
559 | |
560 | Where C<MyApp> is your L<Catalyst> subclass. |
561 | |
562 | If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to |
563 | report it (and not try to hack a workaround in your local code). We also recommend that you |
564 | regard such a workaround as a temporary solution. It is ideal if L<Catalyst> extension |
565 | authors can start to count on L<Catalyst> doing the write thing for encoding |
566 | |
567 | =head1 Conclusion |
568 | |
569 | This document has attempted to be a complete review of how UTF8 and encoding works in the |
570 | current version of L<Catalyst> and also to document known issues, gotchas and backward |
571 | compatible hacks. Please report issues to the development team. |
572 | |
573 | =head1 Author |
574 | |
575 | John Napiorkowski L<jjnapiork@cpan.org|email:jjnapiork@cpan.org> |
576 | |
577 | =cut |
578 | |