Commit | Line | Data |
a09b49d2 |
1 | =encoding UTF-8 |
2 | |
3 | =head1 Name |
4 | |
d63cc9c8 |
5 | Catalyst::UTF8 - All About UTF8 and Catalyst Encoding |
a09b49d2 |
6 | |
7 | =head1 Description |
8 | |
b596572b |
9 | Starting in 5.90080 L<Catalyst> will enable UTF8 encoding by default for |
a09b49d2 |
10 | text like body responses. In addition we've made a ton of fixes around encoding |
11 | and utf8 scattered throughout the codebase. This document attempts to give |
12 | an overview of the assumptions and practices that L<Catalyst> uses when |
13 | dealing with UTF8 and encoding issues. You should also review the |
14 | Changes file, L<Catalyst::Delta> and L<Catalyst::Upgrading> for more. |
15 | |
d63cc9c8 |
16 | We attempt to describe all relevant processes, try to give some advice |
a09b49d2 |
17 | and explain where we may have been exceptional to respect our commitment |
18 | to backwards compatibility. |
19 | |
b596572b |
20 | =head1 UTF8 in Controller Actions |
a09b49d2 |
21 | |
22 | Using UTF8 characters in your Controller classes and actions. |
23 | |
24 | =head2 Summary |
25 | |
26 | In this section we will review changes to how UTF8 characters can be used in |
27 | controller actions, how it looks in the debugging screens (and your logs) |
28 | as well as how you construct L<URL> objects to actions with UTF8 paths |
29 | (or using UTF8 args or captures). |
30 | |
31 | =head2 Unicode in Controllers and URLs |
32 | |
33 | package MyApp::Controller::Root; |
34 | |
473078ff |
35 | use utf8; |
a09b49d2 |
36 | use base 'Catalyst::Controller'; |
37 | |
38 | sub heart_with_arg :Path('♥') Args(1) { |
39 | my ($self, $c, $arg) = @_; |
40 | } |
41 | |
42 | sub base :Chained('/') CaptureArgs(0) { |
43 | my ($self, $c) = @_; |
44 | } |
45 | |
46 | sub capture :Chained('base') PathPart('♥') CaptureArgs(1) { |
47 | my ($self, $c, $capture) = @_; |
48 | } |
49 | |
50 | sub arg :Chained('capture') PathPart('♥') Args(1) { |
51 | my ($self, $c, $arg) = @_; |
52 | } |
53 | |
54 | =head2 Discussion |
55 | |
56 | In the example controller above we have constructed two matchable URL routes: |
57 | |
58 | http://localhost/root/♥/{arg} |
59 | http://localhost/base/♥/{capture}/♥/{arg} |
60 | |
61 | The first one is a classic Path type action and the second uses Chaining, and |
62 | spans three actions in total. As you can see, you can use unicode characters |
473078ff |
63 | in your Path and PathPart attributes (remember to use the C<utf8> pragma to allow |
a09b49d2 |
64 | these multibyte characters in your source). The two constructed matchable routes |
65 | would match the following incoming URLs: |
66 | |
67 | (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg} |
68 | (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg} |
69 | |
70 | That path path C<%E2%99%A5> is url encoded unicode (assuming you are hitting this with |
71 | a reasonably modern browser). Its basically what goes over HTTP when your type a |
72 | browser location that has the unicode 'heart' in it. However we will use the unicode |
73 | symbol in your debugging messages: |
74 | |
75 | [debug] Loaded Path actions: |
76 | .-------------------------------------+--------------------------------------. |
77 | | Path | Private | |
78 | +-------------------------------------+--------------------------------------+ |
79 | | /root/♥/* | /root/heart_with_arg | |
80 | '-------------------------------------+--------------------------------------' |
81 | |
82 | [debug] Loaded Chained actions: |
83 | .-------------------------------------+--------------------------------------. |
84 | | Path Spec | Private | |
85 | +-------------------------------------+--------------------------------------+ |
86 | | /base/♥/*/♥/* | /root/base (0) | |
87 | | | -> /root/capture (1) | |
88 | | | => /root/arg | |
89 | '-------------------------------------+--------------------------------------' |
90 | |
91 | And if the requested URL uses unicode characters in your captures or args (such as |
92 | C<http://localhost:/base/♥/♥/♥/♥>) you should see the arguments and captures as their |
93 | unicode characters as well: |
94 | |
95 | [debug] Arguments are "♥" |
96 | [debug] "GET" request for "base/♥/♥/♥/♥" from "127.0.0.1" |
97 | .------------------------------------------------------------+-----------. |
98 | | Action | Time | |
99 | +------------------------------------------------------------+-----------+ |
100 | | /root/base | 0.000080s | |
101 | | /root/capture | 0.000075s | |
102 | | /root/arg | 0.000755s | |
103 | '------------------------------------------------------------+-----------' |
104 | |
105 | Again, remember that we are display the unicode character and using it to match actions |
106 | containing such multibyte characters BUT over HTTP you are getting these as URL encoded |
b596572b |
107 | bytes. For example if you looked at the L<PSGI> C<$env> value for C<REQUEST_URI> you |
108 | would see (for the above request) |
a09b49d2 |
109 | |
110 | REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5" |
111 | |
112 | So on the incoming request we decode so that we can match and display unicode characters |
113 | (after decoding the URL encoding). This makes it straightforward to use these types of |
114 | multibyte characters in your actions and see them incoming in captures and arguments. Please |
115 | keep this in might if you are doing for example regular expression matching, length determination |
116 | or other string comparisons, you will need to try these incoming variables as though UTF8 |
117 | strings. For example in the following action: |
118 | |
119 | sub arg :Chained('capture') PathPart('♥') Args(1) { |
120 | my ($self, $c, $arg) = @_; |
121 | } |
122 | |
123 | when $arg is "♥" you should expect C<length($arg)> to be C<1> since it is indeed one character |
124 | although it will take more than one byte to store. |
125 | |
126 | =head2 UTF8 in constructing URLs via $c->uri_for |
127 | |
128 | For the reverse (constructing meaningful URLs to actions that contain multibyte characters |
129 | in their paths or path parts, or when you want to include such characters in your captures |
130 | or arguments) L<Catalyst> will do the right thing (again just remember to use the C<utf8> |
131 | pragma). |
132 | |
133 | use utf8; |
134 | my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']); |
135 | |
473078ff |
136 | When you stringify this object (for use in a template, for example) it will automatically |
a09b49d2 |
137 | do the right thing regarding utf8 encoding and url encoding. |
138 | |
139 | http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5 |
140 | |
141 | Since again what you want is a properly url encoded version of this. In this case your string |
142 | length will reflect URL encoded bytes, not the character length. Ultimately what you want |
143 | to send over the wire via HTTP needs to be bytes. |
144 | |
145 | =head1 UTF8 in GET Query and Form POST |
146 | |
147 | What Catalyst does with UTF8 in your GET and classic HTML Form POST |
148 | |
149 | =head2 UTF8 in URL query and keywords |
150 | |
473078ff |
151 | The same rules that we find in URL paths also cover URL query parts. That is |
152 | if one types a URL like this into the browser |
a09b49d2 |
153 | |
154 | http://localhost/example?♥=♥♥ |
155 | |
156 | When this goes 'over the wire' to your application server its going to be as |
157 | percent encoded bytes: |
158 | |
159 | |
160 | http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5 |
161 | |
162 | When L<Catalyst> encounters this we decode the percent encoding and the utf8 |
163 | so that we can properly display this information (such as in the debugging |
164 | logs or in a response.) |
165 | |
166 | [debug] Query Parameters are: |
167 | .-------------------------------------+--------------------------------------. |
168 | | Parameter | Value | |
169 | +-------------------------------------+--------------------------------------+ |
170 | | ♥ | ♥♥ | |
171 | '-------------------------------------+--------------------------------------' |
172 | |
173 | All the values and keys that are part of $c->req->query_parameters will be |
174 | utf8 decoded. So you should not need to do anything special to take those |
175 | values/keys and send them to the body response (since as we will see later |
176 | L<Catalyst> will do all the necessary encoding for you). |
177 | |
178 | Again, remember that values of your parameters are now decode into Unicode strings. so |
179 | for example you'd expect the result of length to reflect the character length not |
b596572b |
180 | the byte length. |
a09b49d2 |
181 | |
182 | Just like with arguments and captures, you can use utf8 literals (or utf8 |
183 | strings) in $c->uri_for: |
184 | |
185 | use utf8; |
186 | my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'}); |
187 | |
473078ff |
188 | When you stringify this object (for use in a template, for example) it will automatically |
a09b49d2 |
189 | do the right thing regarding utf8 encoding and url encoding. |
190 | |
191 | http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5 |
192 | |
193 | Since again what you want is a properly url encoded version of this. Ultimately what you want |
b596572b |
194 | to send over the wire via HTTP needs to be bytes (not unicode characters). |
a09b49d2 |
195 | |
196 | Remember if you use any utf8 literals in your source code, you should use the |
197 | C<use utf8> pragma. |
198 | |
199 | =head2 UTF8 in Form POST |
200 | |
201 | In general most modern browsers will follow the specification, which says that POSTed |
202 | form fields should be encoded in the same way that the document was served with. That means |
203 | that if you are using modern Catalyst and serving UTF8 encoded responses, a browser is |
204 | supposed to notice that and encode the form POSTs accordingly. |
205 | |
206 | As a result since L<Catalyst> now serves UTF8 encoded responses by default, this means that |
207 | you can mostly rely on incoming form POSTs to be so encoded. L<Catalyst> will make this |
208 | assumption and decode accordingly (unless you explicitly turn off encoding...) If you are |
b596572b |
209 | running Catalyst in developer debug, then you will see the correct unicode characters in |
a09b49d2 |
210 | the debug output. For example if you generate a POST request: |
211 | |
212 | use Catalyst::Test 'MyApp'; |
213 | use utf8; |
214 | |
215 | my $res = request POST "/example/posted", ['♥'=>'♥', '♥♥'=>'♥']; |
216 | |
217 | Running in CATALYST_DEBUG=1 mode you should see output like this: |
218 | |
219 | [debug] Body Parameters are: |
220 | .-------------------------------------+--------------------------------------. |
221 | | Parameter | Value | |
222 | +-------------------------------------+--------------------------------------+ |
223 | | ♥ | ♥ | |
224 | | ♥♥ | ♥ | |
225 | '-------------------------------------+--------------------------------------' |
226 | |
227 | And if you had a controller like this: |
228 | |
229 | package MyApp::Controller::Example; |
b596572b |
230 | |
a09b49d2 |
231 | use base 'Catalyst::Controller'; |
232 | |
233 | sub posted :POST Local { |
234 | my ($self, $c) = @_; |
235 | $c->res->content_type('text/plain'); |
236 | $c->res->body("hearts => ${\$c->req->post_parameters->{♥}}"); |
237 | } |
238 | |
239 | The following test case would be true: |
240 | |
241 | use Encode 2.21 'decode_utf8'; |
242 | is decode_utf8($req->content), 'hearts => ♥'; |
243 | |
b596572b |
244 | In this case we decode so that we can print and compare strings with multibyte characters. |
a09b49d2 |
245 | |
246 | B<NOTE> In some cases some browsers may not follow the specification and set the form POST |
247 | encoding based on the server response. Catalyst itself doesn't attempt any workarounds, but one |
248 | common approach is to use a hidden form field with a UTF8 value (You might be familiar with |
249 | this from how Ruby on Rails has HTML form helpers that do that automatically). In that case |
250 | some browsers will send UTF8 encoded if it notices the hidden input field contains such a |
251 | character. Also, you can add an HTML attribute to your form tag which many modern browsers |
252 | will respect to set the encoding (accept-charset="utf-8"). And lastly there are some javascript |
253 | based tricks and workarounds for even more odd cases (just search the web for this will return |
254 | a number of approaches. Hopefully as more compliant browsers become popular these edge cases |
255 | will fade. |
256 | |
b16a64af |
257 | B<NOTE> It is possible for a form POST multipart response (normally a file upload) to contain |
258 | inline content with mixed content character sets and encoding. For example one might create |
259 | a POST like this: |
260 | |
261 | use utf8; |
262 | use HTTP::Request::Common; |
263 | |
264 | my $utf8 = 'test ♥'; |
265 | my $shiftjs = 'test テスト'; |
266 | my $req = POST '/root/echo_arg', |
267 | Content_Type => 'form-data', |
268 | Content => [ |
269 | arg0 => 'helloworld', |
270 | Encode::encode('UTF-8','♥') => Encode::encode('UTF-8','♥♥'), |
271 | arg1 => [ |
272 | undef, '', |
273 | 'Content-Type' =>'text/plain; charset=UTF-8', |
274 | 'Content' => Encode::encode('UTF-8', $utf8)], |
275 | arg2 => [ |
276 | undef, '', |
277 | 'Content-Type' =>'text/plain; charset=SHIFT_JIS', |
278 | 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], |
279 | arg2 => [ |
280 | undef, '', |
281 | 'Content-Type' =>'text/plain; charset=SHIFT_JIS', |
282 | 'Content' => Encode::encode('SHIFT_JIS', $shiftjs)], |
283 | ]; |
284 | |
285 | In this case we've created a POST request but each part specifies its own content |
286 | character set (and setting a content encoding would also be possible). Generally one |
287 | would not run into this situation in a web browser context but for completeness sake |
288 | Catalyst will notice if a multipart POST contains parts with complex or extended |
289 | header information and in those cases it will not attempt to apply decoding to the |
290 | form values. Instead the part will be represented as an instance of an object |
291 | L<Catalyst::Request::PartData> which will contain all the header information needed |
292 | for you to perform custom parser of the data. |
293 | |
a09b49d2 |
294 | =head1 UTF8 Encoding in Body Response |
295 | |
296 | When does L<Catalyst> encode your response body and what rules does it use to |
297 | determine when that is needed. |
298 | |
299 | =head2 Summary |
300 | |
301 | use utf8; |
302 | use warnings; |
303 | use strict; |
304 | |
305 | package MyApp::Controller::Root; |
306 | |
307 | use base 'Catalyst::Controller'; |
308 | use File::Spec; |
309 | |
310 | sub scalar_body :Local { |
311 | my ($self, $c) = @_; |
312 | $c->response->content_type('text/html'); |
313 | $c->response->body("<p>This is scalar_body action ♥</p>"); |
314 | } |
315 | |
316 | sub stream_write :Local { |
317 | my ($self, $c) = @_; |
318 | $c->response->content_type('text/html'); |
319 | $c->response->write("<p>This is stream_write action ♥</p>"); |
b596572b |
320 | } |
a09b49d2 |
321 | |
322 | sub stream_write_fh :Local { |
323 | my ($self, $c) = @_; |
324 | $c->response->content_type('text/html'); |
325 | |
326 | my $writer = $c->res->write_fh; |
327 | $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); |
328 | $writer->close; |
329 | } |
330 | |
331 | sub stream_body_fh :Local { |
332 | my ($self, $c) = @_; |
333 | my $path = File::Spec->catfile('t', 'utf8.txt'); |
334 | open(my $fh, '<', $path) || die "trouble: $!"; |
335 | $c->response->content_type('text/html'); |
336 | $c->response->body($fh); |
337 | } |
338 | |
339 | =head2 Discussion |
340 | |
341 | Beginning with L<Catalyst> version 5.90080 You no longer need to set the encoding |
342 | configuration (although doing so won't hurt anything). |
343 | |
344 | Currently we only encode if the content type is one of the types which generally expects a |
345 | UTF8 encoding. This is determined by the following regular expression: |
346 | |
347 | our $DEFAULT_ENCODE_CONTENT_TYPE_MATCH = qr{text|xml$|javascript$}; |
348 | $c->response->content_type =~ /$DEFAULT_ENCODE_CONTENT_TYPE_MATCH/ |
349 | |
350 | This is a global variable in L<Catalyst::Response> which is stored in the C<encodable_content_type> |
351 | attribute of $c->response. You may currently alter this directly on the response or globally. In |
352 | the future we may offer a configuration setting for this. |
353 | |
354 | This would match content-types like the following (examples) |
355 | |
356 | text/plain |
357 | text/html |
358 | text/xml |
359 | application/javascript |
360 | application/xml |
361 | application/vnd.user+xml |
362 | |
b596572b |
363 | You should set your content type prior to header finalization if you want L<Catalyst> to |
a09b49d2 |
364 | encode. |
365 | |
366 | B<NOTE> We do not attempt to encode C<application/json> since the two most commonly used |
367 | approaches (L<Catalyst::View::JSON> and L<Catalyst::Action::REST>) have already configured |
368 | their JSON encoders to produce properly encoding UTF8 responses. If you are rolling your |
369 | own JSON encoding, you may need to set the encoder to do the right thing (or override |
370 | the global regular expression to include the JSON media type). |
371 | |
372 | =head2 Encoding with Scalar Body |
373 | |
374 | L<Catalyst> supports several methods of supplying your response with body content. The first |
375 | and currently most common is to set the L<Catalyst::Response> ->body with a scalar string ( |
376 | as in the example): |
377 | |
378 | use utf8; |
379 | |
380 | sub scalar_body :Local { |
381 | my ($self, $c) = @_; |
382 | $c->response->content_type('text/html'); |
383 | $c->response->body("<p>This is scalar_body action ♥</p>"); |
384 | } |
385 | |
386 | In general you should need to do nothing else since L<Catalyst> will automatically encode |
387 | this string during body finalization. The only matter to watch out for is to make sure |
388 | the string has not already been encoded, as this will result in double encoding errors. |
389 | |
390 | B<NOTE> pay attention to the content-type setting in the example. L<Catalyst> inspects that |
391 | content type carefully to determine if the body needs encoding). |
392 | |
393 | B<NOTE> If you set the character set of the response L<Catalyst> will skip encoding IF the |
473078ff |
394 | character set is set to something that doesn't match $c->encoding->mime_name. We will assume |
a09b49d2 |
395 | if you are setting an alternative character set, that means you want to handle the encoding |
396 | yourself. However it might be easier to set $c->encoding for a given response cycle since |
397 | you can override this for a given response. For example here's how to override the default |
398 | encoding and set the correct character set in the response: |
399 | |
400 | sub override_encoding :Local { |
401 | my ($self, $c) = @_; |
402 | $c->res->content_type('text/plain'); |
403 | $c->encoding(Encode::find_encoding('Shift_JIS')); |
404 | $c->response->body("テスト"); |
405 | } |
406 | |
407 | This will use the alternative encoding for a single response. |
408 | |
409 | B<NOTE> If you manually set the content-type character set to whatever $c->encoding->mime_name |
410 | is set to, we STILL encode, rather than assume your manual setting is a flag to override. This |
aca337aa |
411 | is done to support backward compatible assumptions (in particular L<Catalyst::View::TT> has set |
412 | a utf-8 character set in its default content-type for ages, even though it does not itself do any |
413 | encoding on the body response). If you are going to handle encoding manually you may set |
414 | $c->clear_encoding for a single request response cycle, or as in the above example set an alternative |
415 | encoding. |
a09b49d2 |
416 | |
417 | =head2 Encoding with streaming type responses |
418 | |
419 | L<Catalyst> offers two approaches to streaming your body response. Again, you must remember |
420 | to set your content type prior to streaming, since invoking a streaming response will automatically |
421 | finalize and send your HTTP headers (and your content type MUST be one that matches the regular |
422 | expression given above.) |
423 | |
424 | Also, if you are going to override $c->encoding (or invoke $c->clear_encoding), you should do |
425 | that before anything else! |
426 | |
427 | The first streaming method is to use the C<write> method on the response object. This method |
428 | allows 'inlined' streaming and is generally used with blocking style servers. |
429 | |
430 | sub stream_write :Local { |
431 | my ($self, $c) = @_; |
432 | $c->response->content_type('text/html'); |
433 | $c->response->write("<p>This is stream_write action ♥</p>"); |
434 | } |
435 | |
436 | You may call the C<write> method as often as you need to finish streaming all your content. |
437 | L<Catalyst> will encode each line in turn as long as the content-type meets the 'encodable types' |
438 | requirement and $c->encoding is set (which it is, as long as you did not change it). |
439 | |
440 | B<NOTE> If you try to change the encoding after you start the stream, this will invoke an error |
473078ff |
441 | response. However since you've already started streaming this will not show up as an HTTP error |
a09b49d2 |
442 | status code, but rather error information in your body response and an error in your logs. |
443 | |
444 | The second way to stream a response is to get the response writer object and invoke methods |
445 | on that directly: |
446 | |
447 | sub stream_write_fh :Local { |
448 | my ($self, $c) = @_; |
449 | $c->response->content_type('text/html'); |
450 | |
451 | my $writer = $c->res->write_fh; |
452 | $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); |
453 | $writer->close; |
454 | } |
455 | |
473078ff |
456 | This can be used just like the C<write> method, but typically you request this object when |
a09b49d2 |
457 | you want to do a nonblocking style response since the writer object can be closed over or |
458 | sent to a model that will invoke it in a non blocking manner. For more on using the writer |
459 | object for non blocking responses you should review the C<Catalyst> documentation and also |
460 | you can look at several articles from last years advent, in particular: |
461 | |
462 | L<http://www.catalystframework.org/calendar/2013/10>, L<http://www.catalystframework.org/calendar/2013/11>, |
463 | L<http://www.catalystframework.org/calendar/2013/12>, L<http://www.catalystframework.org/calendar/2013/13>, |
464 | L<http://www.catalystframework.org/calendar/2013/14>. |
465 | |
466 | The main difference this year is that previously calling ->write_fh would return the actual |
467 | L<Plack> writer object that was supplied by your plack application handler, whereas now we wrap |
468 | that object in a lightweight decorator object that proxies the C<write> and C<close> methods |
469 | and supplies an additional C<write_encoded> method. C<write_encoded> does the exact same thing |
470 | as C<write> except that it will first encode the string when necessary. In general if you are |
471 | streaming encodable content such as HTML this is the method to use. If you are streaming |
472 | binary content, you should just use the C<write> method (although if the content type is set |
473 | correctly we would skip encoding anyway, but you may as well avoid the extra noop overhead). |
474 | |
475 | The last style of content response that L<Catalyst> supports is setting the body to a filehandle |
476 | like object. In this case the object is passed down to the Plack application handler directly |
477 | and currently we do nothing to set encoding. |
478 | |
479 | sub stream_body_fh :Local { |
480 | my ($self, $c) = @_; |
481 | my $path = File::Spec->catfile('t', 'utf8.txt'); |
482 | open(my $fh, '<', $path) || die "trouble: $!"; |
483 | $c->response->content_type('text/html'); |
484 | $c->response->body($fh); |
485 | } |
486 | |
487 | In this example we create a filehandle to a text file that contains UTF8 encoded characters. We |
488 | pass this down without modification, which I think is correct since we don't want to double |
489 | encode. However this may change in a future development release so please be sure to double |
490 | check the current docs and changelog. Its possible a future release will require you to to set |
491 | a encoding on the IO layer level so that we can be sure to properly encode at body finalization. |
492 | So this is still an edge case we are writing test examples for. But for now if you are returning |
493 | a filehandle like response, you are expected to make sure you are following the L<PSGI> specification |
473078ff |
494 | and return raw bytes. |
a09b49d2 |
495 | |
496 | =head2 Override the Encoding on Context |
497 | |
498 | As already noted you may change the current encoding (or remove it) by setting an alternative |
499 | encoding on the context; |
500 | |
501 | $c->encoding(Encode::find_encoding('Shift_JIS')); |
502 | |
503 | Please note that you can continue to change encoding UNTIL the headers have been finalized. The |
504 | last setting always wins. Trying to change encoding after header finalization is an error. |
505 | |
506 | =head2 Setting the Content Encoding HTTP Header |
507 | |
508 | In some cases you may set a content encoding on your response. For example if you are encoding |
509 | your response with gzip. In this case you are again on your own. If we notice that the |
510 | content encoding header is set when we hit finalization, we skip automatic encoding: |
511 | |
512 | use Encode; |
513 | use Compress::Zlib; |
514 | use utf8; |
515 | |
516 | sub gzipped :Local { |
517 | my ($self, $c) = @_; |
518 | |
519 | $c->res->content_type('text/plain'); |
520 | $c->res->content_type_charset('UTF-8'); |
521 | $c->res->content_encoding('gzip'); |
522 | |
523 | $c->response->body( |
524 | Compress::Zlib::memGzip( |
525 | Encode::encode_utf8("manual_1 ♥"))); |
526 | } |
527 | |
528 | |
529 | If you are using L<Catalyst::Plugin::Compress> you need to upgrade to the most recent version |
530 | in order to be compatible with changes introduced in L<Catalyst> 5.90080. Other plugins may |
531 | require updates (please open bugs if you find them). |
532 | |
533 | B<NOTE> Content encoding may be set to 'identify' and we will still perform automatic encoding |
534 | if the content type is encodable and an encoding is present for the context. |
535 | |
536 | =head2 Using Common Views |
537 | |
538 | The following common views have been updated so that their tests pass with default UTF8 |
539 | encoding for L<Catalyst>: |
540 | |
541 | L<Catalyst::View::TT>, L<Catalyst::View::Mason>, L<Catalyst::View::HTML::Mason>, |
542 | L<Catalyst::View::Xslate> |
543 | |
544 | See L<Catalyst::Upgrading> for additional information on L<Catalyst> extensions that require |
545 | upgrades. |
546 | |
547 | In generally for the common views you should not need to do anything special. If your actual |
548 | template files contain UTF8 literals you should set configuration on your View to enable that. |
549 | For example in TT, if your template has actual UTF8 character in it you should do the following: |
550 | |
551 | MyApp::View::TT->config(ENCODING => 'utf-8'); |
552 | |
553 | However L<Catalyst::View::Xslate> wants to do the UTF8 encoding for you (We assume that the |
554 | authors of that view did this as a workaround to the fact that until now encoding was not core |
555 | to L<Catalyst>. So if you use that view, you either need to tell it to not encode, or you need |
556 | to turn off encoding for Catalyst. |
557 | |
558 | MyApp::View::Xslate->config(encode_body => 0); |
559 | |
560 | or |
561 | |
562 | MyApp->config(encoding=>undef); |
563 | |
564 | Preference is to disable it in the View. |
565 | |
566 | Other views may be similar. You should review View documentation and test during upgrading. |
567 | We tried to make sure most common views worked properly and noted all workaround but if we |
568 | missed something please alert the development team (instead of introducing a local hack into |
569 | your application that will mean nobody will ever upgrade it...). |
570 | |
aca337aa |
571 | =head2 Setting the response from an external PSGI application. |
572 | |
573 | L<Catalyst::Response> allows one to set the response from an external L<PSGI> application. |
574 | If you do this, and that external application sets a character set on the content-type, we |
575 | C<clear_encoding> for the rest of the response. This is done to prevent double encoding. |
576 | |
577 | B<NOTE> Even if the character set of the content type is the same as the encoding set in |
578 | $c->encoding, we still skip encoding. This is a regrettable difference from the general rule |
579 | outlined above, where if the current character set is the same as the current encoding, we |
580 | encode anyway. Nevertheless I think this is the correct behavior since the earlier rule exists |
581 | only to support backward compatibility with L<Catalyst::View::TT>. |
582 | |
583 | In general if you want L<Catalyst> to handle encoding, you should avoid setting the content |
584 | type character set since Catalyst will do so automatically based on the requested response |
585 | encoding. Its best to request alternative encodings by setting $c->encoding and if you really |
586 | want manual control of encoding you should always $c->clear_encoding so that programmers that |
587 | come after you are very clear as to your intentions. |
588 | |
a09b49d2 |
589 | =head2 Disabling default UTF8 encoding |
590 | |
591 | You may encounter issues with your legacy code running under default UTF8 body encoding. If |
592 | so you can disable this with the following configurations setting: |
593 | |
594 | MyApp->config(encoding=>undef); |
595 | |
596 | Where C<MyApp> is your L<Catalyst> subclass. |
597 | |
b16a64af |
598 | If you do not wish to disable all the Catalyst encoding features, you may disable specific |
599 | features via two additional configuration options: 'skip_body_param_unicode_decoding' |
600 | and 'skip_complex_post_part_handling'. The first will skip any attempt to decode POST |
601 | parameters in the creating of body parameters and the second will skip creation of instances |
602 | of L<Catalyst::Request::PartData> in the case that the multipart form upload contains parts |
603 | with a mix of content character sets. |
604 | |
a09b49d2 |
605 | If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to |
606 | report it (and not try to hack a workaround in your local code). We also recommend that you |
607 | regard such a workaround as a temporary solution. It is ideal if L<Catalyst> extension |
b16a64af |
608 | authors can start to count on L<Catalyst> doing the write thing for encoding. |
a09b49d2 |
609 | |
610 | =head1 Conclusion |
611 | |
612 | This document has attempted to be a complete review of how UTF8 and encoding works in the |
613 | current version of L<Catalyst> and also to document known issues, gotchas and backward |
614 | compatible hacks. Please report issues to the development team. |
615 | |
616 | =head1 Author |
617 | |
618 | John Napiorkowski L<jjnapiork@cpan.org|email:jjnapiork@cpan.org> |
619 | |
620 | =cut |
621 | |