1 .\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.3
4 .\" ========================================================================
5 .de Sh \" Subsection heading
13 .de Sp \" Vertical space (when we can't use .PP)
17 .de Vb \" Begin verbatim text
22 .de Ve \" End verbatim text
26 .\" Set up some character translations and predefined strings. \*(-- will
27 .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
28 .\" double quote, and \*(R" will give a right double quote. | will give a
29 .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to
30 .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C'
31 .\" expand to `' in nroff, nothing in troff, for use with C<>.
33 .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
37 . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
38 . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
51 .\" If the F register is turned on, we'll generate index entries on stderr for
52 .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index
53 .\" entries marked with X<> in POD. Of course, you'll have to process the
54 .\" output yourself in some meaningful fashion.
57 . tm Index:\\$1\t\\n%\t"\\$2"
63 .\" For nroff, turn off justification. Always turn off hyphenation; it makes
64 .\" way too many mistakes in technical documents.
68 .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
69 .\" Fear. Run. Save yourself. No user-serviceable parts.
70 . \" fudge factors for nroff and troff
79 . ds #H ((1u-(\\\\n(.fu%2u))*.13m)
85 . \" simple accents for nroff and troff
95 . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
96 . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
97 . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
98 . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
99 . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
100 . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
102 . \" troff and (daisy-wheel) nroff accents
103 .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
104 .ds 8 \h'\*(#H'\(*b\h'-\*(#H'
105 .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
106 .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
107 .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
108 .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
109 .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
110 .ds ae a\h'-(\w'a'u*4/10)'e
111 .ds Ae A\h'-(\w'A'u*4/10)'E
112 . \" corrections for vroff
113 .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
114 .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
115 . \" for low resolution devices (crt and lpr)
116 .if \n(.H>23 .if \n(.V>19 \
129 .\" ========================================================================
131 .IX Title "URI::Escape 3"
132 .TH URI::Escape 3 "2009-05-28" "perl v5.8.7" "User Contributed Perl Documentation"
134 URI::Escape \- Escape and unescape unsafe characters
136 .IX Header "SYNOPSIS"
139 \& $safe = uri_escape("10% is enough\en");
140 \& $verysafe = uri_escape("foo", "\e0\-\e377");
141 \& $str = uri_unescape($safe);
144 .IX Header "DESCRIPTION"
145 This module provides functions to escape and unescape \s-1URI\s0 strings as
146 defined by \s-1RFC\s0 2396 (and updated by \s-1RFC\s0 2732).
147 A \s-1URI\s0 consists of a restricted set of characters,
148 denoted as \f(CW\*(C`uric\*(C'\fR in \s-1RFC\s0 2396. The restricted set of characters
149 consists of digits, letters, and a few graphic symbols chosen from
150 those common to most of the character encodings and input facilities
151 available to Internet users:
154 \& "A" .. "Z", "a" .. "z", "0" .. "9",
155 \& ";", "/", "?", ":", "@", "&", "=", "+", "$", ",", "[", "]", # reserved
156 \& "\-", "_", ".", "!", "~", "*", "'", "(", ")"
159 In addition, any byte (octet) can be represented in a \s-1URI\s0 by an escape
160 sequence: a triplet consisting of the character \*(L"%\*(R" followed by two
161 hexadecimal digits. A byte can also be represented directly by a
162 character, using the US-ASCII character for that octet (iff the
163 character is part of \f(CW\*(C`uric\*(C'\fR).
165 Some of the \f(CW\*(C`uric\*(C'\fR characters are \fIreserved\fR for use as delimiters
166 or as part of certain \s-1URI\s0 components. These must be escaped if they are
167 to be treated as ordinary data. Read \s-1RFC\s0 2396 for further details.
169 The functions provided (and exported by default) from this module are:
170 .ie n .IP "uri_escape( $string )" 4
171 .el .IP "uri_escape( \f(CW$string\fR )" 4
172 .IX Item "uri_escape( $string )"
174 .ie n .IP "uri_escape( $string\fR, \f(CW$unsafe )" 4
175 .el .IP "uri_escape( \f(CW$string\fR, \f(CW$unsafe\fR )" 4
176 .IX Item "uri_escape( $string, $unsafe )"
178 Replaces each unsafe character in the \f(CW$string\fR with the corresponding
179 escape sequence and returns the result. The \f(CW$string\fR argument should
180 be a string of bytes. The \fIuri_escape()\fR function will croak if given a
181 characters with code above 255. Use \fIuri_escape_utf8()\fR if you know you
182 have such chars or/and want chars in the 128 .. 255 range treated as
185 The \fIuri_escape()\fR function takes an optional second argument that
186 overrides the set of characters that are to be escaped. The set is
187 specified as a string that can be used in a regular expression
188 character class (between [ ]). E.g.:
191 \& "\ex00\-\ex1f\ex7f\-\exff" # all control and hi\-bit characters
192 \& "a\-z" # all lower case characters
193 \& "^A\-Za\-z" # everything not a letter
196 The default set of characters to be escaped is all those which are
197 \&\fInot\fR part of the \f(CW\*(C`uric\*(C'\fR character class shown above as well as the
198 reserved characters. I.e. the default is:
201 \& "^A\-Za\-z0\-9\e\-_.!~*'()"
203 .ie n .IP "uri_escape_utf8( $string )" 4
204 .el .IP "uri_escape_utf8( \f(CW$string\fR )" 4
205 .IX Item "uri_escape_utf8( $string )"
207 .ie n .IP "uri_escape_utf8( $string\fR, \f(CW$unsafe )" 4
208 .el .IP "uri_escape_utf8( \f(CW$string\fR, \f(CW$unsafe\fR )" 4
209 .IX Item "uri_escape_utf8( $string, $unsafe )"
211 Works like \fIuri_escape()\fR, but will encode chars as \s-1UTF\-8\s0 before
212 escaping them. This makes this function able do deal with characters
213 with code above 255 in \f(CW$string\fR. Note that chars in the 128 .. 255
214 range will be escaped differently by this function compared to what
215 \&\fIuri_escape()\fR would. For chars in the 0 .. 127 range there is no
221 \& $uri = uri_escape_utf8($string);
227 \& use Encode qw(encode);
228 \& $uri = uri_escape(encode("UTF\-8", $string));
231 but will even work for perl\-5.6 for chars in the 128 .. 255 range.
233 Note: Javascript has a function called \fIescape()\fR that produces the
234 sequence \*(L"%uXXXX\*(R" for chars in the 256 .. 65535 range. This function
235 has really nothing to do with \s-1URI\s0 escaping but some folks got confused
236 since it \*(L"does the right thing\*(R" in the 0 .. 255 range. Because of
237 this you sometimes see \*(L"URIs\*(R" with these kind of escapes. The
238 JavaScript \fIencodeURIComponent()\fR function is similar to \fIuri_escape_utf8()\fR.
239 .IP "uri_unescape($string,...)" 4
240 .IX Item "uri_unescape($string,...)"
241 Returns a string with each \f(CW%XX\fR sequence replaced with the actual byte
244 This does the same as:
247 \& $string =~ s/%([0\-9A\-Fa\-f]{2})/chr(hex($1))/eg;
250 but does not modify the string in-place as this \s-1RE\s0 would. Using the
251 \&\fIuri_unescape()\fR function instead of the \s-1RE\s0 might make the code look
252 cleaner and is a few characters less to type.
254 In a simple benchmark test I did,
255 calling the function (instead of the inline \s-1RE\s0 above) if a few chars
256 were unescaped was something like 40% slower, and something like 700% slower if none were. If
257 you are going to unescape a lot of times it might be a good idea to
258 inline the \s-1RE\s0.
260 If the \fIuri_unescape()\fR function is passed multiple strings, then each
261 one is returned unescaped.
263 The module can also export the \f(CW%escapes\fR hash, which contains the
264 mapping from all 256 bytes to the corresponding escape codes. Lookup
265 in this hash is faster than evaluating \f(CW\*(C`sprintf("%%%02X", ord($byte))\*(C'\fR
268 .IX Header "SEE ALSO"
271 .IX Header "COPYRIGHT"
272 Copyright 1995\-2004 Gisle Aas.
274 This program is free software; you can redistribute it and/or modify
275 it under the same terms as Perl itself.