Commit | Line | Data |
1141d9f8 |
1 | package PerlIO; |
2 | |
af048c18 |
3 | our $VERSION = '1.06'; |
8de1277c |
4 | |
1141d9f8 |
5 | # Map layer name to package that defines it |
c1a61b17 |
6 | our %alias; |
1141d9f8 |
7 | |
8 | sub import |
9 | { |
10 | my $class = shift; |
11 | while (@_) |
12 | { |
13 | my $layer = shift; |
14 | if (exists $alias{$layer}) |
15 | { |
16 | $layer = $alias{$layer} |
17 | } |
18 | else |
19 | { |
20 | $layer = "${class}::$layer"; |
21 | } |
22 | eval "require $layer"; |
23 | warn $@ if $@; |
24 | } |
25 | } |
26 | |
39f7a870 |
27 | sub F_UTF8 () { 0x8000 } |
28 | |
1141d9f8 |
29 | 1; |
30 | __END__ |
b3d30bf7 |
31 | |
32 | =head1 NAME |
33 | |
7d3b96bb |
34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space |
b3d30bf7 |
35 | |
36 | =head1 SYNOPSIS |
37 | |
a7845df8 |
38 | open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files |
1cbfc93d |
39 | |
40 | open($fh,"<","his.jpg"); # portably open a binary file for reading |
41 | binmode($fh); |
7d3b96bb |
42 | |
43 | Shell: |
44 | PERLIO=perlio perl .... |
b3d30bf7 |
45 | |
46 | =head1 DESCRIPTION |
47 | |
ec28694c |
48 | When an undefined layer 'foo' is encountered in an C<open> or |
49 | C<binmode> layer specification then C code performs the equivalent of: |
b3d30bf7 |
50 | |
51 | use PerlIO 'foo'; |
52 | |
53 | The perl code in PerlIO.pm then attempts to locate a layer by doing |
54 | |
55 | require PerlIO::foo; |
56 | |
47bfe92f |
57 | Otherwise the C<PerlIO> package is a place holder for additional |
58 | PerlIO related functions. |
b3d30bf7 |
59 | |
7d3b96bb |
60 | The following layers are currently defined: |
b3d30bf7 |
61 | |
7d3b96bb |
62 | =over 4 |
63 | |
3d897973 |
64 | =item :unix |
7d3b96bb |
65 | |
3d897973 |
66 | Lowest level layer which provides basic PerlIO operations in terms of |
67 | UNIX/POSIX numeric file descriptor calls |
68 | (open(), read(), write(), lseek(), close()). |
7d3b96bb |
69 | |
3d897973 |
70 | =item :stdio |
7d3b96bb |
71 | |
47bfe92f |
72 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note |
73 | that as this is "real" stdio it will ignore any layers beneath it and |
9ec269cb |
74 | go straight to the operating system via the C library as usual. |
7d3b96bb |
75 | |
3d897973 |
76 | =item :perlio |
7d3b96bb |
77 | |
3d897973 |
78 | A from scratch implementation of buffering for PerlIO. Provides fast |
79 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> |
80 | and in general attempts to minimize data copying. |
7d3b96bb |
81 | |
3d897973 |
82 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. |
7d3b96bb |
83 | |
3d897973 |
84 | =item :crlf |
7d3b96bb |
85 | |
3d897973 |
86 | A layer that implements DOS/Windows like CRLF line endings. On read |
87 | converts pairs of CR,LF to a single "\n" newline character. On write |
88 | converts each "\n" to a CR,LF pair. Note that this layer likes to be |
89 | one of its kind: it silently ignores attempts to be pushed into the |
90 | layer stack more than once. |
91 | |
92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z |
93 | as being an end-of-file marker. |
94 | |
95 | (Gory details follow) To be more exact what happens is this: after |
96 | pushing itself to the stack, the C<:crlf> layer checks all the layers |
97 | below itself to find the first layer that is capable of being a CRLF |
98 | layer but is not yet enabled to be a CRLF layer. If it finds such a |
99 | layer, it enables the CRLFness of that other deeper layer, and then |
100 | pops itself off the stack. If not, fine, use the one we just pushed. |
101 | |
102 | The end result is that a C<:crlf> means "please enable the first CRLF |
103 | layer you can find, and if you can't find one, here would be a good |
104 | spot to place a new one." |
105 | |
106 | Based on the C<:perlio> layer. |
107 | |
108 | =item :mmap |
109 | |
110 | A layer which implements "reading" of files by using C<mmap()> to |
9ec269cb |
111 | make a (whole) file appear in the process's address space, and then |
3d897973 |
112 | using that as PerlIO's "buffer". This I<may> be faster in certain |
113 | circumstances for large files, and may result in less physical memory |
114 | use when multiple processes are reading the same file. |
115 | |
116 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> |
9ec269cb |
117 | layer. Writes also behave like the C<:perlio> layer, as C<mmap()> for write |
3d897973 |
118 | needs extra house-keeping (to extend the file) which negates any advantage. |
119 | |
9ec269cb |
120 | The C<:mmap> layer will not exist if the platform does not support C<mmap()>. |
3d897973 |
121 | |
122 | =item :utf8 |
7d3b96bb |
123 | |
2575c402 |
124 | Declares that the stream accepts perl's I<internal> encoding of |
47bfe92f |
125 | characters. (Which really is UTF-8 on ASCII machines, but is |
126 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can |
127 | represent to be read from or written to the stream. The UTF-X encoding |
128 | is chosen to render simple text parts (i.e. non-accented letters, |
129 | digits and common punctuation) human readable in the encoded file. |
130 | |
131 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) |
132 | and then read it back in. |
133 | |
134 | open(F, ">:utf8", "data.utf"); |
135 | print F $out; |
136 | close(F); |
137 | |
138 | open(F, "<:utf8", "data.utf"); |
139 | $in = <F>; |
140 | close(F); |
7d3b96bb |
141 | |
740d4bb2 |
142 | Note that this layer does not validate byte sequences. For reading |
9ec269cb |
143 | input, using C<:encoding(utf8)> instead of bare C<:utf8> is strongly |
740d4bb2 |
144 | recommended. |
145 | |
3d897973 |
146 | =item :bytes |
c1a61b17 |
147 | |
9ec269cb |
148 | This is the inverse of the C<:utf8> layer. It turns off the flag |
c1a61b17 |
149 | on the layer below so that data read from it is considered to |
9ec269cb |
150 | be "octets" i.e. characters in the range 0..255 only. Likewise |
c1a61b17 |
151 | on output perl will warn if a "wide" character is written |
152 | to a such a stream. |
153 | |
3d897973 |
154 | =item :raw |
7d3b96bb |
155 | |
0226bbdb |
156 | The C<:raw> layer is I<defined> as being identical to calling |
9ec269cb |
157 | C<binmode($fh)> - the stream is made suitable for passing binary data, |
18aba96f |
158 | i.e. each byte is passed as-is. The stream will still be |
3d897973 |
159 | buffered. |
160 | |
161 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also |
162 | referred to as a "discipline") is documented as the inverse of the |
163 | C<:crlf> layer. That is no longer the case - other layers which would |
9ec269cb |
164 | alter the binary nature of the stream are also disabled. If you want UNIX |
3d897973 |
165 | line endings on a platform that normally does CRLF translation, but still |
9ec269cb |
166 | want UTF-8 or encoding defaults, the appropriate thing to do is to add |
167 | C<:perlio> to the PERLIO environment variable. |
1cbfc93d |
168 | |
0226bbdb |
169 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" |
170 | pops itself and then any layers which do not declare themselves as suitable |
171 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing |
39f7a870 |
172 | flags rather than popping layers but that is an implementation detail.) |
01e6739c |
173 | |
9ec269cb |
174 | As a consequence of the fact that C<:raw> normally pops layers, |
39f7a870 |
175 | it usually only makes sense to have it as the only or first element in |
176 | a layer specification. When used as the first element it provides |
0226bbdb |
177 | a known base on which to build e.g. |
7d3b96bb |
178 | |
0226bbdb |
179 | open($fh,":raw:utf8",...) |
7d3b96bb |
180 | |
0226bbdb |
181 | will construct a "binary" stream, but then enable UTF-8 translation. |
b3d30bf7 |
182 | |
3d897973 |
183 | =item :pop |
4ec2216f |
184 | |
185 | A pseudo layer that removes the top-most layer. Gives perl code |
186 | a way to manipulate the layer stack. Should be considered |
187 | as experimental. Note that C<:pop> only works on real layers |
188 | and will not undo the effects of pseudo layers like C<:utf8>. |
189 | An example of a possible use might be: |
190 | |
191 | open($fh,...) |
192 | ... |
193 | binmode($fh,":encoding(...)"); # next chunk is encoded |
194 | ... |
3c4b39be |
195 | binmode($fh,":pop"); # back to un-encoded |
4ec2216f |
196 | |
197 | A more elegant (and safer) interface is needed. |
198 | |
3d897973 |
199 | =item :win32 |
200 | |
9ec269cb |
201 | On Win32 platforms this I<experimental> layer uses the native "handle" IO |
202 | rather than the unix-like numeric file descriptor layer. Known to be |
3d897973 |
203 | buggy as of perl 5.8.2. |
204 | |
7d3b96bb |
205 | =back |
206 | |
39f7a870 |
207 | =head2 Custom Layers |
208 | |
209 | It is possible to write custom layers in addition to the above builtin |
210 | ones, both in C/XS and Perl. Two such layers (and one example written |
211 | in Perl using the latter) come with the Perl distribution. |
212 | |
213 | =over 4 |
214 | |
215 | =item :encoding |
216 | |
217 | Use C<:encoding(ENCODING)> either in open() or binmode() to install |
9ec269cb |
218 | a layer that transparently does character set and encoding transformations, |
e76300d6 |
219 | for example from Shift-JIS to Unicode. Note that under C<stdio> |
220 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> |
221 | for more information. |
39f7a870 |
222 | |
223 | =item :via |
224 | |
225 | Use C<:via(MODULE)> either in open() or binmode() to install a layer |
226 | that does whatever transformation (for example compression / |
227 | decompression, encryption / decryption) to the filehandle. |
228 | See L<PerlIO::via> for more information. |
229 | |
230 | =back |
231 | |
01e6739c |
232 | =head2 Alternatives to raw |
233 | |
0226bbdb |
234 | To get a binary stream an alternate method is to use: |
01e6739c |
235 | |
0226bbdb |
236 | open($fh,"whatever") |
01e6739c |
237 | binmode($fh); |
238 | |
9ec269cb |
239 | this has the advantage of being backward compatible with how such things have |
01e6739c |
240 | had to be coded on some platforms for years. |
01e6739c |
241 | |
9ec269cb |
242 | To get an unbuffered stream specify an unbuffered layer (e.g. C<:unix>) |
0226bbdb |
243 | in the open call: |
01e6739c |
244 | |
245 | open($fh,"<:unix",$path) |
246 | |
7d3b96bb |
247 | =head2 Defaults and how to override them |
248 | |
ec28694c |
249 | If the platform is MS-DOS like and normally does CRLF to "\n" |
250 | translation for text files then the default layers are : |
7d3b96bb |
251 | |
252 | unix crlf |
253 | |
47bfe92f |
254 | (The low level "unix" layer may be replaced by a platform specific low |
255 | level layer.) |
7d3b96bb |
256 | |
9ec269cb |
257 | Otherwise if C<Configure> found out how to do "fast" IO using the system's |
046e4a6a |
258 | stdio, then the default layers are: |
7d3b96bb |
259 | |
260 | unix stdio |
261 | |
262 | Otherwise the default layers are |
263 | |
264 | unix perlio |
265 | |
266 | These defaults may change once perlio has been better tested and tuned. |
267 | |
47bfe92f |
268 | The default can be overridden by setting the environment variable |
39f7a870 |
269 | PERLIO to a space separated list of layers (C<unix> or platform low |
270 | level layer is always pushed first). |
47bfe92f |
271 | |
7d3b96bb |
272 | This can be used to see the effect of/bugs in the various layers e.g. |
273 | |
274 | cd .../perl/t |
275 | PERLIO=stdio ./perl harness |
276 | PERLIO=perlio ./perl harness |
277 | |
9ec269cb |
278 | For the various values of PERLIO see L<perlrun/PERLIO>. |
3b0db4f9 |
279 | |
4c11337c |
280 | =head2 Querying the layers of filehandles |
39f7a870 |
281 | |
282 | The following returns the B<names> of the PerlIO layers on a filehandle. |
283 | |
9d569fce |
284 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". |
39f7a870 |
285 | |
286 | The layers are returned in the order an open() or binmode() call would |
f0fd62e2 |
287 | use them. Note that the "default stack" depends on the operating |
cc83745d |
288 | system and on the Perl version, and both the compile-time and |
289 | runtime configurations of Perl. |
79d9a4d7 |
290 | |
79d9a4d7 |
291 | The following table summarizes the default layers on UNIX-like and |
9ec269cb |
292 | DOS-like platforms and depending on the setting of C<$ENV{PERLIO}>: |
79d9a4d7 |
293 | |
f0fd62e2 |
294 | PERLIO UNIX-like DOS-like |
a7845df8 |
295 | ------ --------- -------- |
f0fd62e2 |
296 | unset / "" unix perlio / stdio [1] unix crlf |
297 | stdio unix perlio / stdio [1] stdio |
298 | perlio unix perlio unix perlio |
299 | mmap unix mmap unix mmap |
39f7a870 |
300 | |
f0fd62e2 |
301 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends |
302 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" |
046e4a6a |
303 | |
9ec269cb |
304 | By default the layers from the input side of the filehandle are |
305 | returned; to get the output side, use the optional C<output> argument: |
39f7a870 |
306 | |
2ae85e59 |
307 | my @layers = PerlIO::get_layers($fh, output => 1); |
39f7a870 |
308 | |
309 | (Usually the layers are identical on either side of a filehandle but |
2ae85e59 |
310 | for example with sockets there may be differences, or if you have |
311 | been using the C<open> pragma.) |
39f7a870 |
312 | |
92a3e63c |
313 | There is no set_layers(), nor does get_layers() return a tied array |
314 | mirroring the stack, or anything fancy like that. This is not |
315 | accidental or unintentional. The PerlIO layer stack is a bit more |
316 | complicated than just a stack (see for example the behaviour of C<:raw>). |
317 | You are supposed to use open() and binmode() to manipulate the stack. |
318 | |
39f7a870 |
319 | B<Implementation details follow, please close your eyes.> |
320 | |
9ec269cb |
321 | The arguments to layers are by default returned in parentheses after |
39f7a870 |
322 | the name of the layer, and certain layers (like C<utf8>) are not real |
9ec269cb |
323 | layers but instead flags on real layers; to get all of these returned |
324 | separately, use the optional C<details> argument: |
39f7a870 |
325 | |
2ae85e59 |
326 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); |
39f7a870 |
327 | |
328 | The result will be up to be three times the number of layers: |
329 | the first element will be a name, the second element the arguments |
330 | (unspecified arguments will be C<undef>), the third element the flags, |
331 | the fourth element a name again, and so forth. |
332 | |
333 | B<You may open your eyes now.> |
334 | |
7d3b96bb |
335 | =head1 AUTHOR |
336 | |
337 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> |
338 | |
339 | =head1 SEE ALSO |
340 | |
39f7a870 |
341 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, |
342 | L<Encode> |
7d3b96bb |
343 | |
344 | =cut |