Commit | Line | Data |
50b80e25 |
1 | =head1 NAME |
2 | |
3 | perliol - C API for Perl's implementation of IO in Layers. |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | /* Defining a layer ... */ |
8 | #include <perliol.h> |
9 | |
50b80e25 |
10 | =head1 DESCRIPTION |
11 | |
9d799145 |
12 | This document describes the behavior and implementation of the PerlIO |
13 | abstraction described in L<perlapio> when C<USE_PERLIO> is defined (and |
14 | C<USE_SFIO> is not). |
50b80e25 |
15 | |
16 | =head2 History and Background |
17 | |
9d799145 |
18 | The PerlIO abstraction was introduced in perl5.003_02 but languished as |
19 | just an abstraction until perl5.7.0. However during that time a number |
d1be9408 |
20 | of perl extensions switched to using it, so the API is mostly fixed to |
9d799145 |
21 | maintain (source) compatibility. |
50b80e25 |
22 | |
9d799145 |
23 | The aim of the implementation is to provide the PerlIO API in a flexible |
24 | and platform neutral manner. It is also a trial of an "Object Oriented |
25 | C, with vtables" approach which may be applied to perl6. |
50b80e25 |
26 | |
cc83745d |
27 | =head2 Basic Structure |
28 | |
cc7ef057 |
29 | PerlIO is a stack of layers. |
cc83745d |
30 | |
31 | The low levels of the stack work with the low-level operating system |
32 | calls (file descriptors in C) getting bytes in and out, the higher |
cc7ef057 |
33 | layers of the stack buffer, filter, and otherwise manipulate the I/O, |
34 | and return characters (or bytes) to Perl. Terms I<above> and I<below> |
35 | are used to refer to the relative positioning of the stack layers. |
cc83745d |
36 | |
37 | A layer contains a "vtable", the table of I/O operations (at C level |
38 | a table of function pointers), and status flags. The functions in the |
39 | vtable implement operations like "open", "read", and "write". |
40 | |
41 | When I/O, for example "read", is requested, the request goes from Perl |
42 | first down the stack using "read" functions of each layer, then at the |
43 | bottom the input is requested from the operating system services, then |
44 | the result is returned up the stack, finally being interpreted as Perl |
45 | data. |
46 | |
cc7ef057 |
47 | The requests do not necessarily go always all the way down to the |
48 | operating system: that's where PerlIO buffering comes into play. |
49 | |
cc83745d |
50 | When you do an open() and specify extra PerlIO layers to be deployed, |
51 | the layers you specify are "pushed" on top of the already existing |
cc7ef057 |
52 | default stack. One way to see it is that "operating system is |
53 | on the left" and "Perl is on the right". |
54 | |
55 | What exact layers are in this default stack depends on a lot of |
56 | things: your operating system, Perl version, Perl compile time |
57 | configuration, and Perl runtime configuration. See L<PerlIO>, |
cc83745d |
58 | L<perlrun/PERLIO>, and L<open> for more information. |
59 | |
60 | binmode() operates similarly to open(): by default the specified |
61 | layers are pushed on top of the existing stack. |
62 | |
63 | However, note that even as the specified layers are "pushed on top" |
64 | for open() and binmode(), this doesn't mean that the effects are |
65 | limited to the "top": PerlIO layers can be very 'active' and inspect |
66 | and affect layers also deeper in the stack. As an example there |
67 | is a layer called "raw" which repeatedly "pops" layers until |
68 | it reaches the first layer that has declared itself capable of |
69 | handling binary data. The "pushed" layers are processed in left-to-right |
70 | order. |
71 | |
72 | sysopen() operates (unsurprisingly) at a lower level in the stack than |
73 | open(). For example in UNIX or UNIX-like systems sysopen() operates |
74 | directly at the level of file descriptors: in the terms of PerlIO |
75 | layers, it uses only the "unix" layer, which is a rather thin wrapper |
76 | on top of the UNIX file descriptors. |
77 | |
50b80e25 |
78 | =head2 Layers vs Disciplines |
79 | |
9d799145 |
80 | Initial discussion of the ability to modify IO streams behaviour used |
81 | the term "discipline" for the entities which were added. This came (I |
82 | believe) from the use of the term in "sfio", which in turn borrowed it |
83 | from "line disciplines" on Unix terminals. However, this document (and |
84 | the C code) uses the term "layer". |
85 | |
1d11c889 |
86 | This is, I hope, a natural term given the implementation, and should |
87 | avoid connotations that are inherent in earlier uses of "discipline" |
88 | for things which are rather different. |
50b80e25 |
89 | |
90 | =head2 Data Structures |
91 | |
92 | The basic data structure is a PerlIOl: |
93 | |
94 | typedef struct _PerlIO PerlIOl; |
95 | typedef struct _PerlIO_funcs PerlIO_funcs; |
96 | typedef PerlIOl *PerlIO; |
97 | |
98 | struct _PerlIO |
99 | { |
100 | PerlIOl * next; /* Lower layer */ |
101 | PerlIO_funcs * tab; /* Functions for this layer */ |
102 | IV flags; /* Various flags for state */ |
103 | }; |
104 | |
1d11c889 |
105 | A C<PerlIOl *> is a pointer to the struct, and the I<application> |
106 | level C<PerlIO *> is a pointer to a C<PerlIOl *> - i.e. a pointer |
107 | to a pointer to the struct. This allows the application level C<PerlIO *> |
108 | to remain constant while the actual C<PerlIOl *> underneath |
109 | changes. (Compare perl's C<SV *> which remains constant while its |
110 | C<sv_any> field changes as the scalar's type changes.) An IO stream is |
111 | then in general represented as a pointer to this linked-list of |
112 | "layers". |
50b80e25 |
113 | |
9d799145 |
114 | It should be noted that because of the double indirection in a C<PerlIO *>, |
d4165bde |
115 | a C<< &(perlio->next) >> "is" a C<PerlIO *>, and so to some degree |
11e1c8f2 |
116 | at least one layer can use the "standard" API on the next layer down. |
50b80e25 |
117 | |
118 | A "layer" is composed of two parts: |
119 | |
120 | =over 4 |
121 | |
210b36aa |
122 | =item 1. |
50b80e25 |
123 | |
210b36aa |
124 | The functions and attributes of the "layer class". |
125 | |
126 | =item 2. |
127 | |
128 | The per-instance data for a particular handle. |
50b80e25 |
129 | |
130 | =back |
131 | |
132 | =head2 Functions and Attributes |
133 | |
9d799145 |
134 | The functions and attributes are accessed via the "tab" (for table) |
135 | member of C<PerlIOl>. The functions (methods of the layer "class") are |
136 | fixed, and are defined by the C<PerlIO_funcs> type. They are broadly the |
137 | same as the public C<PerlIO_xxxxx> functions: |
50b80e25 |
138 | |
b76cc8ba |
139 | struct _PerlIO_funcs |
140 | { |
2dc2558e |
141 | Size_t fsize; |
b76cc8ba |
142 | char * name; |
143 | Size_t size; |
144 | IV kind; |
2dc2558e |
145 | IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg, PerlIO_funcs *tab); |
d4165bde |
146 | IV (*Popped)(pTHX_ PerlIO *f); |
b76cc8ba |
147 | PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab, |
148 | AV *layers, IV n, |
149 | const char *mode, |
150 | int fd, int imode, int perm, |
151 | PerlIO *old, |
152 | int narg, SV **args); |
86e05cf2 |
153 | IV (*Binmode)(pTHX_ PerlIO *f); |
d4165bde |
154 | SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags) |
155 | IV (*Fileno)(pTHX_ PerlIO *f); |
156 | PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags) |
b76cc8ba |
157 | /* Unix-like functions - cf sfio line disciplines */ |
d4165bde |
158 | SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); |
159 | SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count); |
160 | SSize_t (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count); |
161 | IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); |
162 | Off_t (*Tell)(pTHX_ PerlIO *f); |
163 | IV (*Close)(pTHX_ PerlIO *f); |
b76cc8ba |
164 | /* Stdio-like buffered IO functions */ |
d4165bde |
165 | IV (*Flush)(pTHX_ PerlIO *f); |
166 | IV (*Fill)(pTHX_ PerlIO *f); |
167 | IV (*Eof)(pTHX_ PerlIO *f); |
168 | IV (*Error)(pTHX_ PerlIO *f); |
169 | void (*Clearerr)(pTHX_ PerlIO *f); |
170 | void (*Setlinebuf)(pTHX_ PerlIO *f); |
b76cc8ba |
171 | /* Perl's snooping functions */ |
d4165bde |
172 | STDCHAR * (*Get_base)(pTHX_ PerlIO *f); |
173 | Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); |
174 | STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); |
175 | SSize_t (*Get_cnt)(pTHX_ PerlIO *f); |
176 | void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt); |
b76cc8ba |
177 | }; |
178 | |
2dc2558e |
179 | The first few members of the struct give a function table size for |
180 | compatibility check "name" for the layer, the size to C<malloc> for the per-instance data, |
181 | and some flags which are attributes of the class as whole (such as whether it is a buffering |
9d799145 |
182 | layer), then follow the functions which fall into four basic groups: |
50b80e25 |
183 | |
184 | =over 4 |
185 | |
aa500c9e |
186 | =item 1. |
50b80e25 |
187 | |
aa500c9e |
188 | Opening and setup functions |
50b80e25 |
189 | |
aa500c9e |
190 | =item 2. |
50b80e25 |
191 | |
aa500c9e |
192 | Basic IO operations |
193 | |
194 | =item 3. |
195 | |
196 | Stdio class buffering options. |
197 | |
198 | =item 4. |
199 | |
200 | Functions to support Perl's traditional "fast" access to the buffer. |
50b80e25 |
201 | |
202 | =back |
203 | |
1d11c889 |
204 | A layer does not have to implement all the functions, but the whole |
205 | table has to be present. Unimplemented slots can be NULL (which will |
206 | result in an error when called) or can be filled in with stubs to |
207 | "inherit" behaviour from a "base class". This "inheritance" is fixed |
208 | for all instances of the layer, but as the layer chooses which stubs |
209 | to populate the table, limited "multiple inheritance" is possible. |
50b80e25 |
210 | |
211 | =head2 Per-instance Data |
212 | |
1d11c889 |
213 | The per-instance data are held in memory beyond the basic PerlIOl |
214 | struct, by making a PerlIOl the first member of the layer's struct |
215 | thus: |
50b80e25 |
216 | |
217 | typedef struct |
218 | { |
219 | struct _PerlIO base; /* Base "class" info */ |
220 | STDCHAR * buf; /* Start of buffer */ |
221 | STDCHAR * end; /* End of valid part of buffer */ |
222 | STDCHAR * ptr; /* Current position in buffer */ |
223 | Off_t posn; /* Offset of buf into the file */ |
224 | Size_t bufsiz; /* Real size of buffer */ |
225 | IV oneword; /* Emergency buffer */ |
226 | } PerlIOBuf; |
227 | |
1d11c889 |
228 | In this way (as for perl's scalars) a pointer to a PerlIOBuf can be |
229 | treated as a pointer to a PerlIOl. |
50b80e25 |
230 | |
231 | =head2 Layers in action. |
232 | |
233 | table perlio unix |
234 | | | |
235 | +-----------+ +----------+ +--------+ |
236 | PerlIO ->| |--->| next |--->| NULL | |
237 | +-----------+ +----------+ +--------+ |
238 | | | | buffer | | fd | |
239 | +-----------+ | | +--------+ |
240 | | | +----------+ |
241 | |
242 | |
243 | The above attempts to show how the layer scheme works in a simple case. |
9d799145 |
244 | The application's C<PerlIO *> points to an entry in the table(s) |
245 | representing open (allocated) handles. For example the first three slots |
246 | in the table correspond to C<stdin>,C<stdout> and C<stderr>. The table |
247 | in turn points to the current "top" layer for the handle - in this case |
248 | an instance of the generic buffering layer "perlio". That layer in turn |
249 | points to the next layer down - in this case the lowlevel "unix" layer. |
50b80e25 |
250 | |
9d799145 |
251 | The above is roughly equivalent to a "stdio" buffered stream, but with |
252 | much more flexibility: |
50b80e25 |
253 | |
254 | =over 4 |
255 | |
256 | =item * |
257 | |
9d799145 |
258 | If Unix level C<read>/C<write>/C<lseek> is not appropriate for (say) |
259 | sockets then the "unix" layer can be replaced (at open time or even |
260 | dynamically) with a "socket" layer. |
50b80e25 |
261 | |
262 | =item * |
263 | |
1d11c889 |
264 | Different handles can have different buffering schemes. The "top" |
265 | layer could be the "mmap" layer if reading disk files was quicker |
266 | using C<mmap> than C<read>. An "unbuffered" stream can be implemented |
267 | simply by not having a buffer layer. |
50b80e25 |
268 | |
269 | =item * |
270 | |
271 | Extra layers can be inserted to process the data as it flows through. |
9d799145 |
272 | This was the driving need for including the scheme in perl 5.7.0+ - we |
d1be9408 |
273 | needed a mechanism to allow data to be translated between perl's |
9d799145 |
274 | internal encoding (conceptually at least Unicode as UTF-8), and the |
275 | "native" format used by the system. This is provided by the |
276 | ":encoding(xxxx)" layer which typically sits above the buffering layer. |
50b80e25 |
277 | |
278 | =item * |
279 | |
1d11c889 |
280 | A layer can be added that does "\n" to CRLF translation. This layer |
281 | can be used on any platform, not just those that normally do such |
282 | things. |
50b80e25 |
283 | |
284 | =back |
285 | |
286 | =head2 Per-instance flag bits |
287 | |
1d11c889 |
288 | The generic flag bits are a hybrid of C<O_XXXXX> style flags deduced |
289 | from the mode string passed to C<PerlIO_open()>, and state bits for |
290 | typical buffer layers. |
50b80e25 |
291 | |
9d799145 |
292 | =over 4 |
50b80e25 |
293 | |
294 | =item PERLIO_F_EOF |
295 | |
296 | End of file. |
297 | |
298 | =item PERLIO_F_CANWRITE |
299 | |
3039a93d |
300 | Writes are permitted, i.e. opened as "w" or "r+" or "a", etc. |
50b80e25 |
301 | |
302 | =item PERLIO_F_CANREAD |
303 | |
3039a93d |
304 | Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick). |
50b80e25 |
305 | |
306 | =item PERLIO_F_ERROR |
307 | |
d4165bde |
308 | An error has occurred (for C<PerlIO_error()>). |
50b80e25 |
309 | |
310 | =item PERLIO_F_TRUNCATE |
311 | |
312 | Truncate file suggested by open mode. |
313 | |
314 | =item PERLIO_F_APPEND |
315 | |
316 | All writes should be appends. |
317 | |
318 | =item PERLIO_F_CRLF |
319 | |
11e1c8f2 |
320 | Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF |
321 | mapped to "\n" for input. Normally the provided "crlf" layer is the only |
322 | layer that need bother about this. C<PerlIO_binmode()> will mess with this |
9d799145 |
323 | flag rather than add/remove layers if the C<PERLIO_K_CANCRLF> bit is set |
324 | for the layers class. |
50b80e25 |
325 | |
326 | =item PERLIO_F_UTF8 |
327 | |
3039a93d |
328 | Data written to this layer should be UTF-8 encoded; data provided |
50b80e25 |
329 | by this layer should be considered UTF-8 encoded. Can be set on any layer |
330 | by ":utf8" dummy layer. Also set on ":encoding" layer. |
331 | |
332 | =item PERLIO_F_UNBUF |
333 | |
334 | Layer is unbuffered - i.e. write to next layer down should occur for |
335 | each write to this layer. |
336 | |
337 | =item PERLIO_F_WRBUF |
338 | |
339 | The buffer for this layer currently holds data written to it but not sent |
340 | to next layer. |
341 | |
342 | =item PERLIO_F_RDBUF |
343 | |
344 | The buffer for this layer currently holds unconsumed data read from |
345 | layer below. |
346 | |
347 | =item PERLIO_F_LINEBUF |
348 | |
9d799145 |
349 | Layer is line buffered. Write data should be passed to next layer down |
350 | whenever a "\n" is seen. Any data beyond the "\n" should then be |
351 | processed. |
50b80e25 |
352 | |
353 | =item PERLIO_F_TEMP |
354 | |
9d799145 |
355 | File has been C<unlink()>ed, or should be deleted on C<close()>. |
50b80e25 |
356 | |
357 | =item PERLIO_F_OPEN |
358 | |
359 | Handle is open. |
360 | |
361 | =item PERLIO_F_FASTGETS |
362 | |
9d799145 |
363 | This instance of this layer supports the "fast C<gets>" interface. |
364 | Normally set based on C<PERLIO_K_FASTGETS> for the class and by the |
d1be9408 |
365 | existence of the function(s) in the table. However a class that |
50b80e25 |
366 | normally provides that interface may need to avoid it on a |
367 | particular instance. The "pending" layer needs to do this when |
d1be9408 |
368 | it is pushed above a layer which does not support the interface. |
9d799145 |
369 | (Perl's C<sv_gets()> does not expect the streams fast C<gets> behaviour |
50b80e25 |
370 | to change during one "get".) |
371 | |
372 | =back |
373 | |
374 | =head2 Methods in Detail |
375 | |
376 | =over 4 |
377 | |
e2d9456f |
378 | =item fsize |
2dc2558e |
379 | |
380 | Size_t fsize; |
381 | |
a489db4d |
382 | Size of the function table. This is compared against the value PerlIO |
383 | code "knows" as a compatibility check. Future versions I<may> be able |
384 | to tolerate layers compiled against an old version of the headers. |
2dc2558e |
385 | |
5cb3728c |
386 | =item name |
387 | |
388 | char * name; |
d4165bde |
389 | |
390 | The name of the layer whose open() method Perl should invoke on |
391 | open(). For example if the layer is called APR, you will call: |
392 | |
393 | open $fh, ">:APR", ... |
394 | |
395 | and Perl knows that it has to invoke the PerlIOAPR_open() method |
396 | implemented by the APR layer. |
397 | |
5cb3728c |
398 | =item size |
399 | |
400 | Size_t size; |
d4165bde |
401 | |
402 | The size of the per-instance data structure, e.g.: |
403 | |
404 | sizeof(PerlIOAPR) |
405 | |
a489db4d |
406 | If this field is zero then C<PerlIO_pushed> does not malloc anything |
407 | and assumes layer's Pushed function will do any required layer stack |
408 | manipulation - used to avoid malloc/free overhead for dummy layers. |
2dc2558e |
409 | If the field is non-zero it must be at least the size of C<PerlIOl>, |
410 | C<PerlIO_pushed> will allocate memory for the layer's data structures |
411 | and link new layer onto the stream's stack. (If the layer's Pushed |
412 | method returns an error indication the layer is popped again.) |
413 | |
5cb3728c |
414 | =item kind |
415 | |
416 | IV kind; |
d4165bde |
417 | |
d4165bde |
418 | =over 4 |
419 | |
420 | =item * PERLIO_K_BUFFERED |
421 | |
86e05cf2 |
422 | The layer is buffered. |
423 | |
424 | =item * PERLIO_K_RAW |
425 | |
426 | The layer is acceptable to have in a binmode(FH) stack - i.e. it does not |
427 | (or will configure itself not to) transform bytes passing through it. |
428 | |
d4165bde |
429 | =item * PERLIO_K_CANCRLF |
430 | |
86e05cf2 |
431 | Layer can translate between "\n" and CRLF line ends. |
432 | |
d4165bde |
433 | =item * PERLIO_K_FASTGETS |
434 | |
86e05cf2 |
435 | Layer allows buffer snooping. |
436 | |
d4165bde |
437 | =item * PERLIO_K_MULTIARG |
438 | |
439 | Used when the layer's open() accepts more arguments than usual. The |
440 | extra arguments should come not before the C<MODE> argument. When this |
441 | flag is used it's up to the layer to validate the args. |
442 | |
d4165bde |
443 | =back |
444 | |
5cb3728c |
445 | =item Pushed |
446 | |
447 | IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg); |
50b80e25 |
448 | |
1d11c889 |
449 | The only absolutely mandatory method. Called when the layer is pushed |
450 | onto the stack. The C<mode> argument may be NULL if this occurs |
451 | post-open. The C<arg> will be non-C<NULL> if an argument string was |
452 | passed. In most cases this should call C<PerlIOBase_pushed()> to |
453 | convert C<mode> into the appropriate C<PERLIO_F_XXXXX> flags in |
454 | addition to any actions the layer itself takes. If a layer is not |
455 | expecting an argument it need neither save the one passed to it, nor |
456 | provide C<Getarg()> (it could perhaps C<Perl_warn> that the argument |
457 | was un-expected). |
50b80e25 |
458 | |
d4165bde |
459 | Returns 0 on success. On failure returns -1 and should set errno. |
460 | |
5cb3728c |
461 | =item Popped |
462 | |
463 | IV (*Popped)(pTHX_ PerlIO *f); |
50b80e25 |
464 | |
1d11c889 |
465 | Called when the layer is popped from the stack. A layer will normally |
466 | be popped after C<Close()> is called. But a layer can be popped |
467 | without being closed if the program is dynamically managing layers on |
468 | the stream. In such cases C<Popped()> should free any resources |
469 | (buffers, translation tables, ...) not held directly in the layer's |
470 | struct. It should also C<Unread()> any unconsumed data that has been |
471 | read and buffered from the layer below back to that layer, so that it |
472 | can be re-provided to what ever is now above. |
b76cc8ba |
473 | |
3077d0b1 |
474 | Returns 0 on success and failure. If C<Popped()> returns I<true> then |
475 | I<perlio.c> assumes that either the layer has popped itself, or the |
476 | layer is super special and needs to be retained for other reasons. |
477 | In most cases it should return I<false>. |
d4165bde |
478 | |
5cb3728c |
479 | =item Open |
480 | |
481 | PerlIO * (*Open)(...); |
b76cc8ba |
482 | |
1d11c889 |
483 | The C<Open()> method has lots of arguments because it combines the |
484 | functions of perl's C<open>, C<PerlIO_open>, perl's C<sysopen>, |
485 | C<PerlIO_fdopen> and C<PerlIO_reopen>. The full prototype is as |
486 | follows: |
b76cc8ba |
487 | |
488 | PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab, |
489 | AV *layers, IV n, |
490 | const char *mode, |
491 | int fd, int imode, int perm, |
492 | PerlIO *old, |
493 | int narg, SV **args); |
494 | |
1d11c889 |
495 | Open should (perhaps indirectly) call C<PerlIO_allocate()> to allocate |
496 | a slot in the table and associate it with the layers information for |
497 | the opened file, by calling C<PerlIO_push>. The I<layers> AV is an |
498 | array of all the layers destined for the C<PerlIO *>, and any |
499 | arguments passed to them, I<n> is the index into that array of the |
500 | layer being called. The macro C<PerlIOArg> will return a (possibly |
501 | C<NULL>) SV * for the argument passed to the layer. |
502 | |
503 | The I<mode> string is an "C<fopen()>-like" string which would match |
504 | the regular expression C</^[I#]?[rwa]\+?[bt]?$/>. |
505 | |
506 | The C<'I'> prefix is used during creation of C<stdin>..C<stderr> via |
507 | special C<PerlIO_fdopen> calls; the C<'#'> prefix means that this is |
508 | C<sysopen> and that I<imode> and I<perm> should be passed to |
509 | C<PerlLIO_open3>; C<'r'> means B<r>ead, C<'w'> means B<w>rite and |
510 | C<'a'> means B<a>ppend. The C<'+'> suffix means that both reading and |
a489db4d |
511 | writing/appending are permitted. The C<'b'> suffix means file should |
512 | be binary, and C<'t'> means it is text. (Almost all layers should do |
513 | the IO in binary mode, and ignore the b/t bits. The C<:crlf> layer |
514 | should be pushed to handle the distinction.) |
1d11c889 |
515 | |
516 | If I<old> is not C<NULL> then this is a C<PerlIO_reopen>. Perl itself |
517 | does not use this (yet?) and semantics are a little vague. |
518 | |
519 | If I<fd> not negative then it is the numeric file descriptor I<fd>, |
520 | which will be open in a manner compatible with the supplied mode |
521 | string, the call is thus equivalent to C<PerlIO_fdopen>. In this case |
522 | I<nargs> will be zero. |
523 | |
524 | If I<nargs> is greater than zero then it gives the number of arguments |
525 | passed to C<open>, otherwise it will be 1 if for example |
526 | C<PerlIO_open> was called. In simple cases SvPV_nolen(*args) is the |
527 | pathname to open. |
528 | |
529 | Having said all that translation-only layers do not need to provide |
530 | C<Open()> at all, but rather leave the opening to a lower level layer |
531 | and wait to be "pushed". If a layer does provide C<Open()> it should |
532 | normally call the C<Open()> method of next layer down (if any) and |
533 | then push itself on top if that succeeds. |
b76cc8ba |
534 | |
3077d0b1 |
535 | If C<PerlIO_push> was performed and open has failed, it must |
536 | C<PerlIO_pop> itself, since if it's not, the layer won't be removed |
537 | and may cause bad problems. |
538 | |
d4165bde |
539 | Returns C<NULL> on failure. |
540 | |
86e05cf2 |
541 | =item Binmode |
542 | |
543 | IV (*Binmode)(pTHX_ PerlIO *f); |
544 | |
545 | Optional. Used when C<:raw> layer is pushed (explicitly or as a result |
546 | of binmode(FH)). If not present layer will be popped. If present |
547 | should configure layer as binary (or pop itself) and return 0. |
548 | If it returns -1 for error C<binmode> will fail with layer |
549 | still on the stack. |
550 | |
5cb3728c |
551 | =item Getarg |
552 | |
553 | SV * (*Getarg)(pTHX_ PerlIO *f, |
554 | CLONE_PARAMS *param, int flags); |
b76cc8ba |
555 | |
d4165bde |
556 | Optional. If present should return an SV * representing the string |
557 | argument passed to the layer when it was |
558 | pushed. e.g. ":encoding(ascii)" would return an SvPV with value |
559 | "ascii". (I<param> and I<flags> arguments can be ignored in most |
560 | cases) |
b76cc8ba |
561 | |
5cb3728c |
562 | =item Fileno |
563 | |
564 | IV (*Fileno)(pTHX_ PerlIO *f); |
b76cc8ba |
565 | |
d1be9408 |
566 | Returns the Unix/Posix numeric file descriptor for the handle. Normally |
b76cc8ba |
567 | C<PerlIOBase_fileno()> (which just asks next layer down) will suffice |
568 | for this. |
50b80e25 |
569 | |
a489db4d |
570 | Returns -1 on error, which is considered to include the case where the |
571 | layer cannot provide such a file descriptor. |
d4165bde |
572 | |
5cb3728c |
573 | =item Dup |
574 | |
575 | PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, |
576 | CLONE_PARAMS *param, int flags); |
d4165bde |
577 | |
2dc2558e |
578 | XXX: Needs more docs. |
579 | |
a489db4d |
580 | Used as part of the "clone" process when a thread is spawned (in which |
581 | case param will be non-NULL) and when a stream is being duplicated via |
582 | '&' in the C<open>. |
d4165bde |
583 | |
584 | Similar to C<Open>, returns PerlIO* on success, C<NULL> on failure. |
585 | |
5cb3728c |
586 | =item Read |
587 | |
588 | SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); |
d4165bde |
589 | |
590 | Basic read operation. |
50b80e25 |
591 | |
d4165bde |
592 | Typically will call C<Fill> and manipulate pointers (possibly via the |
593 | API). C<PerlIOBuf_read()> may be suitable for derived classes which |
594 | provide "fast gets" methods. |
50b80e25 |
595 | |
d4165bde |
596 | Returns actual bytes read, or -1 on an error. |
597 | |
5cb3728c |
598 | =item Unread |
599 | |
600 | SSize_t (*Unread)(pTHX_ PerlIO *f, |
601 | const void *vbuf, Size_t count); |
50b80e25 |
602 | |
9d799145 |
603 | A superset of stdio's C<ungetc()>. Should arrange for future reads to |
604 | see the bytes in C<vbuf>. If there is no obviously better implementation |
605 | then C<PerlIOBase_unread()> provides the function by pushing a "fake" |
606 | "pending" layer above the calling layer. |
50b80e25 |
607 | |
d4165bde |
608 | Returns the number of unread chars. |
609 | |
5cb3728c |
610 | =item Write |
611 | |
612 | SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); |
50b80e25 |
613 | |
d4165bde |
614 | Basic write operation. |
50b80e25 |
615 | |
d4165bde |
616 | Returns bytes written or -1 on an error. |
617 | |
5cb3728c |
618 | =item Seek |
619 | |
620 | IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); |
50b80e25 |
621 | |
1d11c889 |
622 | Position the file pointer. Should normally call its own C<Flush> |
623 | method and then the C<Seek> method of next layer down. |
50b80e25 |
624 | |
d4165bde |
625 | Returns 0 on success, -1 on failure. |
626 | |
5cb3728c |
627 | =item Tell |
628 | |
629 | Off_t (*Tell)(pTHX_ PerlIO *f); |
50b80e25 |
630 | |
9d799145 |
631 | Return the file pointer. May be based on layers cached concept of |
632 | position to avoid overhead. |
50b80e25 |
633 | |
d4165bde |
634 | Returns -1 on failure to get the file pointer. |
635 | |
5cb3728c |
636 | =item Close |
637 | |
638 | IV (*Close)(pTHX_ PerlIO *f); |
50b80e25 |
639 | |
9d799145 |
640 | Close the stream. Should normally call C<PerlIOBase_close()> to flush |
641 | itself and close layers below, and then deallocate any data structures |
642 | (buffers, translation tables, ...) not held directly in the data |
643 | structure. |
50b80e25 |
644 | |
d4165bde |
645 | Returns 0 on success, -1 on failure. |
646 | |
5cb3728c |
647 | =item Flush |
648 | |
649 | IV (*Flush)(pTHX_ PerlIO *f); |
50b80e25 |
650 | |
9d799145 |
651 | Should make stream's state consistent with layers below. That is, any |
652 | buffered write data should be written, and file position of lower layers |
d1be9408 |
653 | adjusted for data read from below but not actually consumed. |
b76cc8ba |
654 | (Should perhaps C<Unread()> such data to the lower layer.) |
50b80e25 |
655 | |
d4165bde |
656 | Returns 0 on success, -1 on failure. |
657 | |
5cb3728c |
658 | =item Fill |
659 | |
660 | IV (*Fill)(pTHX_ PerlIO *f); |
d4165bde |
661 | |
662 | The buffer for this layer should be filled (for read) from layer |
663 | below. When you "subclass" PerlIOBuf layer, you want to use its |
664 | I<_read> method and to supply your own fill method, which fills the |
665 | PerlIOBuf's buffer. |
50b80e25 |
666 | |
d4165bde |
667 | Returns 0 on success, -1 on failure. |
50b80e25 |
668 | |
5cb3728c |
669 | =item Eof |
670 | |
671 | IV (*Eof)(pTHX_ PerlIO *f); |
50b80e25 |
672 | |
9d799145 |
673 | Return end-of-file indicator. C<PerlIOBase_eof()> is normally sufficient. |
50b80e25 |
674 | |
d4165bde |
675 | Returns 0 on end-of-file, 1 if not end-of-file, -1 on error. |
676 | |
5cb3728c |
677 | =item Error |
678 | |
679 | IV (*Error)(pTHX_ PerlIO *f); |
50b80e25 |
680 | |
9d799145 |
681 | Return error indicator. C<PerlIOBase_error()> is normally sufficient. |
50b80e25 |
682 | |
d4165bde |
683 | Returns 1 if there is an error (usually when C<PERLIO_F_ERROR> is set, |
684 | 0 otherwise. |
685 | |
5cb3728c |
686 | =item Clearerr |
687 | |
688 | void (*Clearerr)(pTHX_ PerlIO *f); |
50b80e25 |
689 | |
9d799145 |
690 | Clear end-of-file and error indicators. Should call C<PerlIOBase_clearerr()> |
691 | to set the C<PERLIO_F_XXXXX> flags, which may suffice. |
50b80e25 |
692 | |
5cb3728c |
693 | =item Setlinebuf |
694 | |
695 | void (*Setlinebuf)(pTHX_ PerlIO *f); |
50b80e25 |
696 | |
b76cc8ba |
697 | Mark the stream as line buffered. C<PerlIOBase_setlinebuf()> sets the |
698 | PERLIO_F_LINEBUF flag and is normally sufficient. |
50b80e25 |
699 | |
5cb3728c |
700 | =item Get_base |
701 | |
702 | STDCHAR * (*Get_base)(pTHX_ PerlIO *f); |
50b80e25 |
703 | |
704 | Allocate (if not already done so) the read buffer for this layer and |
d4165bde |
705 | return pointer to it. Return NULL on failure. |
50b80e25 |
706 | |
5cb3728c |
707 | =item Get_bufsiz |
708 | |
709 | Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); |
50b80e25 |
710 | |
9d799145 |
711 | Return the number of bytes that last C<Fill()> put in the buffer. |
50b80e25 |
712 | |
5cb3728c |
713 | =item Get_ptr |
714 | |
715 | STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); |
50b80e25 |
716 | |
3039a93d |
717 | Return the current read pointer relative to this layer's buffer. |
50b80e25 |
718 | |
5cb3728c |
719 | =item Get_cnt |
720 | |
721 | SSize_t (*Get_cnt)(pTHX_ PerlIO *f); |
50b80e25 |
722 | |
723 | Return the number of bytes left to be read in the current buffer. |
724 | |
5cb3728c |
725 | =item Set_ptrcnt |
726 | |
727 | void (*Set_ptrcnt)(pTHX_ PerlIO *f, |
728 | STDCHAR *ptr, SSize_t cnt); |
50b80e25 |
729 | |
730 | Adjust the read pointer and count of bytes to match C<ptr> and/or C<cnt>. |
731 | The application (or layer above) must ensure they are consistent. |
732 | (Checking is allowed by the paranoid.) |
733 | |
734 | =back |
735 | |
210e727c |
736 | =head2 Implementing PerlIO Layers |
737 | |
2535a4f7 |
738 | If you find the implementation document unclear or not sufficient, |
739 | look at the existing perlio layer implementations, which include: |
740 | |
741 | =over |
742 | |
743 | =item * C implementations |
744 | |
eae154c7 |
745 | The F<perlio.c> and F<perliol.h> in the Perl core implement the |
746 | "unix", "perlio", "stdio", "crlf", "utf8", "byte", "raw", "pending" |
747 | layers, and also the "mmap" and "win32" layers if applicable. |
748 | (The "win32" is currently unfinished and unused, to see what is used |
749 | instead in Win32, see L<PerlIO/"Querying the layers of filehandles"> .) |
750 | |
2535a4f7 |
751 | PerlIO::encoding, PerlIO::scalar, PerlIO::via in the Perl core. |
752 | |
753 | PerlIO::gzip and APR::PerlIO (mod_perl 2.0) on CPAN. |
754 | |
755 | =item * Perl implementations |
756 | |
757 | PerlIO::via::QuotedPrint in the Perl core and PerlIO::via::* on CPAN. |
758 | |
759 | =back |
760 | |
210e727c |
761 | If you are creating a PerlIO layer, you may want to be lazy, in other |
762 | words, implement only the methods that interest you. The other methods |
763 | you can either replace with the "blank" methods |
764 | |
765 | PerlIOBase_noop_ok |
766 | PerlIOBase_noop_fail |
767 | |
768 | (which do nothing, and return zero and -1, respectively) or for |
769 | certain methods you may assume a default behaviour by using a NULL |
61bdadae |
770 | method. The Open method looks for help in the 'parent' layer. |
771 | The following table summarizes the behaviour: |
210e727c |
772 | |
773 | method behaviour with NULL |
774 | |
775 | Clearerr PerlIOBase_clearerr |
776 | Close PerlIOBase_close |
61bdadae |
777 | Dup PerlIOBase_dup |
210e727c |
778 | Eof PerlIOBase_eof |
779 | Error PerlIOBase_error |
780 | Fileno PerlIOBase_fileno |
781 | Fill FAILURE |
782 | Flush SUCCESS |
61bdadae |
783 | Getarg SUCCESS |
210e727c |
784 | Get_base FAILURE |
785 | Get_bufsiz FAILURE |
786 | Get_cnt FAILURE |
787 | Get_ptr FAILURE |
61bdadae |
788 | Open INHERITED |
789 | Popped SUCCESS |
790 | Pushed SUCCESS |
210e727c |
791 | Read PerlIOBase_read |
792 | Seek FAILURE |
793 | Set_cnt FAILURE |
794 | Set_ptrcnt FAILURE |
795 | Setlinebuf PerlIOBase_setlinebuf |
796 | Tell FAILURE |
797 | Unread PerlIOBase_unread |
798 | Write FAILURE |
50b80e25 |
799 | |
61bdadae |
800 | FAILURE Set errno (to EINVAL in UNIXish, to LIB$_INVARG in VMS) and |
801 | return -1 (for numeric return values) or NULL (for pointers) |
802 | INHERITED Inherited from the layer below |
803 | SUCCESS Return 0 (for numeric return values) or a pointer |
804 | |
50b80e25 |
805 | =head2 Core Layers |
806 | |
807 | The file C<perlio.c> provides the following layers: |
808 | |
809 | =over 4 |
810 | |
811 | =item "unix" |
812 | |
9d799145 |
813 | A basic non-buffered layer which calls Unix/POSIX C<read()>, C<write()>, |
814 | C<lseek()>, C<close()>. No buffering. Even on platforms that distinguish |
815 | between O_TEXT and O_BINARY this layer is always O_BINARY. |
50b80e25 |
816 | |
817 | =item "perlio" |
818 | |
9d799145 |
819 | A very complete generic buffering layer which provides the whole of |
820 | PerlIO API. It is also intended to be used as a "base class" for other |
1d11c889 |
821 | layers. (For example its C<Read()> method is implemented in terms of |
822 | the C<Get_cnt()>/C<Get_ptr()>/C<Set_ptrcnt()> methods). |
50b80e25 |
823 | |
9d799145 |
824 | "perlio" over "unix" provides a complete replacement for stdio as seen |
825 | via PerlIO API. This is the default for USE_PERLIO when system's stdio |
1d11c889 |
826 | does not permit perl's "fast gets" access, and which do not |
827 | distinguish between C<O_TEXT> and C<O_BINARY>. |
50b80e25 |
828 | |
829 | =item "stdio" |
830 | |
9d799145 |
831 | A layer which provides the PerlIO API via the layer scheme, but |
832 | implements it by calling system's stdio. This is (currently) the default |
833 | if system's stdio provides sufficient access to allow perl's "fast gets" |
834 | access and which do not distinguish between C<O_TEXT> and C<O_BINARY>. |
50b80e25 |
835 | |
836 | =item "crlf" |
837 | |
9d799145 |
838 | A layer derived using "perlio" as a base class. It provides Win32-like |
839 | "\n" to CR,LF translation. Can either be applied above "perlio" or serve |
840 | as the buffer layer itself. "crlf" over "unix" is the default if system |
841 | distinguishes between C<O_TEXT> and C<O_BINARY> opens. (At some point |
842 | "unix" will be replaced by a "native" Win32 IO layer on that platform, |
843 | as Win32's read/write layer has various drawbacks.) The "crlf" layer is |
844 | a reasonable model for a layer which transforms data in some way. |
50b80e25 |
845 | |
846 | =item "mmap" |
847 | |
9d799145 |
848 | If Configure detects C<mmap()> functions this layer is provided (with |
849 | "perlio" as a "base") which does "read" operations by mmap()ing the |
850 | file. Performance improvement is marginal on modern systems, so it is |
851 | mainly there as a proof of concept. It is likely to be unbundled from |
852 | the core at some point. The "mmap" layer is a reasonable model for a |
853 | minimalist "derived" layer. |
50b80e25 |
854 | |
855 | =item "pending" |
856 | |
9d799145 |
857 | An "internal" derivative of "perlio" which can be used to provide |
1d11c889 |
858 | Unread() function for layers which have no buffer or cannot be |
859 | bothered. (Basically this layer's C<Fill()> pops itself off the stack |
860 | and so resumes reading from layer below.) |
50b80e25 |
861 | |
862 | =item "raw" |
863 | |
9d799145 |
864 | A dummy layer which never exists on the layer stack. Instead when |
86e05cf2 |
865 | "pushed" it actually pops the stack removing itself, it then calls |
866 | Binmode function table entry on all the layers in the stack - normally |
867 | this (via PerlIOBase_binmode) removes any layers which do not have |
868 | C<PERLIO_K_RAW> bit set. Layers can modify that behaviour by defining |
869 | their own Binmode entry. |
50b80e25 |
870 | |
871 | =item "utf8" |
872 | |
9d799145 |
873 | Another dummy layer. When pushed it pops itself and sets the |
1d11c889 |
874 | C<PERLIO_F_UTF8> flag on the layer which was (and now is once more) |
875 | the top of the stack. |
50b80e25 |
876 | |
877 | =back |
878 | |
9d799145 |
879 | In addition F<perlio.c> also provides a number of C<PerlIOBase_xxxx()> |
880 | functions which are intended to be used in the table slots of classes |
881 | which do not need to do anything special for a particular method. |
50b80e25 |
882 | |
883 | =head2 Extension Layers |
884 | |
1d11c889 |
885 | Layers can made available by extension modules. When an unknown layer |
886 | is encountered the PerlIO code will perform the equivalent of : |
b76cc8ba |
887 | |
888 | use PerlIO 'layer'; |
889 | |
1d11c889 |
890 | Where I<layer> is the unknown layer. F<PerlIO.pm> will then attempt to: |
b76cc8ba |
891 | |
892 | require PerlIO::layer; |
893 | |
1d11c889 |
894 | If after that process the layer is still not defined then the C<open> |
895 | will fail. |
b76cc8ba |
896 | |
897 | The following extension layers are bundled with perl: |
50b80e25 |
898 | |
899 | =over 4 |
900 | |
b76cc8ba |
901 | =item ":encoding" |
50b80e25 |
902 | |
903 | use Encoding; |
904 | |
1d11c889 |
905 | makes this layer available, although F<PerlIO.pm> "knows" where to |
906 | find it. It is an example of a layer which takes an argument as it is |
907 | called thus: |
50b80e25 |
908 | |
b31b80f9 |
909 | open( $fh, "<:encoding(iso-8859-7)", $pathname ); |
50b80e25 |
910 | |
385e1f9f |
911 | =item ":scalar" |
b76cc8ba |
912 | |
b31b80f9 |
913 | Provides support for reading data from and writing data to a scalar. |
b76cc8ba |
914 | |
385e1f9f |
915 | open( $fh, "+<:scalar", \$scalar ); |
50b80e25 |
916 | |
1d11c889 |
917 | When a handle is so opened, then reads get bytes from the string value |
918 | of I<$scalar>, and writes change the value. In both cases the position |
919 | in I<$scalar> starts as zero but can be altered via C<seek>, and |
920 | determined via C<tell>. |
b76cc8ba |
921 | |
385e1f9f |
922 | Please note that this layer is implied when calling open() thus: |
923 | |
924 | open( $fh, "+<", \$scalar ); |
925 | |
926 | =item ":via" |
b76cc8ba |
927 | |
4f7853f4 |
928 | Provided to allow layers to be implemented as Perl code. For instance: |
929 | |
e934609f |
930 | use PerlIO::via::StripHTML; |
385e1f9f |
931 | open( my $fh, "<:via(StripHTML)", "index.html" ); |
4f7853f4 |
932 | |
e934609f |
933 | See L<PerlIO::via> for details. |
b76cc8ba |
934 | |
935 | =back |
50b80e25 |
936 | |
d4165bde |
937 | =head1 TODO |
938 | |
939 | Things that need to be done to improve this document. |
940 | |
941 | =over |
942 | |
943 | =item * |
944 | |
945 | Explain how to make a valid fh without going through open()(i.e. apply |
946 | a layer). For example if the file is not opened through perl, but we |
947 | want to get back a fh, like it was opened by Perl. |
948 | |
949 | How PerlIO_apply_layera fits in, where its docs, was it made public? |
950 | |
951 | Currently the example could be something like this: |
952 | |
953 | PerlIO *foo_to_PerlIO(pTHX_ char *mode, ...) |
954 | { |
955 | char *mode; /* "w", "r", etc */ |
956 | const char *layers = ":APR"; /* the layer name */ |
957 | PerlIO *f = PerlIO_allocate(aTHX); |
958 | if (!f) { |
959 | return NULL; |
960 | } |
961 | |
962 | PerlIO_apply_layers(aTHX_ f, mode, layers); |
963 | |
964 | if (f) { |
965 | PerlIOAPR *st = PerlIOSelf(f, PerlIOAPR); |
966 | /* fill in the st struct, as in _open() */ |
967 | st->file = file; |
968 | PerlIOBase(f)->flags |= PERLIO_F_OPEN; |
969 | |
970 | return f; |
971 | } |
972 | return NULL; |
973 | } |
974 | |
975 | =item * |
976 | |
977 | fix/add the documentation in places marked as XXX. |
978 | |
979 | =item * |
980 | |
981 | The handling of errors by the layer is not specified. e.g. when $! |
982 | should be set explicitly, when the error handling should be just |
983 | delegated to the top layer. |
984 | |
985 | Probably give some hints on using SETERRNO() or pointers to where they |
986 | can be found. |
987 | |
988 | =item * |
989 | |
990 | I think it would help to give some concrete examples to make it easier |
991 | to understand the API. Of course I agree that the API has to be |
992 | concise, but since there is no second document that is more of a |
993 | guide, I think that it'd make it easier to start with the doc which is |
994 | an API, but has examples in it in places where things are unclear, to |
995 | a person who is not a PerlIO guru (yet). |
996 | |
997 | =back |
998 | |
50b80e25 |
999 | =cut |