Commit | Line | Data |
50b80e25 |
1 | |
2 | =head1 NAME |
3 | |
4 | perliol - C API for Perl's implementation of IO in Layers. |
5 | |
6 | =head1 SYNOPSIS |
7 | |
8 | /* Defining a layer ... */ |
9 | #include <perliol.h> |
10 | |
11 | |
12 | =head1 DESCRIPTION |
13 | |
14 | This document describes the behavior and implementation of the PerlIO abstraction |
15 | described in L<perlapio> when C<USE_PERLIO> is defined (and C<USE_SFIO> is not). |
16 | |
17 | =head2 History and Background |
18 | |
19 | The PerlIO abstraction was introduced in perl5.003_02 but languished as just |
20 | an abstraction until perl5.7.0. However during that time a number of perl extenstions |
21 | switch to using it, so the API is mostly fixed to maintain (source) compatibility. |
22 | |
23 | The aim of the implementation is to provide the PerlIO API in a flexible and |
24 | platform neutral manner. It is also a trial of an "Object Oriented C, with vtables" |
25 | approach which may be applied to perl6. |
26 | |
27 | =head2 Layers vs Disciplines |
28 | |
29 | Initial discussion of the ability to modify IO streams behaviour used the term |
30 | "discipline" for the entities which were added. This came (I believe) from the use |
31 | of the term in "sfio", which in turn borowed it from "line disciplines" on Unix |
32 | terminals. However, this document (and the C code) uses the term "layer". |
33 | This is I hope a natural term given the implementation, and should avoid conotations |
34 | that are inherent in earlier uses of "discipline" for things which are rather different. |
35 | |
36 | =head2 Data Structures |
37 | |
38 | The basic data structure is a PerlIOl: |
39 | |
40 | typedef struct _PerlIO PerlIOl; |
41 | typedef struct _PerlIO_funcs PerlIO_funcs; |
42 | typedef PerlIOl *PerlIO; |
43 | |
44 | struct _PerlIO |
45 | { |
46 | PerlIOl * next; /* Lower layer */ |
47 | PerlIO_funcs * tab; /* Functions for this layer */ |
48 | IV flags; /* Various flags for state */ |
49 | }; |
50 | |
51 | A PerlIOl * is a pointer to to the struct, and the I<application> level PerlIO * |
52 | is a pointer to a PerlIOl * - i.e. a pointer to a pointer to the struct. |
53 | This allows the application level PerlIO * to remain constant while the actual |
54 | PerlIOl * underneath changes. (Compare perl's SV * which remains constant |
55 | while its sv_any field changes as the scalar's type changes.) |
56 | An IO stream is then in general represented as a pointer to this linked-list |
57 | of "layers". |
58 | |
59 | It should be noted that because of the double indirection in a PerlIO *, |
60 | a &(perlio->next) "is" a PerlIO *, and so to some degree at least |
61 | one layer can use the "standard" API on the next layer down. |
62 | |
63 | A "layer" is composed of two parts: |
64 | |
65 | =over 4 |
66 | |
67 | =item 1. The functions and attributes of the "layer class". |
68 | |
69 | =item 2. The per-instance data for a particular handle. |
70 | |
71 | =back |
72 | |
73 | =head2 Functions and Attributes |
74 | |
75 | The functions and attributes are accessed via the "tab" (for table) member of |
76 | PerlIOl. The functions (methods of the layer "class") are fixed, and are defined by the |
77 | PerlIO_funcs type. They are broadly the same as the public PerlIO_xxxxx functions: |
78 | |
79 | struct _PerlIO_funcs |
80 | { |
81 | char * name; |
82 | Size_t size; |
83 | IV kind; |
84 | IV (*Fileno)(PerlIO *f); |
85 | PerlIO * (*Fdopen)(PerlIO_funcs *tab, int fd, const char *mode); |
86 | PerlIO * (*Open)(PerlIO_funcs *tab, const char *path, const char *mode); |
87 | int (*Reopen)(const char *path, const char *mode, PerlIO *f); |
88 | IV (*Pushed)(PerlIO *f,const char *mode,const char *arg,STRLEN len); |
89 | IV (*Popped)(PerlIO *f); |
90 | /* Unix-like functions - cf sfio line disciplines */ |
91 | SSize_t (*Read)(PerlIO *f, void *vbuf, Size_t count); |
92 | SSize_t (*Unread)(PerlIO *f, const void *vbuf, Size_t count); |
93 | SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); |
94 | IV (*Seek)(PerlIO *f, Off_t offset, int whence); |
95 | Off_t (*Tell)(PerlIO *f); |
96 | IV (*Close)(PerlIO *f); |
97 | /* Stdio-like buffered IO functions */ |
98 | IV (*Flush)(PerlIO *f); |
99 | IV (*Fill)(PerlIO *f); |
100 | IV (*Eof)(PerlIO *f); |
101 | IV (*Error)(PerlIO *f); |
102 | void (*Clearerr)(PerlIO *f); |
103 | void (*Setlinebuf)(PerlIO *f); |
104 | /* Perl's snooping functions */ |
105 | STDCHAR * (*Get_base)(PerlIO *f); |
106 | Size_t (*Get_bufsiz)(PerlIO *f); |
107 | STDCHAR * (*Get_ptr)(PerlIO *f); |
108 | SSize_t (*Get_cnt)(PerlIO *f); |
109 | void (*Set_ptrcnt)(PerlIO *f,STDCHAR *ptr,SSize_t cnt); |
110 | }; |
111 | |
112 | The first few members of the struct give a "name" for the layer, the size to C<malloc> |
113 | for the per-instance data, and some flags which are attributes of the class as whole |
114 | (such as whether it is a buffering layer), then follow the functions which fall into |
115 | four basic groups: |
116 | |
117 | =over 4 |
118 | |
119 | =item 1. Opening and setup functions |
120 | |
121 | =item 2. Basic IO operations |
122 | |
123 | =item 3. Stdio class buffering options. |
124 | |
125 | =item 4. Functions to support Perl's traditional "fast" access to the buffer. |
126 | |
127 | =back |
128 | |
129 | A layer does not have to implement all the functions, but the whole table has |
130 | to be present. Unimplemented slots can be NULL (which will will result in an error |
131 | when called) or can be filled in with stubs to "inherit" behaviour from |
132 | a "base class". This "inheritance" is fixed for all instances of the layer, |
133 | but as the layer chooses which stubs to populate the table, limited |
134 | "multiple inheritance" is possible. |
135 | |
136 | =head2 Per-instance Data |
137 | |
138 | The per-instance data are held in memory beyond the basic PerlIOl struct, |
139 | by making a PerlIOl the first member of the layer's struct thus: |
140 | |
141 | typedef struct |
142 | { |
143 | struct _PerlIO base; /* Base "class" info */ |
144 | STDCHAR * buf; /* Start of buffer */ |
145 | STDCHAR * end; /* End of valid part of buffer */ |
146 | STDCHAR * ptr; /* Current position in buffer */ |
147 | Off_t posn; /* Offset of buf into the file */ |
148 | Size_t bufsiz; /* Real size of buffer */ |
149 | IV oneword; /* Emergency buffer */ |
150 | } PerlIOBuf; |
151 | |
152 | In this way (as for perl's scalars) a pointer to a PerlIOBuf can be treated |
153 | as a pointer to a PerlIOl. |
154 | |
155 | =head2 Layers in action. |
156 | |
157 | table perlio unix |
158 | | | |
159 | +-----------+ +----------+ +--------+ |
160 | PerlIO ->| |--->| next |--->| NULL | |
161 | +-----------+ +----------+ +--------+ |
162 | | | | buffer | | fd | |
163 | +-----------+ | | +--------+ |
164 | | | +----------+ |
165 | |
166 | |
167 | The above attempts to show how the layer scheme works in a simple case. |
168 | The applications PerlIO * points to an entry in the table(s) representing open |
169 | (allocated) handles. For example the first three slots in the table correspond |
170 | to C<stdin>,C<stdout> and C<stderr>. The table in turn points to the current |
171 | "top" layer for the handle - in this case an instance of the generic buffering |
172 | layer "perlio". That layer in turn points to the next layer down - in this |
173 | case the lowlevel "unix" layer. |
174 | |
175 | The above is roughly equivalent to a "stdio" buffered stream, but with much more |
176 | flexibility: |
177 | |
178 | =over 4 |
179 | |
180 | =item * |
181 | |
182 | If Unix level read/write/lseek is not appropriate for (say) sockets then |
183 | the "unix" layer can be replaced (at open time or even dynamically) with a |
184 | "socket" layer. |
185 | |
186 | =item * |
187 | |
188 | Different handles can have different buffering schemes. The "top" layer |
189 | could be the "mmap" layer if reading disk files was quicker using C<mmap> |
190 | than C<read>. An "unbuffered" stream can be implemented simply by |
191 | not having a buffer layer. |
192 | |
193 | =item * |
194 | |
195 | Extra layers can be inserted to process the data as it flows through. |
196 | This was the driving need for including the scheme in perkl5.70+ - we needed a mechanism |
197 | to allow data to be translated bewteen perl's internal encoding (conceptually |
198 | at least Unicode as UTF-8), and the "native" format used by the system. |
199 | This is provided by the ":encoding(xxxx)" layer which typically sits above |
200 | the buffering layer. |
201 | |
202 | =item * |
203 | |
204 | A layer can be added that does "\n" to CRLF translation. This layer can be used |
205 | on any platform, not just those that normally do such things. |
206 | |
207 | =back |
208 | |
209 | =head2 Per-instance flag bits |
210 | |
211 | The generic flag bits are a hybrid of O_XXXXX style flags deduced from |
212 | the mode string passed to PerlIO_open() and state bits for typical buffer |
213 | layers. |
214 | |
215 | =over4 |
216 | |
217 | =item PERLIO_F_EOF |
218 | |
219 | End of file. |
220 | |
221 | =item PERLIO_F_CANWRITE |
222 | |
223 | Writes are permited i.e. opened as "w" or "r+" or "a". etc. |
224 | |
225 | =item PERLIO_F_CANREAD |
226 | |
227 | Reads are permited i.e. opened "r" or "w+" (or even "a+" - ick). |
228 | |
229 | =item PERLIO_F_ERROR |
230 | |
231 | An error has occured (for PerlIO_error()) |
232 | |
233 | =item PERLIO_F_TRUNCATE |
234 | |
235 | Truncate file suggested by open mode. |
236 | |
237 | =item PERLIO_F_APPEND |
238 | |
239 | All writes should be appends. |
240 | |
241 | =item PERLIO_F_CRLF |
242 | |
243 | Layer is performing Win32-like "\n" => CR,LF for output and CR,LF => "\n" for |
244 | input. Normally the provided "crlf" layer is only layer than need bother about |
245 | this. PerlIO_binmode() will mess with this flag rather than add/remove layers |
246 | if the PERLIO_K_CANCRLF bit is set for the layers class. |
247 | |
248 | =item PERLIO_F_UTF8 |
249 | |
250 | Data for this written to this layer should be UTF-8 encoded, data provided |
251 | by this layer should be considered UTF-8 encoded. Can be set on any layer |
252 | by ":utf8" dummy layer. Also set on ":encoding" layer. |
253 | |
254 | =item PERLIO_F_UNBUF |
255 | |
256 | Layer is unbuffered - i.e. write to next layer down should occur for |
257 | each write to this layer. |
258 | |
259 | =item PERLIO_F_WRBUF |
260 | |
261 | The buffer for this layer currently holds data written to it but not sent |
262 | to next layer. |
263 | |
264 | =item PERLIO_F_RDBUF |
265 | |
266 | The buffer for this layer currently holds unconsumed data read from |
267 | layer below. |
268 | |
269 | =item PERLIO_F_LINEBUF |
270 | |
271 | Layer is line buffered. Write data should be passed to next layer down whenever a |
272 | "\n" is seen. Any data beyond the "\n" should then be processed. |
273 | |
274 | =item PERLIO_F_TEMP |
275 | |
276 | File has been unlink()ed, or should be deleted on close(). |
277 | |
278 | =item PERLIO_F_OPEN |
279 | |
280 | Handle is open. |
281 | |
282 | =item PERLIO_F_FASTGETS |
283 | |
284 | This instance of this layer supports the "fast gets" interface. |
285 | Normally set based on PERLIO_K_FASTGETS for the class and by the |
286 | existance of the function(s) in the table. However a class that |
287 | normally provides that interface may need to avoid it on a |
288 | particular instance. The "pending" layer needs to do this when |
289 | it is pushed above an layer which does not support the interface. |
290 | (Perls sv_gets() does not expect the steams fast gets behaviour |
291 | to change during one "get".) |
292 | |
293 | =back |
294 | |
295 | =head2 Methods in Detail |
296 | |
297 | =over 4 |
298 | |
299 | =item IV (*Fileno)(PerlIO *f); |
300 | |
301 | Returns the Unix/Posix numeric file decriptor for the handle. |
302 | Normally PerlIOBase_fileno() (which just asks next layer down) will suffice for this. |
303 | |
304 | =item PerlIO * (*Fdopen)(PerlIO_funcs *tab, int fd, const char *mode); |
305 | |
306 | Should (perhaps indirectly) call PerlIO_allocate() to allocate a slot |
307 | in the table and associate it with the given numeric file descriptor, |
308 | which will be open in an manner compatible with the supplied mode string. |
309 | |
310 | =item PerlIO * (*Open)(PerlIO_funcs *tab, const char *path, const char *mode); |
311 | |
312 | Should attempt to open the given path and if that succeeds then (perhaps indirectly) |
313 | call PerlIO_allocate() to allocate a slot in the table and associate it with the |
314 | layers information for the opened file. |
315 | |
316 | =item int (*Reopen)(const char *path, const char *mode, PerlIO *f); |
317 | |
318 | Re-open the supplied PerlIO * to connect it to C<path> in C<mode>. Returns as success flag. |
319 | Perl does not use this and L<perlapio> marks it as subject to change. |
320 | |
321 | =item IV (*Pushed)(PerlIO *f,const char *mode,const char *arg,STRLEN len); |
322 | |
323 | Called when the layer is pushed onto the stack. The C<mode> argument may be NULL if this |
324 | occurs post-open. The C<arg> and C<len> will be present if an argument string was |
325 | passed. In most cases this should call PerlIOBase_pushed() to conver C<mode> into |
326 | the appropriate PERLIO_F_XXXXX flags in addition to any actions the layer itself takes. |
327 | |
328 | =item IV (*Popped)(PerlIO *f); |
329 | |
330 | Called when the layer is popped from the stack. A layer will normally be popped after |
331 | Close() is called. But a layer can be popped without being closed if the program |
332 | is dynamically managing layers on the stream. In such cases Popped() should free |
333 | any resources (buffers, translation tables, ...) not held directly in the layer's |
334 | struct. |
335 | |
336 | =item SSize_t (*Read)(PerlIO *f, void *vbuf, Size_t count); |
337 | |
338 | Basic read operation. Returns actual bytes read, or -1 on an error. |
339 | Typically will call Fill and manipulate pointers (possibly via the API). |
340 | PerlIOBuf_read() may be suitable for derived classes which provide "fast gets" methods. |
341 | |
342 | =item SSize_t (*Unread)(PerlIO *f, const void *vbuf, Size_t count); |
343 | |
344 | A superset of stdio's ungetc(). Should arrange for future reads to see the bytes in C<vbuf>. |
345 | If there is no obviously better implementation then PerlIOBase_unread() provides |
346 | the function by pushing a "fake" "pending" layer above the calling layer. |
347 | |
348 | =item SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); |
349 | |
350 | Basic write operation. Returns bytes written or -1 on an error. |
351 | |
352 | =item IV (*Seek)(PerlIO *f, Off_t offset, int whence); |
353 | |
354 | Position the file pointer. Should normally call its own Flush method and |
355 | then the Seek method of next layer down. |
356 | |
357 | =item Off_t (*Tell)(PerlIO *f); |
358 | |
359 | Return the file pointer. May be based on layers cached concept of position to |
360 | avoid overhead. |
361 | |
362 | =item IV (*Close)(PerlIO *f); |
363 | |
364 | Close the stream. Should normally call PerlIOBase_close() to flush itself |
365 | and Close layers below and then deallocate any data structures (buffers, translation |
366 | tables, ...) not held directly in the data structure. |
367 | |
368 | =item IV (*Flush)(PerlIO *f); |
369 | |
370 | Should make streams state consistent with layers below. That is any |
371 | buffered write data should be written, and file position of lower layer |
372 | adjusted for data read fron below but not actually consumed. |
373 | |
374 | =item IV (*Fill)(PerlIO *f); |
375 | |
376 | The buffer for this layer should be filled (for read) from layer below. |
377 | |
378 | =item IV (*Eof)(PerlIO *f); |
379 | |
380 | Return end-of-file indicator. PerlIOBase_eof() is normally sufficient. |
381 | |
382 | =item IV (*Error)(PerlIO *f); |
383 | |
384 | Return error indicator. PerlIOBase_error() is normally sufficient. |
385 | |
386 | =item void (*Clearerr)(PerlIO *f); |
387 | |
388 | Clear end-of-file and error indicators. Should call PerlIOBase_clearerr() |
389 | to set the PERLIO_F_XXXXX flags, which may suffice. |
390 | |
391 | =item void (*Setlinebuf)(PerlIO *f); |
392 | |
393 | Mark the stream as line buffered. |
394 | |
395 | =item STDCHAR * (*Get_base)(PerlIO *f); |
396 | |
397 | Allocate (if not already done so) the read buffer for this layer and |
398 | return pointer to it. |
399 | |
400 | =item Size_t (*Get_bufsiz)(PerlIO *f); |
401 | |
402 | Return the number of bytes that last Fill() put in the buffer. |
403 | |
404 | =item STDCHAR * (*Get_ptr)(PerlIO *f); |
405 | |
406 | Return the current read pointer relative to this layers buffer. |
407 | |
408 | =item SSize_t (*Get_cnt)(PerlIO *f); |
409 | |
410 | Return the number of bytes left to be read in the current buffer. |
411 | |
412 | =item void (*Set_ptrcnt)(PerlIO *f,STDCHAR *ptr,SSize_t cnt); |
413 | |
414 | Adjust the read pointer and count of bytes to match C<ptr> and/or C<cnt>. |
415 | The application (or layer above) must ensure they are consistent. |
416 | (Checking is allowed by the paranoid.) |
417 | |
418 | =back |
419 | |
420 | |
421 | =head2 Core Layers |
422 | |
423 | The file C<perlio.c> provides the following layers: |
424 | |
425 | =over 4 |
426 | |
427 | =item "unix" |
428 | |
429 | A basic non-buffered layer which calls Unix/POSIX read(), write(), lseek(), close(). |
430 | No buffering. Even on platforms that distinguish between O_TEXT and O_BINARY |
431 | this layer is always O_BINARY. |
432 | |
433 | =item "perlio" |
434 | |
435 | A very complete generic buffering layer which provides the whole of PerlIO API. |
436 | It is also intended to be used as a "base class" for other layers. (For example |
437 | its Read() method is implemented in terms of the Get_cnt()/Get_ptr()/Set_ptrcnt() |
438 | methods). |
439 | |
440 | "perlio" over "unix" provides a complete replacement for stdio as seen via PerlIO API. |
441 | This is the default for USE_PERLIO when system's stdio does not permit perl's |
442 | "fast gets" access, and which do not distinguish between O_TEXT and O_BINARY. |
443 | |
444 | =item "stdio" |
445 | |
446 | A layer which provides the PerlIO API via the layer scheme, but implements it by calling |
447 | system's stdio. This is (currently) the default if system's stdio provides sufficient |
448 | access to allow perl's "fast gets" access and which do not distinguish between O_TEXT and |
449 | O_BINARY. |
450 | |
451 | =item "crlf" |
452 | |
453 | A layer derived using "perlio" as a base class. It provides Win32-like "\n" to CR,LF |
454 | translation. Can either be applied above "perlio" or serve as the buffer layer itself. |
455 | "crlf" over "unix" is the default if system distinguishes between O_TEXT and O_BINARY |
456 | opens. (At some point "unix" will be replaced by a "native" Win32 IO layer on that |
457 | platform, as Win32's read/write layer has various drawbacks.) |
458 | The "crlf" layer is a reasonable model for a layer which transforms data in some way. |
459 | |
460 | =item "mmap" |
461 | |
462 | If Configure detects C<mmap()> functions this layer is provided (with "perlio" as a |
463 | "base") which does "read" operations by mmap()ing the file. Performance improvement |
464 | is marginal on modern systems, so it is mainly there as a proof of concept. |
465 | It is likely to be unbundled from the core at some point. |
466 | The "mmap" layer is a reasonable model for a minimalist "derived" layer. |
467 | |
468 | =item "pending" |
469 | |
470 | An "internal" derivative of "perlio" which can be used to provide Unread() function |
471 | for layers which have no buffer or cannot be bothered. |
472 | (Basically this layer's Fill() pops itself off the stack and so resumes reading |
473 | from layer below.) |
474 | |
475 | =item "raw" |
476 | |
477 | A dummy layer which never exists on the layer stack. Instead when "pushed" it |
478 | actually pops the stack!, removing itself, and any other layers until it reaches |
479 | a layer with the class PERLIO_K_RAW bit set. |
480 | |
481 | =item "utf8" |
482 | |
483 | Another dummy layer. When pushed it pops itself and sets the PERLIO_F_UTF8 flag |
484 | on the layer which was (and now is once more) the top of the stack. |
485 | |
486 | =back |
487 | |
488 | In addition C<perlio.c> also provides a number of PerlIOBase_xxxx() functions |
489 | which are intended to be used in the table slots of classes which do not need |
490 | to do anything special for a particular method. |
491 | |
492 | =head2 Extension Layers |
493 | |
494 | Layers can made available by extension modules. |
495 | |
496 | =over 4 |
497 | |
498 | =item "encoding" |
499 | |
500 | use Encoding; |
501 | |
502 | makes this layer available. It is an example of a layer which takes an argument. |
503 | as it is called as: |
504 | |
505 | open($fh,"<:encoding(iso-8859-7)",$pathname) |
506 | |
507 | =back |
508 | |
509 | |
510 | =cut |
511 | |
512 | |
513 | |