4 perliol - C API for Perl's implementation of IO in Layers.
8 /* Defining a layer ... */
14 This document describes the behavior and implementation of the PerlIO
15 abstraction described in L<perlapio> when C<USE_PERLIO> is defined (and
18 =head2 History and Background
20 The PerlIO abstraction was introduced in perl5.003_02 but languished as
21 just an abstraction until perl5.7.0. However during that time a number
22 of perl extensions switched to using it, so the API is mostly fixed to
23 maintain (source) compatibility.
25 The aim of the implementation is to provide the PerlIO API in a flexible
26 and platform neutral manner. It is also a trial of an "Object Oriented
27 C, with vtables" approach which may be applied to perl6.
29 =head2 Layers vs Disciplines
31 Initial discussion of the ability to modify IO streams behaviour used
32 the term "discipline" for the entities which were added. This came (I
33 believe) from the use of the term in "sfio", which in turn borrowed it
34 from "line disciplines" on Unix terminals. However, this document (and
35 the C code) uses the term "layer".
37 This is, I hope, a natural term given the implementation, and should
38 avoid connotations that are inherent in earlier uses of "discipline"
39 for things which are rather different.
41 =head2 Data Structures
43 The basic data structure is a PerlIOl:
45 typedef struct _PerlIO PerlIOl;
46 typedef struct _PerlIO_funcs PerlIO_funcs;
47 typedef PerlIOl *PerlIO;
51 PerlIOl * next; /* Lower layer */
52 PerlIO_funcs * tab; /* Functions for this layer */
53 IV flags; /* Various flags for state */
56 A C<PerlIOl *> is a pointer to the struct, and the I<application>
57 level C<PerlIO *> is a pointer to a C<PerlIOl *> - i.e. a pointer
58 to a pointer to the struct. This allows the application level C<PerlIO *>
59 to remain constant while the actual C<PerlIOl *> underneath
60 changes. (Compare perl's C<SV *> which remains constant while its
61 C<sv_any> field changes as the scalar's type changes.) An IO stream is
62 then in general represented as a pointer to this linked-list of
65 It should be noted that because of the double indirection in a C<PerlIO *>,
66 a C<< &(perlio-E<gt>next) >> "is" a C<PerlIO *>, and so to some degree
67 at least one layer can use the "standard" API on the next layer down.
69 A "layer" is composed of two parts:
75 The functions and attributes of the "layer class".
79 The per-instance data for a particular handle.
83 =head2 Functions and Attributes
85 The functions and attributes are accessed via the "tab" (for table)
86 member of C<PerlIOl>. The functions (methods of the layer "class") are
87 fixed, and are defined by the C<PerlIO_funcs> type. They are broadly the
88 same as the public C<PerlIO_xxxxx> functions:
95 IV (*Pushed)(PerlIO *f,const char *mode,SV *arg);
96 IV (*Popped)(PerlIO *f);
97 PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
100 int fd, int imode, int perm,
102 int narg, SV **args);
103 SV * (*Getarg)(PerlIO *f);
104 IV (*Fileno)(PerlIO *f);
105 /* Unix-like functions - cf sfio line disciplines */
106 SSize_t (*Read)(PerlIO *f, void *vbuf, Size_t count);
107 SSize_t (*Unread)(PerlIO *f, const void *vbuf, Size_t count);
108 SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count);
109 IV (*Seek)(PerlIO *f, Off_t offset, int whence);
110 Off_t (*Tell)(PerlIO *f);
111 IV (*Close)(PerlIO *f);
112 /* Stdio-like buffered IO functions */
113 IV (*Flush)(PerlIO *f);
114 IV (*Fill)(PerlIO *f);
115 IV (*Eof)(PerlIO *f);
116 IV (*Error)(PerlIO *f);
117 void (*Clearerr)(PerlIO *f);
118 void (*Setlinebuf)(PerlIO *f);
119 /* Perl's snooping functions */
120 STDCHAR * (*Get_base)(PerlIO *f);
121 Size_t (*Get_bufsiz)(PerlIO *f);
122 STDCHAR * (*Get_ptr)(PerlIO *f);
123 SSize_t (*Get_cnt)(PerlIO *f);
124 void (*Set_ptrcnt)(PerlIO *f,STDCHAR *ptr,SSize_t cnt);
129 The first few members of the struct give a "name" for the layer, the
130 size to C<malloc> for the per-instance data, and some flags which are
131 attributes of the class as whole (such as whether it is a buffering
132 layer), then follow the functions which fall into four basic groups:
138 Opening and setup functions
146 Stdio class buffering options.
150 Functions to support Perl's traditional "fast" access to the buffer.
154 A layer does not have to implement all the functions, but the whole
155 table has to be present. Unimplemented slots can be NULL (which will
156 result in an error when called) or can be filled in with stubs to
157 "inherit" behaviour from a "base class". This "inheritance" is fixed
158 for all instances of the layer, but as the layer chooses which stubs
159 to populate the table, limited "multiple inheritance" is possible.
161 =head2 Per-instance Data
163 The per-instance data are held in memory beyond the basic PerlIOl
164 struct, by making a PerlIOl the first member of the layer's struct
169 struct _PerlIO base; /* Base "class" info */
170 STDCHAR * buf; /* Start of buffer */
171 STDCHAR * end; /* End of valid part of buffer */
172 STDCHAR * ptr; /* Current position in buffer */
173 Off_t posn; /* Offset of buf into the file */
174 Size_t bufsiz; /* Real size of buffer */
175 IV oneword; /* Emergency buffer */
178 In this way (as for perl's scalars) a pointer to a PerlIOBuf can be
179 treated as a pointer to a PerlIOl.
181 =head2 Layers in action.
185 +-----------+ +----------+ +--------+
186 PerlIO ->| |--->| next |--->| NULL |
187 +-----------+ +----------+ +--------+
188 | | | buffer | | fd |
189 +-----------+ | | +--------+
193 The above attempts to show how the layer scheme works in a simple case.
194 The application's C<PerlIO *> points to an entry in the table(s)
195 representing open (allocated) handles. For example the first three slots
196 in the table correspond to C<stdin>,C<stdout> and C<stderr>. The table
197 in turn points to the current "top" layer for the handle - in this case
198 an instance of the generic buffering layer "perlio". That layer in turn
199 points to the next layer down - in this case the lowlevel "unix" layer.
201 The above is roughly equivalent to a "stdio" buffered stream, but with
202 much more flexibility:
208 If Unix level C<read>/C<write>/C<lseek> is not appropriate for (say)
209 sockets then the "unix" layer can be replaced (at open time or even
210 dynamically) with a "socket" layer.
214 Different handles can have different buffering schemes. The "top"
215 layer could be the "mmap" layer if reading disk files was quicker
216 using C<mmap> than C<read>. An "unbuffered" stream can be implemented
217 simply by not having a buffer layer.
221 Extra layers can be inserted to process the data as it flows through.
222 This was the driving need for including the scheme in perl 5.7.0+ - we
223 needed a mechanism to allow data to be translated between perl's
224 internal encoding (conceptually at least Unicode as UTF-8), and the
225 "native" format used by the system. This is provided by the
226 ":encoding(xxxx)" layer which typically sits above the buffering layer.
230 A layer can be added that does "\n" to CRLF translation. This layer
231 can be used on any platform, not just those that normally do such
236 =head2 Per-instance flag bits
238 The generic flag bits are a hybrid of C<O_XXXXX> style flags deduced
239 from the mode string passed to C<PerlIO_open()>, and state bits for
240 typical buffer layers.
248 =item PERLIO_F_CANWRITE
250 Writes are permitted, i.e. opened as "w" or "r+" or "a", etc.
252 =item PERLIO_F_CANREAD
254 Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick).
258 An error has occurred (for C<PerlIO_error()>)
260 =item PERLIO_F_TRUNCATE
262 Truncate file suggested by open mode.
264 =item PERLIO_F_APPEND
266 All writes should be appends.
270 Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF
271 mapped to "\n" for input. Normally the provided "crlf" layer is the only
272 layer that need bother about this. C<PerlIO_binmode()> will mess with this
273 flag rather than add/remove layers if the C<PERLIO_K_CANCRLF> bit is set
274 for the layers class.
278 Data written to this layer should be UTF-8 encoded; data provided
279 by this layer should be considered UTF-8 encoded. Can be set on any layer
280 by ":utf8" dummy layer. Also set on ":encoding" layer.
284 Layer is unbuffered - i.e. write to next layer down should occur for
285 each write to this layer.
289 The buffer for this layer currently holds data written to it but not sent
294 The buffer for this layer currently holds unconsumed data read from
297 =item PERLIO_F_LINEBUF
299 Layer is line buffered. Write data should be passed to next layer down
300 whenever a "\n" is seen. Any data beyond the "\n" should then be
305 File has been C<unlink()>ed, or should be deleted on C<close()>.
311 =item PERLIO_F_FASTGETS
313 This instance of this layer supports the "fast C<gets>" interface.
314 Normally set based on C<PERLIO_K_FASTGETS> for the class and by the
315 existence of the function(s) in the table. However a class that
316 normally provides that interface may need to avoid it on a
317 particular instance. The "pending" layer needs to do this when
318 it is pushed above a layer which does not support the interface.
319 (Perl's C<sv_gets()> does not expect the streams fast C<gets> behaviour
320 to change during one "get".)
324 =head2 Methods in Detail
328 =item IV (*Pushed)(PerlIO *f,const char *mode, SV *arg);
330 The only absolutely mandatory method. Called when the layer is pushed
331 onto the stack. The C<mode> argument may be NULL if this occurs
332 post-open. The C<arg> will be non-C<NULL> if an argument string was
333 passed. In most cases this should call C<PerlIOBase_pushed()> to
334 convert C<mode> into the appropriate C<PERLIO_F_XXXXX> flags in
335 addition to any actions the layer itself takes. If a layer is not
336 expecting an argument it need neither save the one passed to it, nor
337 provide C<Getarg()> (it could perhaps C<Perl_warn> that the argument
340 =item IV (*Popped)(PerlIO *f);
342 Called when the layer is popped from the stack. A layer will normally
343 be popped after C<Close()> is called. But a layer can be popped
344 without being closed if the program is dynamically managing layers on
345 the stream. In such cases C<Popped()> should free any resources
346 (buffers, translation tables, ...) not held directly in the layer's
347 struct. It should also C<Unread()> any unconsumed data that has been
348 read and buffered from the layer below back to that layer, so that it
349 can be re-provided to what ever is now above.
351 =item PerlIO * (*Open)(...);
353 The C<Open()> method has lots of arguments because it combines the
354 functions of perl's C<open>, C<PerlIO_open>, perl's C<sysopen>,
355 C<PerlIO_fdopen> and C<PerlIO_reopen>. The full prototype is as
358 PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
361 int fd, int imode, int perm,
363 int narg, SV **args);
365 Open should (perhaps indirectly) call C<PerlIO_allocate()> to allocate
366 a slot in the table and associate it with the layers information for
367 the opened file, by calling C<PerlIO_push>. The I<layers> AV is an
368 array of all the layers destined for the C<PerlIO *>, and any
369 arguments passed to them, I<n> is the index into that array of the
370 layer being called. The macro C<PerlIOArg> will return a (possibly
371 C<NULL>) SV * for the argument passed to the layer.
373 The I<mode> string is an "C<fopen()>-like" string which would match
374 the regular expression C</^[I#]?[rwa]\+?[bt]?$/>.
376 The C<'I'> prefix is used during creation of C<stdin>..C<stderr> via
377 special C<PerlIO_fdopen> calls; the C<'#'> prefix means that this is
378 C<sysopen> and that I<imode> and I<perm> should be passed to
379 C<PerlLIO_open3>; C<'r'> means B<r>ead, C<'w'> means B<w>rite and
380 C<'a'> means B<a>ppend. The C<'+'> suffix means that both reading and
381 writing/appending are permitted. The C<'b'> suffix means file should
382 be binary, and C<'t'> means it is text. (Binary/Text should be ignored
383 by almost all layers and binary IO done, with PerlIO. The C<:crlf>
384 layer should be pushed to handle the distinction.)
386 If I<old> is not C<NULL> then this is a C<PerlIO_reopen>. Perl itself
387 does not use this (yet?) and semantics are a little vague.
389 If I<fd> not negative then it is the numeric file descriptor I<fd>,
390 which will be open in a manner compatible with the supplied mode
391 string, the call is thus equivalent to C<PerlIO_fdopen>. In this case
392 I<nargs> will be zero.
394 If I<nargs> is greater than zero then it gives the number of arguments
395 passed to C<open>, otherwise it will be 1 if for example
396 C<PerlIO_open> was called. In simple cases SvPV_nolen(*args) is the
399 Having said all that translation-only layers do not need to provide
400 C<Open()> at all, but rather leave the opening to a lower level layer
401 and wait to be "pushed". If a layer does provide C<Open()> it should
402 normally call the C<Open()> method of next layer down (if any) and
403 then push itself on top if that succeeds.
405 =item SV * (*Getarg)(PerlIO *f);
407 Optional. If present should return an SV * representing the string argument
408 passed to the layer when it was pushed. e.g. ":encoding(ascii)" would
409 return an SvPV with value "ascii".
411 =item IV (*Fileno)(PerlIO *f);
413 Returns the Unix/Posix numeric file descriptor for the handle. Normally
414 C<PerlIOBase_fileno()> (which just asks next layer down) will suffice
417 =item SSize_t (*Read)(PerlIO *f, void *vbuf, Size_t count);
419 Basic read operation. Returns actual bytes read, or -1 on an error.
420 Typically will call Fill and manipulate pointers (possibly via the API).
421 C<PerlIOBuf_read()> may be suitable for derived classes which provide
424 =item SSize_t (*Unread)(PerlIO *f, const void *vbuf, Size_t count);
426 A superset of stdio's C<ungetc()>. Should arrange for future reads to
427 see the bytes in C<vbuf>. If there is no obviously better implementation
428 then C<PerlIOBase_unread()> provides the function by pushing a "fake"
429 "pending" layer above the calling layer.
431 =item SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count);
433 Basic write operation. Returns bytes written or -1 on an error.
435 =item IV (*Seek)(PerlIO *f, Off_t offset, int whence);
437 Position the file pointer. Should normally call its own C<Flush>
438 method and then the C<Seek> method of next layer down.
440 =item Off_t (*Tell)(PerlIO *f);
442 Return the file pointer. May be based on layers cached concept of
443 position to avoid overhead.
445 =item IV (*Close)(PerlIO *f);
447 Close the stream. Should normally call C<PerlIOBase_close()> to flush
448 itself and close layers below, and then deallocate any data structures
449 (buffers, translation tables, ...) not held directly in the data
452 =item IV (*Flush)(PerlIO *f);
454 Should make stream's state consistent with layers below. That is, any
455 buffered write data should be written, and file position of lower layers
456 adjusted for data read from below but not actually consumed.
457 (Should perhaps C<Unread()> such data to the lower layer.)
459 =item IV (*Fill)(PerlIO *f);
461 The buffer for this layer should be filled (for read) from layer below.
463 =item IV (*Eof)(PerlIO *f);
465 Return end-of-file indicator. C<PerlIOBase_eof()> is normally sufficient.
467 =item IV (*Error)(PerlIO *f);
469 Return error indicator. C<PerlIOBase_error()> is normally sufficient.
471 =item void (*Clearerr)(PerlIO *f);
473 Clear end-of-file and error indicators. Should call C<PerlIOBase_clearerr()>
474 to set the C<PERLIO_F_XXXXX> flags, which may suffice.
476 =item void (*Setlinebuf)(PerlIO *f);
478 Mark the stream as line buffered. C<PerlIOBase_setlinebuf()> sets the
479 PERLIO_F_LINEBUF flag and is normally sufficient.
481 =item STDCHAR * (*Get_base)(PerlIO *f);
483 Allocate (if not already done so) the read buffer for this layer and
484 return pointer to it.
486 =item Size_t (*Get_bufsiz)(PerlIO *f);
488 Return the number of bytes that last C<Fill()> put in the buffer.
490 =item STDCHAR * (*Get_ptr)(PerlIO *f);
492 Return the current read pointer relative to this layer's buffer.
494 =item SSize_t (*Get_cnt)(PerlIO *f);
496 Return the number of bytes left to be read in the current buffer.
498 =item void (*Set_ptrcnt)(PerlIO *f,STDCHAR *ptr,SSize_t cnt);
500 Adjust the read pointer and count of bytes to match C<ptr> and/or C<cnt>.
501 The application (or layer above) must ensure they are consistent.
502 (Checking is allowed by the paranoid.)
509 The file C<perlio.c> provides the following layers:
515 A basic non-buffered layer which calls Unix/POSIX C<read()>, C<write()>,
516 C<lseek()>, C<close()>. No buffering. Even on platforms that distinguish
517 between O_TEXT and O_BINARY this layer is always O_BINARY.
521 A very complete generic buffering layer which provides the whole of
522 PerlIO API. It is also intended to be used as a "base class" for other
523 layers. (For example its C<Read()> method is implemented in terms of
524 the C<Get_cnt()>/C<Get_ptr()>/C<Set_ptrcnt()> methods).
526 "perlio" over "unix" provides a complete replacement for stdio as seen
527 via PerlIO API. This is the default for USE_PERLIO when system's stdio
528 does not permit perl's "fast gets" access, and which do not
529 distinguish between C<O_TEXT> and C<O_BINARY>.
533 A layer which provides the PerlIO API via the layer scheme, but
534 implements it by calling system's stdio. This is (currently) the default
535 if system's stdio provides sufficient access to allow perl's "fast gets"
536 access and which do not distinguish between C<O_TEXT> and C<O_BINARY>.
540 A layer derived using "perlio" as a base class. It provides Win32-like
541 "\n" to CR,LF translation. Can either be applied above "perlio" or serve
542 as the buffer layer itself. "crlf" over "unix" is the default if system
543 distinguishes between C<O_TEXT> and C<O_BINARY> opens. (At some point
544 "unix" will be replaced by a "native" Win32 IO layer on that platform,
545 as Win32's read/write layer has various drawbacks.) The "crlf" layer is
546 a reasonable model for a layer which transforms data in some way.
550 If Configure detects C<mmap()> functions this layer is provided (with
551 "perlio" as a "base") which does "read" operations by mmap()ing the
552 file. Performance improvement is marginal on modern systems, so it is
553 mainly there as a proof of concept. It is likely to be unbundled from
554 the core at some point. The "mmap" layer is a reasonable model for a
555 minimalist "derived" layer.
559 An "internal" derivative of "perlio" which can be used to provide
560 Unread() function for layers which have no buffer or cannot be
561 bothered. (Basically this layer's C<Fill()> pops itself off the stack
562 and so resumes reading from layer below.)
566 A dummy layer which never exists on the layer stack. Instead when
567 "pushed" it actually pops the stack(!), removing itself, and any other
568 layers until it reaches a layer with the class C<PERLIO_K_RAW> bit set.
572 Another dummy layer. When pushed it pops itself and sets the
573 C<PERLIO_F_UTF8> flag on the layer which was (and now is once more)
574 the top of the stack.
578 In addition F<perlio.c> also provides a number of C<PerlIOBase_xxxx()>
579 functions which are intended to be used in the table slots of classes
580 which do not need to do anything special for a particular method.
582 =head2 Extension Layers
584 Layers can made available by extension modules. When an unknown layer
585 is encountered the PerlIO code will perform the equivalent of :
589 Where I<layer> is the unknown layer. F<PerlIO.pm> will then attempt to:
591 require PerlIO::layer;
593 If after that process the layer is still not defined then the C<open>
596 The following extension layers are bundled with perl:
604 makes this layer available, although F<PerlIO.pm> "knows" where to
605 find it. It is an example of a layer which takes an argument as it is
608 open($fh,"<:encoding(iso-8859-7)",$pathname)
614 open($fh,"...",\$scalar)
616 When a handle is so opened, then reads get bytes from the string value
617 of I<$scalar>, and writes change the value. In both cases the position
618 in I<$scalar> starts as zero but can be altered via C<seek>, and
619 determined via C<tell>.
621 =item ":Object" or ":Perl"
623 May be provided to allow layers to be implemented as perl code -
624 implementation is being investigated.