X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperliol.pod;h=466959b001002191ee19794689dd819bb152a18f;hb=209071589ddd827372bd46e1358d1d13f6b4dbcb;hp=cde9be54b8a747425e6df8adb5eca29382b15fea;hpb=d4165bded29b540a8716daf95e9a96ed73736060;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perliol.pod b/pod/perliol.pod index cde9be5..466959b 100644 --- a/pod/perliol.pod +++ b/pod/perliol.pod @@ -24,6 +24,57 @@ The aim of the implementation is to provide the PerlIO API in a flexible and platform neutral manner. It is also a trial of an "Object Oriented C, with vtables" approach which may be applied to perl6. +=head2 Basic Structure + +PerlIO is a stack of layers. + +The low levels of the stack work with the low-level operating system +calls (file descriptors in C) getting bytes in and out, the higher +layers of the stack buffer, filter, and otherwise manipulate the I/O, +and return characters (or bytes) to Perl. Terms I and I +are used to refer to the relative positioning of the stack layers. + +A layer contains a "vtable", the table of I/O operations (at C level +a table of function pointers), and status flags. The functions in the +vtable implement operations like "open", "read", and "write". + +When I/O, for example "read", is requested, the request goes from Perl +first down the stack using "read" functions of each layer, then at the +bottom the input is requested from the operating system services, then +the result is returned up the stack, finally being interpreted as Perl +data. + +The requests do not necessarily go always all the way down to the +operating system: that's where PerlIO buffering comes into play. + +When you do an open() and specify extra PerlIO layers to be deployed, +the layers you specify are "pushed" on top of the already existing +default stack. One way to see it is that "operating system is +on the left" and "Perl is on the right". + +What exact layers are in this default stack depends on a lot of +things: your operating system, Perl version, Perl compile time +configuration, and Perl runtime configuration. See L, +L, and L for more information. + +binmode() operates similarly to open(): by default the specified +layers are pushed on top of the existing stack. + +However, note that even as the specified layers are "pushed on top" +for open() and binmode(), this doesn't mean that the effects are +limited to the "top": PerlIO layers can be very 'active' and inspect +and affect layers also deeper in the stack. As an example there +is a layer called "raw" which repeatedly "pops" layers until +it reaches the first layer that has declared itself capable of +handling binary data. The "pushed" layers are processed in left-to-right +order. + +sysopen() operates (unsurprisingly) at a lower level in the stack than +open(). For example in UNIX or UNIX-like systems sysopen() operates +directly at the level of file descriptors: in the terms of PerlIO +layers, it uses only the "unix" layer, which is a rather thin wrapper +on top of the UNIX file descriptors. + =head2 Layers vs Disciplines Initial discussion of the ability to modify IO streams behaviour used @@ -87,10 +138,11 @@ same as the public C functions: struct _PerlIO_funcs { + Size_t fsize; char * name; Size_t size; IV kind; - IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg); + IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg, PerlIO_funcs *tab); IV (*Popped)(pTHX_ PerlIO *f); PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab, AV *layers, IV n, @@ -98,6 +150,7 @@ same as the public C functions: int fd, int imode, int perm, PerlIO *old, int narg, SV **args); + IV (*Binmode)(pTHX_ PerlIO *f); SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags) IV (*Fileno)(pTHX_ PerlIO *f); PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags) @@ -123,9 +176,9 @@ same as the public C functions: void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt); }; -The first few members of the struct give a "name" for the layer, the -size to C for the per-instance data, and some flags which are -attributes of the class as whole (such as whether it is a buffering +The first few members of the struct give a function table size for +compatibility check "name" for the layer, the size to C for the per-instance data, +and some flags which are attributes of the class as whole (such as whether it is a buffering layer), then follow the functions which fall into four basic groups: =over 4 @@ -322,7 +375,17 @@ to change during one "get".) =over 4 -=item char * name; +=item fsize + + Size_t fsize; + +Size of the function table. This is compared against the value PerlIO +code "knows" as a compatibility check. Future versions I be able +to tolerate layers compiled against an old version of the headers. + +=item name + + char * name; The name of the layer whose open() method Perl should invoke on open(). For example if the layer is called APR, you will call: @@ -332,35 +395,56 @@ open(). For example if the layer is called APR, you will call: and Perl knows that it has to invoke the PerlIOAPR_open() method implemented by the APR layer. -=item Size_t size; +=item size + + Size_t size; The size of the per-instance data structure, e.g.: sizeof(PerlIOAPR) -=item IV kind; +If this field is zero then C does not malloc anything +and assumes layer's Pushed function will do any required layer stack +manipulation - used to avoid malloc/free overhead for dummy layers. +If the field is non-zero it must be at least the size of C, +C will allocate memory for the layer's data structures +and link new layer onto the stream's stack. (If the layer's Pushed +method returns an error indication the layer is popped again.) + +=item kind - XXX: explain all the available flags here + IV kind; =over 4 =item * PERLIO_K_BUFFERED +The layer is buffered. + +=item * PERLIO_K_RAW + +The layer is acceptable to have in a binmode(FH) stack - i.e. it does not +(or will configure itself not to) transform bytes passing through it. + =item * PERLIO_K_CANCRLF +Layer can translate between "\n" and CRLF line ends. + =item * PERLIO_K_FASTGETS +Layer allows buffer snooping. + =item * PERLIO_K_MULTIARG Used when the layer's open() accepts more arguments than usual. The extra arguments should come not before the C argument. When this flag is used it's up to the layer to validate the args. -=item * PERLIO_K_RAW - =back -=item IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg); +=item Pushed + + IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg); The only absolutely mandatory method. Called when the layer is pushed onto the stack. The C argument may be NULL if this occurs @@ -374,7 +458,9 @@ was un-expected). Returns 0 on success. On failure returns -1 and should set errno. -=item IV (*Popped)(pTHX_ PerlIO *f); +=item Popped + + IV (*Popped)(pTHX_ PerlIO *f); Called when the layer is popped from the stack. A layer will normally be popped after C is called. But a layer can be popped @@ -385,9 +471,14 @@ struct. It should also C any unconsumed data that has been read and buffered from the layer below back to that layer, so that it can be re-provided to what ever is now above. -Returns 0 on success and failure. +Returns 0 on success and failure. If C returns I then +I assumes that either the layer has popped itself, or the +layer is super special and needs to be retained for other reasons. +In most cases it should return I. + +=item Open -=item PerlIO * (*Open)(...); + PerlIO * (*Open)(...); The C method has lots of arguments because it combines the functions of perl's C, C, perl's C, @@ -417,10 +508,10 @@ special C calls; the C<'#'> prefix means that this is C and that I and I should be passed to C; C<'r'> means Bead, C<'w'> means Brite and C<'a'> means Bppend. The C<'+'> suffix means that both reading and -writing/appending are permitted. The C<'b'> suffix means file should -be binary, and C<'t'> means it is text. (Binary/Text should be ignored -by almost all layers and binary IO done, with PerlIO. The C<:crlf> -layer should be pushed to handle the distinction.) +writing/appending are permitted. The C<'b'> suffix means file should +be binary, and C<'t'> means it is text. (Almost all layers should do +the IO in binary mode, and ignore the b/t bits. The C<:crlf> layer +should be pushed to handle the distinction.) If I is not C then this is a C. Perl itself does not use this (yet?) and semantics are a little vague. @@ -441,9 +532,26 @@ and wait to be "pushed". If a layer does provide C it should normally call the C method of next layer down (if any) and then push itself on top if that succeeds. +If C was performed and open has failed, it must +C itself, since if it's not, the layer won't be removed +and may cause bad problems. + Returns C on failure. -=item SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags) +=item Binmode + + IV (*Binmode)(pTHX_ PerlIO *f); + +Optional. Used when C<:raw> layer is pushed (explicitly or as a result +of binmode(FH)). If not present layer will be popped. If present +should configure layer as binary (or pop itself) and return 0. +If it returns -1 for error C will fail with layer +still on the stack. + +=item Getarg + + SV * (*Getarg)(pTHX_ PerlIO *f, + CLONE_PARAMS *param, int flags); Optional. If present should return an SV * representing the string argument passed to the layer when it was @@ -451,25 +559,33 @@ pushed. e.g. ":encoding(ascii)" would return an SvPV with value "ascii". (I and I arguments can be ignored in most cases) -=item IV (*Fileno)(pTHX_ PerlIO *f); +=item Fileno + + IV (*Fileno)(pTHX_ PerlIO *f); Returns the Unix/Posix numeric file descriptor for the handle. Normally C (which just asks next layer down) will suffice for this. -Returns -1 if the layer cannot provide such a file descriptor, or in -the case of the error. +Returns -1 on error, which is considered to include the case where the +layer cannot provide such a file descriptor. + +=item Dup -XXX: two possible results end up in -1, one is an error the other is -not. + PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, + CLONE_PARAMS *param, int flags); -=item PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags) +XXX: Needs more docs. -XXX: not documented +Used as part of the "clone" process when a thread is spawned (in which +case param will be non-NULL) and when a stream is being duplicated via +'&' in the C. Similar to C, returns PerlIO* on success, C on failure. -=item SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); +=item Read + + SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); Basic read operation. @@ -479,7 +595,10 @@ provide "fast gets" methods. Returns actual bytes read, or -1 on an error. -=item SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count); +=item Unread + + SSize_t (*Unread)(pTHX_ PerlIO *f, + const void *vbuf, Size_t count); A superset of stdio's C. Should arrange for future reads to see the bytes in C. If there is no obviously better implementation @@ -488,27 +607,35 @@ then C provides the function by pushing a "fake" Returns the number of unread chars. -=item SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); +=item Write + + SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); Basic write operation. Returns bytes written or -1 on an error. -=item IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); +=item Seek + + IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); Position the file pointer. Should normally call its own C method and then the C method of next layer down. Returns 0 on success, -1 on failure. -=item Off_t (*Tell)(pTHX_ PerlIO *f); +=item Tell + + Off_t (*Tell)(pTHX_ PerlIO *f); Return the file pointer. May be based on layers cached concept of position to avoid overhead. Returns -1 on failure to get the file pointer. -=item IV (*Close)(pTHX_ PerlIO *f); +=item Close + + IV (*Close)(pTHX_ PerlIO *f); Close the stream. Should normally call C to flush itself and close layers below, and then deallocate any data structures @@ -517,7 +644,9 @@ structure. Returns 0 on success, -1 on failure. -=item IV (*Flush)(pTHX_ PerlIO *f); +=item Flush + + IV (*Flush)(pTHX_ PerlIO *f); Should make stream's state consistent with layers below. That is, any buffered write data should be written, and file position of lower layers @@ -526,7 +655,9 @@ adjusted for data read from below but not actually consumed. Returns 0 on success, -1 on failure. -=item IV (*Fill)(pTHX_ PerlIO *f); +=item Fill + + IV (*Fill)(pTHX_ PerlIO *f); The buffer for this layer should be filled (for read) from layer below. When you "subclass" PerlIOBuf layer, you want to use its @@ -535,47 +666,66 @@ PerlIOBuf's buffer. Returns 0 on success, -1 on failure. -=item IV (*Eof)(pTHX_ PerlIO *f); +=item Eof + + IV (*Eof)(pTHX_ PerlIO *f); Return end-of-file indicator. C is normally sufficient. Returns 0 on end-of-file, 1 if not end-of-file, -1 on error. -=item IV (*Error)(pTHX_ PerlIO *f); +=item Error + + IV (*Error)(pTHX_ PerlIO *f); Return error indicator. C is normally sufficient. Returns 1 if there is an error (usually when C is set, 0 otherwise. -=item void (*Clearerr)(pTHX_ PerlIO *f); +=item Clearerr + + void (*Clearerr)(pTHX_ PerlIO *f); Clear end-of-file and error indicators. Should call C to set the C flags, which may suffice. -=item void (*Setlinebuf)(pTHX_ PerlIO *f); +=item Setlinebuf + + void (*Setlinebuf)(pTHX_ PerlIO *f); Mark the stream as line buffered. C sets the PERLIO_F_LINEBUF flag and is normally sufficient. -=item STDCHAR * (*Get_base)(pTHX_ PerlIO *f); +=item Get_base + + STDCHAR * (*Get_base)(pTHX_ PerlIO *f); Allocate (if not already done so) the read buffer for this layer and return pointer to it. Return NULL on failure. -=item Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); +=item Get_bufsiz + + Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); Return the number of bytes that last C put in the buffer. -=item STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); +=item Get_ptr + + STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); Return the current read pointer relative to this layer's buffer. -=item SSize_t (*Get_cnt)(pTHX_ PerlIO *f); +=item Get_cnt + + SSize_t (*Get_cnt)(pTHX_ PerlIO *f); Return the number of bytes left to be read in the current buffer. -=item void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt); +=item Set_ptrcnt + + void (*Set_ptrcnt)(pTHX_ PerlIO *f, + STDCHAR *ptr, SSize_t cnt); Adjust the read pointer and count of bytes to match C and/or C. The application (or layer above) must ensure they are consistent. @@ -583,6 +733,110 @@ The application (or layer above) must ensure they are consistent. =back +=head2 Utilities + +To ask for the next layer down use PerlIONext(PerlIO *f). + +To check that a PerlIO* is valid use PerlIOValid(PerlIO *f). (All +this does is really just to check that the pointer is non-NULL and +that the pointer behind that is non-NULL.) + +PerlIOBase(PerlIO *f) returns the "Base" pointer, or in other words, +the C pointer. + +PerlIOSelf(PerlIO* f, type) return the PerlIOBase cast to a type. + +Perl_PerlIO_or_Base(PerlIO* f, callback, base, failure, args) either +calls the I from the functions of the layer I (just by +the name of the IO function, like "Read") with the I, or if +there is no such callback, calls the I version of the callback +with the same args, or if the f is invalid, set errno to EBADF and +return I. + +Perl_PerlIO_or_fail(PerlIO* f, callback, failure, args) either calls +the I of the functions of the layer I with the I, +or if there is no such callback, set errno to EINVAL. Or if the f is +invalid, set errno to EBADF and return I. + +Perl_PerlIO_or_Base_void(PerlIO* f, callback, base, args) either calls +the I of the functions of the layer I with the I, +or if there is no such callback, calls the I version of the +callback with the same args, or if the f is invalid, set errno to +EBADF. + +Perl_PerlIO_or_fail_void(PerlIO* f, callback, args) either calls the +I of the functions of the layer I with the I, or if +there is no such callback, set errno to EINVAL. Or if the f is +invalid, set errno to EBADF. + +=head2 Implementing PerlIO Layers + +If you find the implementation document unclear or not sufficient, +look at the existing PerlIO layer implementations, which include: + +=over + +=item * C implementations + +The F and F in the Perl core implement the +"unix", "perlio", "stdio", "crlf", "utf8", "byte", "raw", "pending" +layers, and also the "mmap" and "win32" layers if applicable. +(The "win32" is currently unfinished and unused, to see what is used +instead in Win32, see L .) + +PerlIO::encoding, PerlIO::scalar, PerlIO::via in the Perl core. + +PerlIO::gzip and APR::PerlIO (mod_perl 2.0) on CPAN. + +=item * Perl implementations + +PerlIO::via::QuotedPrint in the Perl core and PerlIO::via::* on CPAN. + +=back + +If you are creating a PerlIO layer, you may want to be lazy, in other +words, implement only the methods that interest you. The other methods +you can either replace with the "blank" methods + + PerlIOBase_noop_ok + PerlIOBase_noop_fail + +(which do nothing, and return zero and -1, respectively) or for +certain methods you may assume a default behaviour by using a NULL +method. The Open method looks for help in the 'parent' layer. +The following table summarizes the behaviour: + + method behaviour with NULL + + Clearerr PerlIOBase_clearerr + Close PerlIOBase_close + Dup PerlIOBase_dup + Eof PerlIOBase_eof + Error PerlIOBase_error + Fileno PerlIOBase_fileno + Fill FAILURE + Flush SUCCESS + Getarg SUCCESS + Get_base FAILURE + Get_bufsiz FAILURE + Get_cnt FAILURE + Get_ptr FAILURE + Open INHERITED + Popped SUCCESS + Pushed SUCCESS + Read PerlIOBase_read + Seek FAILURE + Set_cnt FAILURE + Set_ptrcnt FAILURE + Setlinebuf PerlIOBase_setlinebuf + Tell FAILURE + Unread PerlIOBase_unread + Write FAILURE + + FAILURE Set errno (to EINVAL in UNIXish, to LIB$_INVARG in VMS) and + return -1 (for numeric return values) or NULL (for pointers) + INHERITED Inherited from the layer below + SUCCESS Return 0 (for numeric return values) or a pointer =head2 Core Layers @@ -644,8 +898,11 @@ and so resumes reading from layer below.) =item "raw" A dummy layer which never exists on the layer stack. Instead when -"pushed" it actually pops the stack(!), removing itself, and any other -layers until it reaches a layer with the class C bit set. +"pushed" it actually pops the stack removing itself, it then calls +Binmode function table entry on all the layers in the stack - normally +this (via PerlIOBase_binmode) removes any layers which do not have +C bit set. Layers can modify that behaviour by defining +their own Binmode entry. =item "utf8" @@ -685,23 +942,31 @@ makes this layer available, although F "knows" where to find it. It is an example of a layer which takes an argument as it is called thus: - open($fh,"<:encoding(iso-8859-7)",$pathname) + open( $fh, "<:encoding(iso-8859-7)", $pathname ); -=item ":Scalar" +=item ":scalar" -Provides support for +Provides support for reading data from and writing data to a scalar. - open($fh,"...",\$scalar) + open( $fh, "+<:scalar", \$scalar ); When a handle is so opened, then reads get bytes from the string value of I<$scalar>, and writes change the value. In both cases the position in I<$scalar> starts as zero but can be altered via C, and determined via C. -=item ":Object" or ":Perl" +Please note that this layer is implied when calling open() thus: + + open( $fh, "+<", \$scalar ); + +=item ":via" + +Provided to allow layers to be implemented as Perl code. For instance: -May be provided to allow layers to be implemented as perl code - -implementation is being investigated. + use PerlIO::via::StripHTML; + open( my $fh, "<:via(StripHTML)", "index.html" ); + +See L for details. =back @@ -768,6 +1033,3 @@ a person who is not a PerlIO guru (yet). =back =cut - - -