X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperldata.pod;h=b7c3b1cecd342daa728d2326164e8122df33e409;hb=533367d84727b326a81c972a3555d6a7847a4558;hp=ad27db163bc8fb23153d79d2edd3a4aa8ea8cf3e;hpb=d55a8828f62418643356fd7e780c23f77dbf7926;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perldata.pod b/pod/perldata.pod index ad27db1..b7c3b1c 100644 --- a/pod/perldata.pod +++ b/pod/perldata.pod @@ -8,9 +8,9 @@ perldata - Perl data types Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". Normal arrays -are ordered lists indexed by number, starting with 0 and with +are ordered lists of scalars indexed by number, starting with 0 and with negative subscripts counting from the end. Hashes are unordered -collections of values indexed by their associated string key. +collections of scalar values indexed by their associated string key. Values are usually referred to by name, or through a named reference. The first character of the name tells you to what sort of data @@ -109,14 +109,14 @@ list context to each of its arguments. For example, if you say int( ) -the integer operation provides scalar context for the E +the integer operation provides scalar context for the <> operator, which responds by reading one line from STDIN and passing it back to the integer operation, which will then find the integer value of that line and return that. If, on the other hand, you say sort( ) -then the sort operation provides list context for E, which +then the sort operation provides list context for <>, which will proceed to read every line available up to the end of file, and pass that list of lines back to the sort routine, which will then sort those lines and return them as a list to whatever the context @@ -129,7 +129,8 @@ assignment to an array or hash evaluates the righthand side in list context. Assignment to a list (or slice, which is just a list anyway) also evaluates the righthand side in list context. -When you use Perl's B<-w> command-line option, you may see warnings +When you use the C pragma or Perl's B<-w> command-line +option, you may see warnings about useless uses of constants or functions in "void context". Void context just means the value has been discarded, such as a statement containing only C<"fred";> or C. It still @@ -165,7 +166,7 @@ references are strongly-typed, uncastable pointers with builtin reference-counting and destructor invocation. A scalar value is interpreted as TRUE in the Boolean sense if it is not -the empty string or the number 0 (or its string equivalent, "0"). The +the null string or the number 0 (or its string equivalent, "0"). The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed. @@ -208,9 +209,9 @@ with a regular expression (as documented in L). unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/; The length of an array is a scalar value. You may find the length -of array @days by evaluating C<$#days>, as in B. Technically -speaking, this isn't the length of the array; it's the subscript -of the last element, since there is ordinarily a 0th element. +of array @days by evaluating C<$#days>, as in B. However, this +isn't the length of the array; it's the subscript of the last element, +which is a different value since there is ordinarily a 0th element. Assigning to C<$#days> actually changes the length of the array. Shortening an array this way destroys intervening values. Lengthening an array that was previously shortened does not recover values @@ -220,7 +221,7 @@ had to break this to make sure destructors were called when expected.) You can also gain some miniscule measure of efficiency by pre-extending an array that is going to get big. You can also extend an array by assigning to an element that is off the end of the array. You -can truncate an array down to nothing by assigning the empty list +can truncate an array down to nothing by assigning the null list () to it. The following are equivalent: @whatever = (); @@ -258,7 +259,7 @@ of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen. You can preallocate space for a hash by assigning to the keys() function. -This rounds up the allocated bucked to the next power of two: +This rounds up the allocated buckets to the next power of two: keys(%users) = 1000; # allocate 1024 buckets @@ -270,18 +271,25 @@ integer formats: 12345 12345.67 .23E-10 # a very small number - 4_294_967_296 # underline for legibility + 3.14_15_92 # a very important number + 4_294_967_296 # underscore for legibility 0xff # hex + 0xdead_beef # more hex 0377 # octal 0b011011 # binary +You are allowed to use underscores (underbars) in numeric literals +between digits for legibility. You could, for example, group binary +digits by threes (as for a Unix-style mode argument such as 0b110_100_100) +or by fours (to represent nibbles, as in 0b1010_0110) or in other groups. + String literals are usually delimited by either single or double quotes. They work much like quotes in the standard Unix shells: double-quoted string literals are subject to backslash and variable -substitution; single-quoted strings are not (except for "C<\'>" and -"C<\\>"). The usual C-style backslash rules apply for making +substitution; single-quoted strings are not (except for C<\'> and +C<\\>). The usual C-style backslash rules apply for making characters such as newline, tab, etc., as well as some more exotic -forms. See L for a list. +forms. See L for a list. Hexadecimal, octal, or binary, representations in string literals (e.g. '0xff') are not automatically converted to their integer @@ -302,7 +310,8 @@ price is $Z<>100." print "The price is $Price.\n"; # interpreted As in some shells, you can enclose the variable name in braces to -disambiguate it from following alphanumerics. You must also do +disambiguate it from following alphanumerics (and underscores). +You must also do this when interpolating a variable into a string to separate the variable name from a following double-colon or an apostrophe, since these would be otherwise treated as a package separator: @@ -323,19 +332,45 @@ C<$days{Feb}> and the quotes will be assumed automatically. But anything more complicated in the subscript will be interpreted as an expression. +A literal of the form C is parsed as a string composed +of characters with the specified ordinals. This provides an alternative, +more readable way to construct strings, rather than use the somewhat less +readable interpolation form C<"\x{1}\x{14}\x{12c}\x{fa0}">. This is useful +for representing Unicode strings, and for comparing version "numbers" +using the string comparison operators, C, C, C etc. +If there are two or more dots in the literal, the leading C may be +omitted. + + print v9786; # prints UTF-8 encoded SMILEY, "\x{263a}" + print v102.111.111; # prints "foo" + print 102.111.111; # same + +Such literals are accepted by both C and C for +doing a version check. The C<$^V> special variable also contains the +running Perl interpreter's version in this form. See L. + The special literals __FILE__, __LINE__, and __PACKAGE__ represent the current filename, line number, and package name at that point in your program. They may be used only as separate tokens; they will not be interpolated into strings. If there is no current package -(due to an empty C directive), __PACKAGE__ is the undefined value. - -The tokens __END__ and __DATA__ may be used to indicate the logical -end of the script before the actual end of file. Any following -text is ignored, but may be read via a DATA filehandle: main::DATA -for __END__, or PACKNAME::DATA (where PACKNAME is the current -package) for __DATA__. The two control characters ^D and ^Z are -synonyms for __END__ in the main program, __DATA__ in a separate -module. See L for more description of __DATA__, and +(due to an empty C directive), __PACKAGE__ is the undefined +value. + +The two control characters ^D and ^Z, and the tokens __END__ and __DATA__ +may be used to indicate the logical end of the script before the actual +end of file. Any following text is ignored. + +Text after __DATA__ but may be read via the filehandle C, +where C is the package that was current when the __DATA__ +token was encountered. The filehandle is left open pointing to the +contents after __DATA__. It is the program's responsibility to +C when it is done reading from it. For compatibility with +older scripts written before __DATA__ was introduced, __END__ behaves +like __DATA__ in the toplevel script (but not in files loaded with +C or C) and leaves the remaining contents of the +file accessible via C. + +See L for more description of __DATA__, and an example of its use. Note that you cannot read from the DATA filehandle in a BEGIN block: the BEGIN block is executed as soon as it is seen (during compilation), at which point the corresponding @@ -345,7 +380,8 @@ A word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are known as "barewords". As with filehandles and labels, a bareword that consists entirely of lowercase letters risks conflict with future reserved -words, and if you use the B<-w> switch, Perl will warn you about any +words, and if you use the C pragma or the B<-w> switch, +Perl will warn you about any such words. Some people may wish to outlaw barewords entirely. If you say @@ -377,27 +413,27 @@ plain paranoid, you can force the correct interpretation with curly braces as above. A line-oriented form of quoting is based on the shell "here-document" -syntax. Following a CE> you specify a string to terminate +syntax. Following a C<< << >> you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are the value of the item. The terminating string may be either an identifier (a word), or some quoted text. If quoted, the type of quotes you use determines the treatment of the text, just as in regular quoting. An unquoted identifier works like -double quotes. There must be no space between the CE> and -the identifier. (If you put a space it will be treated as a null -identifier, which is valid, and matches the first empty line.) The -terminating string must appear by itself (unquoted and with no -surrounding whitespace) on the terminating line. +double quotes. There must be no space between the C<< << >> and +the identifier, unless the identifier is quoted. (If you put a space it +will be treated as a null identifier, which is valid, and matches the first +empty line.) The terminating string must appear by itself (unquoted and +with no surrounding whitespace) on the terminating line. print <, +the quoted material must come on the lines following the final delimiter. +So instead of + + s/this/<. + +Additionally, the quoting rules for the identifier are not related to +Perl's quoting rules -- C, C, and the like are not supported +in place of C<''> and C<"">, and the only interpolation is for backslashing +the quoting character: + + print << "abc\"def"; + testing... + abc"def + +Finally, quoted strings cannot span multiple lines. The general rule is +that the identifier must be a string literal. Stick with that, and you +should be safe. + =head2 List value constructors List values are denoted by separating individual values by commas @@ -490,11 +559,20 @@ followed by all the elements returned by the subroutine named SomeSub called in list context, followed by the key/value pairs of %glarch. To make a list reference that does I interpolate, see L. -The empty list is represented by (). Interpolating it in a list +The null list is represented by (). Interpolating it in a list has no effect. Thus ((),(),()) is equivalent to (). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that point. +This interpolation combines with the facts that the opening +and closing parentheses are optional (except necessary for +precedence) and lists may end with an optional comma to mean that +multiple commas within lists are legal syntax. The list C<1,,3> is a +concatenation of two lists, C<1,> and C<3>, the first of which ends +with that optional comma. C<1,,3> is C<(1,),(3)> is C<1,3> (And +similarly for C<1,,,3> is C<(1,),(,),3> is C<1,3> and so on.) Not that +we'd advise you to use this obfuscation. + A list value may also be subscripted like a normal array. You must put the list in parentheses to avoid ambiguity. For example: @@ -530,7 +608,7 @@ produced by the expression on the right side of the assignment: $x = (($foo,$bar) = f()); # set $x to f()'s return count This is handy when you want to do a list assignment in a Boolean -context, because most list functions return a empty list when finished, +context, because most list functions return a null list when finished, which when assigned produces a 0, which is interpreted as FALSE. The final element may be an array or a hash: @@ -555,8 +633,8 @@ hash. Likewise, hashes included as parts of other lists (including parameters lists and return lists from functions) always flatten out into key/value pairs. That's why it's good to use references sometimes. -It is often more readable to use the C<=E> operator between key/value -pairs. The C<=E> operator is mostly just a more visually distinctive +It is often more readable to use the C<< => >> operator between key/value +pairs. The C<< => >> operator is mostly just a more visually distinctive synonym for a comma, but it also arranges for its left-hand operand to be interpreted as a string--if it's a bareword that would be a legal identifier. This makes it nice for initializing hashes: @@ -591,16 +669,16 @@ of how to arrange for an output ordering. =head2 Slices -A common way access an array or a hash is one scalar element at a time. -You can also subscript a list to get a single element from it. +A common way to access an array or a hash is one scalar element at a +time. You can also subscript a list to get a single element from it. $whoami = $ENV{"USER"}; # one element from the hash $parent = $ISA[0]; # one element from the array $dir = (getpwnam("daemon"))[7]; # likewise, but with list A slice accesses several elements of a list, an array, or a hash -simultaneously using a list of subscripts. It's a more convenient -that writing out the individual elements as a list of separate +simultaneously using a list of subscripts. It's more convenient +than writing out the individual elements as a list of separate scalar values. ($him, $her) = @folks[0,-1]; # array slice @@ -624,8 +702,8 @@ The previous assignments are exactly equivalent to ($folks[0], $folks[-1]) = ($folks[0], $folks[-1]); Since changing a slice changes the original array or hash that it's -slicing, a C construct will alter through some--or even -all--of the values of the array or hash. +slicing, a C construct will alter some--or even all--of the +values of the array or hash. foreach (@array[ 4 .. 10 ]) { s/peter/paul/ } @@ -635,13 +713,19 @@ all--of the values of the array or hash. s/(\w+)/\u\L$1/g; # "titlecase" words } -You couldn't just loop through C to do this because -that function produces a new list which is a copy of the values, -so changing them doesn't change the original. +A slice of an empty list is still an empty list. Thus: + + @a = ()[1,0]; # @a has no elements + @b = (@a)[0,1]; # @b has no elements + @c = (0,1)[2,3]; # @c has no elements + +But: -As a special rule, if a slice would produce a list consisting entirely -of undefined values, the empty list is produced instead. This makes -it easy to write loops that terminate when an empty list is returned: + @a = (1)[1,0]; # @a has two elements + @b = (1,undef)[1,0,2]; # @b has three elements + +This makes it easy to write loops that terminate when a null list +is returned: while ( ($home, $user) = (getpwent)[7,0]) { printf "%-8s %s\n", $user, $home; @@ -649,7 +733,7 @@ it easy to write loops that terminate when an empty list is returned: As noted earlier in this document, the scalar sense of list assignment is the number of elements on the right-hand side of the assignment. -The empty list contains no elements, so when the password file is +The null list contains no elements, so when the password file is exhausted, the result is 0, not 2. If you're confused about why you use an '@' there on a hash slice @@ -716,6 +800,28 @@ C<*HANDLE{IO}> only works if HANDLE has already been used as a handle. In other words, C<*FH> must be used to create new symbol table entries; C<*foo{THING}> cannot. When in doubt, use C<*FH>. +All functions that are capable of creating filehandles (open(), +opendir(), pipe(), socketpair(), sysopen(), socket(), and accept()) +automatically create an anonymous filehandle if the handle passed to +them is an uninitialized scalar variable. This allows the constructs +such as C and C to be used to +create filehandles that will conveniently be closed automatically when +the scope ends, provided there are no other references to them. This +largely eliminates the need for typeglobs when opening filehandles +that must be passed around, as in the following example: + + sub myopen { + open my $fh, "@_" + or die "Can't open '@_': $!"; + return $fh; + } + + { + my $f = myopen("; + # $f implicitly closed here + } + Another way to create anonymous filehandles is with the Symbol module or with the IO::Handle module and its ilk. These modules have the advantage of not hiding different types of the same name