pod/perldata.pod

   1 =head1 NAME
   2
   3 perldata - Perl data types
   4
   5 =head1 DESCRIPTION
   6
   7 =head2 Variable names
   8
   9 Perl has three data structures: scalars, arrays of scalars, and
  10 associative arrays of scalars, known as "hashes".  Normal arrays are
  11 indexed by number, starting with 0.  (Negative subscripts count from
  12 the end.)  Hash arrays are indexed by string.
  13
  14 Scalar values are always named with '$', even when referring to a scalar
  15 that is part of an array.  It works like the English word "the".  Thus
  16 we have:
  17
  18     $days               # the simple scalar value "days"
  19     $days[28]           # the 29th element of array @days
  20     $days{'Feb'}        # the 'Feb' value from hash %days
  21     $#days              # the last index of array @days
  22
  23 but entire arrays or array slices are denoted by '@', which works much like
  24 the word "these" or "those":
  25
  26     @days               # ($days[0], $days[1],... $days[n])
  27     @days[3,4,5]        # same as @days[3..5]
  28     @days{'a','c'}      # same as ($days{'a'},$days{'c'})
  29
  30 and entire hashes are denoted by '%':
  31
  32     %days               # (key1, val1, key2, val2 ...)
  33
  34 In addition, subroutines are named with an initial '&', though this is
  35 optional when it's otherwise unambiguous (just as "do" is often
  36 redundant in English).  Symbol table entries can be named with an
  37 initial '*', but you don't really care about that yet.
  38
  39 Every variable type has its own namespace.  You can, without fear of
  40 conflict, use the same name for a scalar variable, an array, or a hash
  41 (or, for that matter, a filehandle, a subroutine name, or a label).
  42 This means that $foo and @foo are two different variables.  It also
  43 means that C<$foo[1]> is a part of @foo, not a part of $foo.  This may
  44 seem a bit weird, but that's okay, because it is weird.
  45
  46 Since variable and array references always start with '$', '@', or '%',
  47 the "reserved" words aren't in fact reserved with respect to variable
  48 names.  (They ARE reserved with respect to labels and filehandles,
  49 however, which don't have an initial special character.  You can't have
  50 a filehandle named "log", for instance.  Hint: you could say
  51 C<open(LOG,'logfile')> rather than C<open(log,'logfile')>.  Using uppercase
  52 filehandles also improves readability and protects you from conflict
  53 with future reserved words.)  Case I<IS> significant--"FOO", "Foo" and
  54 "foo" are all different names.  Names that start with a letter or
  55 underscore may also contain digits and underscores.
  56
  57 It is possible to replace such an alphanumeric name with an expression
  58 that returns a reference to an object of that type.  For a description
  59 of this, see L<perlref>.
  60
  61 Names that start with a digit may only contain more digits.  Names
  62 which do not start with a letter, underscore,  or digit are limited to
  63 one character, e.g.  C<$%> or C<$$>.  (Most of these one character names
  64 have a predefined significance to Perl.  For instance, C<$$> is the
  65 current process id.)
  66
  67 =head2 Context
  68
  69 The interpretation of operations and values in Perl sometimes depends
  70 on the requirements of the context around the operation or value.
  71 There are two major contexts: scalar and list.  Certain operations
  72 return list values in contexts wanting a list, and scalar values
  73 otherwise.  (If this is true of an operation it will be mentioned in
  74 the documentation for that operation.)  In other words, Perl overloads
  75 certain operations based on whether the expected return value is
  76 singular or plural.  (Some words in English work this way, like "fish"
  77 and "sheep".)
  78
  79 In a reciprocal fashion, an operation provides either a scalar or a
  80 list context to each of its arguments.  For example, if you say
  81
  82     int( <STDIN> )
  83
  84 the integer operation provides a scalar context for the <STDIN>
  85 operator, which responds by reading one line from STDIN and passing it
  86 back to the integer operation, which will then find the integer value
  87 of that line and return that.  If, on the other hand, you say
  88
  89     sort( <STDIN> )
  90
  91 then the sort operation provides a list context for <STDIN>, which
  92 will proceed to read every line available up to the end of file, and
  93 pass that list of lines back to the sort routine, which will then
  94 sort those lines and return them as a list to whatever the context
  95 of the sort was.
  96
  97 Assignment is a little bit special in that it uses its left argument to
  98 determine the context for the right argument.  Assignment to a scalar
  99 evaluates the righthand side in a scalar context, while assignment to
 100 an array or array slice evaluates the righthand side in a list
 101 context.  Assignment to a list also evaluates the righthand side in a
 102 list context.
 103
 104 User defined subroutines may choose to care whether they are being
 105 called in a scalar or list context, but most subroutines do not
 106 need to care, because scalars are automatically interpolated into
 107 lists.  See L<perlfunc/wantarray>.
 108
 109 =head2 Scalar values
 110
 111 All data in Perl is a scalar or an array of scalars or a hash of scalars.
 112 Scalar variables may contain various kinds of singular data, such as
 113 numbers, strings, and references.  In general, conversion from one form to
 114 another is transparent.  (A scalar may not contain multiple values, but
 115 may contain a reference to an array or hash containing multiple values.)
 116 Because of the automatic conversion of scalars, operations and functions
 117 that return scalars don't need to care (and, in fact, can't care) whether
 118 the context is looking for a string or a number.
 119
 120 Scalars aren't necessarily one thing or another.  There's no place to
 121 declare a scalar variable to be of type "string", or of type "number", or
 122 type "filehandle", or anything else.  Perl is a contextually polymorphic
 123 language whose scalars can be strings, numbers, or references (which
 124 includes objects).  While strings and numbers are considered pretty
 125 much same thing for nearly all purposes, references are strongly-typed
 126 uncastable pointers with built-in reference-counting and destructor
 127 invocation.
 128
 129 A scalar value is interpreted as TRUE in the Boolean sense if it is not
 130 the null string or the number 0 (or its string equivalent, "0").  The
 131 Boolean context is just a special kind of scalar context.
 132
 133 There are actually two varieties of null scalars: defined and
 134 undefined.  Undefined null scalars are returned when there is no real
 135 value for something, such as when there was an error, or at end of
 136 file, or when you refer to an uninitialized variable or element of an
 137 array.  An undefined null scalar may become defined the first time you
 138 use it as if it were defined, but prior to that you can use the
 139 defined() operator to determine whether the value is defined or not.
 140
 141 To find out whether a given string is a valid non-zero number, it's usually
 142 enough to test it against both numeric 0 and also lexical "0" (although
 143 this will cause B<-w> noises).  That's because strings that aren't
 144 numbers count as 0, just as the do in I<awk>:
 145
 146     if ($str == 0 && $str ne "0")  {
 147         warn "That doesn't look like a number";
 148     }
 149
 150 That's usually preferable because otherwise you won't treat IEEE notations
 151 like C<NaN> or C<Infinity> properly.  At other times you might prefer to
 152 use a regular expression to check whether data is numeric.  See L<perlre>
 153 for details on regular expressions.
 154
 155     warn "has nondigits"        if     /\D/;
 156     warn "not a whole number"   unless /^\d+$/;
 157     warn "not an integer"       unless /^[+-]?\d+$/
 158     warn "not a decimal number" unless /^[+-]?\d+\.?\d*$/
 159     warn "not a C float"
 160         unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
 161
 162 The length of an array is a scalar value.  You may find the length of
 163 array @days by evaluating C<$#days>, as in B<csh>.  (Actually, it's not
 164 the length of the array, it's the subscript of the last element, since
 165 there is (ordinarily) a 0th element.)  Assigning to C<$#days> changes the
 166 length of the array.  Shortening an array by this method destroys
 167 intervening values.  Lengthening an array that was previously shortened
 168 I<NO LONGER> recovers the values that were in those elements.  (It used to
 169 in Perl 4, but we had to break this make to make sure destructors were
 170 called when expected.)  You can also gain some measure of efficiency by
 171 preextending an array that is going to get big.  (You can also extend
 172 an array by assigning to an element that is off the end of the array.)
 173 You can truncate an array down to nothing by assigning the null list ()
 174 to it.  The following are equivalent:
 175
 176     @whatever = ();
 177     $#whatever = $[ - 1;
 178
 179 If you evaluate a named array in a scalar context, it returns the length of
 180 the array.  (Note that this is not true of lists, which return the
 181 last value, like the C comma operator.)  The following is always true:
 182
 183     scalar(@whatever) == $#whatever - $[ + 1;
 184
 185 Version 5 of Perl changed the semantics of $[: files that don't set
 186 the value of $[ no longer need to worry about whether another
 187 file changed its value.  (In other words, use of $[ is deprecated.)
 188 So in general you can just assume that
 189
 190     scalar(@whatever) == $#whatever + 1;
 191
 192 Some programmers choose to use an explicit conversion so nothing's
 193 left to doubt:
 194
 195     $element_count = scalar(@whatever);
 196
 197 If you evaluate a hash in a scalar context, it returns a value which is
 198 true if and only if the hash contains any key/value pairs.  (If there
 199 are any key/value pairs, the value returned is a string consisting of
 200 the number of used buckets and the number of allocated buckets, separated
 201 by a slash.  This is pretty much only useful to find out whether Perl's
 202 (compiled in) hashing algorithm is performing poorly on your data set.
 203 For example, you stick 10,000 things in a hash, but evaluating %HASH in
 204 scalar context reveals "1/16", which means only one out of sixteen buckets
 205 has been touched, and presumably contains all 10,000 of your items.  This
 206 isn't supposed to happen.)
 207
 208 =head2 Scalar value constructors
 209
 210 Numeric literals are specified in any of the customary floating point or
 211 integer formats:
 212
 213     12345
 214     12345.67
 215     .23E-10
 216     0xffff              # hex
 217     0377                # octal
 218     4_294_967_296       # underline for legibility
 219
 220 String literals are usually delimited by either single or double quotes.  They
 221 work much like shell quotes:  double-quoted string literals are subject
 222 to backslash and variable substitution; single-quoted strings are not
 223 (except for "C<\'>" and "C<\\>").  The usual Unix backslash rules apply for making
 224 characters such as newline, tab, etc., as well as some more exotic
 225 forms.  See L<perlop/qq> for a list.
 226
 227 You can also embed newlines directly in your strings, i.e. they can end
 228 on a different line than they begin.  This is nice, but if you forget
 229 your trailing quote, the error will not be reported until Perl finds
 230 another line containing the quote character, which may be much further
 231 on in the script.  Variable substitution inside strings is limited to
 232 scalar variables, arrays, and array slices.  (In other words,
 233 identifiers beginning with $ or @, followed by an optional bracketed
 234 expression as a subscript.)  The following code segment prints out "The
 235 price is $100."
 236
 237     $Price = '$100';    # not interpreted
 238     print "The price is $Price.\n";     # interpreted
 239
 240 As in some shells, you can put curly brackets around the identifier to
 241 delimit it from following alphanumerics.  In fact, an identifier
 242 within such curlies is forced to be a string, as is any single
 243 identifier within a hash subscript.  Our earlier example,
 244
 245     $days{'Feb'}
 246
 247 can be written as
 248
 249     $days{Feb}
 250
 251 and the quotes will be assumed automatically.  But anything more complicated
 252 in the subscript will be interpreted as an expression.
 253
 254 Note that a
 255 single-quoted string must be separated from a preceding word by a
 256 space, since single quote is a valid (though deprecated) character in
 257 an identifier (see L<perlmod/Packages>).
 258
 259 Two special literals are __LINE__ and __FILE__, which represent the
 260 current line number and filename at that point in your program.  They
 261 may only be used as separate tokens; they will not be interpolated into
 262 strings.  In addition, the token __END__ may be used to indicate the
 263 logical end of the script before the actual end of file.  Any following
 264 text is ignored, but may be read via the DATA filehandle.  (The DATA
 265 filehandle may read data only from the main script, but not from any
 266 required file or evaluated string.)  The two control characters ^D and
 267 ^Z are synonyms for __END__ (or __DATA__ in a module; see L<SelfLoader> for
 268 details on __DATA__).
 269
 270 A word that has no other interpretation in the grammar will
 271 be treated as if it were a quoted string.  These are known as
 272 "barewords".  As with filehandles and labels, a bareword that consists
 273 entirely of lowercase letters risks conflict with future reserved
 274 words, and if you use the B<-w> switch, Perl will warn you about any
 275 such words.  Some people may wish to outlaw barewords entirely.  If you
 276 say
 277
 278     use strict 'subs';
 279
 280 then any bareword that would NOT be interpreted as a subroutine call
 281 produces a compile-time error instead.  The restriction lasts to the
 282 end of the enclosing block.  An inner block may countermand this
 283 by saying C<no strict 'subs'>.
 284
 285 Array variables are interpolated into double-quoted strings by joining all
 286 the elements of the array with the delimiter specified in the C<$">
 287 variable ($LIST_SEPARATOR in English), space by default.  The following
 288 are equivalent:
 289
 290     $temp = join($",@ARGV);
 291     system "echo $temp";
 292
 293     system "echo @ARGV";
 294
 295 Within search patterns (which also undergo double-quotish substitution)
 296 there is a bad ambiguity:  Is C</$foo[bar]/> to be interpreted as
 297 C</${foo}[bar]/> (where C<[bar]> is a character class for the regular
 298 expression) or as C</${foo[bar]}/> (where C<[bar]> is the subscript to array
 299 @foo)?  If @foo doesn't otherwise exist, then it's obviously a
 300 character class.  If @foo exists, Perl takes a good guess about C<[bar]>,
 301 and is almost always right.  If it does guess wrong, or if you're just
 302 plain paranoid, you can force the correct interpretation with curly
 303 brackets as above.
 304
 305 A line-oriented form of quoting is based on the shell "here-doc" syntax.
 306 Following a C<E<lt>E<lt>> you specify a string to terminate the quoted material,
 307 and all lines following the current line down to the terminating string
 308 are the value of the item.  The terminating string may be either an
 309 identifier (a word), or some quoted text.  If quoted, the type of
 310 quotes you use determines the treatment of the text, just as in regular
 311 quoting.  An unquoted identifier works like double quotes.  There must
 312 be no space between the C<E<lt>E<lt>> and the identifier.  (If you put a space it
 313 will be treated as a null identifier, which is valid, and matches the
 314 first blank line.)  The terminating string must appear by itself
 315 (unquoted and with no surrounding whitespace) on the terminating line.
 316
 317         print <<EOF;
 318     The price is $Price.
 319     EOF
 320
 321         print <<"EOF";  # same as above
 322     The price is $Price.
 323     EOF
 324
 325         print <<`EOC`;  # execute commands
 326     echo hi there
 327     echo lo there
 328     EOC
 329
 330         print <<"foo", <<"bar"; # you can stack them
 331     I said foo.
 332     foo
 333     I said bar.
 334     bar
 335
 336         myfunc(<<"THIS", 23, <<'THAT');
 337     Here's a line
 338     or two.
 339     THIS
 340     and here another.
 341     THAT
 342
 343 Just don't forget that you have to put a semicolon on the end
 344 to finish the statement, as Perl doesn't know you're not going to
 345 try to do this:
 346
 347         print <<ABC
 348     179231
 349     ABC
 350         + 20;
 351
 352
 353 =head2 List value constructors
 354
 355 List values are denoted by separating individual values by commas
 356 (and enclosing the list in parentheses where precedence requires it):
 357
 358     (LIST)
 359
 360 In a context not requiring a list value, the value of the list
 361 literal is the value of the final element, as with the C comma operator.
 362 For example,
 363
 364     @foo = ('cc', '-E', $bar);
 365
 366 assigns the entire list value to array foo, but
 367
 368     $foo = ('cc', '-E', $bar);
 369
 370 assigns the value of variable bar to variable foo.  Note that the value
 371 of an actual array in a scalar context is the length of the array; the
 372 following assigns to $foo the value 3:
 373
 374     @foo = ('cc', '-E', $bar);
 375     $foo = @foo;                # $foo gets 3
 376
 377 You may have an optional comma before the closing parenthesis of an
 378 list literal, so that you can say:
 379
 380     @foo = (
 381         1,
 382         2,
 383         3,
 384     );
 385
 386 LISTs do automatic interpolation of sublists.  That is, when a LIST is
 387 evaluated, each element of the list is evaluated in a list context, and
 388 the resulting list value is interpolated into LIST just as if each
 389 individual element were a member of LIST.  Thus arrays lose their
 390 identity in a LIST--the list
 391
 392     (@foo,@bar,&SomeSub)
 393
 394 contains all the elements of @foo followed by all the elements of @bar,
 395 followed by all the elements returned by the subroutine named SomeSub when
 396 it's called in a list context.
 397 To make a list reference that does I<NOT> interpolate, see L<perlref>.
 398
 399 The null list is represented by ().  Interpolating it in a list
 400 has no effect.  Thus ((),(),()) is equivalent to ().  Similarly,
 401 interpolating an array with no elements is the same as if no
 402 array had been interpolated at that point.
 403
 404 A list value may also be subscripted like a normal array.  You must
 405 put the list in parentheses to avoid ambiguity.  Examples:
 406
 407     # Stat returns list value.
 408     $time = (stat($file))[8];
 409
 410     # SYNTAX ERROR HERE.
 411     $time = stat($file)[8];  # OOPS, FORGOT PARENS
 412
 413     # Find a hex digit.
 414     $hexdigit = ('a','b','c','d','e','f')[$digit-10];
 415
 416     # A "reverse comma operator".
 417     return (pop(@foo),pop(@foo))[0];
 418
 419 Lists may be assigned to if and only if each element of the list
 420 is legal to assign to:
 421
 422     ($a, $b, $c) = (1, 2, 3);
 423
 424     ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
 425
 426 Array assignment in a scalar context returns the number of elements
 427 produced by the expression on the right side of the assignment:
 428
 429     $x = (($foo,$bar) = (3,2,1));       # set $x to 3, not 2
 430     $x = (($foo,$bar) = f());           # set $x to f()'s return count
 431
 432 This is very handy when you want to do a list assignment in a Boolean
 433 context, since most list functions return a null list when finished,
 434 which when assigned produces a 0, which is interpreted as FALSE.
 435
 436 The final element may be an array or a hash:
 437
 438     ($a, $b, @rest) = split;
 439     local($a, $b, %rest) = @_;
 440
 441 You can actually put an array or hash anywhere in the list, but the first one
 442 in the list will soak up all the values, and anything after it will get
 443 a null value.  This may be useful in a local() or my().
 444
 445 A hash literal contains pairs of values to be interpreted
 446 as a key and a value:
 447
 448     # same as map assignment above
 449     %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
 450
 451 While literal lists and named arrays are usually interchangeable, that's
 452 not the case for hashes.  Just because you can subscript a list value like
 453 a normal array does not mean that you can subscript a list value as a
 454 hash.  Likewise, hashes included as parts of other lists (including
 455 parameters lists and return lists from functions) always flatten out into
 456 key/value pairs.  That's why it's good to use references sometimes.
 457
 458 It is often more readable to use the C<=E<gt>> operator between key/value
 459 pairs.  The C<=E<gt>> operator is mostly just a more visually distinctive
 460 synonym for a comma, but it also quotes its left-hand operand, which makes
 461 it nice for initializing hashes:
 462
 463     %map = (
 464                  red   => 0x00f,
 465                  blue  => 0x0f0,
 466                  green => 0xf00,
 467    );
 468
 469 or for initializing hash references to be used as records:
 470
 471     $rec = {
 472                 witch => 'Mable the Merciless',
 473                 cat   => 'Fluffy the Ferocious',
 474                 date  => '10/31/1776',
 475     };
 476
 477 or for using call-by-named-parameter to complicated functions:
 478
 479    $field = $query->radio_group(
 480                name      => 'group_name',
 481                values    => ['eenie','meenie','minie'],
 482                default   => 'meenie',
 483                linebreak => 'true',
 484                labels    => \%labels
 485    );
 486
 487 Note that just because a hash is initialized in that order doesn't
 488 mean that it comes out in that order.  See L<perlfunc/sort> for examples
 489 of how to arrange for an output ordering.
 490
 491 =head2 Typeglobs and FileHandles
 492
 493 Perl uses an internal type called a I<typeglob> to hold an entire
 494 symbol table entry.  The type prefix of a typeglob is a C<*>, because
 495 it represents all types.  This used to be the preferred way to
 496 pass arrays and hashes by reference into a function, but now that
 497 we have real references, this is seldom needed.
 498
 499 One place where you still use typeglobs (or references thereto)
 500 is for passing or storing filehandles.  If you want to save away
 501 a filehandle, do it this way:
 502
 503     $fh = *STDOUT;
 504
 505 or perhaps as a real reference, like this:
 506
 507     $fh = \*STDOUT;
 508
 509 This is also the way to create a local filehandle.  For example:
 510
 511     sub newopen {
 512         my $path = shift;
 513         local *FH;  # not my!
 514         open (FH, $path) || return undef;
 515         return \*FH;
 516     }
 517     $fh = newopen('/etc/passwd');
 518
 519 See L<perlref>, L<perlsub>, and L<perlmod/"Symbols Tables"> for more
 520 discussion on typeglobs.  See L<perlfunc/open> for other ways of
 521 generating filehandles.