X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq4.pod;h=6a882c53ff5b2c6718b0d0549ddd9a51b60ed1dc;hb=938c8732ceb115a707f725327a631eb35319ba87;hp=aeb7c14c199e2452b359e3b86577d44ea856ae1a;hpb=76817d6d0100fce867ccaef52d4367ca8ccd3fa2;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index aeb7c14..6a882c5 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.22 $, $Date: 2002/05/16 12:44:24 $) +perlfaq4 - Data Manipulation ($Revision: 1.54 $, $Date: 2003/11/30 00:50:08 $) =head1 DESCRIPTION @@ -11,65 +11,63 @@ numbers, dates, strings, arrays, hashes, and miscellaneous data issues. =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)? -The infinite set that a mathematician thinks of as the real numbers can -only be approximated on a computer, since the computer only has a finite -number of bits to store an infinite number of, um, numbers. - -Internally, your computer represents floating-point numbers in binary. -Floating-point numbers read in from a file or appearing as literals -in your program are converted from their decimal floating-point -representation (eg, 19.95) to an internal binary representation. - -However, 19.95 can't be precisely represented as a binary -floating-point number, just like 1/3 can't be exactly represented as a -decimal floating-point number. The computer's binary representation -of 19.95, therefore, isn't exactly 19.95. - -When a floating-point number gets printed, the binary floating-point -representation is converted back to decimal. These decimal numbers -are displayed in either the format you specify with printf(), or the -current output format for numbers. (See L if you use -print. C<$#> has a different default value in Perl5 than it did in -Perl4. Changing C<$#> yourself is deprecated.) - -This affects B computer languages that represent decimal -floating-point numbers in binary, not just Perl. Perl provides -arbitrary-precision decimal numbers with the Math::BigFloat module -(part of the standard Perl distribution), but mathematical operations -are consequently slower. - -If precision is important, such as when dealing with money, it's good -to work with integers and then divide at the last possible moment. -For example, work in pennies (1995) instead of dollars and cents -(19.95) and divide by 100 at the end. - -To get rid of the superfluous digits, just use a format (eg, -C) to get the required precision. -See L. +Internally, your computer represents floating-point numbers +in binary. Digital (as in powers of two) computers cannot +store all numbers exactly. Some real numbers lose precision +in the process. This is a problem with how computers store +numbers and affects all computer languages, not just Perl. + +L show the gory details of number +representations and conversions. + +To limit the number of decimal places in your numbers, you +can use the printf or sprintf function. See the +L<"Floating Point Arithmetic"|perlop> for more details. + + printf "%.2f", 10/3; + + my $number = sprintf "%.2f", 10/3; + +=head2 Why is int() broken? + +Your int() is most probably working just fine. It's the numbers that +aren't quite what you think. + +First, see the above item "Why am I getting long decimals +(eg, 19.9499999999999) instead of the numbers I should be getting +(eg, 19.95)?". + +For example, this + + print int(0.6/0.2-2), "\n"; + +will in most computers print 0, not 1, because even such simple +numbers as 0.6 and 0.2 cannot be presented exactly by floating-point +numbers. What you think in the above as 'three' is really more like +2.9999999999999995559. =head2 Why isn't my octal data interpreted correctly? -Perl only understands octal and hex numbers as such when they occur -as literals in your program. Octal literals in perl must start with -a leading "0" and hexadecimal literals must start with a leading "0x". -If they are read in from somewhere and assigned, no automatic -conversion takes place. You must explicitly use oct() or hex() if you -want the values converted to decimal. oct() interprets -both hex ("0x350") numbers and octal ones ("0350" or even without the -leading "0", like "377"), while hex() only converts hexadecimal ones, -with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef". +Perl only understands octal and hex numbers as such when they occur as +literals in your program. Octal literals in perl must start with a +leading "0" and hexadecimal literals must start with a leading "0x". +If they are read in from somewhere and assigned, no automatic +conversion takes place. You must explicitly use oct() or hex() if you +want the values converted to decimal. oct() interprets hex ("0x350"), +octal ("0350" or even without the leading "0", like "377") and binary +("0b1010") numbers, while hex() only converts hexadecimal ones, with +or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef". The inverse mapping from decimal to octal can be done with either the -"%o" or "%O" sprintf() formats. To get from decimal to hex try either -the "%x" or the "%X" formats to sprintf(). +"%o" or "%O" sprintf() formats. This problem shows up most often when people try using chmod(), mkdir(), -umask(), or sysopen(), which by widespread tradition typically take +umask(), or sysopen(), which by widespread tradition typically take permissions in octal. chmod(644, $file); # WRONG chmod(0644, $file); # right -Note the mistake in the first line was specifying the decimal literal +Note the mistake in the first line was specifying the decimal literal 644, rather than the intended octal literal 0644. The problem can be seen with: @@ -77,7 +75,7 @@ be seen with: Surely you had not intended C - did you? If you want to use numeric literals as arguments to chmod() et al. then please -try to express them as octal constants, that is with a leading zero and +try to express them as octal constants, that is with a leading zero and with the following digits restricted to the set 0..7. =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions? @@ -114,7 +112,7 @@ alternation: for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i} - 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 + 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0 Don't blame Perl. It's the same as in C. IEEE says we have to do this. @@ -122,7 +120,7 @@ Perl numbers whose absolute values are integers under 2**31 (on 32 bit machines) will work pretty much like mathematical integers. Other numbers are not guaranteed. -=head2 How do I convert between numeric representations? +=head2 How do I convert between numeric representations/bases/radixes? As always with Perl there is more than one way to do it. Below are a few examples of approaches to making common conversions @@ -135,22 +133,21 @@ functions is that it works with numbers of ANY size, that it is optimized for speed on some operations, and for at least some programmers the notation might be familiar. -=item B +=over 4 + +=item How do I convert hexadecimal into decimal Using perl's built in conversion of 0x notation: - $int = 0xDEADBEEF; - $dec = sprintf("%d", $int); + $dec = 0xDEADBEEF; Using the hex function: - $int = hex("DEADBEEF"); - $dec = sprintf("%d", $int); + $dec = hex("DEADBEEF"); Using pack: - $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); - $dec = sprintf("%d", $int); + $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); Using the CPAN module Bit::Vector: @@ -158,17 +155,18 @@ Using the CPAN module Bit::Vector: $vec = Bit::Vector->new_Hex(32, "DEADBEEF"); $dec = $vec->to_Dec(); -=item B +=item How do I convert from decimal to hexadecimal -Using sprint: +Using sprintf: - $hex = sprintf("%X", 3735928559); + $hex = sprintf("%X", 3735928559); # upper case A-F + $hex = sprintf("%x", 3735928559); # lower case a-f -Using unpack +Using unpack: $hex = unpack("H*", pack("N", 3735928559)); -Using Bit::Vector +Using Bit::Vector: use Bit::Vector; $vec = Bit::Vector->new_Dec(32, -559038737); @@ -181,17 +179,15 @@ And Bit::Vector supports odd bit counts: $vec->Resize(32); # suppress leading 0 if unwanted $hex = $vec->to_Hex(); -=item B +=item How do I convert from octal to decimal Using Perl's built in conversion of numbers with leading zeros: - $int = 033653337357; # note the leading 0! - $dec = sprintf("%d", $int); + $dec = 033653337357; # note the leading 0! Using the oct function: - $int = oct("33653337357"); - $dec = sprintf("%d", $int); + $dec = oct("33653337357"); Using Bit::Vector: @@ -200,30 +196,35 @@ Using Bit::Vector: $vec->Chunk_List_Store(3, split(//, reverse "33653337357")); $dec = $vec->to_Dec(); -=item B +=item How do I convert from decimal to octal Using sprintf: $oct = sprintf("%o", 3735928559); -Using Bit::Vector +Using Bit::Vector: use Bit::Vector; $vec = Bit::Vector->new_Dec(32, -559038737); $oct = reverse join('', $vec->Chunk_List_Read(3)); -=item B +=item How do I convert from binary to decimal Perl 5.6 lets you write binary numbers directly with the 0b notation: - $number = 0b10110110; + $number = 0b10110110; -Using pack and ord +Using oct: + + my $input = "10110110"; + $decimal = oct( "0b$input" ); + +Using pack and ord: $decimal = ord(pack('B8', '10110110')); -Using pack and unpack for larger strings +Using pack and unpack for larger strings: $int = unpack("N", pack("B32", substr("0" x 32 . "11110101011011011111011101111", -32))); @@ -236,9 +237,13 @@ Using Bit::Vector: $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111"); $dec = $vec->to_Dec(); -=item B +=item How do I convert from decimal to binary + +Using sprintf (perl 5.6+): -Using unpack; + $bin = sprintf("%b", 3735928559); + +Using unpack: $bin = unpack("B*", pack("N", 3735928559)); @@ -251,6 +256,7 @@ Using Bit::Vector: The remaining transformations (e.g. hex -> oct, bin -> hex, etc.) are left as an exercise to the inclined reader. +=back =head2 Why doesn't & work the way I want it to? @@ -261,7 +267,7 @@ C<00110011>). The operators work with the binary form of a number (the number C<3> is treated as the bit pattern C<00000011>). So, saying C<11 & 3> performs the "and" operation on numbers (yielding -C<1>). Saying C<"11" & "3"> performs the "and" operation on strings +C<3>). Saying C<"11" & "3"> performs the "and" operation on strings (yielding C<"1">). Most problems with C<&> and C<|> arise because the programmer thinks @@ -332,14 +338,17 @@ Get the http://www.cpan.org/modules/by-module/Roman module. If you're using a version of Perl before 5.004, you must call C once at the start of your program to seed the random number generator. + + BEGIN { srand() if $] < 5.004 } + 5.004 and later automatically call C at the beginning. Don't -call C more than once--you make your numbers less random, rather +call C more than once---you make your numbers less random, rather than more. Computers are good at being predictable and bad at being random (despite appearances caused by bugs in your programs :-). see the -F artitcle in the "Far More Than You Ever Wanted To Know" -collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of +F article in the "Far More Than You Ever Wanted To Know" +collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of Tom Phoenix, talks more about this. John von Neumann said, ``Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.'' @@ -353,9 +362,20 @@ pseudorandom generator than comes with your operating system, look at =head2 How do I get a random number between X and Y? -Use the following simple function. It selects a random integer between -(and possibly including!) the two given integers, e.g., -C +C returns a number such that +C<< 0 <= rand($x) < $x >>. Thus what you want to have perl +figure out is a random number in the range from 0 to the +difference between your I and I. + +That is, to get a number between 10 and 15, inclusive, you +want a random number between 0 and 5 that you can then add +to 10. + + my $number = 10 + int rand( 15-10+1 ); + +Hence you derive the following simple function to abstract +that. It selects a random integer between the two given +integers (inclusive), For example: C. sub random_int_in ($$) { my($min, $max) = @_; @@ -367,29 +387,51 @@ C =head1 Data: Dates -=head2 How do I find the week-of-the-year/day-of-the-year? +=head2 How do I find the day or week of the year? -The day of the year is in the array returned by localtime() (see -L): +The localtime function returns the day of the week. Without an +argument localtime uses the current time. - $day_of_year = (localtime(time()))[7]; + $day_of_year = (localtime)[7]; + +The POSIX module can also format a date as the day of the year or +week of the year. + + use POSIX qw/strftime/; + my $day_of_year = strftime "%j", localtime; + my $week_of_year = strftime "%W", localtime; + +To get the day of year for any date, use the Time::Local module to get +a time in epoch seconds for the argument to localtime. + + use POSIX qw/strftime/; + use Time::Local; + my $week_of_year = strftime "%W", + localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) ); + +The Date::Calc module provides two functions for to calculate these. + + use Date::Calc; + my $day_of_year = Day_of_Year( 1987, 12, 18 ); + my $week_of_year = Week_of_Year( 1987, 12, 18 ); =head2 How do I find the current century or millennium? Use the following simple functions: - sub get_century { + sub get_century { return int((((localtime(shift || time))[5] + 1999))/100); - } - sub get_millennium { + } + sub get_millennium { return 1+int((((localtime(shift || time))[5] + 1899))/1000); - } + } -On some systems, you'll find that the POSIX module's strftime() function -has been extended in a non-standard way to use a C<%C> format, which they -sometimes claim is the "century". It isn't, because on most such systems, -this is only the first two digits of the four-digit year, and thus cannot -be used to reliably determine the current century or millennium. +On some systems, the POSIX module's strftime() function has +been extended in a non-standard way to use a C<%C> format, +which they sometimes claim is the "century". It isn't, +because on most such systems, this is only the first two +digits of the four-digit year, and thus cannot be used to +reliably determine the current century or millennium. =head2 How can I compare two dates and find the difference? @@ -435,58 +477,60 @@ modules. (Thanks to David Cassell for most of this text.) =head2 How do I find yesterday's date? -The C function returns the current time in seconds since the -epoch. Take twenty-four hours off that: +If you only need to find the date (and not the same time), you +can use the Date::Calc module. - $yesterday = time() - ( 24 * 60 * 60 ); + use Date::Calc qw(Today Add_Delta_Days); -Then you can pass this to C and get the individual year, -month, day, hour, minute, seconds values. + my @date = Add_Delta_Days( Today(), -1 ); -Note very carefully that the code above assumes that your days are -twenty-four hours each. For most people, there are two days a year -when they aren't: the switch to and from summer time throws this off. -A solution to this issue is offered by Russ Allbery. + print "@date\n"; + +Most people try to use the time rather than the calendar to +figure out dates, but that assumes that your days are +twenty-four hours each. For most people, there are two days +a year when they aren't: the switch to and from summer time +throws this off. Russ Allbery offers this solution. sub yesterday { - my $now = defined $_[0] ? $_[0] : time; - my $then = $now - 60 * 60 * 24; - my $ndst = (localtime $now)[8] > 0; - my $tdst = (localtime $then)[8] > 0; - $then - ($tdst - $ndst) * 60 * 60; - } - # Should give you "this time yesterday" in seconds since epoch relative to - # the first argument or the current time if no argument is given and - # suitable for passing to localtime or whatever else you need to do with - # it. $ndst is whether we're currently in daylight savings time; $tdst is - # whether the point 24 hours ago was in daylight savings time. If $tdst - # and $ndst are the same, a boundary wasn't crossed, and the correction - # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more - # from yesterday's time since we gained an extra hour while going off - # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a - # negative hour (add an hour) to yesterday's time since we lost an hour. - # - # All of this is because during those days when one switches off or onto - # DST, a "day" isn't 24 hours long; it's either 23 or 25. - # - # The explicit settings of $ndst and $tdst are necessary because localtime - # only says it returns the system tm struct, and the system tm struct at - # least on Solaris doesn't guarantee any particular positive value (like, - # say, 1) for isdst, just a positive value. And that value can - # potentially be negative, if DST information isn't available (this sub - # just treats those cases like no DST). - # - # Note that between 2am and 3am on the day after the time zone switches - # off daylight savings time, the exact hour of "yesterday" corresponding - # to the current hour is not clearly defined. Note also that if used - # between 2am and 3am the day after the change to daylight savings time, - # the result will be between 3am and 4am of the previous day; it's - # arguable whether this is correct. - # - # This sub does not attempt to deal with leap seconds (most things don't). - # - # Copyright relinquished 1999 by Russ Allbery - # This code is in the public domain + my $now = defined $_[0] ? $_[0] : time; + my $then = $now - 60 * 60 * 24; + my $ndst = (localtime $now)[8] > 0; + my $tdst = (localtime $then)[8] > 0; + $then - ($tdst - $ndst) * 60 * 60; + } + +Should give you "this time yesterday" in seconds since epoch relative to +the first argument or the current time if no argument is given and +suitable for passing to localtime or whatever else you need to do with +it. $ndst is whether we're currently in daylight savings time; $tdst is +whether the point 24 hours ago was in daylight savings time. If $tdst +and $ndst are the same, a boundary wasn't crossed, and the correction +will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more +from yesterday's time since we gained an extra hour while going off +daylight savings time. If $tdst is 0 and $ndst is 1, subtract a +negative hour (add an hour) to yesterday's time since we lost an hour. + +All of this is because during those days when one switches off or onto +DST, a "day" isn't 24 hours long; it's either 23 or 25. + +The explicit settings of $ndst and $tdst are necessary because localtime +only says it returns the system tm struct, and the system tm struct at +least on Solaris doesn't guarantee any particular positive value (like, +say, 1) for isdst, just a positive value. And that value can +potentially be negative, if DST information isn't available (this sub +just treats those cases like no DST). + +Note that between 2am and 3am on the day after the time zone switches +off daylight savings time, the exact hour of "yesterday" corresponding +to the current hour is not clearly defined. Note also that if used +between 2am and 3am the day after the change to daylight savings time, +the result will be between 3am and 4am of the previous day; it's +arguable whether this is correct. + +This sub does not attempt to deal with leap seconds (most things don't). + + =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? @@ -554,14 +598,6 @@ a subroutine call (in list context) into a string: print "My sub returned @{[mysub(1,2,3)]} that time.\n"; -If you prefer scalar context, similar chicanery is also useful for -arbitrary expressions: - - print "That yields ${\($n + 5)} widgets\n"; - -Version 5.004 of Perl had a bug that gave list context to the -expression in C<${...}>, but this is fixed in version 5.005. - See also ``How can I expand variables in text strings?'' in this section of the FAQ. @@ -572,8 +608,9 @@ matter how complicated. To find something between two single characters, a pattern like C will get the intervening bits in $1. For multiple ones, then something more like C would be needed. But none of these deals with -nested patterns, nor can they. For that you'll have to write a -parser. +nested patterns. For balanced expressions using C<(>, C<{>, C<[> +or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see +L. For other cases, you'll have to write a parser. If you are serious about writing a parser, there are a number of modules or oddities that will make your life a lot easier. There are @@ -586,7 +623,7 @@ pull out the smallest nesting parts one at a time: while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 - } + } A more complicated and sneaky approach is to make Perl's regular expression engine do it for you. This is courtesy Dean Inada, and @@ -641,22 +678,24 @@ done by making a shell alias, like so: See the documentation for Text::Autoformat to appreciate its many capabilities. -=head2 How can I access/change the first N letters of a string? +=head2 How can I access or change N characters of a string? + +You can access the first characters of a string with substr(). +To get the first character, for example, start at position 0 +and grab the string of length 1. -There are many ways. If you just want to grab a copy, use -substr(): - $first_byte = substr($a, 0, 1); + $string = "Just another Perl Hacker"; + $first_char = substr( $string, 0, 1 ); # 'J' -If you want to modify part of a string, the simplest way is often to -use substr() as an lvalue: +To change part of a string, you can use the optional fourth +argument which is the replacement string. - substr($a, 0, 3) = "Tom"; + substr( $string, 13, 4, "Perl 5.8.0" ); -Although those with a pattern matching kind of thought process will -likely prefer +You can also use substr() as an lvalue. - $a =~ s/^.../Tom/; + substr( $string, 13, 4 ) = "Perl 5.8.0"; =head2 How do I change the Nth occurrence of something? @@ -749,20 +788,34 @@ case", but that's not quite accurate. Consider the proper capitalization of the movie I, for example. -=head2 How can I split a [character] delimited string except when inside -[character]? (Comma-separated files) +Damian Conway's L module provides some smart +case transformations: + + use Text::Autoformat; + my $x = "Dr. Strangelove or: How I Learned to Stop ". + "Worrying and Love the Bomb"; + + print $x, "\n"; + for my $style (qw( sentence title highlight )) + { + print autoformat($x, { case => $style }), "\n"; + } + +=head2 How can I split a [character] delimited string except when inside [character]? + +Several modules can handle this sort of pasing---Text::Balanced, +Text::CVS, Text::CVS_XS, and Text::ParseWords, among others. -Take the example case of trying to split a string that is comma-separated -into its different fields. (We'll pretend you said comma-separated, not -comma-delimited, which is different and almost never what you mean.) You -can't use C because you shouldn't split if the comma is inside -quotes. For example, take a data line like this: +Take the example case of trying to split a string that is +comma-separated into its different fields. You can't use C +because you shouldn't split if the comma is inside quotes. For +example, take a data line like this: SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" Due to the restriction of the quotes, this is a fairly complex -problem. Thankfully, we have Jeffrey Friedl, author of a highly -recommended book on regular expressions, to handle these for us. He +problem. Thankfully, we have Jeffrey Friedl, author of +I, to handle these for us. He suggests (assuming your string is contained in $text): @new = (); @@ -775,8 +828,7 @@ suggests (assuming your string is contained in $text): If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, -C<"like \"this\"">. Unescaping them is a task addressed earlier in -this section. +C<"like \"this\"">. Alternatively, the Text::ParseWords module (part of the standard Perl distribution) lets you say: @@ -807,10 +859,10 @@ Or more nicely written as: This idiom takes advantage of the C loop's aliasing behavior to factor out common code. You can do this -on several strings at once, or arrays, or even the +on several strings at once, or arrays, or even the values of a hash if you use a slice: - # trim whitespace in the scalar, the array, + # trim whitespace in the scalar, the array, # and all the values in the hash foreach ($scalar, @array, @hash{keys %hash}) { s/^\s+//; @@ -819,9 +871,6 @@ values of a hash if you use a slice: =head2 How do I pad a string with blanks or pad a number with zeroes? -(This answer contributed by Uri Guttman, with kibitzing from -Bart Lateur.) - In the following examples, C<$pad_len> is the length to which you wish to pad the string, C<$text> or C<$num> contains the string to be padded, and C<$pad_char> contains the padding character. You can use a single @@ -836,13 +885,16 @@ right with blanks and it will truncate the result to a maximum length of C<$pad_len>. # Left padding a string with blanks (no truncation): - $padded = sprintf("%${pad_len}s", $text); + $padded = sprintf("%${pad_len}s", $text); + $padded = sprintf("%*s", $pad_len, $text); # same thing # Right padding a string with blanks (no truncation): - $padded = sprintf("%-${pad_len}s", $text); + $padded = sprintf("%-${pad_len}s", $text); + $padded = sprintf("%-*s", $pad_len, $text); # same thing - # Left padding a number with 0 (no truncation): - $padded = sprintf("%0${pad_len}d", $num); + # Left padding a number with 0 (no truncation): + $padded = sprintf("%0${pad_len}d", $num); + $padded = sprintf("%0*d", $pad_len, $num); # same thing # Right padding a string with blanks using pack (will truncate): $padded = pack("A$pad_len",$text); @@ -865,19 +917,19 @@ Left and right padding with any character, modifying C<$text> directly: =head2 How do I extract selected columns from a string? Use substr() or unpack(), both documented in L. -If you prefer thinking in terms of columns instead of widths, +If you prefer thinking in terms of columns instead of widths, you can use this kind of thing: # determine the unpack format needed to split Linux ps output # arguments are cut columns my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); - sub cut2fmt { + sub cut2fmt { my(@positions) = @_; my $template = ''; my $lastpos = 1; for my $place (@positions) { - $template .= "A" . ($place - $lastpos) . " "; + $template .= "A" . ($place - $lastpos) . " "; $lastpos = $place; } $template .= "A*"; @@ -915,7 +967,7 @@ be, you'd have to do this: It's probably better in the general case to treat those variables as entries in some special hash. For example: - %user_defs = ( + %user_defs = ( foo => 23, bar => 19, ); @@ -929,7 +981,7 @@ of the FAQ. The problem is that those double-quotes force stringification-- coercing numbers and references into strings--even when you don't want them to be strings. Think of it this way: double-quote -expansion is used to produce new strings. If you already +expansion is used to produce new strings. If you already have a string, why do you need more? If you get used to writing odd things like these: @@ -960,27 +1012,27 @@ that actually do care about the difference between a string and a number, such as the magical C<++> autoincrement operator or the syscall() function. -Stringification also destroys arrays. +Stringification also destroys arrays. @lines = `command`; print "@lines"; # WRONG - extra blanks print @lines; # right -=head2 Why don't my <EHERE documents work? Check for these three things: =over 4 -=item 1. There must be no space after the << part. +=item There must be no space after the EE part. -=item 2. There (probably) should be a semicolon at the end. +=item There (probably) should be a semicolon at the end. -=item 3. You can't (easily) have any space in front of the tag. +=item You can't (easily) have any space in front of the tag. =back -If you want to indent the text in the here document, you +If you want to indent the text in the here document, you can do this: # all in one @@ -990,7 +1042,7 @@ can do this: HERE_TARGET But the HERE_TARGET must still be flush against the margin. -If you want that indented also, you'll have to quote +If you want that indented also, you'll have to quote in the indentation. ($quote = <<' FINIS') =~ s/^\s+//gm; @@ -1085,7 +1137,7 @@ with @bad[0] = `same program that outputs several lines`; -The C pragma and the B<-w> flag will warn you about these +The C pragma and the B<-w> flag will warn you about these matters. =head2 How can I remove duplicate elements from a list or array? @@ -1241,8 +1293,8 @@ like this one. It uses the CPAN module FreezeThaw: @a = @b = ( "this", "that", [ "more", "stuff" ] ); printf "a and b contain %s arrays\n", - cmpStr(\@a, \@b) == 0 - ? "the same" + cmpStr(\@a, \@b) == 0 + ? "the same" : "different"; This approach also works for comparing hashes. Here @@ -1252,7 +1304,7 @@ we'll demonstrate two different answers: %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); $a{EXTRA} = \%b; - $b{EXTRA} = \%a; + $b{EXTRA} = \%a; printf "a and b contain %s hashes\n", cmpStr(\%a, \%b) == 0 ? "the same" : "different"; @@ -1267,16 +1319,37 @@ an exercise to the reader. =head2 How do I find the first array element for which a condition is true? -You can use this if you care about the index: +To find the first array element which satisfies a condition, you can +use the first() function in the List::Util module, which comes with +Perl 5.8. This example finds the first element that contains "Perl". - for ($i= 0; $i < @array; $i++) { - if ($array[$i] eq "Waldo") { - $found_index = $i; - last; - } - } + use List::Util qw(first); -Now C<$found_index> has what you want. + my $element = first { /Perl/ } @array; + +If you cannot use List::Util, you can make your own loop to do the +same thing. Once you find the element, you stop the loop with last. + + my $found; + foreach my $element ( @array ) + { + if( /Perl/ ) { $found = $element; last } + } + +If you want the array index, you can iterate through the indices +and check the array element at each index until you find one +that satisfies the condition. + + my( $found, $index ) = ( undef, -1 ); + for( $i = 0; $i < @array; $i++ ) + { + if( $array[$i] =~ /Perl/ ) + { + $found = $array[$i]; + $index = $i; + last; + } + } =head2 How do I handle linked lists? @@ -1396,65 +1469,79 @@ Here's another; let's compute spherical volumes: $_ *= (4/3) * 3.14159; # this will be constant folded } +which can also be done with map() which is made to transform +one list into another: + + @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii; + If you want to do the same thing to modify the values of the hash, you can use the C function. As of Perl 5.6 the values are not copied, so if you modify $orbit (in this case), you modify the value. for $orbit ( values %orbits ) { - ($orbit **= 3) *= (4/3) * 3.14159; + ($orbit **= 3) *= (4/3) * 3.14159; } - + Prior to perl 5.6 C returned copies of the values, so older perl code often contains constructions such as C<@orbits{keys %orbits}> instead of C where the hash is to be modified. - + =head2 How do I select a random element from an array? Use the rand() function (see L): - # at the top of the program: - srand; # not needed for 5.004 and later - - # then later on $index = rand @array; $element = $array[$index]; -Make sure you I. -If you are calling it more than once (such as before each -call to rand), you're almost certainly doing something wrong. +Or, simply: + my $element = $array[ rand @array ]; =head2 How do I permute N elements of a list? -Here's a little program that generates all permutations -of all the words on each line of input. The algorithm embodied -in the permute() function should work on any list: - - #!/usr/bin/perl -n - # tsc-permute: permute each word of input - permute([split], []); - sub permute { - my @items = @{ $_[0] }; - my @perms = @{ $_[1] }; - unless (@items) { - print "@perms\n"; - } else { - my(@newitems,@newperms,$i); - foreach $i (0 .. $#items) { - @newitems = @items; - @newperms = @perms; - unshift(@newperms, splice(@newitems, $i, 1)); - permute([@newitems], [@newperms]); - } +Use the List::Permutor module on CPAN. If the list is +actually an array, try the Algorithm::Permute module (also +on CPAN). It's written in XS code and is very efficient. + + use Algorithm::Permute; + my @array = 'a'..'d'; + my $p_iterator = Algorithm::Permute->new ( \@array ); + while (my @perm = $p_iterator->next) { + print "next permutation: (@perm)\n"; } - } -Unfortunately, this algorithm is very inefficient. The Algorithm::Permute -module from CPAN runs at least an order of magnitude faster. If you don't -have a C compiler (or a binary distribution of Algorithm::Permute), then -you can use List::Permutor which is written in pure Perl, and is still -several times faster than the algorithm above. +For even faster execution, you could do: + + use Algorithm::Permute; + my @array = 'a'..'d'; + Algorithm::Permute::permute { + print "next permutation: (@array)\n"; + } @array; + +Here's a little program that generates all permutations of +all the words on each line of input. The algorithm embodied +in the permute() function is discussed in Volume 4 (still +unpublished) of Knuth's I +and will work on any list: + + #!/usr/bin/perl -n + # Fischer-Kause ordered permutation generator + + sub permute (&@) { + my $code = shift; + my @idx = 0..$#_; + while ( $code->(@_[@idx]) ) { + my $p = $#idx; + --$p while $idx[$p-1] > $idx[$p]; + my $q = $p or return; + push @idx, reverse splice @idx, $p; + ++$q while $idx[$p-1] > $idx[$q]; + @idx[$p-1,$q]=@idx[$q,$p-1]; + } + } + + permute {print"@_\n"} split; =head2 How do I sort an array by (anything)? @@ -1497,8 +1584,8 @@ If you need to sort on several fields, the following paradigm is useful. This can be conveniently combined with precalculation of keys as given above. -See the F artitcle article in the "Far More Than You Ever Wanted -To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for +See the F article in the "Far More Than You Ever Wanted +To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more about this approach. See also the question below on sorting hashes. @@ -1561,13 +1648,13 @@ Or use the CPAN module Bit::Vector: @ints = $vector->Index_List_Read(); Bit::Vector provides efficient methods for bit vector, sets of small integers -and "big int" math. +and "big int" math. Here's a more extensive illustration using vec(): # vec demo $vector = "\xff\x0f\xef\xfe"; - print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", + print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", unpack("N", $vector), "\n"; $is_set = vec($vector, 23, 1); print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n"; @@ -1587,7 +1674,7 @@ Here's a more extensive illustration using vec(): set_vec(0,32,17); set_vec(1,32,17); - sub set_vec { + sub set_vec { my ($offset, $width, $value) = @_; my $vector = ''; vec($vector, $offset, $width) = $value; @@ -1604,7 +1691,7 @@ Here's a more extensive illustration using vec(): print "vector length in bytes: ", length($vector), "\n"; @bytes = unpack("A8" x length($vector), $bits); print "bits are: @bytes\n\n"; - } + } =head2 Why does defined() return true on empty arrays and hashes? @@ -1671,8 +1758,8 @@ use the keys() function in a scalar context: $num_keys = keys %hash; -The keys() function also resets the iterator, which means that you may -see strange results if you use this between uses of other hash operators +The keys() function also resets the iterator, which means that you may +see strange results if you use this between uses of other hash operators such as each(). =head2 How do I sort a hash (optionally by value instead of key)? @@ -1707,15 +1794,17 @@ The Tie::IxHash module from CPAN might also be instructive. =head2 What's the difference between "delete" and "undef" with hashes? -Hashes are pairs of scalars: the first is the key, the second is the -value. The key will be coerced to a string, although the value can be -any kind of scalar: string, number, or reference. If a key C<$key> is -present in the array, C will return true. The value for -a given key can be C, in which case C<$array{$key}> will be -C while C<$exists{$key}> will return true. This corresponds to -(C<$key>, C) being in the hash. +Hashes contain pairs of scalars: the first is the key, the +second is the value. The key will be coerced to a string, +although the value can be any kind of scalar: string, +number, or reference. If a key $key is present in +%hash, C will return true. The value +for a given key can be C, in which case +C<$hash{$key}> will be C while C +will return true. This corresponds to (C<$key>, C) +being in the hash. -Pictures help... here's the C<%ary> table: +Pictures help... here's the %hash table: keys values +------+------+ @@ -1727,16 +1816,16 @@ Pictures help... here's the C<%ary> table: And these conditions hold - $ary{'a'} is true - $ary{'d'} is false - defined $ary{'d'} is true - defined $ary{'a'} is true - exists $ary{'a'} is true (Perl5 only) - grep ($_ eq 'a', keys %ary) is true + $hash{'a'} is true + $hash{'d'} is false + defined $hash{'d'} is true + defined $hash{'a'} is true + exists $hash{'a'} is true (Perl5 only) + grep ($_ eq 'a', keys %hash) is true If you now say - undef $ary{'a'} + undef $hash{'a'} your table now reads: @@ -1751,18 +1840,18 @@ your table now reads: and these conditions now hold; changes in caps: - $ary{'a'} is FALSE - $ary{'d'} is false - defined $ary{'d'} is true - defined $ary{'a'} is FALSE - exists $ary{'a'} is true (Perl5 only) - grep ($_ eq 'a', keys %ary) is true + $hash{'a'} is FALSE + $hash{'d'} is false + defined $hash{'d'} is true + defined $hash{'a'} is FALSE + exists $hash{'a'} is true (Perl5 only) + grep ($_ eq 'a', keys %hash) is true Notice the last two: you have an undef value, but a defined key! Now, consider this: - delete $ary{'a'} + delete $hash{'a'} your table now reads: @@ -1775,23 +1864,22 @@ your table now reads: and these conditions now hold; changes in caps: - $ary{'a'} is false - $ary{'d'} is false - defined $ary{'d'} is true - defined $ary{'a'} is false - exists $ary{'a'} is FALSE (Perl5 only) - grep ($_ eq 'a', keys %ary) is FALSE + $hash{'a'} is false + $hash{'d'} is false + defined $hash{'d'} is true + defined $hash{'a'} is false + exists $hash{'a'} is FALSE (Perl5 only) + grep ($_ eq 'a', keys %hash) is FALSE See, the whole entry is gone! =head2 Why don't my tied hashes make the defined/exists distinction? -They may or may not implement the EXISTS() and DEFINED() methods -differently. For example, there isn't the concept of undef with hashes -that are tied to DBM* files. This means the true/false tables above -will give different results when used on such a hash. It also means -that exists and defined do the same thing with a DBM* file, and what -they end up doing is not what they do with ordinary hashes. +This depends on the tied hash's implementation of EXISTS(). +For example, there isn't the concept of undef with hashes +that are tied to DBM* files. It also means that exists() and +defined() do the same thing with a DBM* file, and what they +end up doing is not what they do with ordinary hashes. =head2 How do I reset an each() operation part-way through? @@ -1837,11 +1925,11 @@ it on top of either DB_File or GDBM_File. Use the Tie::IxHash from CPAN. use Tie::IxHash; - tie(%myhash, Tie::IxHash); - for ($i=0; $i<20; $i++) { + tie my %myhash, 'Tie::IxHash'; + for (my $i=0; $i<20; $i++) { $myhash{$i} = 2*$i; } - @keys = keys %myhash; + my @keys = keys %myhash; # @keys = (0,1,2,3,...) =head2 Why does passing a subroutine an undefined element in a hash create it? @@ -1897,9 +1985,7 @@ this works fine (assuming the files are found): On less elegant (read: Byzantine) systems, however, you have to play tedious games with "text" versus "binary" files. See -L or L. Most of these ancient-thinking -systems are curses out of Microsoft, who seem to be committed to putting -the backward into backward compatibility. +L or L. If you're concerned about 8-bit ASCII data, then see L. @@ -1920,7 +2006,17 @@ Assuming that you don't care about IEEE notations like "NaN" or if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/) { print "a C float\n" } -If you're on a POSIX system, Perl's supports the C +There are also some commonly used modules for the task. +L (distributed with 5.8) provides access to perl's +internal function C for determining +whether a variable looks like a number. L +exports functions that validate data types using both the +above and other regular expressions. Thirdly, there is +C which has regular expressions to match +various types of numbers. Those three modules are available +from the CPAN. + +If you're on a POSIX system, Perl supports the C function. Its semantics are somewhat cumbersome, so here's a C wrapper function for more convenient access. This function takes a string and returns the number it found, or C for input that @@ -1938,14 +2034,14 @@ if you just want to say, ``Is this a float?'' return undef; } else { return $num; - } - } + } + } - sub is_numeric { defined getnum($_[0]) } + sub is_numeric { defined getnum($_[0]) } -Or you could check out the String::Scanf module on CPAN instead. The -POSIX module (part of the standard Perl distribution) provides the -C and C for converting strings to double and longs, +Or you could check out the L module on the CPAN +instead. The POSIX module (part of the standard Perl distribution) provides +the C and C for converting strings to double and longs, respectively. =head2 How do I keep persistent data across program calls? @@ -1956,20 +2052,21 @@ or Storable modules from CPAN. Starting from Perl 5.8 Storable is part of the standard distribution. Here's one example using Storable's C and C functions: - use Storable; + use Storable; store(\%hash, "filename"); - # later on... + # later on... $href = retrieve("filename"); # by ref %hash = %{ retrieve("filename") }; # direct to hash =head2 How do I print out or copy a recursive data structure? The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great -for printing out data structures. The Storable module, found on CPAN, -provides a function called C that recursively copies its argument. +for printing out data structures. The Storable module on CPAN (or the +5.8 release of Perl), provides a function called C that recursively +copies its argument. - use Storable qw(dclone); + use Storable qw(dclone); $r2 = dclone($r1); Where $r1 can be a reference to any kind of data structure you'd like.