X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq4.pod;h=22ea1ae5814fb300ba5b04db36d4c2761341caf6;hb=58103a2e295c15d87c7ce0bd8dd83d7e110adac4;hp=7c616aca3d28b7f89e785a792d2ddad199e2f6b1;hpb=49d635f9372392ae44fe4c5b62b06e41912ae0c9;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 7c616ac..22ea1ae 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.37 $, $Date: 2002/11/13 06:04:00 $) +perlfaq4 - Data Manipulation ($Revision: 1.73 $, $Date: 2005/12/31 00:54:37 $) =head1 DESCRIPTION @@ -22,12 +22,30 @@ representations and conversions. To limit the number of decimal places in your numbers, you can use the printf or sprintf function. See the -L for more details. +L<"Floating Point Arithmetic"|perlop> for more details. printf "%.2f", 10/3; - + my $number = sprintf "%.2f", 10/3; - + +=head2 Why is int() broken? + +Your int() is most probably working just fine. It's the numbers that +aren't quite what you think. + +First, see the above item "Why am I getting long decimals +(eg, 19.9499999999999) instead of the numbers I should be getting +(eg, 19.95)?". + +For example, this + + print int(0.6/0.2-2), "\n"; + +will in most computers print 0, not 1, because even such simple +numbers as 0.6 and 0.2 cannot be presented exactly by floating-point +numbers. What you think in the above as 'three' is really more like +2.9999999999999995559. + =head2 Why isn't my octal data interpreted correctly? Perl only understands octal and hex numbers as such when they occur as @@ -43,13 +61,13 @@ The inverse mapping from decimal to octal can be done with either the "%o" or "%O" sprintf() formats. This problem shows up most often when people try using chmod(), mkdir(), -umask(), or sysopen(), which by widespread tradition typically take +umask(), or sysopen(), which by widespread tradition typically take permissions in octal. chmod(644, $file); # WRONG chmod(0644, $file); # right -Note the mistake in the first line was specifying the decimal literal +Note the mistake in the first line was specifying the decimal literal 644, rather than the intended octal literal 0644. The problem can be seen with: @@ -57,7 +75,7 @@ be seen with: Surely you had not intended C - did you? If you want to use numeric literals as arguments to chmod() et al. then please -try to express them as octal constants, that is with a leading zero and +try to express them as octal constants, that is with a leading zero and with the following digits restricted to the set 0..7. =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions? @@ -94,7 +112,7 @@ alternation: for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i} - 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 + 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0 Don't blame Perl. It's the same as in C. IEEE says we have to do this. @@ -102,7 +120,7 @@ Perl numbers whose absolute values are integers under 2**31 (on 32 bit machines) will work pretty much like mathematical integers. Other numbers are not guaranteed. -=head2 How do I convert between numeric representations? +=head2 How do I convert between numeric representations/bases/radixes? As always with Perl there is more than one way to do it. Below are a few examples of approaches to making common conversions @@ -121,18 +139,15 @@ programmers the notation might be familiar. Using perl's built in conversion of 0x notation: - $int = 0xDEADBEEF; - $dec = sprintf("%d", $int); + $dec = 0xDEADBEEF; Using the hex function: - $int = hex("DEADBEEF"); - $dec = sprintf("%d", $int); + $dec = hex("DEADBEEF"); Using pack: - $int = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); - $dec = sprintf("%d", $int); + $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); Using the CPAN module Bit::Vector: @@ -142,15 +157,16 @@ Using the CPAN module Bit::Vector: =item How do I convert from decimal to hexadecimal -Using sprint: +Using sprintf: - $hex = sprintf("%X", 3735928559); + $hex = sprintf("%X", 3735928559); # upper case A-F + $hex = sprintf("%x", 3735928559); # lower case a-f -Using unpack +Using unpack: $hex = unpack("H*", pack("N", 3735928559)); -Using Bit::Vector +Using Bit::Vector: use Bit::Vector; $vec = Bit::Vector->new_Dec(32, -559038737); @@ -167,13 +183,11 @@ And Bit::Vector supports odd bit counts: Using Perl's built in conversion of numbers with leading zeros: - $int = 033653337357; # note the leading 0! - $dec = sprintf("%d", $int); + $dec = 033653337357; # note the leading 0! Using the oct function: - $int = oct("33653337357"); - $dec = sprintf("%d", $int); + $dec = oct("33653337357"); Using Bit::Vector: @@ -188,7 +202,7 @@ Using sprintf: $oct = sprintf("%o", 3735928559); -Using Bit::Vector +Using Bit::Vector: use Bit::Vector; $vec = Bit::Vector->new_Dec(32, -559038737); @@ -199,13 +213,18 @@ Using Bit::Vector Perl 5.6 lets you write binary numbers directly with the 0b notation: - $number = 0b10110110; + $number = 0b10110110; + +Using oct: -Using pack and ord + my $input = "10110110"; + $decimal = oct( "0b$input" ); + +Using pack and ord: $decimal = ord(pack('B8', '10110110')); -Using pack and unpack for larger strings +Using pack and unpack for larger strings: $int = unpack("N", pack("B32", substr("0" x 32 . "11110101011011011111011101111", -32))); @@ -220,7 +239,11 @@ Using Bit::Vector: =item How do I convert from decimal to binary -Using unpack; +Using sprintf (perl 5.6+): + + $bin = sprintf("%b", 3735928559); + +Using unpack: $bin = unpack("B*", pack("N", 3735928559)); @@ -316,7 +339,7 @@ Get the http://www.cpan.org/modules/by-module/Roman module. If you're using a version of Perl before 5.004, you must call C once at the start of your program to seed the random number generator. - BEGIN { srand() if $[ < 5.004 } + BEGIN { srand() if $] < 5.004 } 5.004 and later automatically call C at the beginning. Don't call C more than once---you make your numbers less random, rather @@ -326,22 +349,33 @@ Computers are good at being predictable and bad at being random (despite appearances caused by bugs in your programs :-). see the F article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of -Tom Phoenix, talks more about this. John von Neumann said, ``Anyone +Tom Phoenix, talks more about this. John von Neumann said, "Anyone who attempts to generate random numbers by deterministic means is, of -course, living in a state of sin.'' +course, living in a state of sin." If you want numbers that are more random than C with C provides, you should also check out the Math::TrulyRandom module from CPAN. It uses the imperfections in your system's timer to generate random numbers, but this takes quite a while. If you want a better pseudorandom generator than comes with your operating system, look at -``Numerical Recipes in C'' at http://www.nr.com/ . +"Numerical Recipes in C" at http://www.nr.com/ . =head2 How do I get a random number between X and Y? -Use the following simple function. It selects a random integer between -(and possibly including!) the two given integers, e.g., -C +C returns a number such that +C<< 0 <= rand($x) < $x >>. Thus what you want to have perl +figure out is a random number in the range from 0 to the +difference between your I and I. + +That is, to get a number between 10 and 15, inclusive, you +want a random number between 0 and 5 that you can then add +to 10. + + my $number = 10 + int rand( 15-10+1 ); + +Hence you derive the following simple function to abstract +that. It selects a random integer between the two given +integers (inclusive), For example: C. sub random_int_in ($$) { my($min, $max) = @_; @@ -353,31 +387,45 @@ C =head1 Data: Dates -=head2 How do I find the week-of-the-year/day-of-the-year? +=head2 How do I find the day or week of the year? + +The localtime function returns the day of the year. Without an +argument localtime uses the current time. -The day of the year is in the array returned by localtime() (see -L): + $day_of_year = (localtime)[7]; - $day_of_year = (localtime(time()))[7]; +The POSIX module can also format a date as the day of the year or +week of the year. + + use POSIX qw/strftime/; + my $day_of_year = strftime "%j", localtime; + my $week_of_year = strftime "%W", localtime; + +To get the day of year for any date, use the Time::Local module to get +a time in epoch seconds for the argument to localtime. + + use POSIX qw/strftime/; + use Time::Local; + my $week_of_year = strftime "%W", + localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) ); + +The Date::Calc module provides two functions to calculate these. + + use Date::Calc; + my $day_of_year = Day_of_Year( 1987, 12, 18 ); + my $week_of_year = Week_of_Year( 1987, 12, 18 ); =head2 How do I find the current century or millennium? Use the following simple functions: - sub get_century { + sub get_century { return int((((localtime(shift || time))[5] + 1999))/100); - } - sub get_millennium { - return 1+int((((localtime(shift || time))[5] + 1899))/1000); - } - -You can also use the POSIX strftime() function which may be a bit -slower but is easier to read and maintain. + } - use POSIX qw/strftime/; - - my $week_of_the_year = strftime "%W", localtime; - my $day_of_the_year = strftime "%j", localtime; + sub get_millennium { + return 1+int((((localtime(shift || time))[5] + 1899))/1000); + } On some systems, the POSIX module's strftime() function has been extended in a non-standard way to use a C<%C> format, @@ -388,15 +436,12 @@ reliably determine the current century or millennium. =head2 How can I compare two dates and find the difference? -If you're storing your dates as epoch seconds then simply subtract one -from the other. If you've got a structured date (distinct year, day, -month, hour, minute, seconds values), then for reasons of accessibility, -simplicity, and efficiency, merely use either timelocal or timegm (from -the Time::Local module in the standard distribution) to reduce structured -dates to epoch seconds. However, if you don't know the precise format of -your dates, then you should probably use either of the Date::Manip and -Date::Calc modules from CPAN before you go hacking up your own parsing -routine to handle arbitrary date formats. +(contributed by brian d foy) + +You could just store all your dates as a number and then subtract. Life +isn't always that simple though. If you want to work with formatted +dates, the Date::Manip, Date::Calc, or DateTime modules can help you. + =head2 How can I take a string and turn it into epoch seconds? @@ -407,83 +452,56 @@ and Date::Manip modules from CPAN. =head2 How can I find the Julian Day? -Use the Time::JulianDay module (part of the Time-modules bundle -available from CPAN.) - -Before you immerse yourself too deeply in this, be sure to verify that -it is the I Day you really want. Are you interested in a way -of getting serial days so that you just can tell how many days they -are apart or so that you can do also other date arithmetic? If you -are interested in performing date arithmetic, this can be done using -modules Date::Manip or Date::Calc. - -There is too many details and much confusion on this issue to cover in -this FAQ, but the term is applied (correctly) to a calendar now -supplanted by the Gregorian Calendar, with the Julian Calendar failing -to adjust properly for leap years on centennial years (among other -annoyances). The term is also used (incorrectly) to mean: [1] days in -the Gregorian Calendar; and [2] days since a particular starting time -or `epoch', usually 1970 in the Unix world and 1980 in the -MS-DOS/Windows world. If you find that it is not the first meaning -that you really want, then check out the Date::Manip and Date::Calc -modules. (Thanks to David Cassell for most of this text.) +(contributed by brian d foy and Dave Cross) + +You can use the Time::JulianDay module available on CPAN. Ensure that +you really want to find a Julian day, though, as many people have +different ideas about Julian days. See +http://www.hermetic.ch/cal_stud/jdn.htm for instance. + +You can also try the DateTime module, which can convert a date/time +to a Julian Day. + + $ perl -MDateTime -le'print DateTime->today->jd' + 2453401.5 + +Or the modified Julian Day + + $ perl -MDateTime -le'print DateTime->today->mjd' + 53401 + +Or even the day of the year (which is what some people think of as a +Julian day) + + $ perl -MDateTime -le'print DateTime->today->doy' + 31 =head2 How do I find yesterday's date? -If you only need to find the date (and not the same time), you -can use the Date::Calc module. +(contributed by brian d foy) - use Date::Calc qw(Today Add_Delta_Days); - - my @date = Add_Delta_Days( Today(), -1 ); - - print "@date\n"; +Use one of the Date modules. The C module makes it simple, and +give you the same time of day, only the day before. -Most people try to use the time rather than the calendar to -figure out dates, but that assumes that your days are -twenty-four hours each. For most people, there are two days -a year when they aren't: the switch to and from summer time -throws this off. Russ Allbery offers this solution. - - sub yesterday { - my $now = defined $_[0] ? $_[0] : time; - my $then = $now - 60 * 60 * 24; - my $ndst = (localtime $now)[8] > 0; - my $tdst = (localtime $then)[8] > 0; - $then - ($tdst - $ndst) * 60 * 60; - } - -Should give you "this time yesterday" in seconds since epoch relative to -the first argument or the current time if no argument is given and -suitable for passing to localtime or whatever else you need to do with -it. $ndst is whether we're currently in daylight savings time; $tdst is -whether the point 24 hours ago was in daylight savings time. If $tdst -and $ndst are the same, a boundary wasn't crossed, and the correction -will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more -from yesterday's time since we gained an extra hour while going off -daylight savings time. If $tdst is 0 and $ndst is 1, subtract a -negative hour (add an hour) to yesterday's time since we lost an hour. - -All of this is because during those days when one switches off or onto -DST, a "day" isn't 24 hours long; it's either 23 or 25. - -The explicit settings of $ndst and $tdst are necessary because localtime -only says it returns the system tm struct, and the system tm struct at -least on Solaris doesn't guarantee any particular positive value (like, -say, 1) for isdst, just a positive value. And that value can -potentially be negative, if DST information isn't available (this sub -just treats those cases like no DST). - -Note that between 2am and 3am on the day after the time zone switches -off daylight savings time, the exact hour of "yesterday" corresponding -to the current hour is not clearly defined. Note also that if used -between 2am and 3am the day after the change to daylight savings time, -the result will be between 3am and 4am of the previous day; it's -arguable whether this is correct. - -This sub does not attempt to deal with leap seconds (most things don't). + use DateTime; + + my $yesterday = DateTime->now->subtract( days => 1 ); + + print "Yesterday was $yesterday\n"; + +You can also use the C module using its Today_and_Now +function. + use Date::Calc qw( Today_and_Now Add_Delta_DHMS ); + my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 ); + + print "@date\n"; + +Most people try to use the time rather than the calendar to figure out +dates, but that assumes that days are twenty-four hours each. For +most people, there are two days a year when they aren't: the switch to +and from summer time throws this off. Let the modules do the work. =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? @@ -511,21 +529,28 @@ C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00 That doesn't mean that Perl can't be used to create non-Y2K compliant programs. It can. But so can your pencil. It's the fault of the user, -not the language. At the risk of inflaming the NRA: ``Perl doesn't -break Y2K, people do.'' See http://language.perl.com/news/y2k.html for +not the language. At the risk of inflaming the NRA: "Perl doesn't +break Y2K, people do." See http://www.perl.org/about/y2k.html for a longer exposition. =head1 Data: Strings =head2 How do I validate input? -The answer to this question is usually a regular expression, perhaps -with auxiliary logic. See the more specific questions (numbers, mail -addresses, etc.) for details. +(contributed by brian d foy) + +There are many ways to ensure that values are what you expect or +want to accept. Besides the specific examples that we cover in the +perlfaq, you can also look at the modules with "Assert" and "Validate" +in their names, along with other modules such as C. + +Some modules have validation for particular types of input, such +as C, C, C, +and C. =head2 How do I unescape a string? -It depends just what you mean by ``escape''. URL escapes are dealt +It depends just what you mean by "escape". URL escapes are dealt with in L. Shell escapes with the backslash (C<\>) character are removed with @@ -535,24 +560,69 @@ This won't expand C<"\n"> or C<"\t"> or any other special escapes. =head2 How do I remove consecutive pairs of characters? -To turn C<"abbcccd"> into C<"abccd">: +(contributed by brian d foy) - s/(.)\1/$1/g; # add /s to include newlines +You can use the substitution operator to find pairs of characters (or +runs of characters) and replace them with a single instance. In this +substitution, we find a character in C<(.)>. The memory parentheses +store the matched character in the back-reference C<\1> and we use +that to require that the same thing immediately follow it. We replace +that part of the string with the character in C<$1>. -Here's a solution that turns "abbcccd" to "abcd": + s/(.)\1/$1/g; - y///cs; # y == tr, but shorter :-) +We can also use the transliteration operator, C. In this +example, the search list side of our C contains nothing, but +the C option complements that so it contains everything. The +replacement list also contains nothing, so the transliteration is +almost a no-op since it won't do any replacements (or more exactly, +replace the character with itself). However, the C option squashes +duplicated and consecutive characters in the string so a character +does not show up next to itself + + my $str = 'Haarlem'; # in the Netherlands + $str =~ tr///cs; # Now Harlem, like in New York =head2 How do I expand function calls in a string? -This is documented in L. In general, this is fraught with -quoting and readability problems, but it is possible. To interpolate -a subroutine call (in list context) into a string: +(contributed by brian d foy) + +This is documented in L, and although it's not the easiest +thing to read, it does work. In each of these examples, we call the +function inside the braces used to dereference a reference. If we +have a more than one return value, we can construct and dereference an +anonymous array. In this case, we call the function in list context. + + print "The time values are @{ [localtime] }.\n"; + +If we want to call the function in scalar context, we have to do a bit +more work. We can really have any code we like inside the braces, so +we simply have to end with the scalar reference, although how you do +that is up to you, and you can use code inside the braces. - print "My sub returned @{[mysub(1,2,3)]} that time.\n"; + print "The time is ${\(scalar localtime)}.\n" -See also ``How can I expand variables in text strings?'' in this -section of the FAQ. + print "The time is ${ my $x = localtime; \$x }.\n"; + +If your function already returns a reference, you don't need to create +the reference yourself. + + sub timestamp { my $t = localtime; \$t } + + print "The time is ${ timestamp() }.\n"; + +The C module can also do a lot of magic for you. You can +specify a variable name, in this case C, to set up a tied hash that +does the interpolation for you. It has several other methods to do this +as well. + + use Interpolation E => 'eval'; + print "The time values are $E{localtime()}.\n"; + +In most cases, it is probably easier to simply use string concatenation, +which also forces scalar context. + + print "The time is " . localtime . ".\n"; =head2 How do I find matching/nesting anything? @@ -561,22 +631,23 @@ matter how complicated. To find something between two single characters, a pattern like C will get the intervening bits in $1. For multiple ones, then something more like C would be needed. But none of these deals with -nested patterns. For balanced expressions using C<(>, C<{>, C<[> -or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see -L. For other cases, you'll have to write a parser. +nested patterns. For balanced expressions using C<(>, C<{>, C<[> or +C<< < >> as delimiters, use the CPAN module Regexp::Common, or see +L. For other cases, you'll have to write a +parser. If you are serious about writing a parser, there are a number of modules or oddities that will make your life a lot easier. There are the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced; -and the byacc program. Starting from perl 5.8 the Text::Balanced -is part of the standard distribution. +and the byacc program. Starting from perl 5.8 the Text::Balanced is +part of the standard distribution. One simple destructive, inside-out approach that you might try is to pull out the smallest nesting parts one at a time: while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 - } + } A more complicated and sneaky approach is to make Perl's regular expression engine do it for you. This is courtesy Dean Inada, and @@ -635,7 +706,7 @@ capabilities. You can access the first characters of a string with substr(). To get the first character, for example, start at position 0 -and grab the string of length 1. +and grab the string of length 1. $string = "Just another Perl Hacker"; @@ -645,11 +716,11 @@ To change part of a string, you can use the optional fourth argument which is the replacement string. substr( $string, 13, 4, "Perl 5.8.0" ); - + You can also use substr() as an lvalue. substr( $string, 13, 4 ) = "Perl 5.8.0"; - + =head2 How do I change the Nth occurrence of something? You have to keep track of N yourself. For example, let's say you want @@ -741,10 +812,23 @@ case", but that's not quite accurate. Consider the proper capitalization of the movie I, for example. +Damian Conway's L module provides some smart +case transformations: + + use Text::Autoformat; + my $x = "Dr. Strangelove or: How I Learned to Stop ". + "Worrying and Love the Bomb"; + + print $x, "\n"; + for my $style (qw( sentence title highlight )) + { + print autoformat($x, { case => $style }), "\n"; + } + =head2 How can I split a [character] delimited string except when inside [character]? Several modules can handle this sort of pasing---Text::Balanced, -Text::CVS, Text::CVS_XS, and Text::ParseWords, among others. +Text::CSV, Text::CSV_XS, and Text::ParseWords, among others. Take the example case of trying to split a string that is comma-separated into its different fields. You can't use C @@ -754,7 +838,7 @@ example, take a data line like this: SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" Due to the restriction of the quotes, this is a fairly complex -problem. Thankfully, we have Jeffrey Friedl, author of +problem. Thankfully, we have Jeffrey Friedl, author of I, to handle these for us. He suggests (assuming your string is contained in $text): @@ -780,39 +864,54 @@ There's also a Text::CSV (Comma-Separated Values) module on CPAN. =head2 How do I strip blank space from the beginning/end of a string? -Although the simplest approach would seem to be +(contributed by brian d foy) - $string =~ s/^\s*(.*?)\s*$/$1/; +A substitution can do this for you. For a single line, you want to +replace all the leading or trailing whitespace with nothing. You +can do that with a pair of substitutions. -not only is this unnecessarily slow and destructive, it also fails with -embedded newlines. It is much faster to do this operation in two steps: + s/^\s+//; + s/\s+$//; - $string =~ s/^\s+//; - $string =~ s/\s+$//; +You can also write that as a single substitution, although it turns +out the combined statement is slower than the separate ones. That +might not matter to you, though. -Or more nicely written as: + s/^\s+|\s+$//g; - for ($string) { - s/^\s+//; - s/\s+$//; - } +In this regular expression, the alternation matches either at the +beginning or the end of the string since the anchors have a lower +precedence than the alternation. With the C flag, the substitution +makes all possible matches, so it gets both. Remember, the trailing +newline matches the C<\s+>, and the C<$> anchor can match to the +physical end of the string, so the newline disappears too. Just add +the newline to the output, which has the added benefit of preserving +"blank" (consisting entirely of whitespace) lines which the C<^\s+> +would remove all by itself. + + while( <> ) + { + s/^\s+|\s+$//g; + print "$_\n"; + } -This idiom takes advantage of the C loop's aliasing -behavior to factor out common code. You can do this -on several strings at once, or arrays, or even the -values of a hash if you use a slice: +For a multi-line string, you can apply the regular expression +to each logical line in the string by adding the C flag (for +"multi-line"). With the C flag, the C<$> matches I an +embedded newline, so it doesn't remove it. It still removes the +newline at the end of the string. - # trim whitespace in the scalar, the array, - # and all the values in the hash - foreach ($scalar, @array, @hash{keys %hash}) { - s/^\s+//; - s/\s+$//; - } + $string =~ s/^\s+|\s+$//gm; -=head2 How do I pad a string with blanks or pad a number with zeroes? +Remember that lines consisting entirely of whitespace will disappear, +since the first part of the alternation can match the entire string +and replace it with nothing. If need to keep embedded blank lines, +you have to do a little more work. Instead of matching any whitespace +(since that includes a newline), just match the other whitespace. + + $string =~ s/^[\t\f ]+|[\t\f ]+$//mg; -(This answer contributed by Uri Guttman, with kibitzing from -Bart Lateur.) +=head2 How do I pad a string with blanks or pad a number with zeroes? In the following examples, C<$pad_len> is the length to which you wish to pad the string, C<$text> or C<$num> contains the string to be padded, @@ -828,13 +927,16 @@ right with blanks and it will truncate the result to a maximum length of C<$pad_len>. # Left padding a string with blanks (no truncation): - $padded = sprintf("%${pad_len}s", $text); + $padded = sprintf("%${pad_len}s", $text); + $padded = sprintf("%*s", $pad_len, $text); # same thing # Right padding a string with blanks (no truncation): - $padded = sprintf("%-${pad_len}s", $text); + $padded = sprintf("%-${pad_len}s", $text); + $padded = sprintf("%-*s", $pad_len, $text); # same thing - # Left padding a number with 0 (no truncation): - $padded = sprintf("%0${pad_len}d", $num); + # Left padding a number with 0 (no truncation): + $padded = sprintf("%0${pad_len}d", $num); + $padded = sprintf("%0*d", $pad_len, $num); # same thing # Right padding a string with blanks using pack (will truncate): $padded = pack("A$pad_len",$text); @@ -857,19 +959,19 @@ Left and right padding with any character, modifying C<$text> directly: =head2 How do I extract selected columns from a string? Use substr() or unpack(), both documented in L. -If you prefer thinking in terms of columns instead of widths, +If you prefer thinking in terms of columns instead of widths, you can use this kind of thing: # determine the unpack format needed to split Linux ps output # arguments are cut columns my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); - sub cut2fmt { + sub cut2fmt { my(@positions) = @_; my $template = ''; my $lastpos = 1; for my $place (@positions) { - $template .= "A" . ($place - $lastpos) . " "; + $template .= "A" . ($place - $lastpos) . " "; $lastpos = $place; } $template .= "A*"; @@ -878,50 +980,43 @@ you can use this kind of thing: =head2 How do I find the soundex value of a string? -Use the standard Text::Soundex module distributed with Perl. -Before you do so, you may want to determine whether `soundex' is in -fact what you think it is. Knuth's soundex algorithm compresses words -into a small space, and so it does not necessarily distinguish between -two words which you might want to appear separately. For example, the -last names `Knuth' and `Kant' are both mapped to the soundex code K530. -If Text::Soundex does not do what you are looking for, you might want -to consider the String::Approx module available at CPAN. +(contributed by brian d foy) + +You can use the Text::Soundex module. If you want to do fuzzy or close +matching, you might also try the String::Approx, and Text::Metaphone, +and Text::DoubleMetaphone modules. =head2 How can I expand variables in text strings? -Let's assume that you have a string like: +Let's assume that you have a string that contains placeholder +variables. $text = 'this has a $foo in it and a $bar'; -If those were both global variables, then this would -suffice: - - $text =~ s/\$(\w+)/${$1}/g; # no /e needed +You can use a substitution with a double evaluation. The +first /e turns C<$1> into C<$foo>, and the second /e turns +C<$foo> into its value. You may want to wrap this in an +C: if you try to get the value of an undeclared variable +while running under C, you get a fatal error. -But since they are probably lexicals, or at least, they could -be, you'd have to do this: - - $text =~ s/(\$\w+)/$1/eeg; - die if $@; # needed /ee, not /e + eval { $text =~ s/(\$\w+)/$1/eeg }; + die if $@; It's probably better in the general case to treat those variables as entries in some special hash. For example: - %user_defs = ( + %user_defs = ( foo => 23, bar => 19, ); $text =~ s/\$(\w+)/$user_defs{$1}/g; -See also ``How do I expand function calls in a string?'' in this section -of the FAQ. - =head2 What's wrong with always quoting "$vars"? The problem is that those double-quotes force stringification-- coercing numbers and references into strings--even when you don't want them to be strings. Think of it this way: double-quote -expansion is used to produce new strings. If you already +expansion is used to produce new strings. If you already have a string, why do you need more? If you get used to writing odd things like these: @@ -952,27 +1047,27 @@ that actually do care about the difference between a string and a number, such as the magical C<++> autoincrement operator or the syscall() function. -Stringification also destroys arrays. +Stringification also destroys arrays. @lines = `command`; print "@lines"; # WRONG - extra blanks print @lines; # right -=head2 Why don't my <EHERE documents work? Check for these three things: =over 4 -=item 1. There must be no space after the << part. +=item There must be no space after the EE part. -=item 2. There (probably) should be a semicolon at the end. +=item There (probably) should be a semicolon at the end. -=item 3. You can't (easily) have any space in front of the tag. +=item You can't (easily) have any space in front of the tag. =back -If you want to indent the text in the here document, you +If you want to indent the text in the here document, you can do this: # all in one @@ -982,7 +1077,7 @@ can do this: HERE_TARGET But the HERE_TARGET must still be flush against the margin. -If you want that indented also, you'll have to quote +If you want that indented also, you'll have to quote in the indentation. ($quote = <<' FINIS') =~ s/^\s+//gm; @@ -1077,64 +1172,56 @@ with @bad[0] = `same program that outputs several lines`; -The C pragma and the B<-w> flag will warn you about these +The C pragma and the B<-w> flag will warn you about these matters. =head2 How can I remove duplicate elements from a list or array? -There are several possible ways, depending on whether the array is -ordered and whether you wish to preserve the ordering. - -=over 4 - -=item a) - -If @in is sorted, and you want @out to be sorted: -(this assumes all true values in the array) +(contributed by brian d foy) - $prev = "not equal to $in[0]"; - @out = grep($_ ne $prev && ($prev = $_, 1), @in); +Use a hash. When you think the words "unique" or "duplicated", think +"hash keys". -This is nice in that it doesn't use much extra memory, simulating -uniq(1)'s behavior of removing only adjacent duplicates. The ", 1" -guarantees that the expression is true (so that grep picks it up) -even if the $_ is 0, "", or undef. +If you don't care about the order of the elements, you could just +create the hash then extract the keys. It's not important how you +create that hash: just that you use C to get the unique +elements. -=item b) + my %hash = map { $_, 1 } @array; + # or a hash slice: @hash{ @array } = (); + # or a foreach: $hash{$_} = 1 foreach ( @array ); -If you don't know whether @in is sorted: + my @unique = keys %hash; - undef %saw; - @out = grep(!$saw{$_}++, @in); +You can also go through each element and skip the ones you've seen +before. Use a hash to keep track. The first time the loop sees an +element, that element has no key in C<%Seen>. The C statement +creates the key and immediately uses its value, which is C, so +the loop continues to the C and increments the value for that +key. The next time the loop sees that same element, its key exists in +the hash I the value for that key is true (since it's not 0 or +undef), so the next skips that iteration and the loop goes to the next +element. -=item c) + my @unique = (); + my %seen = (); -Like (b), but @in contains only small integers: - - @out = grep(!$saw[$_]++, @in); - -=item d) - -A way to do (b) without any loops or greps: - - undef %saw; - @saw{@in} = (); - @out = sort keys %saw; # remove sort if undesired - -=item e) - -Like (d), but @in contains only small positive integers: - - undef @ary; - @ary[@in] = @in; - @out = grep {defined} @ary; + foreach my $elem ( @array ) + { + next if $seen{ $elem }++; + push @unique, $elem; + } -=back +You can write this more briefly using a grep, which does the +same thing. -But perhaps you should have been using a hash all along, eh? + my %seen = (); + my @unique = grep { ! $seen{ $_ }++ } @array; =head2 How can I tell whether a certain element is contained in a list or array? +(portions of this answer contributed by Anno Siegel) + Hearing the word "in" is an Idication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't. @@ -1170,27 +1257,34 @@ quite a lot of space by using bit strings instead: Now check whether C is true for some C<$n>. -Please do not use +These methods guarantee fast individual tests but require a re-organization +of the original list or array. They only pay off if you have to test +multiple values against the same array. - ($is_there) = grep $_ eq $whatever, @array; +If you are testing only once, the standard module List::Util exports +the function C for this purpose. It works by stopping once it +finds the element. It's written in C for speed, and its Perl equivalant +looks like this subroutine: -or worse yet + sub first (&@) { + my $code = shift; + foreach (@_) { + return $_ if &{$code}(); + } + undef; + } - ($is_there) = grep /$whatever/, @array; +If speed is of little concern, the common idiom uses grep in scalar context +(which returns the number of items that passed its condition) to traverse the +entire list. This does have the benefit of telling you how many matches it +found, though. -These are slow (checks every element even if the first matches), -inefficient (same reason), and potentially buggy (what if there are -regex characters in $whatever?). If you're only testing once, then -use: + my $is_there = grep $_ eq $whatever, @array; - $is_there = 0; - foreach $elt (@array) { - if ($elt eq $elt_to_find) { - $is_there = 1; - last; - } - } - if ($is_there) { ... } +If you want to actually extract the matching elements, simply use grep in +list context. + + my @matches = grep $_ eq $whatever, @array; =head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays? @@ -1233,8 +1327,8 @@ like this one. It uses the CPAN module FreezeThaw: @a = @b = ( "this", "that", [ "more", "stuff" ] ); printf "a and b contain %s arrays\n", - cmpStr(\@a, \@b) == 0 - ? "the same" + cmpStr(\@a, \@b) == 0 + ? "the same" : "different"; This approach also works for comparing hashes. Here @@ -1244,7 +1338,7 @@ we'll demonstrate two different answers: %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); $a{EXTRA} = \%b; - $b{EXTRA} = \%a; + $b{EXTRA} = \%a; printf "a and b contain %s hashes\n", cmpStr(\%a, \%b) == 0 ? "the same" : "different"; @@ -1264,32 +1358,32 @@ use the first() function in the List::Util module, which comes with Perl 5.8. This example finds the first element that contains "Perl". use List::Util qw(first); - + my $element = first { /Perl/ } @array; - + If you cannot use List::Util, you can make your own loop to do the same thing. Once you find the element, you stop the loop with last. my $found; - foreach my $element ( @array ) + foreach ( @array ) { - if( /Perl/ ) { $found = $element; last } + if( /Perl/ ) { $found = $_; last } } If you want the array index, you can iterate through the indices and check the array element at each index until you find one that satisfies the condition. - my( $found, $i ) = ( undef, -1 ); - for( $i = 0; $i < @array; $i++ ) - { - if( $array[$i] =~ /Perl/ ) - { - $found = $array[$i]; - $index = $i; - last; - } - } + my( $found, $index ) = ( undef, -1 ); + for( $i = 0; $i < @array; $i++ ) + { + if( $array[$i] =~ /Perl/ ) + { + $found = $array[$i]; + $index = $i; + last; + } + } =head2 How do I handle linked lists? @@ -1362,7 +1456,7 @@ If not, you can use a Fisher-Yates shuffle. sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array my $i = @$deck; - while ($i--) { + while (--$i) { my $j = int rand ($i+1); @$deck[$i,$j] = @$deck[$j,$i]; } @@ -1398,17 +1492,17 @@ this until you have rather largish arrays. Use C/C: for (@lines) { - s/foo/bar/; # change that word - y/XZ/ZX/; # swap those letters + s/foo/bar/; # change that word + tr/XZ/ZX/; # swap those letters } Here's another; let's compute spherical volumes: for (@volumes = @radii) { # @volumes has changed parts - $_ **= 3; - $_ *= (4/3) * 3.14159; # this will be constant folded + $_ **= 3; + $_ *= (4/3) * 3.14159; # this will be constant folded } - + which can also be done with map() which is made to transform one list into another: @@ -1420,7 +1514,7 @@ the values are not copied, so if you modify $orbit (in this case), you modify the value. for $orbit ( values %orbits ) { - ($orbit **= 3) *= (4/3) * 3.14159; + ($orbit **= 3) *= (4/3) * 3.14159; } Prior to perl 5.6 C returned copies of the values, @@ -1432,16 +1526,11 @@ the hash is to be modified. Use the rand() function (see L): - # at the top of the program: - srand; # not needed for 5.004 and later - - # then later on $index = rand @array; $element = $array[$index]; -Make sure you I. -If you are calling it more than once (such as before each -call to rand), you're almost certainly doing something wrong. +Or, simply: + my $element = $array[ rand @array ]; =head2 How do I permute N elements of a list? @@ -1456,6 +1545,14 @@ on CPAN). It's written in XS code and is very efficient. print "next permutation: (@perm)\n"; } +For even faster execution, you could do: + + use Algorithm::Permute; + my @array = 'a'..'d'; + Algorithm::Permute::permute { + print "next permutation: (@array)\n"; + } @array; + Here's a little program that generates all permutations of all the words on each line of input. The algorithm embodied in the permute() function is discussed in Volume 4 (still @@ -1521,7 +1618,7 @@ If you need to sort on several fields, the following paradigm is useful. This can be conveniently combined with precalculation of keys as given above. -See the F artitcle article in the "Far More Than You Ever Wanted +See the F article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more about this approach. @@ -1585,13 +1682,13 @@ Or use the CPAN module Bit::Vector: @ints = $vector->Index_List_Read(); Bit::Vector provides efficient methods for bit vector, sets of small integers -and "big int" math. +and "big int" math. Here's a more extensive illustration using vec(): # vec demo $vector = "\xff\x0f\xef\xfe"; - print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", + print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", unpack("N", $vector), "\n"; $is_set = vec($vector, 23, 1); print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n"; @@ -1611,7 +1708,7 @@ Here's a more extensive illustration using vec(): set_vec(0,32,17); set_vec(1,32,17); - sub set_vec { + sub set_vec { my ($offset, $width, $value) = @_; my $vector = ''; vec($vector, $offset, $width) = $value; @@ -1628,7 +1725,7 @@ Here's a more extensive illustration using vec(): print "vector length in bytes: ", length($vector), "\n"; @bytes = unpack("A8" x length($vector), $bits); print "bits are: @bytes\n\n"; - } + } =head2 Why does defined() return true on empty arrays and hashes? @@ -1652,19 +1749,15 @@ sorting the keys as shown in an earlier question. =head2 What happens if I add or remove keys from a hash while iterating over it? -Don't do that. :-) +(contributed by brian d foy) -[lwall] In Perl 4, you were not allowed to modify a hash at all while -iterating over it. In Perl 5 you can delete from it, but you still -can't add to it, because that might cause a doubling of the hash table, -in which half the entries get copied up to the new top half of the -table, at which point you've totally bamboozled the iterator code. -Even if the table doesn't double, there's no telling whether your new -entry will be inserted before or after the current iterator position. +The easy answer is "Don't do that!" -Either treasure up your changes and make them after the iterator finishes -or use keys to fetch all the old keys at once, and iterate over the list -of keys. +If you iterate through the hash with each(), you can delete the key +most recently returned without worrying about it. If you delete or add +other keys, the iterator may skip or double up on them since perl +may rearrange the hash table. See the +entry for C in L. =head2 How do I look up a hash element by value? @@ -1695,33 +1788,56 @@ use the keys() function in a scalar context: $num_keys = keys %hash; -The keys() function also resets the iterator, which means that you may -see strange results if you use this between uses of other hash operators +The keys() function also resets the iterator, which means that you may +see strange results if you use this between uses of other hash operators such as each(). =head2 How do I sort a hash (optionally by value instead of key)? -Internally, hashes are stored in a way that prevents you from imposing -an order on key-value pairs. Instead, you have to sort a list of the -keys or values: +(contributed by brian d foy) + +To sort a hash, start with the keys. In this example, we give the list of +keys to the sort function which then compares them ASCIIbetically (which +might be affected by your locale settings). The output list has the keys +in ASCIIbetical order. Once we have the keys, we can go through them to +create a report which lists the keys in ASCIIbetical order. + + my @keys = sort { $a cmp $b } keys %hash; + + foreach my $key ( @keys ) + { + printf "%-20s %6d\n", $key, $hash{$value}; + } + +We could get more fancy in the C block though. Instead of +comparing the keys, we can compute a value with them and use that +value as the comparison. + +For instance, to make our report order case-insensitive, we use +the C<\L> sequence in a double-quoted string to make everything +lowercase. The C block then compares the lowercased +values to determine in which order to put the keys. - @keys = sort keys %hash; # sorted by key - @keys = sort { - $hash{$a} cmp $hash{$b} - } keys %hash; # and by value + my @keys = sort { "\L$a" cmp "\L$b" } keys %hash; -Here we'll do a reverse numeric sort by value, and if two keys are -identical, sort by length of key, or if that fails, by straight ASCII -comparison of the keys (well, possibly modified by your locale--see -L). +Note: if the computation is expensive or the hash has many elements, +you may want to look at the Schwartzian Transform to cache the +computation results. - @keys = sort { - $hash{$b} <=> $hash{$a} - || - length($b) <=> length($a) - || - $a cmp $b - } keys %hash; +If we want to sort by the hash value instead, we use the hash key +to look it up. We still get out a list of keys, but this time they +are ordered by their value. + + my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash; + +From there we can get more complex. If the hash values are the same, +we can provide a secondary sort on the hash key. + + my @keys = sort { + $hash{$a} <=> $hash{$b} + or + "\L$a" cmp "\L$b" + } keys %hash; =head2 How can I always keep my hash sorted? @@ -1862,7 +1978,7 @@ it on top of either DB_File or GDBM_File. Use the Tie::IxHash from CPAN. use Tie::IxHash; - tie my %myhash, Tie::IxHash; + tie my %myhash, 'Tie::IxHash'; for (my $i=0; $i<20; $i++) { $myhash{$i} = 2*$i; } @@ -1906,8 +2022,18 @@ in L. =head2 How can I use a reference as a hash key? -You can't do this directly, but you could use the standard Tie::RefHash -module distributed with Perl. +(contributed by brian d foy) + +Hash keys are strings, so you can't really use a reference as the key. +When you try to do that, perl turns the reference into its stringified +form (for instance, C). From there you can't get back +the reference from the stringified form, at least without doing some +extra work on your own. Also remember that hash keys must be unique, but +two different variables can store the same reference (and those variables +can change later). + +The Tie::RefHash module, which is distributed with perl, might be what +you want. It handles that extra work. =head1 Data: Misc @@ -1943,18 +2069,22 @@ Assuming that you don't care about IEEE notations like "NaN" or if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/) { print "a C float\n" } -You can also use the L module on -the CPAN, which exports functions that validate data types -using these and other regular expressions, or you can use -the C module from CPAN which has regular -expressions to match various types of numbers. - -If you're on a POSIX system, Perl's supports the C +There are also some commonly used modules for the task. +L (distributed with 5.8) provides access to perl's +internal function C for determining +whether a variable looks like a number. L +exports functions that validate data types using both the +above and other regular expressions. Thirdly, there is +C which has regular expressions to match +various types of numbers. Those three modules are available +from the CPAN. + +If you're on a POSIX system, Perl supports the C function. Its semantics are somewhat cumbersome, so here's a C wrapper function for more convenient access. This function takes a string and returns the number it found, or C for input that isn't a C float. The C function is a front end to C -if you just want to say, ``Is this a float?'' +if you just want to say, "Is this a float?" sub getnum { use POSIX qw(strtod); @@ -1967,12 +2097,12 @@ if you just want to say, ``Is this a float?'' return undef; } else { return $num; - } - } + } + } - sub is_numeric { defined getnum($_[0]) } + sub is_numeric { defined getnum($_[0]) } -Or you could check out the L module on the CPAN +Or you could check out the L module on the CPAN instead. The POSIX module (part of the standard Perl distribution) provides the C and C for converting strings to double and longs, respectively. @@ -1985,20 +2115,21 @@ or Storable modules from CPAN. Starting from Perl 5.8 Storable is part of the standard distribution. Here's one example using Storable's C and C functions: - use Storable; + use Storable; store(\%hash, "filename"); - # later on... + # later on... $href = retrieve("filename"); # by ref %hash = %{ retrieve("filename") }; # direct to hash =head2 How do I print out or copy a recursive data structure? The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great -for printing out data structures. The Storable module, found on CPAN, -provides a function called C that recursively copies its argument. +for printing out data structures. The Storable module on CPAN (or the +5.8 release of Perl), provides a function called C that recursively +copies its argument. - use Storable qw(dclone); + use Storable qw(dclone); $r2 = dclone($r1); Where $r1 can be a reference to any kind of data structure you'd like. @@ -2024,8 +2155,8 @@ the PDL module from CPAN instead--it makes number-crunching easy. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington. -All rights reserved. +Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and +other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.