From: Jarkko Hietaniemi Date: Sat, 20 Jan 2001 20:15:30 +0000 (+0000) Subject: Document and test the new qu operator. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=49cb94c67d828cadfe8cac24ae5955cf752eb2df;p=p5sagit%2Fp5-mst-13.2.git Document and test the new qu operator. p4raw-id: //depot/perl@8485 --- diff --git a/MANIFEST b/MANIFEST index 49813e8..4269c3c 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1557,6 +1557,7 @@ t/op/pat.t See if esoteric patterns work t/op/pos.t See if pos works t/op/push.t See if push and pop work t/op/pwent.t See if getpw*() functions work +t/op/qu.t See if qu works t/op/quotemeta.t See if quotemeta works t/op/rand.t See if rand works t/op/range.t See if .. works diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 82086e3..9228fdb 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -96,8 +96,9 @@ than one place. =item Functions for SCALARs or strings C, C, C, C, C, C, C, C, -C, C, C, C, C, C, C, -C, C, C, C, C, C, C +C, C, C, C, C, C, C, +C, C, C, C, C, C, C, +C =item Regular expressions and pattern matching @@ -3469,10 +3470,12 @@ but is more efficient. Returns the new number of elements in the array. =item qr/STRING/ -=item qx/STRING/ +=item qu/STRING/ =item qw/STRING/ +=item qx/STRING/ + Generalized quotes. See L. =item quotemeta EXPR diff --git a/pod/perlop.pod b/pod/perlop.pod index 0bb506d..ebe52c5 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -645,6 +645,7 @@ any pair of delimiters you choose. Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes + qu{} Literal yes, Unicode `` qx{} Command yes (unless '' is delimiter) qw{} Word list no // m{} Pattern match yes (unless '' is delimiter) @@ -1011,6 +1012,44 @@ Options are: See L for additional information on valid syntax for STRING, and for a detailed look at the semantics of regular expressions. +=item qw/STRING/ + +Evaluates to a list of the words extracted out of STRING, using embedded +whitespace as the word delimiters. It can be understood as being roughly +equivalent to: + + split(' ', q/STRING/); + +the difference being that it generates a real list at compile time. So +this expression: + + qw(foo bar baz) + +is semantically equivalent to the list: + + 'foo', 'bar', 'baz' + +Some frequently seen examples: + + use POSIX qw( setlocale localeconv ) + @EXPORT = qw( foo bar baz ); + +A common mistake is to try to separate the words with comma or to +put comments into a multi-line C-string. For this reason, the +C pragma and the B<-w> switch (that is, the C<$^W> variable) +produces warnings if the STRING contains the "," or the "#" character. + +=item qu/STRING/ + +Like L but generates Unicode for characters whose code points are +greater than 128, or 0x80. Such characters can be generated using +the \xHH (for characters 0x80...0xff, or 128..255) and \x{HHH...} +notations (for characters 0x100..., or greater than 256). + +(In qq/STRING/, or "", both the \xHH and the \x{HHH...} generate +bytes for the 0x80..0xff range (these bytes are host-dependent), +and the \x{HHH...} can be used to generate Unicode.) + =item qx/STRING/ =item `STRING` @@ -1092,33 +1131,6 @@ Just understand what you're getting yourself into. See L<"I/O Operators"> for more discussion. -=item qw/STRING/ - -Evaluates to a list of the words extracted out of STRING, using embedded -whitespace as the word delimiters. It can be understood as being roughly -equivalent to: - - split(' ', q/STRING/); - -the difference being that it generates a real list at compile time. So -this expression: - - qw(foo bar baz) - -is semantically equivalent to the list: - - 'foo', 'bar', 'baz' - -Some frequently seen examples: - - use POSIX qw( setlocale localeconv ) - @EXPORT = qw( foo bar baz ); - -A common mistake is to try to separate the words with comma or to -put comments into a multi-line C-string. For this reason, the -C pragma and the B<-w> switch (that is, the C<$^W> variable) -produces warnings if the STRING contains the "," or the "#" character. - =item s/PATTERN/REPLACEMENT/egimosx Searches a string for a pattern, and if found, replaces that pattern diff --git a/pod/perlre.pod b/pod/perlre.pod index c5ecb13..0c38ac7 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -179,6 +179,7 @@ In addition, Perl defines the following: \X Match eXtended Unicode "combining character sequence", equivalent to C<(?:\PM\pM*)> \C Match a single C char (octet) even under utf8. + (Currently this does not work correctly.) A C<\w> matches a single alphanumeric character or C<_>, not a whole word. Use C<\w+> to match a string of Perl-identifier characters (which isn't diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 30a4482..b8bbc57 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -16,7 +16,8 @@ The following areas need further work. There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of -focus in the near future. +focus in the near future. Unfortunately it is unlikely that the Perl +5.6 and earlier will ever gain this capability. =item Regular Expressions @@ -66,7 +67,8 @@ or from literals and constants in the source text. If the C<-C> command line switch is used, (or the ${^WIDE_SYSTEM_CALLS} global flag is set to C<1>), all system calls will use the corresponding wide character APIs. This is currently only implemented -on Windows. +on Windows as other platforms do not have a unified way of handling +wide character APIs. Regardless of the above, the C pragma can always be used to force byte semantics in a particular lexical scope. See L. @@ -127,8 +129,7 @@ attempt to canonicalize variable names for you.) Regular expressions match characters instead of bytes. For instance, "." matches a character instead of a byte. (However, the C<\C> pattern -is provided to force a match a single byte ("C" in C, hence -C<\C>).) +is available to force a match a single byte ("C" in C, hence C<\C>).) =item * @@ -216,7 +217,10 @@ And finally, C reverses by character rather than by byte. =head2 Character encodings for input and output -[XXX: This feature is not yet implemented.] +This feature is in the process of getting implemented. + +(For Perl 5.6 and earlier the support is unlikely to get integrated +to the core language and some external module will be required.) =head1 CAVEATS diff --git a/t/op/qu.t b/t/op/qu.t new file mode 100644 index 0000000..2800204 --- /dev/null +++ b/t/op/qu.t @@ -0,0 +1,24 @@ +print "1..6\n"; + +my $foo = "foo"; + +print "not " unless qu(abc$foo) eq "abcfoo"; +print "ok 1\n"; + +# qu is always Unicode, even in EBCDIC, so \x41 is 'A' and \x{61} is 'a'. + +print "not " unless qu(abc\x41) eq "abcA"; +print "ok 2\n"; + +print "not " unless qu(abc\x{61}$foo) eq "abcafoo"; +print "ok 3\n"; + +print "not " unless qu(\x{41}\x{100}\x61\x{200}) eq "A\x{100}a\x{200}"; +print "ok 4\n"; + +print "not " unless join(" ", unpack("C*", qu(\x80))) eq "194 128"; +print "ok 5\n"; + +print "not " unless join(" ", unpack("C*", qu(\x{100}))) eq "196 128"; +print "ok 6\n"; +