From: Jarkko Hietaniemi Date: Sun, 16 Dec 2001 16:12:57 +0000 (+0000) Subject: Disallow also Unicode ranges 0xFDD0..0xFDEF and X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=3d401ffb834607a79a2bd57f63376985af7c39c2;p=p5sagit%2Fp5-mst-13.2.git Disallow also Unicode ranges 0xFDD0..0xFDEF and 0xFFFE..0xFFFF. Ranges 0x...FFFE..0x...FFFF in general, and characters beyond 0x10FFF should be disallowed, too, but some tests would need changing, but more importantly some APIs would need remodeling since one can easily generate such characters either by bitwise complements, tr complements, or v-strings. p4raw-id: //depot/perl@13722 --- diff --git a/pod/perldiag.pod b/pod/perldiag.pod index c10d56c..7a7661a 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3603,6 +3603,10 @@ Check the #! line, or manually feed your script into Perl yourself. (F) The unexec() routine failed for some reason. See your local FSF representative, who probably put it there in the first place. +=item Unicode character %s is illegal + +Certain Unicode characters have been designated off-limits by the +Unicode standard and should not be generated. =item Unknown BYTEORDER diff --git a/utf8.c b/utf8.c index 517b2e3..0979506 100644 --- a/utf8.c +++ b/utf8.c @@ -46,12 +46,15 @@ is the recommended Unicode-aware way of saying U8 * Perl_uvuni_to_utf8(pTHX_ U8 *d, UV uv) { + if (UNICODE_IS_SURROGATE(uv)) + Perl_croak(aTHX_ "UTF-16 surrogate 0x%04"UVxf, uv); + else if ((uv >= 0xFDD0 && uv <= 0xFDEF) || + (uv == 0xFFFE || uv == 0xFFFF)) + Perl_croak(aTHX_ "Unicode character 0x%04"UVxf" is illegal", uv); if (UNI_IS_INVARIANT(uv)) { *d++ = UTF_TO_NATIVE(uv); return d; } - if (UNICODE_IS_SURROGATE(uv)) - Perl_croak(aTHX_ "UTF-16 surrogate 0x%04"UVxf, uv); #if defined(EBCDIC) else { STRLEN len = UNISKIP(uv);