From: Jarkko Hietaniemi Date: Fri, 8 Dec 2000 16:33:39 +0000 (+0000) Subject: Do not return the Unicode replacement character if UTF-8 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=28d3d195b04c40a4811db0a74e089d561a53d924;p=p5sagit%2Fp5-mst-13.2.git Do not return the Unicode replacement character if UTF-8 decoding goes awry, it should be up to the caller to decide. p4raw-id: //depot/perl@8042 --- diff --git a/pod/perlapi.pod b/pod/perlapi.pod index 7fdc55e..cf7c5db 100644 --- a/pod/perlapi.pod +++ b/pod/perlapi.pod @@ -3279,11 +3279,13 @@ and the pointer C will be advanced to the end of the character. If C does not point to a well-formed UTF8 character, the behaviour is dependent on the value of C: if it contains UTF8_CHECK_ONLY, it is assumed that the caller will raise a warning, and this function -will set C to C<-1> and return zero. If the C does not -contain UTF8_CHECK_ONLY, the UNICODE_REPLACEMENT (0xFFFD) will be -returned, and C will be set to the expected length of the -UTF-8 character in bytes. The C can also contain various flags -to allow deviations from the strict UTF-8 encoding (see F). +will silently just set C to C<-1> and return zero. If the +C does not contain UTF8_CHECK_ONLY, warnings about +malformations will be given, C will be set to the expected +length of the UTF-8 character in bytes, and zero will be returned. + +The C can also contain various flags to allow deviations from +the strict UTF-8 encoding (see F). U8* s utf8_to_uv(STRLEN curlen, STRLEN *retlen, U32 flags) diff --git a/utf8.c b/utf8.c index e9c4386..8d812ab 100644 --- a/utf8.c +++ b/utf8.c @@ -189,11 +189,13 @@ and the pointer C will be advanced to the end of the character. If C does not point to a well-formed UTF8 character, the behaviour is dependent on the value of C: if it contains UTF8_CHECK_ONLY, it is assumed that the caller will raise a warning, and this function -will set C to C<-1> and return zero. If the C does not -contain UTF8_CHECK_ONLY, the UNICODE_REPLACEMENT (0xFFFD) will be -returned, and C will be set to the expected length of the -UTF-8 character in bytes. The C can also contain various flags -to allow deviations from the strict UTF-8 encoding (see F). +will silently just set C to C<-1> and return zero. If the +C does not contain UTF8_CHECK_ONLY, warnings about +malformations will be given, C will be set to the expected +length of the UTF-8 character in bytes, and zero will be returned. + +The C can also contain various flags to allow deviations from +the strict UTF-8 encoding (see F). =cut */ @@ -339,9 +341,9 @@ malformed: } if (retlen) - *retlen = expectlen; + *retlen = expectlen ? expectlen : len; - return UNICODE_REPLACEMENT; + return 0; } /*