Returns the character value of the first character in the string C<s>
which is assumed to be in UTF8 encoding and no longer than C<curlen>;
-C<retlen> will be set to the length, in bytes, of that character,
-and the pointer C<s> will be advanced to the end of the character.
+C<retlen> will be set to the length, in bytes, of that character.
If C<s> does not point to a well-formed UTF8 character, the behaviour
is dependent on the value of C<flags>: if it contains UTF8_CHECK_ONLY,
it is assumed that the caller will raise a warning, and this function
-will set C<retlen> to C<-1> and return zero. If the C<flags> does not
-contain UTF8_CHECK_ONLY, the UNICODE_REPLACEMENT (0xFFFD) will be
-returned, and C<retlen> will be set to the expected length of the
-UTF-8 character in bytes. The C<flags> can also contain various flags
-to allow deviations from the strict UTF-8 encoding (see F<utf8.h>).
+will silently just set C<retlen> to C<-1> and return zero. If the
+C<flags> does not contain UTF8_CHECK_ONLY, warnings about
+malformations will be given, C<retlen> will be set to the expected
+length of the UTF-8 character in bytes, and zero will be returned.
+
+The C<flags> can also contain various flags to allow deviations from
+the strict UTF-8 encoding (see F<utf8.h>).
=cut */
}
if (retlen)
- *retlen = expectlen;
+ *retlen = expectlen ? expectlen : len;
- return UNICODE_REPLACEMENT;
+ return 0;
}
/*
Returns the character value of the first character in the string C<s>
which is assumed to be in UTF8 encoding; C<retlen> will be set to the
-length, in bytes, of that character, and the pointer C<s> will be
-advanced to the end of the character.
+length, in bytes, of that character.
If C<s> does not point to a well-formed UTF8 character, zero is
returned and retlen is set, if possible, to -1.