From: Jarkko Hietaniemi <jhi@iki.fi>
Date: Sun, 11 Nov 2001 21:09:31 +0000 (+0000)
Subject: BOM, bom, Bom.
X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=042da322fd0da11e48625ad8cc61f221bb63e7f7;p=p5sagit%2Fp5-mst-13.2.git

BOM, bom, Bom.

p4raw-id: //depot/perl@12946
---

diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 13031ff..e374854 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -711,21 +711,25 @@ is UTF-16, but you don't know which endianness?  Byte Order Marks
 (BOMs) are a solution to this.  A special character has been reserved
 in Unicode to function as a byte order marker: the character with the
 code point 0xFEFF is the BOM.
+
 The trick is that if you read a BOM, you will know the byte order,
 since if it was written on a big endian platform, you will read the
 bytes 0xFE 0xFF, but if it was written on a little endian platform,
 you will read the bytes 0xFF 0xFE.  (And if the originating platform
 was writing in UTF-8, you will read the bytes 0xEF 0xBB 0xBF.)
+
 The way this trick works is that the character with the code point
 0xFFFE is guaranteed not to be a valid Unicode character, so the
 sequence of bytes 0xFF 0xFE is unambiguously "BOM, represented in
-little-endian format" and cannot be "0xFFFE, represented in
-big-endian format".
+little-endian format" and cannot be "0xFFFE, represented in big-endian
+format".
 
 =item UTF-32, UTF-32BE, UTF32-LE
 
 The UTF-32 family is pretty much like the UTF-16 family, expect that
-the units are 32-bit, and therefore the surrogate scheme is not needed.
+the units are 32-bit, and therefore the surrogate scheme is not
+needed.  The BOM signatures will be 0x00 0x00 0xFE 0xFF for BE and
+0xFF 0xFE 0x00 0x00 for LE.
 
 =item UCS-2, UCS-4