From glenn@metis.com Tue Dec 22 19:38:53 1992
Received: from SAPIR.METIS.COM by dkuug.dk with SMTP id AA22817
  (5.65c8/IDA-1.4.4j for <i18n@dkuug.dk>); Wed, 23 Dec 1992 06:39:48 +0100
Received: by sapir.metis.com (4.1/METIS-4.10) id AA09301; Wed, 23 Dec 92 00:38:53 EST
Date: Wed, 23 Dec 92 00:38:53 EST
From: glenn@metis.com (Glenn Adams)
Message-Id: <9212230538.AA09301@sapir.metis.com>
To: keld@login.dkuug.dk
Cc: i18n@dkuug.dk, andrew@research.att.com
In-Reply-To: Keld J|rn Simonsen's message of Sun, 20 Dec 92 16:38:08 +0100 <9212201538.AA25436@login.dkuug.dk>
Subject: (i18n.179) plan 9 and 10646
X-Charset: ASCII
X-Char-Esc: 29


   >From: andrew@research.att.com
   >
   >firstly, we can answer what FEFF is. it is not a character as such
   >(in fact, it and FFEF are defined as never being characters).

Actually, FEFF *is* a character.  It is called ZERO-WIDTH NO-BREAK SPACE. 
It also may be used as a byte order mark.  The only codepoints of 10646BMP
defined as *not a character* are FFFE and FFFF.  [FFEF in Andrew's message
should be read as FFFE, the byte swapped form of FEFF, known both as
ZWNBSP and BOM.]

I append below the text of the Unicode 1.0.1 technical report which deals
with this a little more.

   B. Byte Order Mark
   U+FEFF       ZERO WIDTH NO-BREAK SPACE

   In addition to the meaning of BYTE ORDER MARK, as defined in Volume 1 of 
   the Unicode standard, the code value U+FEFF may now also be used as ZERO 
   WIDTH NO-BREAK SPACE (ZWNBSP). For convenience in discussion, it can 
   also be referred to by this name (which is the ISO 10646/Unicode 1.1 
   name for U+FEFF).

   ZWNBSP behaves like a U+00A0 NO-BREAK SPACE in that it indicates the 
   absence of word boundaries; however, ZWNBSP has no width. For example, 
   this character can be inserted after the fourth character in the text 
   "base+delta" to indicate that there should be no line break between the 
   "e" and the "+" (for more information, see Volume 2, pp. 6-7).


Glenn Adams



