From ALB@immedia.ca Sat Sep 20 08:49:00 1994
Received: from Clouso.CRIM.CA by dkuug.dk with SMTP id AA25160
  (5.65c8/IDA-1.4.4j for <i18n@dkuug.dk>); Tue, 20 Sep 1994 15:55:38 +0200
Received: from immedia.ca by clouso.crim.ca (4.1/SMI-4.1)
	id AA28622; Tue, 20 Sep 94 09:55:32 EDT
Return-Path: <ALB@immedia.ca>
Received: by immedia.ca (3.2/2.D)
        id AA27961; 20 Sep 94 13:51:28 -0500
Date: 20 Sep 94 13:49:00 -0500
From: ALB@immedia.ca
Message-Id: <199409201351.AA27961@immedia.ca>
To: i18n@dkuug.dk
Cc: cpwg-mail@revcan.ca
Subject: About the relativity of character identification
X-Charset: ASCII
X-Char-Esc: 29

----------
Copy of a note I just sent to ISO10646 list server, important in my opinion.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>The point is that G WITH CEDILLA is _not_ a valid or descriptive
>name for G MI/KSTINA/JUMS. G WITH COMMA BELOW is.
>
>>If multiple names were
>>allowed then adding a new one would not break any existing code.
>
>I wholeheartedly agree that a registry of aliases should exist.

I could not agree more. I am revising a Canadian standard now which must use a
stable identifier for characters. Because of this business of LETTER or
LIGATURE or JOINED DIGRAPH AE (and perhaps others, what about OE?) which is not
fixed yet, we have decided to use the Uxxxx[xxxx] (where xx is a hexadecimal
digit representing the code point of ISO/IEC 10646 used as a catalog number to
preserve coding independence) identifiers to relate the names we use in both
languages (French and English) to the right characters in the UCS. Furthermore
as we make our standards in 2 languages, we have at least already 2 sets of
names for each character and we will not publish double-size documents (think
about the discussion on ISO/IEC 10646 price) just because only one set is
official (which it has no reasonable right to be). Of course in a single
language in ISO standards names must relate between different standards (the
names of the English version of ISO/IEC 10646 must fit with the names of, say,
the English version of ISO/IEC 8859 [all parts], and the names of the French
version of ISO/IEC 10646 when published must fit with the names of, say, the
current or righted French versions of ISO/IEC 8859). Names remain the best
anchor point between different standards, provided the absolute of this remains
inside the documents themselves tied to the language in use. For the outside
world the anchor point should be the ISO/IEC 10646 "catalog numbers" which are
constituted by their UCS coding.

Two other projects of international standards that I know of will use
Uxxxx[xxxx] identifiers for another reason: the names are simply too long and
often won't fit on an Internet screen line (this creates syntax problems
everywhere and is in practice [for all those I know who have tried] not
universally applicable). Keld's system of short identifiers (provided there are
also aliases, which Keld agreed about in principle) is also of help to
complement Uxxxx[xxxx] scheme (but instead of standardizing the short names, we
use them as names of variables whose definition uses Uxxxx[xxxx])).

Perhaps we could use Uxxxx[xxxx] all the time to make sure different naming or
identification schemes all fit together. This would also help solving
interlanguage and intercultural problems like the one of Latvian-view names of
characters in English versus Turkish-view names also in English. Also in Canada
we have problems with names of natives' syllabic (unified) characters: in many
native languages, the names for a same character are quite different and we
can't even use the transliterated values as this would mean many
transliterations for one character: if we have to do this there shall be
aliases, no question about it.

Alain LaBont<e'>
Secr<e'>tariat du Conseil du tr<e'>sor
Gouvernement du Qu<e'>bec
