From keld@dkuug.dk Wed Feb 27 19:15:11 1991
Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8)
	id AA24106; Wed, 27 Feb 91 19:15:11 +0100
Date: Wed, 27 Feb 91 19:15:11 +0100
From: Keld J|rn Simonsen <keld@dkuug.dk>
Message-Id: <9102271815.AA24106@dkuug.dk>
To: i18n@dkuug.dk, wg14@dkuug.dk
Subject: Alain LaBonte' on ideogram naming
X-Charset: ASCII
X-Char-Esc: 29

I am forwarding this as it did not seem to make it to the wg14 and i18n lists.

keld
---
From ISO10646@JHUVM.BITNET Wed Feb 27 19:00:30 1991
Received: from danpost.uni-c.dk by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA23723; Wed, 27 Feb 91 18:59:17 +0100
Received: from vm.uni-c.dk by danpost.uni-c.dk (5.65/4.7)
	id AA16833; Wed, 27 Feb 91 17:59:28 GMT
Message-Id: <9102271759.AA16833@danpost.uni-c.dk>
Received: from vm.uni-c.dk by vm.uni-c.dk (IBM VM SMTP R1.2.2MX) with BSMTP id 5989; Wed, 27 Feb 91 19:01:02 DNT
Received: from SEARN.SUNET.SE by vm.uni-c.dk (Mailer R2.07) with BSMTP id 8486;
 Wed, 27 Feb 91 19:01:02 DNT
Received: from SEARN.BITNET by SEARN.SUNET.SE (Mailer R2.05) with BSMTP id
 7215; Wed, 27 Feb 91 19:02:12 +0100
Date:         Tue, 26 Feb 91 14:28:00 GMT
Reply-To: Multi-byte Code Issues <ISO10646@JHUVM.BITNET>
Sender: Multi-byte Code Issues <ISO10646@JHUVM.BITNET>
From: ALB <ALB%SEAS@liverpool.ac.uk>
Subject:      Naming of ideograms
To: Multiple recipients <BSMTP@LIST>
In-Reply-To:  Your subject -- Re: (wg14 46) Re: (i18n 80) Re: AT&T Bell Labs
              wishes for shorthand character na
X-Charset: US-DK
X-Char-Esc: 29
Status: RO

I don't think as I said earlier that naming ideograms would be a great idea
for convention communication. The retrieval of any shape's coding will
continue to be impossible with such a scheme for a foreigner (i.e. a German
wanting to code Kanji, a Japnaese wanting to code Hanzi or a Chinese wanting
to code Hanja...) particularly if Han unification was not achieved.

I proposed a method of identification of ideograms (and Western characters as
well) that would be based on the traditional way Chinese characters are
searched in a dictionary, i.e. independently of phonetics. This paper is
JTC1/SC2/WG3 N125. I will probably rewrite it to improve naming of Western
characters (and if I have time, to draw a list for 6937, if Johan does not do
it before me...) This naming scheme would be based on decomposition of radicals
in their number of strokes, and structured like a dictionary. From the shape
one would compose the identifier (or a very close approximation that can be
found in an alphanumerically ordered list) and retrieve coding (or codings if
Han unification is not achieved). An index given phonetics in pinyin or
hiragana or Korean phonetics could also be done at the same time, why not?

Of course, given the coding, the identifier is obvious. If the japanese wish
so (and Korean and Chinese as well), another index could be done by name,
in their own respective countries (National standards) but it will not be very
useful for international communications.

Of course also, in N125, I proposed to use Xi\and\ai H\any~u C/idi~an (Modern
Chinese Character Dictionary) as the basic reference. To avoid diplomatic
problems, it would be more advisable to include the full reference in the
standard, with the searching method, to make all this independent of any
particular countries' usages (The Chinese have added radicals to the traditional
ones). My idea about short ids is twofold: 1st have a shorter identifier than
names (some characters are identified by around 80 characters in 10646), and
nonambiguous (References to national standards for ideograms point to a lot of
identical names), and 2nd to remove any linguistic hegemony in this naming
business (the ids I propose won't have to be translated in other languages than
English, which would decrease efficiency: of course each person will translate
the identifier in his/her own language, but English will not have to be used
in other languages).

Just to give an idea about the naming problem for ideograms. Pinyin "YI" has
80 different characters. If tone diacritics are added, the average would be
around 20 characters per case. This would be non unique naming; adding numbers
to each case would still leave the problem of impossibility to find the
identifier and coding(s) for foreigners.
                             Alain LaBont/e

