From erik@sran8.sra.co.jp Mon Dec  9 03:29:46 1991
Received: from mcsun.EU.net by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA03007; Mon, 9 Dec 91 03:29:46 +0100
Received: from srawgw.sra.co.jp by mcsun.EU.net with SMTP;
	id AA13142 (5.65a/CWI-2.128); Mon, 9 Dec 1991 03:29:38 +0100
Received: from sranhd.sra.co.jp by srawgw.sra.co.jp (5.64WH/1.4)
	id AA06246; Mon, 9 Dec 91 11:28:26 +0900
Received: from sran8.sra.co.jp by sranhd.sra.co.jp (5.64dw/6.4J.6-BJW)
	id AA01023; Mon, 9 Dec 91 11:30:18 +0900
Received: from localhost by sran8.sra.co.jp (4.0/6.4J.6-SJX)
	id AA09952; Mon, 9 Dec 91 11:30:14 JST
Return-Path: <erik@sran8.sra.co.jp>
Message-Id: <9112090230.AA09952@sran8.sra.co.jp>
Reply-To: erik@sra.co.jp
From: erik@sra.co.jp (Erik M. van der Poel)
To: i18n@dkuug.dk, unicode@sun.com
Subject: Re: Locale specific data manipulation
Date: Mon, 09 Dec 91 11:30:08 +0900
Sender: erik@sran8.sra.co.jp
X-Charset: ASCII
X-Char-Esc: 29

Chang Hyeoungkyu writes:
> 	It is not possible to express Ideographic-to-Phonetic
> 	conversion information in the current "localedef".
> 
> 	We have two choices. The first is to have locale specific
> 	function. That is to say, if we call strcoll(), then it should
> 	call strcoll_ja_JP@phonetic_sort().
> 
> 	The second choice is that we extend the "localedef" to include
> 	this kind of information.

No, this is impossible. You cannot convert from Kanji (ideographic) to
Kana (phonetic). I'll give a real-life example: both "kokuritsu" and
"kunitachi" are written using the same Kanji. Therefore, you cannot
convert from these Kanji to the correct Kana unless the Kana are also
stored with the Kanji, as you describe below.


> 	In the Japanese case already described, one could retain the
> 	corresponding Kana of Kanji at input conversion time (I saw
> 	this in a letter of Glenn Adams distributed to
> 	unicode@sun.com). Such a tagged data handling would be more
> 	easy if we have locale specific routines.

Yes, of course. But you have to ensure that the correct Kana are
stored with the Kanji. This is not possible with the currently
available Kana-Kanji converters, though this is simply a matter of
updating their dictionaries.

I'll give another real-life example. I sometimes write email to a
colleague called Noto. However, Wnn's jserver cannot convert "noto" to
the correct Kanji. Instead, I have to write "noutou", and then convert
the "nou" and "tou" separately to get the right Kanji.

So, if the software automatically stored "noutou" together with the
Kanji for "Noto", it would be incorrect.

So I think it would be better to allow the user to input and store
both the Kana and the Kanji directly. The application can then sort
the Kanji by invoking strcoll() on the Kana, using ordinary POSIX
LC_COLLATE techniques. (I have already written a POSIX Kana LC_COLLATE
table. I can't give it to you, however.)

Hope this answers your (repeated) questions.


Sincerely,

Erik M. van der Poel                                      erik@sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692