From erik@sran8.sra.co.jp Mon Nov 19 07:20:48 1990
Received: from MCSUN.EU.NET by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA17110; Mon, 19 Nov 90 07:20:48 +0100
Received: by mcsun.EU.net with SMTP; Mon, 19 Nov 90 07:24:10 +0100
Received: from srava.sra.co.jp by srawgw.sra.co.jp (5.64WH/1.4)
	id AA23533; Mon, 19 Nov 90 15:23:16 +0900
Received: from sran8.sra.co.jp by srava.sra.co.jp (5.64b/6.4J.6-BJW)
	id AA19238; Mon, 19 Nov 90 15:23:19 +0900
Received: from localhost by sran8.sra.co.jp (4.0/6.4J.6-SJ)
	id AA25707; Mon, 19 Nov 90 15:21:54 JST
Return-Path: <erik@sran8.sra.co.jp>
Message-Id: <9011190622.AA25707@sran8.sra.co.jp>
Reply-To: erik@sra.co.jp
From: Erik M. van der Poel <erik@sra.co.jp>
To: glenn@ila.com
Cc: unicode@sun.com, i18n@dkuug.dk, arnet@hpda.cup.hp.com
Subject: Re: Han Character Code Ordering
Date: Mon, 19 Nov 90 15:21:50 +0900
Sender: erik@sran8.sra.co.jp
X-Charset: ASCII
X-Char-Esc: 29

>    From: Erik M. van der Poel <erik@sra.co.jp>
> 
>    String-based sorting is desirable because of the change in
>    pronunciation of a character when it is combined with other
>    characters. Example:
> 
> 	   KAZE		(1 character)	means "wind"
> 	   TAI FUU	(2 characters)	means "typhoon"
> 
>    Here, the KAZE and FUU are the same character. The implications of
>    this are staggering. Not only do we need a large dictionary with all
>    the different pronunciations, but we may in some cases also need to
>    parse sentences.
> 
> Alternatively, one could retain the yomi at input conversion time and
> annotate jiritsugo accordingly.  The annotation could be retained for
> cases where recovery would be difficult or impossible (unambiguously).
> Unfortunately, this will be impossible for most conversion interfaces
> which remove this structure.  I believe this is a good reason for
> demanding a richer conversion interface.
> 
> Glenn

Yes, I have often thought about this idea, and it seems like a good
idea, but I think there would be several problems.

1. Existing unannotated text may be difficult to reverse-convert,
   especially when it's ambiguous, as you say. So you can only use your
   idea on newly converted and annotated text.

2. In some cases, it is easier to convert to a particular Kanji by
   entering a different yomi. I.e. you can convert quickly if you enter a
   yomi that is unique so that you don't have to waste time
   disambiguating. Of course, this has a lot to do with the limitations
   of the input conversion systems of today. Nevertheless, this may be a
   problem.

3. If we are going to annotate text, we had better do it consistently, so
   that we can send each other text even if we use different input
   conversion systems. This is not a problem with the idea itself, but
   more with the actual implementation of this idea.


By the way, who or what is "arnet@hpda.cup.hp.com"? It'd be kinda nice
to know who I'm sending mail to! :-)


Erik

