From keld@dkuug.dk Tue Dec 11 15:32:04 1990
Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8)
	id AA01820; Tue, 11 Dec 90 15:32:04 +0100
Date: Tue, 11 Dec 90 15:32:04 +0100
From: Keld J|rn Simonsen <keld@dkuug.dk>
Message-Id: <9012111432.AA01820@dkuug.dk>
To: i18n@dkuug.dk, seki@sysrap.cs.fujitsu.co.jp
Subject: Re:  (i18n 43) Re: Japanese Profile (#420)
Cc: XoTGinter@xopen.co.uk
X-Charset: ASCII
X-Char-Esc: 29

Some comments to Sekiguchi-san's article:

> Answers and comments on the issue, as the ``author'' of the original.
> 
> [In "(i18n 33) Re: Japanese Profile"
>        Keld J|rn Simonsen <keld%dkuug.dk@uunet> writes:]
> 
> > Concerning the Japanese X/open locale:
> 
> This might be a minor issue, but please note that my definition
> is only an example, and it is not intended to be a part of any
> formal specification.  There are people who do not wat to call it
> ``Japanese locale'' or ``X/Open locale.''

Clear. Also the locales that Danish Standards have been providing
are only examples, and do not bear any formal weight.
I think though that we are trying to reach a level which is good
enough to be a formal specification, and we should discuss the
examples, that different sources have been so kind to provide us,
so that the examples should not be in confilct with some general
principles, and we eventually can turn the examples into official
locales.

> > X0208 as I know it does not allow to use undefined positions.
> 
> JIS X0208 has no definition on usage of those undefined positions.
> The standard has a KAISETSU (which is JIS counter part of ISO's
> ``informative annex'', I guess) saying: ``Subject to agreement
> between interchanging parties, these (undefined) area may be
> used by assigning characters, tmporarily or locally.''

OK, what I have got is the older JIS C 6226-1983 in the English
version obtained from ECMA, (which as far as I know
is technically equivalent to the older X0208 - but may differ from the
1990 X0208) and it says that it is not allowed to use undefined  
positions of this code.

Another problem with having symbolic names for the undefined
positions is: what do they mean? I think different symbols should not
have the same symbolic name. So if the same code is used on one machine
for a medical term and on the other machine for a chemical term, they
should not have the same name. And if the character is present in X0212
it should use the code from X0212. 

> > A third questionable item is the use of collating, first the 
> > capital letters then the lowercase letters. Is that common Japanese
> > usage?
> 
> Yes and no.
> 
> In real Japanese life, alphabets are collated in Western manner.
> In typical Japanese computers, alphabets in JIS X0208 is collated
> via their internal code value, i.e., all uppercases first, then
> lowercases.
> 
> That's what happening today in Japan.

Yes, I can understand that. This is in line with common UNIX usage.

I think it is debatable if we should follow this historic trend.

The reason why the example Danish locale is having upper and lowercase
letters together is that we have a Danish Standard DS 377 which 
precribes that ordering, and doing the work on locales on behalf of
Danish Standards we were obliged to follow DS 377. And I think we 
succeeded in doing that.

An ordinary UNIX user may however expect another ordering,
along the lines that Sekigushi-san has specified.
Maybe both possibilities should exist.

Keld Simonsen
