From erik@sran8.sra.co.jp Fri Apr 19 19:51:20 1991
Received: from mcsun.EU.net by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA02007; Fri, 19 Apr 91 19:51:20 +0200
Received: from [133.137.4.3] by mcsun.EU.net with SMTP;
	id AA02935 (5.65a/CWI-2.83); Fri, 19 Apr 91 16:43:01 +0200
Received: from srava.sra.co.jp by srawgw.sra.co.jp (5.64WH/1.4)
	id AA27278; Fri, 19 Apr 91 23:42:47 +0900
Received: from sran8.sra.co.jp by srava.sra.co.jp (5.64b/6.4J.6-BJW)
	id AA05229; Fri, 19 Apr 91 23:41:43 +0900
Received: from localhost by sran8.sra.co.jp (4.0/6.4J.6-SJ)
	id AA06101; Fri, 19 Apr 91 23:38:25 JST
Return-Path: <erik@sran8.sra.co.jp>
Message-Id: <9104191438.AA06101@sran8.sra.co.jp>
Reply-To: erik@sra.co.jp
From: Erik M. van der Poel <erik@sra.co.jp>
To: i18n@dkuug.dk
Cc: tut@eng.sun.com
Subject: Re: shortcomings in XPG locale
Date: Fri, 19 Apr 91 23:38:21 +0900
Sender: erik@sran8.sra.co.jp
X-Charset: ASCII
X-Char-Esc: 29

> > > 1. There is no LC_BIDI database to store direction information.
> > > 	Did X/Open ever think about Hebrew and Arabic?
> > 
> > While we're at it, we might as well consider vertical printing, as is
> > sometimes used in Japan. So maybe we should call it LC_DIRECTION, or
> > LC_TEXTDIRECTION, or maybe just include it in LC_CTYPE?
> 
> Yes, LC_DIRECTION sounds like a good name.  And perhaps text direction
> really is related to character type.

Adding a new category may be a good idea if one wants to give the user
and/or application more control over the settings of each category. At
the POSIX level, each category can be set through environment
variables. At the C level, the setlocale() function can be called on
each category.

On the other hand, we may not want to add too many new categories,
since a large number of e.g. LC_* environment variables may be
confusing. I guess this is a kind of trade-off.

Also, I'm not sure that LC_DIRECTION can just be put in LC_CTYPE.
POSIX says that LC_CTYPE is for character classification and case
conversion, while C says that LC_CTYPE affects the behavior of the
character handling functions (isalpha, etc) and the multibyte
functions (mbtowc, etc). LC_DIRECTION is not really related to
character type, since e.g. Kanji can be printed horizontally *or*
vertically.

Come to think of it, isn't it strange that LC_CTYPE affects the
behavior of the multibyte functions? This would seem to indicate that
the character encoding depends on the locale, which may be true in
many locales, but there are also locales where the user may be using
e.g. one codeset in the terminal and a different codeset in the files,
or even several different codesets in remote filesystems.

We are trying to achieve codeset independence, but shouldn't this be
separated from the locale model? POSIX has done a lot about this
already, by putting the encoding in the charmap file, and by
specifying character *names* (rather than codes) in the locale
definitions.

On the other hand, the locale setting may be used to determine the
default codeset, assuming a different codeset only when explicitly
told to do so, e.g. in tagged data such as ISO 2022 with its escape
sequences. I.e. if the data does not contain an identifier, you look
at the locale setting to decide what codeset to assume.


Erik M. van der Poel                                      erik@sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692