From chk@sol.kaist.ac.kr Wed Nov 13 18:14:38 1991
Received: from [143.248.1.1] by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA10888; Wed, 13 Nov 91 18:14:38 +0100
Received: from sol.kaist.ac.kr by daiduk.kaist.ac.kr (4.0/DAIDUK-MX-1.0)
	id AA03994; Thu, 14 Nov 91 02:18:43 KST
Return-Path: <chk@sol.kaist.ac.kr>
Received: by sol.kaist.ac.kr (4.12/08-14-86)
	id AA08623; Thu, 14 Nov 91 02:12:47 KST
From: chk@sol.kaist.ac.kr (Chang Hyeoungkyu)
Message-Id: <9111131712.AA08623@sol.kaist.ac.kr>
Subject: Locale Definition
To: XoJIG@xopen.co.uk
Date: Thu, 14 Nov 91 2:12:45 JST
Cc: i18n@dkuug.dk
X-Mailer: ELM [version 2.3 PL0]
X-Charset: ASCII
X-Char-Esc: 29

Dear members,

In the included memo, I want to say for the definition of locale.
Also I want to argue the need of locale and application specific
locale sensitive functions.

I'm cc'ing this to i18n mailing list. 

If I'm going to wrong direction, please let me know.
I'll greatly appreciate your comments.

Best Regards,
Chang Hyeoungkyu

-- 
Chang Hyeoungkyu  -  chk@sol.kaist.ac.kr
GUI Consortium / SA Lab., CS Dept., KAIST, KOREA

--------------------------------------------------------------------

Locale
------

The term 'locale' is defined as the combination of language, cultural
data and coded character set by International POSIX and X/Open --
Uniforum Joint Internationalization Group. And the POSIX model of
internationalization allows a specific version of a locale modified
for an application type. Because locale sensitive processing is
missing from the definition of locale, locale sensitive processing
should be done in locale and application neutral way by the definition
of locale. That is to say, a locale sensitive function should operate
for all locales including the modified locales by application
modifiers by switching locale information data. The examples of locale
sensitive functions are atof(), printf(), scanf(), strcoll() and so on.

I don't think that locale sensitive processing can be language,
cultural data and application neutral. So I define a 'locale' as a
combination of language, cultural data, coded character set and their
processing. By the term 'processing', I mean only the locale and
application sensitive processing and it can be achieved by locale and
application specific locale sensitive functions, while common (locale
and application neutral) locale sensitive functions are used by
International POSIX and X/Open -- Uniforum Joint Internationalization
Group. Below I present the examples for collating sequences with which
we can't do language, cultural data and application neutral
processing. The examples are chosen from the book titled 'Digital
Guide to Developing International Software' and the e-mail 'Guideline
for producing a national POSIX locale' of Mr. Keld Simonsen.

Collating symbols
-----------------

Numbers, punctuation, and additional symbols can be treated in a
variety of ways when producing ordered lists. It may be a requirement
to allow for different ways of treating them if the software is to be
used in different applicaton domains. For example, a space between
characters is ignored for some applications but observed for others.
If the space is ignored, the resulting list would be

    Daniels
    Da Silva
    Dauxois

However, if the space is not ignored, the resulting list would be

    Da Silva
    Daniels
    Dauxois

Collating Danish characters
---------------------------

There are many levels of complication for collation of Danish as
defined in the official collating standard DS 377. For example, on the
telephone level, Mc is the same as Mac, numbers are spelled out, and
certain words like ``the'' are ignored or moved to the end. Another
level is the phonetic level - soundex, which is a little less
complicated. A third level is transcripted characters, as the
librarians use when they see a greek alpha and order that as a normal
``a''.

Collating Arabic characters
---------------------------

Arabic is a single-case language, so the problems of collating
uppercase and lowercase characters do not occur. The following
guidelines apply to the Arabic collating sequence:

    -     The Arabic connecting character, the ``tatweel'' has no
	  significance in a word and should be excluded during
	  collation.

    -     Words are first sorted in code order with the Arabic vowels
	  characters excluded.

    -     Groups of words having the same consonants are then sorted
	  in code order including the vowel characters.

Collating ideographic characters
--------------------------------

Collating ideographic characters is more complex than collating Latin
characters. The following three different methods of collating are
used:

    -     By radicals

	  Radicals are the root forms of a character that give the
	  character its basic meaning. The radical collating sequence
	  sorts according to the radicals that make up the character.
	  If there is more than one character with the same radical,
	  then these similar characters are further sorted by the
	  number of strokes that make up the character.

    -     By number of strokes

	  Characters are sorted by the number of strokes that make up
	  the character. If more than one character has the same
	  number of strokes, these characters are further sorted by
	  radicals.

    -     By phonetic sequence

	  Characters are sorted according to the sequence in which
	  they appear in a phonetic alphabet. In this phonetic
	  alphabet, the characters are organized according to their
	  romanized (western) spelling.

Locale and application specific locale sensitive functions
----------------------------------------------------------

If common locale sensitive functions are used, a locale sensitive
function should accommodate all locales by switching locale
information data. For example, the function ``strcoll()'' should be
able to compare two strings from all locales. If it could be done by
switching collation table for each locale and application combination,
we would have no problem. As shown in the examples above, however,
multi-path algorithm is used for some locales and other locale and
application combinations need different processing to collate two
strings. Locale and application specific locale sensitive functions
are the means to give clients locale and application sensitive
processing.