From keld@dkuug.dk Tue Feb 12 23:22:39 1991
Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8)
	id AA17488; Tue, 12 Feb 91 23:22:39 +0100
Date: Tue, 12 Feb 91 23:22:39 +0100
From: Keld J|rn Simonsen <keld@dkuug.dk>
Message-Id: <9102122222.AA17488@dkuug.dk>
To: i18n@dkuug.dk
Subject: paper presented to WG11
X-Charset: ASCII
X-Char-Esc: 29

Title: A programming language independent character set independent string type.
Source: Expert contribution by Keld Simonsen, Danish Standards to SC22/WG11.
Date: 1991-01-21

POSIX has defined a way to describe many cultural and natural language
dependencies, independently of character set encoding. This is done in
the current draft 10 of POSIX.2 in the "localedef" and "charmap"
specifications. The mechanism used is symbolic character names,
which then is used in strings for the specification of for example
day and month names. The binding to the actual character set encoding
is done in the "charmap", and you can have different charmaps for
the same locale.

the notation used for strings is:

      <name>

- a symbolic character name enclosed in angle brackets - also known
as less-than and greater-than signs. Ordinary characters like Latin
letter "a" (Latin small letter a) can be represented by themselves;
actually all characters of the encoded character set, which the 
locale is encoded in, can be represented by themselves. Only "odd"
characters need to be specified with symbolic notation.

An example of a "General String" could then be:

           "s<o/>ndag"

(the Danish word for Sunday), with <o/> meaning "Latin small letter
o with stroke".

I am aware of "general strings" in ASN.1 syntax of OSI protocols,
and new work is going on here which is relevant to this.
Also the SGML text mark-up language is handling similar issues.
