From LALOVIC@torolab5.vnet.ibm.com Wed Jan  8 01:57:30 1992
Received: from vnet.ibm.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA02508; Wed, 8 Jan 92 01:57:30 +0100
Message-Id: <9201080057.AA02508@dkuug.dk>
Received: from TOROLAB5 by vnet.ibm.com (IBM VM SMTP V2R2) with BSMTP id 5461;
   Tue, 07 Jan 92 19:57:29 EST
Date: Tue, 7 Jan 92 18:40:15 EST
From: LALOVIC@torolab5.vnet.ibm.com
To: i18n@dkuug.dk, wg14@dkuug.dk
Subject: Re: (XoJIG 413) (i18n.135)
X-Charset: ASCII
X-Char-Esc: 29

          Re: (SC22WG14.165)
           Re: support for symbolic character names

>
> . . . text deleted
>
>The mechanism is to support portability of international C programs.
>Consider a text with my name in it, Keld J|rn Simonsen.
>The | looks fine on my terminal, but probably not on yours.
>It is a lowercase-o-with-stroke. If I want to write a portable
>program, which tests for this letter, then it is very difficult.
>It depends on the execution character set. In the one I am really
>using now (IBM865) it is one value, in ISO8859-1 it is another,
>and ind MacIntosh it is yet another. If  I can use the localedef
>notation, I would just refer to the symbolic character name,
>and the execution locale will do the proper naming for me.

There are two issues here: portability of source code, and
data integrity. I agree with Keld that source code portability
demands a standard way of de-referencing characters that
are not available in a particular environment. However, to
deal with the second issue a different approach is needed.

Keld correctly observes that his scheme is in line with the
current practice in C (e.g. \t, \n), but the scheme goes one
step further. It assumes that symbolic character names will
be resolved during the execution, as opposed to compilation.

As far as I know C resolves symbolic character names during
compilation, thus the object code can only work correctly
if it is executed in the same environment in which the source
was compiled.

Similarly POSIX locale must be compiled by localedef utility
before it can be used, but the compilation produces an object
which is fixed to a particular code set (the one corresponding
to the charmap).

The current practice, therefore, is not in line with Keld's
assumption that symbolic character names are resolved at
execution time. Due to performance issues, the current practice
is unlikely to change in the future, so other means of dealing
with data integrity must be considered.

One possibility is to base the processing environment on UCS
(Universal Character Set) such as ISO 10646, and convert on
its boundaries. For example, if a display device does not have
all characters that appear in a text to be displayed, the text
can be converted such that unavailable characters are replaced
with Keld's symbolic names. The opposite conversion would apply
to keyboard entry, i.e. from symbolic names to UCS code points.

>
>Keld

+----------------------------------------------------------------------+
|  Milos Lalovic                           A3/979/895/TOR              |
|                                          IBM Canada, Inc.            |
|  phone: (416) 448-2276                   895 Don Mills Road          |
|  fax:   (416) 448-2114                   Noth York, Ontario M3C 1W3  |
|  email: lalovic@torolab5.vnet.ibm.com    CANADA                      |
+----------------------------------------------------------------------+
