From taylor@limbo Tue Jan 14 11:24:33 1992
Received: from uucp-gw-1.pa.dec.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA08055; Tue, 14 Jan 92 11:24:33 +0100
Received: by uucp-gw-1.pa.dec.com; id AA18234; Tue, 14 Jan 92 01:49:33 -0800
Received: by limbo; Mon, 13 Jan 92 23:34:36 pst
Message-Id: <9201140734.AA01852@limbo.intuitive.com>
Subject: Re: Support for symbolic character names
To: wg14@dkuug.dk, i18n@dkuug.dk
Date: Mon, 13 Jan 92 23:34:33 PST
From: Dave Taylor <taylor@limbo.intuitive.com>
Reply-To: Dave Taylor <taylor@limbo.intuitive.com>
Organization: Intuitive Systems, Mountain View, California  +1 (415) 966-1151
X-Mailer: Elm [version 2.02]
X-Charset: ASCII
X-Char-Esc: 29

Regarding the ideas about international character labelling, a
few thoughts.

First off, UniCode and MULTInational standards of that ilke solve
this problem rather directly.  If "UNICODE 540" is always 'o/' (a
character I cannot duplicate here, alas), then it's always the
same for everyone.

For symbolic names, I suggest further that there be standard header
files that define an English name for each character, so we might
have something like:

	#define LOWER_O_SLASH	540

to define the character mnemonically.  In my upcoming book "Global
Software", I have extensive examples of this type of mnemonic approach
for 8859-1 characters.  It makes the code very clean, and also makes
the collating / transliteration tables nice and obvious too.  Indeed,
one wonders why we don't just have everything defined that way anyway,
so that regular C could contain tests like:

	if (ch == COLON || ch == EXCLAMATION_MARK || ch == ASTERISK)

rather than the much less portable, and more cryptic tests like:

	if (ch == ':' || ch == '!' || ch == '*')

The place that it's most problematical, btw, is when a programmer
wants to compare a character to the single quote ASCII character,
leading to an ugly that we've all seen:

	if (ch == ''')		or 		if (ch == '\'')

both of which are pretty sad solutions to the problem, really.

In any case, I support the approach of having mnemonics, and 
note strongly that the key to having anything of this ilke work
is to have the *mnemonic* already available on all the systems
targeted.  Perhaps a publicly available, and X/Open (etc) proposed
standard on <mnemonics.h> as a system include file?  Perhaps it
could be automatically included when <stdio.h> is included in a 
program, even?  Remember, it's not going to add one iota of code
to the application, and modern day computers should be quite fast
enough that even another few thousand preprocessor defines should
be transparent on compiler performance.

Before I leave this note, it is true that I suggest a set of mnemonics
that are defined for English.  Indeed, it's just as ethnocentric as
all the original computer design that -- significantly -- got us into
this mess in the first place.  Mea culpa.  But having the 'standard'
mnemonics in English doesn't preclude localization teams from having
their own application specific mapping of English to local language
for within their code.  Perhaps something like:

	#define ENYE	LOWER_N_TILDE

or similar (I know how to pronounce the Spanish name for the 'n' with
a tilde, but don't know how to spell that word.  My apologies!).  Note
that this would not only be just as portable as using the English
(read "standard") mnemonics, but would offer the additional boon of 
being a localized definition that could, among other things, be
globally replaced without any danger to the integrity of the code.


						-- Dave Taylor

Intuitive Systems				        SunWorld Magazine
Mountain View, CA			        	San Francisco, CA

taylor@intuitive.com         taylor@netcom.com        taylor@sunworld.com

ps: can someone fiddle with the mail headers so we get a standard
    Reply-To: that points to the entire list?  When I composed this
    message I must have spent five minutes trying to puzzle out the
    headers and addresses, and am still not sure it's right...


