From taylor@limbo Thu Jan 16 08:47:40 1992
Received: from uucp-gw-1.pa.dec.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA17495; Thu, 16 Jan 92 08:47:40 +0100
Received: by uucp-gw-1.pa.dec.com; id AA21569; Wed, 15 Jan 92 23:42:36 -0800
Received: by limbo; Wed, 15 Jan 92 23:39:29 pst
Message-Id: <9201160739.AA07631@limbo.intuitive.com>
Subject: Re: Support for symbolic character names
To: keld%dkuug.dk%sun%uunet.UU.NET@Pa.dec.com (Keld J|rn Simonsen)
Date: Wed, 15 Jan 92 23:39:26 PST
From: Dave Taylor <taylor@limbo.intuitive.com>
Cc: i18n@dkuug.dk, wg14@dkuug.dk
In-Reply-To: <9201160451.AA13482@dkuug.dk>; from "Keld J|rn Simonsen" at Jan 16, 92 5:51 am
Reply-To: Dave Taylor <taylor@limbo.intuitive.com>
Organization: Intuitive Systems, Mountain View, California  +1 (415) 966-1151
X-Mailer: Elm [version 2.02]
X-Charset: ASCII
X-Char-Esc: 29

Keld and Teruhiko offer some worthwhile insight into the problem 
of symbolic naming of international characters.  In particular,
it's true that when we expand my scheme into thousands upon 
thousands of characters, it becomes more difficult to believe that
it could be a valid solution!

Nonetheless, I would like to take issue with some of the things
that have gone past:  Teruhiko commented that he felt the use
of symbolic names was more cryptic than explicitly listed ASCII
characters (e.g., "COLON" versus ":").  While I can understand
that the use of English to name values is ethnocentric (a point
that I made in an earlier message), I fear that Teruhiko has
missed one of the more important points of modular programming,
in that use of symbolic values is a win over explicit "magic"
values hidden in the code.

That is, surely we all agree that code of the form:

	#define  BYTES_IN_A_WINDOW_GLYPH	1033

	if (icon_size == BYTES_IN_A_WINDOW_GLYPH)

		....

is clearer and better than:

	if (icon_size == 1033)

		....

If agreed, then the challenge I believe we face is to map this
simple scheme into an internationally applicable solution.  One
interesting possibility is the work that Glenn Adams has done, 
which expands beyond the cryptic "<o/>" to having something
more akin to:

	latin/slashed-o case/lower

Which, while still very wordy, offers, through use of a preprocessor,
the ability to have the solution I suggested earlier in a completely
portable fashion:

	if (ch == 'latin/slashed-o case/lower')

One challenge I still see remaining is how to encode what we
consider the 'basic ASCII', the 7-bit subset of 8859-1?  Do we
simply leave them as 'a', 'b', etc, or have the compatible
mapping of "latin/a case/lower" "latin/a case/upper", etc?

I must admit that I like Glenns' approach; it offers a clear
and readable notation that offers not just an approach to 
having symbolic naming in code, but modular design too, and
readable, maintainable code.  

The down side is one that Keld notes: how do we deal with
strings, rather than individual characters?  Perhaps that's
not much of a problem, though?  Surely we're forgetting that
there are message catalogs around that can be accessed for
complex data items like foreign characterset strings.
	
I think that we need to put significant effort into keeping
whatever notation accepted "clean" and with minimal clutter.
Keld comments:

> There is not much difference between writing
> 
>       if (ch == COLON) ....
> 
> and
>       
>       if (ch == L'\<COLON>') ....

Yet I suggest that there is indeed considerable difference, 
and that the latter is more obfuscated, delving into a popular
bit of C programming, 'magic characters'.  Imagine a foreign
programmer reading a line like:

	if (xyd == L'\<OMEGA>' || xyd == L'\<ALPHA>')

more character here are punctuation (read 'confusing') than aren't.
Why should that matter?  Because the solution that we're trying
to get to here should be one that encourages programmers to write
for the global marketplace, not one that is so confusing, and such
a burden that they complain when management requests a globalized
version...

And I believe that the current state of internationalized code is
perilously close to the latter state, where C is already pretty
amazingly cryptic, and adding more and more confusing system calls
and library calls is not a good service to the community, and to
the marketplace at large.  But that's another discussion...!

						-- Dave Taylor

Intuitive Systems				        SunWorld Magazine
Mountain View, CA			        	San Francisco, CA

taylor@intuitive.com         taylor@netcom.com        taylor@sunworld.com
