From ALB%SEAS@liverpool.ac.uk Thu Jan  9 02:02:03 1992
Received: from danpost2.uni-c.dk by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA10739; Thu, 9 Jan 92 02:02:03 +0100
Received: from vm.uni-c.dk by danpost2.uni-c.dk (5.65/1.34)
	id AA04049; Thu, 9 Jan 92 01:01:46 GMT
Message-Id: <9201090101.AA04049@danpost2.uni-c.dk>
Received: from vm.uni-c.dk by vm.uni-c.dk (IBM VM SMTP V2R1) with BSMTP id 4262;
   Thu, 09 Jan 92 02:02:10 DNT
Received: from UKACRL.BITNET by vm.uni-c.dk (Mailer R2.07) with BSMTP id 4752;
 Thu, 09 Jan 92 02:02:09 DNT
Received: from RL.IB by UKACRL.BITNET (Mailer R2.07) with BSMTP id 9955; Thu,
 09 Jan 92 00:59:28 GMT
Received: 
           from RL.IB by UK.AC.RL.IB (Mailer R2.07) with BSMTP id 5923; Thu, 09
                Jan 92 00:59:28 GMT
Via:            UK.AC.LIV.IBM;  9 JAN 92  0:59:23 GMT
Received:       from ALB@SEAS by MAILER(4.1.a);  9 Jan 1992 01:01:10 GM
Addressed-To:   I18N@EARN.DK.DKUUG Via MAILER
Addressed-From: ALAIN_LA_BONTE (Alain LaBonte O1 418 644 1835)
Subject: 
             To the rescue of Keld in Symbolic name usefulness when chars don't
                exis
Date:           Thu, 9 Jan 1992  00:47 GMT
To: I18N@DKUUG.DK
From: ALB <ALB%SEAS@liverpool.ac.uk>
X-Charset: ASCII
X-Char-Esc: 29

I hope I will scandalize nobody if I say that to process characters
corectly, coded character sets are not very useful any way, except it is the
best need to input and output characters to minimize necessity of agreement
between parties, which has even proven to be quite difficult.

In fact in the Quebec Government, to be able to process (sort/search/compare/
merge textual data without having to change old programs, sort programs,
access methods provided by vendors and so on, we store character data in
only retaining meaning according to CSA standard Z243.4.1 (on which has
been built POSIX LC_COLLATE structure).

Now as a consequence it means that even if we don't have, say, the possibility
to input or output a LATIN LETTER C WITH CARON, we can store the character
so that it can be correctly compared everywhere, and we can even omit the
accent on output just by dropping temporarily part of the data, like
in floating point numbers you can truncate a mantissa and only lose precision
without loss of the essence of a number.

In this case Keld's short ids can be quite useful: once decorticated all the
information is retained in binary processable form. So far it works even in
so-called 8-bit envrironments because we don't store 8-bit characters,
but n-bit strings of binary data directly representing chaarcter data in
the most efficient processable (directly) form...

I found the other explanation simplist about processing of caharcter data.
Comparisons of chaarcters are not limited to equality, and thus just storing
characters on an external file in their standardized format does nothing to
make processing consistent, though I agree for fixed messages (error msgs,
info msgs, prompts, etc.) it is the only way, except it must also be
stored ideally, as Keld pointed out, independently of a given code, so that
these messages be ported even in the same country using the same language
but another code.

For more information about structure of binary processable text (those who
never heard my sempiternal preaching and conferences on this issue), please
contact me.

              Alain LaBont/e
              Minist\ere des Communications du Qu/ebec
