From alb@riq.qc.ca Fri Jan 20 20:36:46 1995
Received: from socrate.riq.qc.ca by dkuug.dk with SMTP id AA02145
  (5.65c8/IDA-1.4.4j for <i18n@dkuug.dk>); Fri, 20 Jan 1995 22:02:36 +0100
Received: from slip70 (slip71.riq.qc.ca) by socrate.riq.qc.ca (5.0/SMI-SVR4)
	id AA00505; Fri, 20 Jan 1995 15:36:47 +0500
Date: Fri, 20 Jan 1995 15:36:46 +0500
Message-Id: <9501202036.AA00505@socrate.riq.qc.ca>
X-Sender: alb@riq.qc.ca
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: comp-software-international@SENATOR-BEDFELLOW.MIT.EDU,
        iso10646@jhuvm.hcf.jhu.edu, i18n@dkuug.dk, cpwg-mail@revcan.ca,
        ca-jtc1-sc2@microstar.com, sc22@dkuug.dk, csa-cpl@math.uwaterloo.ca,
        CA-JTC1-SC18-WG9@MICROSTAR.COM
From: alb@riq.qc.ca (Alain LaBont[e'])
Subject: Just another email transporter, but useful!
X-Mailer: <PC Eudora Version 1.4>
Content-Length: 0
X-Charset: ASCII
X-Char-Esc: 29


Some of you already know the intermediate code TRANISCI that I had to propose a few years ago having in mind that not all people had decoders such as UUDECODE, BASE64 or HexBin (3 and 4 are incompatible and 4 is only available on Mac!), nor MIME-aware email interfaces, and that the actual email infrastructure persists in using archaic 7-bit ASCII.

The TRANISCI code is just another code that has 2 advantages: first, the encoded message is interpretable by most users without decoder; second, for most characters (in fact for all graphic characters part of the high-end part of the Latin-1 repertoire) it only uses 2 characters in USASCII (while "quoted printable" uses 3). In conclusion it is more efficient not only for machines, but also for humans (some other intermediate codes are slightly more efficient in terms of storage but are totally human-unreadable without a decoder).

The intermediate code could have the potential advantage to be mixed with univented conventions for the Latin script (Latin-2, and so-on) and still be readable (for those who have that need) in USASCII.

I just made a new version of TRANISCI/TRANI850 (TRANISCJ/TRANI819 for when either a source text or a target text is using integral 8859-1. I will transmit BASIC programs to anybody interested (send me a mail - ther maybe a delay due to work overload, but I should answer if I get the message).

Here is description of the code.

TRANISCI Format description 
*******************************************************************
The following conventions were created by Alain LaBont<e'>, formely
from Minist<e!>re des Communications du Qu<e'>bec (now working for  |
Secr<e'>tariat du Conseil du tr<e'>sor du Qu<e'>bec), for practical |
purposes, to transport full code page 850 (including full Latin-1   |
Repertoire plus other box drawing characters and a few others) in   | telecom-safe 7-bit ASCII, in the most efficient way, so that the    | 
message be decipherable even without decoder (a more elegant ouptut |
should use the appropriate decoder), i.e. even in ASCII.            |

The original coders and decoders are simply BASIC programs that are |
expected to be interpretable on all MS/PC DOS-compatible machines:  |

Coders                                                              |

TRANISCI - Converts IBM-850-coded text input to telecom-proof code  |
TRANISCJ - Converts 8859-1-coded text input to telecom-proof code   |

Decoders                                                            |

TRANI819 - Converts telecom-proof code to 8859-1 text output        |
TRANI850 - Converts telecom-proof code to IBM-850 text output       |

To convert an 850 file using ASCII, use program TRANISCI.BAS
To revert the process, i.e. reconverting to 850, use
program TRANI850.BAS - These programs are also available in
ANSI C versions (programs NTRNISCI and NTRNI850). In the BASIC      |
versions, TRANISCI text input is simplistically always on file      |
TEMP1, and intermediate code file produced is TEMP2. TRANI850 reads |
the intermediate code file on TEMP2 and reconstitutes the original  |
on file TEMP3. The ANSI C version asks the user what are file names.|

The "telecom-proof" intermediate code being able to transport the   |
full Latin-1 repertoire, programs TRANISCJ and TRANI819 are to      |
TRANISCI and TRANI850 what 8859-1 is to 850, with the advantage     |
that they share the same intermediate code, with round-trip         |
integrity, whatever the original printable characters are.          |
Currently no ANSI C version of TRANISCJ and TRANI819 exist.         |

COPYRIGHT (C) 1992, 1993, 1994, 1995 by Alain LaBont<e'>
Permission is granted to reproduce these conventions and programs
provided this notice is copied too.

The general coding mechanism of TRANISCI is using digraphs or
occasionally trigraphs to represent characters which are not the    |
known invariant and telecom-proof 7-bit ASCII subset of characters.

For example, to represent the latin letter e with acute accent,
the announcer / is used in front of e to give /e. To make sure
round-trip character integrity is possible, announcers themselves
are coded. Hence, an original / is coded as +/ which will be
guaranteed to be decoded to the original character in case, for
example, where /e in an original text, would not stand for e acute.
Safe characters (like the unaccented latin letters) are coded as
they were in the original (no digraph or trigraph used).
In general, too, a coded text, even undecoded, should be readable
in its essential form if it is a text written in a language using
the latin script.

*****************************************************************
ANNOUNCERS and the list of coded characters accessed with them:
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER SLASH (/)
*************          ******************************************
aeiouy AEIOUY          WITH ACUTE ACCENT
cC                     WITH CEDILLA
s                      szet (scharfes s)
S                      section sign (en fran<c,>ais: paragraphe)
dD                     WITH STROKE (icelandic eth)
L                      POUND STERLING SIGN
423                    FRACTIONS 1/4, 1/2, 3/4
,                      CEDILLA (stand-alone)
x                      MULTIPLY SIGN
:                      DIVIDE SIGN
-                      BROKEN VERTICAL BAR (not ASCII but 850)
>                      GREATER-THAN SIGN
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER LESS-THAN SIGN (<)
*************          ******************************************
aeiouAEIOU             WITH GRAVE ACCENT
C                      CENT SIGN
-                      DASH
*****************************************************************
CHARACTERS OBTAINED WITH GREATER-THAN SIGN (>)
*************          ******************************************
aeiouAEIOU             WITH CIRCUMFLEX ACCENT
j                      INVERTED EXCLAMATION MARK (Spanish)
123                    EXPONENTS 1,2,3
6                      INVERTED QUESTION MARK (Spanish)
0                      DEGREE SIGN
-                      NOT SIGN
=                      OVERLINE
.                      MIDDLE DOT
/                      ACUTE ACCENT (stand-alone)
?                      QUESTION  MARK
:                      TREMA OR UMLAUT (stand-alone)
,                      NO-BREAK-SPACE
)                      ANGLE QUOTATION MARK RIGHT
(                      ANGLE QUOTATION MARK LEFT
>                      COLON
+                      PLUS SIGN
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER COLON (:)
*************          ******************************************
aeiouyAEIOU            WITH DIAERESIS (tr<e'>ma or umlaut)
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER QUESTION MARK (?)
*************          ******************************************
anoANO                 WITH TILDE (Portuguese and Spanish)
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER STAR (*)
*************          ******************************************
aA                     WITH RING ABOVE (Scandinavian)
oO                     SLASHED (Scandinavian)
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER DASH (-)
*************          ******************************************
pP                     ICELANDIC THORNs (also old Anglo-Saxon)
C                      COPYRIGHT SIGN
R                      REGISTERED TRADE MARK SIGN
9                      PILCROW (en fran<c,>ais: alin<e'>a)
Y                      YEN SIGN
ao                     FEMIN/MASCULINE ORDINAL INDICATORs (Port.)
m                      MICRO SIGN
*****************************************************************
CHARACTERS OBTAINED WITH ANNOUNCER PLUS SIGN (+)
*************          ******************************************
o                      INTERNATIONAL MONETARY SYMBOL
-                      PLUS-OR-MINUS SIGN
eE                     AE LIGATUREs (letters in Scandinavia)
/                      SLASH (stand-alone)
*                      ASTERIX
<                      LESS-THAN SIGN
*****************************************************************
ALL OTHER NON ALPHANUMERIC CHARACTERS (EXCEPT carriage return,
line feed, and hex "1A" (DOS end of file) are converted by
program TRANISCI as follows:

The resulting code will have as announcer a double question mark
(??) (thus forming a trigraph) for all the following cases:

Control characters (from 01 to hex "1F") are converted to
printable by adding hex "20" (ex.: BEL [hex "08"] becomes "(").

All others (box drawing characters and characters that are not
part of the repertoire of the Latin Alphabet No. 1) are
substracted with decimal 100 (ex: the double line cross [decimal
206] becomes "j") except:

DEL (127)               becomes "@"
GULDEN SIGN (159)       becomes "A"
SOFT HYPHEN (240)       becomes "B" (note: SHY is part of 8859-1
                                     but has problems under DOS)
DOUBLE LINE BELOW (242) becomes "C"
CENTRAL SQUARE (254)    becomes "D"
*****************************************************************

When an intermediate line would be longer than 80 columns, the
sequence +! is generated at the end of a physical line. The
decoder then reconnects the next line with the previous line in
reconstructing the original data. This avoids loss of data by
some MAILERs which truncate data beyond 80 columns on a line.

The header of the intermediate code (indicating its level) is:      |
TRANISCI <- 1992-01-22 ->TRANI850                                   |


Alain LaBont[e']                               \875, Grande-All[e']e Est, 4C
Service de la prospective et de la francisation \Qu[e']bec, QC, CA 
Secr[e']tariat du Conseil du tr[e']sor           \T[e']l +1 418 643 7229
Gouvernement du Qu[e']bec                         \Fax   +1 418 646 3571
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<- Qu[e']bec                   fr = "Qu[e'] bec {Quel bec} ! " (Champlain) 
 = Kopek/Kipek/{Kebek}/Gebek   langues algonquiennes = passage r[e']tr[e']ci
 = City of Qu[e']bec           en = capital of Qu[e']bec, 1 540 680 square km
 = Uepishtukuiau               {iu?} {innu, montagnais}
 = Ku[i'] B[e<]i K[e`] Sh[i`]  zh (mand.) = grande ville ayant vaincu le Nord
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
