From mgm@sybase.com  Wed Nov 19 01:32:56 1997
Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id BAA23418; Wed, 19 Nov 1997 01:32:54 +0100
Received: from smtp1.sybase.com (sybgate.sybase.com [130.214.220.35])
          by inergen.sybase.com (8.8.4/8.8.4) with SMTP
	  id QAA26198; Tue, 18 Nov 1997 16:29:49 -0800 (PST)
Received: from constantine.sybase.com by smtp1.sybase.com (4.1/SMI-4.1/SybH3.5-030896)
	id AA28895; Tue, 18 Nov 97 16:30:37 PST
Received: by constantine.sybase.com (5.x/SMI-SVR4/SybEC3.5)
	id AA23221; Tue, 18 Nov 1997 16:27:47 -0800
Date: Tue, 18 Nov 1997 16:27:47 -0800
From: mgm@sybase.com (Michael G. McKenna)
Message-Id: <9711190027.AA23221@constantine.sybase.com>
To: rosenne@NetVision.net.il, Harald.T.Alvestrand@uninett.no,
        manuel.carrasco@emea.eudra.org
Subject: Re: (i18n.390) RE: Transliteration standards: possible impact on internationaliz
	 ation
Cc: Converse@sesame.demon.co.uk, i18n@dkuug.dk, xojig@xopen.co.uk,
        sc22wg14@dkuug.dk, www-international@w3.org, wgi18n@terena.nl,
        keld@dkuug.dk
X-Sun-Charset: US-ASCII

[Mike]

Unfortunately, the 639 language code does not cover regional
differences, for instance between US English and International
English.  This may not be that big of a problem with regards the target
language, but it may make a difference when choosing the source
language.

In Russian, for instance, is the source language White Russian,
or contemporary Russian?  And what script is it in?
Serbo-Croation is commonly written using a latin script in Croation
areas, but a cyrillic script in Serbian areas.

It might look a little ugly, but the X-windows font specifier strings
may be of some use as a starting template.  Perhaps something like:


t-<sl>-<sd>-<ss>-<tl>-<td>-<ts>

Where:
	sl - source language, using ISO 639
	sd - source dialect, perhaps using ISO 3166 (I know, even this
		has defieciencies)
	ss - source script - (we'll script identifiers)

	tl - target language
	td - target dialect
	ts - target script

Any value can be a default or wild card.  So,
	French transliterated into Hebrew  = t-fr-*-*-iw-*-*
	French transliterated into Russian = t-fr-*-*-ru-*-*
	Russian transliterated into Serbo-Croation in a latin script
					= t-ru-*-cy-sh-*-cy

		where 	cy = cyrillic
			la = latin

This may be overkill, but we do need some sort of modifier part for
regional differences.

My $0.02,

	Mike____

> [Carrasco 1]
> > >Transliteration should be coded in RFC 1766 (Mr. Alvestrand ?).
> > >
> > >For example:
> > >
> > >  t-xx
> > >
> > >where
> > >  t   : transliteration
> > >  xx : a 639 language code
> > 
> > [Rosenne]
> > A second argument is needed: the language into which the text is
> > transliterated. Obviously, French transliterated to Hebrew is
> > different
> > from French transliterated into Russian.
> > 
> > [Carrasco 2]
> > 
> > So one needs to code:
> > 
> >  -  t    :  transliteration indicator
> >  - ss  :  a 639 language code ; source language (language
> > transliterated from)
> >  - tt    :  a 639 language code ; target language  (language
> > transliterated into)
> > 
> > Examples:
> >  French transliterated into Hebrew  = t-fr-iw
> >  French transliterated into Russian = t-fr-ru
> > 
> > Questions:
> >   - Any other parameters needed to be coded ?
> >   - Does this breaks RFC 1766 ?
> > 
> > Regards
> > Tomas
> > 
> 
