From mduerst@ifi.unizh.ch  Mon Mar 11 15:01:39 1996
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by dkuug.dk (8.6.12/8.6.12) with SMTP id PAA21361 for <I18N@DKUUG.DK>; Mon, 11 Mar 1996 15:01:31 +0100
Message-Id: <199603111401.PAA21361@dkuug.dk>
Received: from ifi.unizh.ch by josef.ifi.unizh.ch 
          id <00728-0@josef.ifi.unizh.ch>; Mon, 11 Mar 1996 15:01:23 +0100
Subject: Re: (Copy) (Copy) iso / dis 8879 Hypertext Markup Language Standard
To: iso10646@listproc.hcf.jhu.edu
Date: Mon, 11 Mar 1996 15:01:22 +0100 (MET)
Cc: I18N@DKUUG.DK
In-Reply-To: <"96-03-11-13:40:23.65*PRECAL"@rulmvs.LeidenUniv.NL> from "Johan van Wingen" at Mar 11, 96 07:40:00 am
X-Mailer: ELM [version 2.4 PL11]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
Content-Length: 6948
From: Martin J Duerst <mduerst@ifi.unizh.ch>
Sender: mduerst@ifi.unizh.ch

In respect to the message from Turkey forwarded to this list
by Johan van Wingen, I can say the following:

I am not current about the details of the 8879-HTML draft,
and so I might miss something here. I am however, as one
of the coauthors of draft-ietf-html-i18n-03.txt, the internet
draft on internationalization issues of the html working group
of the ietf, quite current about issues of character sets and
encodings in HTML as currently being used on the internet
and standardized by the IETF.

1) ISO 8859-1 is the widely accepted default for HTTP/HTML, and
	it would be completely confusing and useless to try to
	change it to anything else, even if something else might
	cover more languages and be of help especially for Turkey.

2) The default of ISO 8859-1 for HTTP/HTML is more a matter
	of HTTP than of HTML. Using the MIME "charset" parameter
	in the header of the transmitted HTTP message, this can
	be changed to anything else, such as 8859-9. This only
	affects what characters can be transmitted, and is of course
	dependent on the receiver (browser) understanding this
	"charset" (a MIME charset is more like a character encoding
	than a character set!). This functionality as such is already
	quite supported by browsers, although I don't know about
	any "charset"s in particular.
	(the default "charset" parameter if HTML is transmitted via
	mail is US-ASCII (7-bit)).
	The current i18n draft also gives some additional hints
	of how the "charset" of an HTML text could be specified
	in case the above is not directly usable.

3) RFC 1866 (the current ietf HTML standard) specifies ISO 8859-1
	as the document character set, but says that this will be
	changed in a future version to ISO 10646. This change has
	been made in draft-ietf-html-i18n-03.txt. As this change
	is fully backwards-compatible, it would probably be a good
	idea to introduce it into 8879-HTML, even if none or only
	few of the other changes from draft-ietf-html-i18n-03.txt
	are addopted. "Latin capital letter i dot" can then be referenced
	as &#304; absolutely independently of the "charset" choosen
	for transmission.

4) The set of entities for single characters defined currently in HTML,
	such as &uuml;, is rather limited. The relevant ISO authorities
	might consider to expand this list e.g. in accordance with
	the lists available for SGML. This would allow to write characters
	such as "Latin capital letter i dot" in a more user-friendly
	fashion. This has to be done in connection with making ISO
	10646 the document character set.

I hope that this helps to clear up the confusion and find a solution
that is acceptable not only to Turkey, but also to non-European
countries with non-Latin writing systems. If I can be of any
additional help, please feel free to contact me.

Regards,	Martin.

----
Dr.sc.  Martin J. Du"rst			    ' , . p y f g c R l / =
Institut fu"r Informatik			     a o e U i D h T n S -
der Universita"t Zu"rich			      ; q j k x b m w v z
Winterthurerstrasse  190			     (the Dvorak keyboard)
CH-8057   Zu"rich-Irchel   Tel: +41 1 257 43 16
 S w i t z e r l a n d	   Fax: +41 1 363 00 35   Email: mduerst@ifi.unizh.ch
----

> ---------------------------- Text of forwarded message -----------------------
>Date:    Fri, 08 Mar 96 16:57 CET
>From:    "Johan van Wingen"                          <PRECAL@RULMVS.LEIDENUNIV.N
>L>
>To:      Martin Bryan                     <mtbryan@SGML-CEN.DEMON.CO.UK>,
>         "H. Gaylord"                     <GALIARD@LET.RUG.NL>
>Subject: (Copy) iso / dis 8879 Hypertext Markup Language Standard
>CC:      SC22 List                            <SC22@DKUUG.DK>,
>         SC22/WG20 mailing list               <SC22WG20@DKUUG.DK>,
>         SC02 List                            <SC2@DKUUG.DK>
>
>Gentlemen
>Here is at last a voice from Turkey itself. I always maintained that
>The Netherlands had selected 8859-9, not 8859-1 for Government use,
>but that message was generally ignored.
> ---------------------------- Text of forwarded message -----------------------
>Date: Fri, 08 Mar 1996 16:40:33 +0300 (EET)
>From: Umit KARAKAS <karakas8@ETI.CC.HUN.EDU.TR>
>Subject: iso / dis 8879 Hypertext Markup Language Standard
>To: central@ISOCS.ISO.CN
>Cc: karakas@ETI.CC.HUN.EDU.TR, tse-d@SERVIS.MET.TR, postmaster@DIN.DE,smazza@ANS
>I.ORG.EDU
> , ozgit@METU.EDU.TR, yener@METU.EDU.TR,faruk@ECZACIBASI.COM.TR
> , jbettels@GVA05.ENET.DEC.COM,becker.osbunorth@XEROX.COM
> , don_carrol@HPBOI1.DESK.HP.COM, ksoe@DS.DK,edwin-hart@JHUAPL.EDU
> , asmusf@IX.NETCOM.COM, alb@SCT.GOUV.QC.CA,winkler@PO3.BB.UNISYS.COM
> , precal@RULMVS.LEIDENUNIV.NL,
> Ersin TORECI <toreci@ETI.CC.HUN.EDU.TR>, hun.edu.tr@ETI.CC.HUN.EDU.TR
>
>I have deep concern about DRaft (?) standard 8879 . Would you supply me
>e-mail address, fax and postal address for working group members for this
>(8879) standard. The previous standard in this area were ISO8879 Standard
>Generalized Markup Language (SGML), date 1986-10-15, and the previous
>standard is based on various, selectable code tables as
>iso2022,iso4873,iso6737. According to partial information on current
>(draft ? ) HTML the code table is selected as Latin 1.
>Latin 1 is regional code table for most of the West European languages is
>represented in. On the contrary Latin 5 ( iso8859 table 9 , ecma128 Latin
>5 ) is designed for maximum utility latin letters. Latin 1 covers 43
>countries and 17 languages according to geographic selection which is
>west european,
>
>Latin 5 covers 43 countries and 17 languages according to frequency of
>the letter and language. Since HTML is basically network utility, and it
>is used /may be used for WWW pages, this standard may not be based on
>regional standard. Otherwise WWW pages, that advertise new products, may
>not be read in Turkey.   For your information According to our brief
>search depending on UN statistics indicate that Latin Alphabet based
>languages utility frequencies is given below :
>
>                  spoken as first language    spoken as second language
>English                 518,244,999               69,033,265  +++
>Spanish                 304,689,188               17,830,359  +
>Portuguese              151,230,622                  965,712  ++
>Turkish                  99,866,460               30,381,893  +
>French                  104,751,358               10,529,661  ++
>German                   93,849,110                9,341,971  ++
>Italian                  57,944,978                9,056,842  +
>Polish                   37,881,900                1,898,223
>
>Data depends on UN statistics , related years 91 to 94, if recent data
>is unavailable statistics on 91 is accepted.
>
>By choosing latin 1 over Latin 5, or intending to do it 130 million
>customers (who speak Turkish ) is ignored.
>
>Would you send information about the current status about 8879 HTML
>?,  copy of current draft and related iso / sc18 meeting timetable
>
>sincerely
>
>
>Umit Karakas
>
>
>
>

