From jbettels@wadd.enet.dec.com Fri Sep  4 16:45:23 1992
Received: from inet-gw-2.pa.dec.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8)
	id AA12027; Fri, 4 Sep 92 16:45:23 +0200
Received: by inet-gw-2.pa.dec.com; id AA19977; Fri, 4 Sep 92 07:45:20 -0700
Received: by vbormc.vbo.dec.com; id AA08180; Fri, 4 Sep 92 16:40:55 +0200
Message-Id: <9209041440.AA08180@vbormc.vbo.dec.com>
Received: from wadd.enet; by vbormc.enet; Fri, 4 Sep 92 16:40:56 MET DST
Date: Fri, 4 Sep 92 16:40:56 MET DST
From: "Jurgen Bettels, TNSG/ISE  04-Sep-1992 1641" <jbettels@wadd.enet.dec.com>
To: iso10646@jhuvm.bitnet, i18n@dkuug.dk
Cc: jbettels@wadd.enet.dec.com
Apparently-To: i18n@dkuug.dk, iso10646@jhuvm.bitnet
Subject: request for feedback on character set identification proposal
X-Charset: ASCII
X-Char-Esc: 29




                                                           JTC1/SC2/WG3/N 156R
                                                           
                                                           

         Identification of Character Collections and Encodings
         =====================================================

                                92-09-03

                    J|rgen Bettels (Digital Equipm. Co.)



1.  Requirements

    There is a growing need to be able to indentify character sets using
    some specification mechanism such as ASN.1. The demand for this has
    been raised by SC18, SC21 and SC22 and also from other groups such as
    X/Open. A liason statement from SC22 to SC2 has expressed a need for
    character set names. This paper is proposing a possible approach to
    these requirements by revising ISO 7350 in order to allow an
    identification registry which can accommodate the requirements.


1.1 Type of Repertoires for which OID's Are Required

    It is recognized that ISO 10646 represents a registry of all individual 
    characters (It is expected that any character not included in the first 
    edition of ISO 10646 will be so in a later revision). Even though ISO
    10646 does enumerate a number of character collections as sub-sets, it
    is clearly not a registry of all collections currently in use.

    This proposal would allow to generate identification for any collection
    of characters and their encodings.

    It must be possible to generate (and register) identifications for
    character repertoires and encodings from:

	1. ISO standards (including 10646 and its subsets)

	2. national standards

	3. proprietary standards


1.2 Format

    It would be most useful to have the collection of all such identifiers
    in one document so that they are easily accessible by users. An alternative
    would be to have OID's added to each character set standard as it is
    revised. This would only address 1. above and even then it is not very
    likely that all standards will undergo a revision at which the OID
    could be added. Most flexible would therefore be a registry of
    identifiers under a revised ISO 7350 to which new entries can be added
    easily.


1.3 Content of the registry

    As a minimum, for each character set registered the following must be
    specified and part of the registry entry:

	- a reference and description of the standard
	- ASN.1 OID for the abstract syntax (i.e. the repertoire)
	- an optional ASN.1 OID for the transfer syntax (i.e. the encoding)
	- object descriptors of the OIDs for human reference and for
	  programming languages

    Note that it is proposed here to separate the repertoire from the
    encoding. This allows more flexibility as multiple encodings exist for
    the same collection.

    There is particular interest from programming languages, databases and
    Operating systems (Posix) for such OID's. E.g. SC22 (Posix) as well as
    X/Open intend to register names for locales. These names will contain
    different standard fields specifying language, culture, character set,
    etc. It is expected that the object descriptors for abstract and
    transfer syntax proposed here be used in these fields.

    As other applications might have additional requirements it is expected
    that wide circulation and review of this paper will uncover these
    requirements.


2.0 Creation of the registry

    It is proposed that JTC1/SC2, after revision of ISO 7350, initially
    populates the register with a set of most frequently used character set
    repertoires and encodings

    Examples for the most urgently needed entries:

    1. all parts of  ISO 8859
    2. all collections of ISO 10646
    3. JIS X0208 and JIS X0212
    4. national standard of Korea
    5.    "              of PRC
    6.    "              of Thailand
    7. regional    "     of Taiwan
    8. proprietary standards:
	- selected PC code pages
	- Microsoft Kanji code ("shift JIS")
	- EUC Japanese, Taiwanese...
    9. other standards

    Note: the registry must be easily updatable and publicly available.


3.0 Examples for OIDs and descriptors:

    a. ISO standard 8859-1:

         		OID					descriptor

       { iso standard 8859 part(1) abstract-syntax(1) }   "ISO 8859 part-1 rep"

       { iso standard 8859 part(1) transfer-syntax(0)}    "ISO 8859 part-1 ENCoding"


    b. Repertoire of 8859-1 and UCS-2 encoding:

       the abstract syntax is the same as in example a. The transfer syntax
       would change to:

       { iso standard 10646 part(1) transfer-syntaxes(0) two-octet-form(2)}  

                                    and

                          "ISO 10646 part-1 form 2"


    b. JIS X0208:

       One possibility would be to use exactly the same OIDs as proposed for
       2022:

           { iso standard 2022 abstract syntax(1) reg(...)}      
 
                "ISO 2022 registrations ..."

                        or as synonym

       something like: "JIS X-0208 edition..."


       A better alternative would be to use arcs which point to JIS. Such arcs
       have been defined in ISO 8824 (Annexe B) and use the country codes of ISO
       3166. For this to work it is, however, necessary for the ISO member body
       (in this case JISC) to assign arcs to their standards:

	       { iso member-body Japan(392) JIS X-0208-1990(n) }


    c. PC code pages

       Here we have the same problem as in example 2.

       Let us assume that the repertoire of the total set is that of 10646. Then
       a modified 7350 has to address the following issues:

       - extend the repertoire to 10646 (this would then also be the registry
         for 10646 subsets)

       - register also encoding methods (in case of code tables this means that
         the code table has to be in the registry together with the 10646 names 
         or code points

       - definition of the format of the registry entry

       - synonyms must be unique.                         

       I.e. for a PC code page:

        { iso standard 7350 abstract-syntax(1) reg(...) }          

           "ISO 7350 registrations ..."

	or as synonym:

        " some name for the code table"


4.0 Register entry:

    For each collection there shall be an entry for the abstract syntax (the
    character collection) and an (optional) entry for the transfer syntax (the
    encoding) as follows:

registr. #   |  description  |  specification            |     OID            | descriptor  
             |               |                           |                    | or synonym
__________________________________________________________________________________________
             |               |                           |                    |
reg. n       | ............. |  one or several of:       |  abstract syntax   | string
	     |		     |	- reference to SC2       |                    |
	     |		     |	  standard               |                    |
	     |		     |	- 10646 collection #     |                    |
	     |		     |	- list of 10646 code po- |                    |
	     |		     |	  sitions or character   |                    |
             |               |    names                  |                    |
             |               |                           |                    |
reg. m       | ............  |  - reference to specifica-|  transfer syntax   | string
	     |		     |    tion of register entry |                    |
             |               |    of the relevant collec-|                    |
             |               |    tion                   |                    |
             |               |  - code table             |                    |


    It is possible that particular applications may require additional columns
    of information or provisions for private denominations.



5.0 Conclusions

    As this proposal is considered to cover urgent requirements JTC1/SC2/WG3
    is planning to start work on this as soon as possible. Comments and possible
    additional requirements are invited and should be sent no later than

    		31 January 1993
    to:

    		J|rgen Bettels
    		Digital Equipment Co.
    		12, av. des Morgines, CP 176
    		CH-1213 Petit-Lancy 1
    		Switzerland
    		Fax: +41 22 709 41 40
    
    
    or to the convenor of SC2/WG3:

    		Jan van den Beld
    		ECMA	
    		114, rue du Rhone
    		CH-1204 Geneva
    		Switzerland
    		Fax: +41 22 786 52 31

    
    Email recipients of this paper are invited to reply by email instead.



                                                                              
