From rinehuls@access.digex.net  Sat Nov 15 00:33:33 1997
Received: from access2.digex.net (qlrhmEbBUV1EY@access2.digex.net [205.197.245.193]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id AAA21427 for <sc22docs@dkuug.dk>; Sat, 15 Nov 1997 00:33:27 +0100
Received: from localhost (rinehuls@localhost)
          by access2.digex.net (8.8.4/8.8.4) with SMTP
	  id SAA00300 for <sc22docs@dkuug.dk>; Fri, 14 Nov 1997 18:33:22 -0500 (EST)
Date: Fri, 14 Nov 1997 18:33:22 -0500 (EST)
From: "william c. rinehuls" <rinehuls@access.digex.net>
X-Sender: rinehuls@access2.digex.net
Reply-To: "william c. rinehuls" <rinehuls@access.digex.net>
To: sc22docs@dkuug.dk
Subject: SC22 N2612 - Vote Summary on CD 14652 - Cultural Conventions Specifications
Message-ID: <Pine.SUN.3.96.971114174738.20789D-100000@access5.digex.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

________________________ beginning of title page _____________________
ISO/IEC JTC 1/SC22
Programming languages, their environments and system software interfaces
Secretariat:  U.S.A.  (ANSI)

ISO/IEC JTC 1/SC22
N2612

TITLE:
Summary of Voting on Concurrent CD Registration and CD Approval for CD
14652 - Information technology - Specificiations for Cultural
Conventions

DATE ASSIGNED:
1997-11-14

SOURCE
Secretariat, ISO/IEC JTC 1/SC22

BACKWARD POINTER:
N/A

DOCUMENT TYPE:
Summary of Voting

PROJECT NUMBER:
JTC 1.22.30.02.03

STATUS:
CD 14652 has been registered.  WG20 is requested to prepare a Disposition
of Comments Report and a recommendation on the further processing of the
CD.

ACTION IDENTIFIER:
FYI to SC22 Member Bodies
ACT to WG20

DUE DATE:
N/A

DISTRIBUTION:
Text

CROSS REFERENCE:
SC22 N2504

DISTRIBUTION FORM:
Def


Address reply to:
ISO/IEC JTC 1/SC22 Secretariat
William C. Rinehuls
8457 Rushing Creek Court
Springfield, VA 22153 USA
Telephone:  +1 (703) 912-9680
Fax:  +1 (703) 912-2973
email:  rinehuls@access.digex.net

______________ end of title page; beginning of overall summary _________

                       SUMMARY OF VOTING ON

Letter Ballot Reference No:  SC22 N2504
Circulated by:               JTC 1/SC22
Circulation Date:            07-22-1997
Closing Date:                11-07-1997

SUBJECT:  Concurrent CD Registration and CD Approval for CD 14652 -
Information technology - Specificiations for Cultural Conventions

--------------------------------------------------------------------
The following responses have been received on the subject of CD
registration:

"P" Members supporting registration without comment:    9

"P" Members supporting registration with comment:       0

"P" Members not supporting registration                 3

"P" Members abstaining:                                 4

"P" Members not voting:                                 7

"O" Members supporting registration without comment:    1


The following responses have been received on the subject of CD approval:

"P" Members supporting approval without comment:        7

"P" Members supporting approval with comment:           2

"P" Members not supporting approval:                    3

"P" Members abtaining:                                  4

"P" Members not voting:                                 7

"O" Members supporting approval without comment:        1

------------------------------------------------------------------------
Secretariat Action:

CD 14652 has been registered.  WG20 is requested to prepare a Disposition
of Comments Report and a recommendation on the further processing of the
CD.

The comment accompanying the abstention vote from Austria was:  "Lack of
expert resources."  The comment accompanying the abstention vote from
Germany was:  "There is no national WG20 rapporteur."  The comment
accompanying the abstention vote from Sweden was:  "Expert resources not
available."

_________ end of overall summary; beginning of registration summary ___

                 ISO/IEC JTC1/SC22  LETTER BALLOT SUMMARY
                           Registration Ballot

PROJECT NO:    JTC 1.22.30.02.03

SUBJECT:  Concurrent CD Registration and CD Approval for CD 14652 -
          Information technology - Specifications for Cultural Conventions
          
Reference Document No:  N2504           Ballot Document No:  N2504
Circulation Date:   07-22-1997          Closing Date:  11-07-1997 
                                                              
Circulated To: SC22 P, O, L             Circulated By: Secretariat


                  SUMMARY OF VOTING AND COMMENTS RECEIVED

                      Approve  Disapprove Abstain Comments   Not Voting
'P' Members

Australia               (X)      ( )       ( )       ( )       ( )
Austria                 ( )      ( )       (X)       (X)       ( )
Belgium                 (X)      ( )       ( )       ( )       ( )
Brazil                  ( )      ( )       ( )       ( )       (X)    
Canada                  (X)      ( )       ( )       ( )       ( )
China                   ( )      ( )       (X)       ( )       ( )
Czech Republic          (X)      ( )       ( )       ( )       ( )
Denmark                 (X)      ( )       ( )       ( )       ( )
Egypt                   ( )      ( )       ( )       ( )       (X)
Finland                 (X)      ( )       ( )       ( )       ( )
France                  (X)      ( )       ( )       ( )       ( )
Germany                 ( )      ( )       (X)       (X)       ( )
Ireland                 ( )      ( )       ( )       ( )       (X)
Japan                   ( )      (X)       ( )       (X)       ( )
Netherlands             ( )      (X)       ( )       (X)       ( )
Norway                  (X)      ( )       ( )       ( )       ( )
Romania                 ( )      ( )       ( )       ( )       (X)
Russian Federation      ( )      ( )       ( )       ( )       (X)
Slovenia                ( )      ( )       ( )       ( )       (X)
Sweden                  ( )      ( )       (X)       (X)       ( )
UK                      ( )      ( )       ( )       ( )       (X)
Ukraine                 (X)      ( )       ( )       ( )       ( )
USA                     ( )      (X)       ( )       (X)       ( )

'O' Members Voting

Korea Republic          (X)      ( )       ( )       ( )       ( )
Portugal                ( )      ( )       (X)       ( )       ( )

__________ end of registration summmary; beginning of approval summary __

                 ISO/IEC JTC1/SC22  LETTER BALLOT SUMMARY
                             Approval Ballot

PROJECT NO:    JTC 1.22.30.02.03

SUBJECT:  Concurrent CD Registration and CD Approval for CD 14652 -
          Information technology - Specifications for Cultural Conventions
          
Reference Document No:  N2504           Ballot Document No:  N2504
Circulation Date:   07-22-1997          Closing Date:  11-07-1997 
                                                              
Circulated To: SC22 P, O, L             Circulated By: Secretariat


                  SUMMARY OF VOTING AND COMMENTS RECEIVED

                      Approve  Disapprove Abstain Comments Not Voting
'P' Members

Australia               (X)      ( )       ( )       ( )       ( )
Austria                 ( )      ( )       (X)       (X)       ( )
Belgium                 (X)      ( )       ( )       ( )       ( )
Brazil                  ( )      ( )       ( )       ( )       (X)    
Canada                  (X)      ( )       ( )       (X)       ( )
China                   ( )      ( )       (X)       ( )       ( )
Czech Republic          (X)      ( )       ( )       ( )       ( )
Denmark                 (X)      ( )       ( )       (X)       ( )
Egypt                   ( )      ( )       ( )       ( )       (X)
Finland                 (X)      ( )       ( )       ( )       ( )
France                  (X)      ( )       ( )       ( )       ( )
Germany                 ( )      ( )       (X)       (X)       ( )
Ireland                 ( )      ( )       ( )       ( )       (X)
Japan                   ( )      (X)       ( )       (X)       ( )
Netherlands             ( )      (X)       ( )       (X)       ( )
Norway                  (X)      ( )       ( )       ( )       ( )
Romania                 ( )      ( )       ( )       ( )       (X)
Russian Federation      ( )      ( )       ( )       ( )       (X)
Slovenia                ( )      ( )       ( )       ( )       (X)
Sweden                  ( )      ( )       (X)       (X)       ( )
UK                      ( )      ( )       ( )       ( )       (X)
Ukraine                 (X)      ( )       ( )       ( )       ( )
USA                     ( )      (X)       ( )       (X)       ( )

'O' Members Voting

Korea Republic          (X)      ( )       ( )       ( )       ( )
Portugal                ( )      ( )       (X)       ( )       ( )

________ end of approval summary ______________________________________
___________ beginning of comments accompanying Canada affirmative vote__

Canadian Comments on ISO/IEC 14652 WD:

1. There is no rationale as to why this standard is required.
   There is rationale in Annex B for the FDDC-set and for the
   various LC_* categories but none for this standard. It would
   be very helpful to add such rationale to understand why this
   standard is necessary and what problems it solves.

Specific Comments:

2. section 3.1.7 - definition of charmap needs to be changed to
   ..." a definition of a mapping between symbolic character names
   and the encoding for a coded character set"

    - The defintion for FDCC states that the term replaces the POSIX
      term 'locale', as the new entity is a superset of the locale
      as it is currently used. One may debate the point, but as a
      superset it fails to deal with basic issues of multiple
      concurrent support of differing formats (have 2 or more local
      currency formats as needed in Europe) and calendaring other than
      Gregorian. I would have expected more support from a new standard.

3. section 4 - FDDC-set. The paragraph beginning "..Other category
   names...". This is an unnecessary restriction and one that will
   cause problems with existing implementations. POSIX had no such
   restriction and as a result we have implementations that have
   introduced categories such as LC_TIMEZONE or LC_TOD etc.

   We could say that the six categories are mandatory in the FDDC-set.

   - The proposal states "In the event that some of the information
   for an FDCC-set category, as specified in this standard, is missing
   from the FDCC-set source definition, the behavior of that CATEGORY,
   if it is referenced, is unspecified." This is too restrictive, in
   that the complete category is 'wasted'. Perhaps a word of
   clarification is required. Does the proposed standard really want
   to have the complete category ignored? If so, there is a requirement
   on the object creation mechanism to issue a failure message during
   the compilation step of the 'FDCC-set object'.

4. section 4.1.0.5 - the third paragraph ("The items (2), ....") should
   be moved before item (2).

5. section 4.1.1. The portable character set should be mentioned because
   the next sub-section assumes some defaults that are characters in
   this set (which is also always a part of any charmap?).

6. section 4.1.1.1.
   - are these the only keywords allowed? Other
   keywords are allowed in POSIX. Statement should indicate that this
   list does not preclude others but that these are the minimum that are
   supported.

   - under "upper" etc. it states that "..if this keyword is not
     specified, the uppercase letters A through Z, shall automatically
     belong to his class...". This is fine when the keyword is absent.
     But in the opposite case this means that one can specify a whole
     range of characters under "upper" and exclude the A to Z set.
     This is not what you want. A statement should be made here that
     indicates that either:

      - one must include the portable character set when specifying
        characters in "upper", or
      - that these are automatically included if one does not include
        them in the specification for "upper"

     Of course, this applies to other keywords in this section as well.

   - under "graph", change "printable" to "graphical"

   - the keyword 'digit' ONLY allows the use of digits 0 through 9, but
     does not state whether they can be values in any language.

   - table 1:

         - the intersections of (upper, upper), (lower, lower). etc.,
           should be indicated as N/A (not applicable).

         - upper (row) should not be permitted in lower (column) and vice
           versa

7. section 4.1.1.2:

   - No information is provided on the API that may be able to use such
     information. The term transformation is used as a synonym for
     transliteration. Transliteration should be used and the term
     transform should be used to avoid confusion with other functions

     performing string transforms (UTF-n, layout transforms....)

   - add new sub-section numbers (4.1.1.2.x) for:

         - transform_start keyword
         - transform_end keyword
         - include keyword
         - default_missing keyword

   - suggest a sub-section for the example. Also, the text at the end
     of the example should be preceded with "...in the example above.."
     or words to that effect.

8. Add section 4.1.1.3 for the "i18n" LC_CTYPE.

9. The LC_CTYPE that is shown should be a model; it does not follow
   the order of the keywords shown in 4.1.1.1. - it should.

  -Also, if one looks under "toupper" in 4.1.1.1, it states that
   "...only characters specified for the keywords lower and upper
    shall be specified". In this definition of LC_CTYPE, "toupper" is
   defined. Unfortunately, the keywords "lower" and "upper" are
   NOT specified!!

  -The LC_CTYPE is incomplete because all the Uxxxx characters are not
   shown. Ideally, this LC_CTYPE should be complete. Failing that, the
   incompleteness should be addressed and acknowledged.

10.section 4.1.2:

  - item (8): needs to be reworded to be clear but at a minimum replace
    "from behind" with "backwards"

  - "...The following keywords ..": states that the keywords are
    described in detail later. This is not wholly true because the
    first two keyword are not detailed later.

  - coll_weight_max: stated that the minimum value is 7 and that this is
    also the default. This is not the case as per the example for
    LC_COLLATE. There this value is 4.

11.section 4.1.2.4: third paragraph, third sentence - "The first operand
   .... this <script_symbol>." Expand the sentence to end with
   "or another "order_start" keyword is encountered".

12.section 4.1.2.5: this really does not belong in the explanation
   of keywords and as such should really appear after 4.1.2.12.

   Further in the example, <LOW>  and <ss> need to be explained.

13.remove sub-section heading 4.1.2.11 because it is not needed.

14.section 4.1.2.13: assumption here is that there is wide demand for
   this function. Most folks do not deal with locale construction
   so the benefit of a 'shorthand' way of changing locale source will be
   lost for the masses. All of this capability does not provide dynamic
   run time overrides, only deals with the current static model of
   previously defined source files. It also presumes the use of rather
   large all encompassing locales.

15.sub-section 4.1.2.13.6: this example shows two LC_COLLATE statements.
   Is this correct? Which one takes precedence?

16.section 4.1.3: the i18n LC_MONETARY category shown:

     - why is the "mon_decimal_point" shown as <U002C>? This is not
       culturally neutral, nor is it the only internationally accepted
       value.

     - the "-1" value for int_frac_digits through to n_sign_posn is
       incorrect in that the value "-1" is not described in the keyword
       text that precedes this definition.

     - why are there no entries for int_p_cs-precedes etc. when these
       are identified as keywords. In the description of the keywords,
       there is no indication as what will happen when these keywords
       are omitted.

     - how are occurrences of multiple currencies, such as EURO and the
       local country currency, proposed to be handled?

16.section 4.1.4: the i18n LC_NUMERIC category shown:

     - why is the "mon_decimal_point" shown as <U002C>? This is not
       culturally neutral, nor is it the only internationally accepted
       value.

18.section 4.1.5: this is the first time that the word "mandatory" has
   been used for keywords. Does this means that all other keywords in
   the other categories are optional?

     - abmon and mon keywords: the current restriction of twelve months
       is not correct; it does not allow 13 Hebrew months to be shown.

     - what is the effect of when the "am_pm" keyword is an empty string?
       in general, what the effect of empty keywords?

     - optional keyword support for 'era' and alternate digits is
       rather short sighted in that these are 'mandatory' for
       far-east support.

19.section 4.1.5.1: table 2:

     - %m - change from (01-12) to (01-13)

     - if the timezone information is application defined, per note
       at the end of the table, then %Z should really be removed.
       The better suggestion is that the category be expanded to
       handle timezone and not leave it up to the application.

20.section 4.1.5.2:

    - %Of should be changed as per the new %f in section 4.1.5.1

21.section 6: repertoiremap is incomplete (misses, for example, the
   section between U06AF and U1e00).

___________ end of Canada Comments ___________________________________
__ beginning of Denmark Comments Accompanying Affirmative Vote ______

Hereby the Danish Standards vote on SC22 N2504 - CD 14652

1. The vote for CD registration is "Yes"

2. The vote on the CD ballot is "Yes" with comments.

2.1 There is a need for support of an alternate currency, such
as the EURO. We propose keywords such as the current for international
currency and domestic currency, but with a "2" added to
each of the keywords, and a "currency_rate" keyword for the fixed
currency rate.

2.2 There is a need for equivalencing of weights in the LC_COLLATE 
specification, eg by a "weight_equivalence" keyword.
This to accomodate different weight naming schemes.

2.3 Some extra keywords in the LC_MESSAGES category should
be added, such as "yesstr" "nostr" and "cancelstr"

2.4 Support for ISO 2022  for extended charmap specifications
are needed.

_______________ end of Denmark Comments ____________________________
_____ beginning of Japan comments accompanying negative vote _______

Japan disapproves document SC22 N2504 (CD 14652) to be registered as
Committee Draft (CD).

Comments
1.  The scope of this project is to specify specification method of
cultural conventions more than what POSIX supports.  The draft CD covers
nothing more than POSIX. Therefore, the document N2504 does not satisfy the project
objective. At the project subdivision ballot, Japan asked the difference
from POSIX locale definition method.  The disposition for the comment is
described in SC22 WG20 N269 (Disposition of comments received on WG20  
proposal to subdivide project JTC1.22.30.01.01 to include a project on:
Cultural convention specification SC22 N1574).  The disposition of comment
commits that this project covers more than what current POSIX does.

If there is no intention to add any more cultural conventions (more than
POSIX) at the first publication, this project should be canceled.

2.  The SC22 WG20 N269 indicates that the candidates of the "extended
cultural conventions" as follow: Data input, Multi-lingual
synchronization, Measuring system, Paper size and Postal address.
Out form candidate, Japan recommends to add at least Paper size,
Measurement system and Postal address.  In addition to that, Japan  
recommend to consider to add "colour specification including colour
systems and name of color" and "name of person".

3.  When a cultural convention specification method more than POSIX does
specify, to make FDCC-set compatible with POSIX, it is necessary to
provide a FDCC-set specification method (method to specify which FDCCs are
included in specified FDCC-set). Add clause of  "specification method of
FDCC-set".

4.  It is anticipated that more cultural conventions to be added in this
standard in future. There is a need to have a guide line to specify the
new cultural convention specification methods. Add clause of  "the guide
line".

5.  There are many technical and editorial comments on the documents N2504.
Those comments are a part of CD ballot.

6.  Confirm whether if the difinition of FDCC and FDCC-set are compatible
with TR 11017. There is very high possibility that they are different each
other.  If there is, then aline the terminology with TR 11017.   (This may
resolve most of above comments)
------end of registration ballot comments

-----CD BALLOT COMMENTS (Japan)-----

Japan disapproves the document SC22 N2504 (CD 14652) as Committee Draft
(CD) with following comments:


J-1)  General:
The CD text is only a minor enhancement of a POSIX locale specification
method and does not include any new categories which are declared to be
investigated in SC22 WG20 N 269 -- disposition of the comments to NWI
ballots.

This project should be abandoned if it would include no new categories not 
included in a POSIX locale specification method.  The extension of
collation method should be moved to ISO/IEC 14651 in that case.

note: this is the same comment as the CD registration ballot.  See the
registration ballot comment for detail.

J-2)   p.2, FOREWORD:

The paragraph
        The Standard uses text from ISO/IEC 9945-2:1993 "Information
        Technology - Portable Operating System Interface (POSIX) Part 2:
        Shell and Utilities". The major differences from this text is
        listed in annex A.
should be removed.


J-3)   p.4, 1.Scope

The sentence
        The specification is compatible with POSIX locale specifications
        (10), and a locale conformant to POSIX specifications will also be
        conformant to the specifications in this Standard, while the
        reverse condition will not hold.
should be changed to
        The specification is upward compatible with POSIX locale
        specifications(10) -- a locale conformant to POSIX specifications
        will also be conformant to the specifications in this Standard,
        while the reverse condition will not hold.

J-4)    p.4, 2. Normative referemces:

The following references
        (1) ISO 639  Code for the representation of names of languages
        (2) ISO 646  Information technology - ISO 7-bit coded character
                     set for information interchange
        (3) ISO/IEC 2022  Information technology - Character code
                     structure and extension techniques
        (4) ISO 3166 Code for the representation of names of countries
        (7) ISO/IEC 8824 Information technology - Open Systems
                     Interconnection - Specification of Abstract Syntax
                     Notation One (ASN.1)
        (8) ISO/IEC 8825 Information technology - Open System
                     Interconnection - Specification of Basic Encoding
                     Rules for Abstract Syntax Notation One (ASN.1)
        (9) ISO/IEC 9899 Information technology - Programming Language C.
should be removed because those standards are not referenced or referenced
only in informative part (ISO 646).


J-5)   p.6, 3.1.12 collation:

The text
        These rules identify a collation sequence between the collating
        elements, and such additional rules that can be used to order
        strings consisting of multiple collating elements.
should be removed because it is too detailed as a definition and it is
vague
-- there is no explanation for what rule is additional.


J-6)   p.6, 3.1.17 affirmative responses:

The definition should be removed because the term is understandable  
without definition.

If they remain, the definition should be changed from:
        An input string that matches one of the responses acceptable to  
        the LC_MESSAGES category keyword "yesexpr", matching an extended
        regular expression in the current FDCC-set.
to:
        A string conforming to the definition of LC_MESSAGES category
        keyword "yesexpr".


J-7)   p.6, 3.1.18 negative response:
(the same comment as 3.1.17 affirmative)


J-8)   p.7, 3.2.1 Format of syntax descriptions:

The text
        The format of each parameter is given by an escape sequence as
        follows:

        %s      specifies a string
        %d      specifies an decimal integer
        %c      specifies a character
        %o      specifies an octal integer
        %x      specifies a hexadecimal integer
        %%      specifies a single %
        \n       specifies an end-of-line

        All other characters in the format string represent themselves.
should be changed to
        The format of each parameter is given by an escape sequence as
        follows:

        %s      specifies a string
        %d      specifies an decimal integer
        %c      specifies a character
        %o      specifies an octal integer
        %x      specifies a hexadecimal integer

        All other characters in the format string except

        %%      specifies a single %
        \n       specifies an end-of-line

        represent themselves.


J-9)    p.7, 3.2.3 Ellipses:
The definitions here are not consistent with thier expression in 5.1
Caharcter set description file (pp.45-46). The text here should be changed
as to match with POSIX and the explanation in 5.2 should be removed.


J-10)   p.8, 4. FDCC-set:
In the sentence
        This standard defines a normative FDCC-set named "i18n" with
        values for each of the above categories.
the word "normative" is redundant.  It should be removed.


J-11)   p.9, 4.1 FDCC-set Definition, para."The categrory body ...":

The restriction
        Each keyword within a FDCC-set shall have a unique name (i.e.,
        two categories cannot have a commonly-named keyword);
should be removed because it loads a heavy burden on designing each
categories -- even in this draft, the keyword "copy" is defined in more  
than two categories.


J-12)   p.9, 4.1 FDCC-set Definition:

The subclauses 4.1.0.1 - 4.1.0.5 are ill-structured because they have not
their direct superior subclause 4.1.0.

The content of 4.1.0.5 should be moved before 4.1.0.1 without being put
into a subclause and a new subclause title "4.1.0 Pre-category lines"
should be introduced before 4.1.0.1:.


J-13)   p.9-10, 4.1.0.3 repertoiremap:

Make clear how many repertoiremap specification is allowed in a FDCC-set.


J-14)   p10, 4.1.0.4 charmap:

The sentence
        For the actual use of a FDCC-set, at most one charmap may be in
        use, and this may be different from any charmap specified with the
        "charmap" line.
needs more explanation.


J-15)   p.11, 4.1.0.5 Character representation:

Add a new rule for UCS-notation, <Uxxxx> and <UXXXXXXXX>,
which looks like symbolic names but not defined in a charmap file.


J-16)   p.10, 4.1.0.5 Character representation:

The text
        Individual characters, characters in strings, and collating
        elements shall be represented using symbolic names, as defined
        below. In addition, characters can be represented using the
        characters themselves, or as octal, hexadecimal, or decimal
        constants. When nonsymbolic notation is used, the resultant
        FDCC-set definitions need not be portable between systems.  The
        left angle bracket (<) is a reserved symbol, denoting the start of
        a symbolic name; when used to represent itself it shall be
        preceded by the escape character. The following rules apply to
        character representation:

        (1) ...
is confusing. It should be changed to
        Individual characters, characters in strings, and collating
        elements shall be represented using symbolic names, UCS notation
        or characters themselves, or as octal, hexadecimal, or decimal
        constants as defined below.   When constant notation is used, the
        resultant FDCC-set definitions need not be portable between
        systems.

        (0)
        The left angle bracket (<) is a reserved symbol, denoting the  
        start of a symbolic name; when used to represent itself it shall
        be preceded by the escape character.

        (1) ...


J-17)   p.11, 4.1.0.5 Character representation, (1):

The sentence
        The symbolic name, including the angle brackets, shall exactly
        match a symbolic name defined in a charmap file to be used, and
        shall be replaced by a character value determined from the value
        associated with the symbolic name in the charmap file.
should be changed to
        The symbolic name, including the angle brackets, shall exactly
        match a symbolic name defined in charmap files or repertoire map
        files to be used, and shall be replaced by a character value
        determined from the value associated with the symbolic name in the
        charmap file or a value asscociated to UCS in repertoire map
        files.


J-18)   p.11, 4.1.0.5 Character representation, (3)-(5):

It is confusing to include concatenated constants in each examples without
any definition.  The concatenated constants should be removed from the
examples of (3)-(5) and the explanation for concatenated constants should
be formed as a new rule as follows:

        (6) Multibyte characters can be represented by concatenated
            constants specified in byte order with the last constant
            specifying the least significant byte of the character.
            Concatenated constants can include a mix of the above 
            character representations.


J-19)   p.11, 4.1.0.5 Character representation, end:

The "Editor's note" here makes no sense. It shoud be removed.


J-20)    p.12, 4.1.1.1 Basic keywords

The specification of digit
      digit   Difine the characters to be classfied as numeric digit.
      Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 shall be specified,
      and in ascending sequence by numerical value.  If this keyword is
      not specified, the digits 0 through 9, shall automatically belongs
      class, with application-defined character value.
is ambiguous.   Make clear the relation between the digit 0-9 and <zero.
..<nine> in table 3 portable character


J-21)    p.14-16, 4.1.1.2 Character string transformation:

The concept of character string transformation is not mature yet.  It is of
no use to specify only one transformation without any specific meaning.
It should be removed.


J-22)    p.16, 4.1.1.2 Character string transformation:

The "i18n" FDCC-set is not a matter of 4.1.1.2 Character string
transformation. A new subclause title "4.1.1.3 i18n-LC_CTYPE" should be
added just before the line beginning with "The "i18n" FDCC-set for the
LC_CTYPE ...".


J-23)    p.16, "i18n" FDCC-set

IPA characters should be removed from "toupper"and "alpha"
Presentation form characters should be removed from "alpha"

J-24)    p23, 4.1.2 LC_COLLATE:

The capabilities text
(9)      Easy reordering of characters. The "i18n" FDCC-set has a 
collation specification that with just a few modifications can be
culturally correct for a specific culture.  Here the "reorder-after"
keyword gives a convenient way to modify a FDCC-set.

(10)      Easy reordering of scripts. The "i18n" FDCC-set gives an  
ordering of the scripts that may not be culturally acceptable in certain
cultures. The keyword "reorder-script-after" gives a convenient way to
modify the order of scripts in a FDCC-set.

should be changed to

        (9)     Easy reordering of characters.
        ISO/IEC 14651 has a template for collation specification that with
just a few modifications can be culturally correct for a specific
culture. Here the "reorder-after" keyword gives a convenient way to modify
a FDCC-set template.

        (10)    Easy reordering of scripts. The template in ISO/IEC 14651
gives an ordering of the scripts that may not be culturally acceptable in
certain cultures.  The keyword "reorder-script-after" gives a convenient
way to modify the order of scripts in a FDCC-set template.


J-25)   p24, 4.1.2 LC_COLLATE:

Add summaries of toggling keywords -- "define", "ifdef" etc. -- just  
before 4.2.2.1.


J-26)   p.25, 4.1.2.4 "order_start" keyword:

The text here become very confusing by mixing <script-symbols> and
<sort-rules>.  Text should be changed based on two syntax forms

        "order_start %s;%s;...;%s\n", <sort-rules>, <sort-rules> ...
and
        "order_start %s;%s;...;%s\n", <script-symbols>, <sort-rules>,
                <sort-rules> ...


J-27)   p.25, 4.1.2.4, "order_start" keywords, directives "forward" and
"backward":

Give a difinition of "substring" and add a sentence
      The direction of scanning substrings is towards the logical end of
      the string.
to the explanation of the directives "forward" and "backward".

note: related discussion in the past.
> > Accepted in principle. The "forward" directive has wordings of
> > scanning towards the logical end, while the "backward" directive
> > scans towards the beginning of the string. Or was something else
> > meant, such as scanning for collating elements?
>
> Let's consider an example string "ABC123GHI" where A..Z are "backward"
> and 1..9 are "forward".  In this case, three substrings "ABC", "123",  
  and "GHI" produces a key "CBA", "123", and "IHG" respectively from the
  current specification but there is no explanation how to combine those
  subkeys.
>
> The sentence
>       The direction of scanning substrings is towards the logical
>       end of the string.
> assures that those subkeys are combined in "forward" manner resulting in
> "CBA123IHG".

There is no prescribed resulting string as per the specifications,
this in an implementation detail.
------end of related discussion----


J-28)   p.32, 4.1.2.13.3 "elif" keyword:

The definition is incomplete -- the effect of preceding block is not
considered.


J-29)   p.33, 4.1.2.14 "i18n" LC_COLLATE category:

The sentence
        The "i18n" FDCC-set LC_COLLATE category is defined in ISO/IEC 14651
(12). 
should be changed to
        There is no "i18n" FDCC-set LC_COLLATE category. Instead of the
default ordering, the common template for tailoring defined in ISO/IEC
14651 (12) should be used.

note: related discussion in the past.
> > Rejected. The "i18n" FDCC-set will be using the data of IS 14651.
> > When referring to the "i18n" FDCC-set sorting needs to be defined.
>
> At the Quebec meeting, we agreed not define a "default" ordering
> in IS 14651.

Yes, but that does not influence 14652 in the way you indicate.
One can always modify the i18n fdcc-set for a given culture.
------end of related discussion-----


J-30)   p.36-37, 4.1.3 LC_MONETARY and IC:4.1.4 LC_NUMERIC:

The default values for "mon_decimal_point" and "decimal_point" should be
changed from <U002C> ( = ',') to <U002E> ( = '.') according to the
standard ISO 6093:1985  Information processing - Representation of
numerical values in character strings for  information interchange which
should be added into 3. Normative
references.


J-31)   p.37,  4.1.5  LC_TIME

Following old comments are still opened:
- Need to have a specification method of date and time convention for 
Luna calendar which year is not 365 days.
- Need to have a specification method of data and time convention that week
is not seven day. (historical buddist calendar)


J-31)   p.40, 4.1.5.2 Modified Field Descriptors:

The value
        d_t_fmt "<%><a><SP><%><F><SP><%><T>" -- 2 1997-10-07 10:00:01
should be changed to
        d_t_fmt "<%><F><(><%><a><)><SP><%><T>" -- 1997-10-07(2) 10:00:01


J-32)   p.41, 5. CHARMAP:

The sentence
        Conforming charmaps shall support the portable character set
        specified in Table 3 is ambiguous. It should be changed to Each
        charmap shall support the portable character set specified in
        Table 3
or
        A set of charmaps for a FDCC-set shall support the portable
        character set specified in Table 3


J-33)   p.39, Table 2:

The discussion did agree as follow:
> > > 2. The LC_TIME %f format should return "1" for the first day of the
>     week, etc, and "7" for the 7th day of the week. Returning a string
      with a "0" for the first day of the week is misleading, and this is
      not used for indexing in arrays, but for display in strings.
> >
> > Accepted.
>
> The newly introduced escape sequence
>
>       %f      Weekday as a decimal number (0(Monday)-6).
>
> modified to
>
>       %f      Weekday as a decimal number (the first day 1 - the last  
                day 7).

However, this document is:

>       %f      Weekday as a decimal number (1(Monday) - 7)

Is this agreeable with POSIX?
-------end of the comment on the past discussion-------


J-34)  p.46   5.1 Character set description File, para. "Theencoding
part...":

If this paragraph remains (we have requested to remove the explanation of 
character representation in 5.1 including this paragraph already),
the sentence
          In a portable chrmap file, each constant shall represent an 8  
          bit byte
should be removed because the concept of " a portable character set ifile"
is not defined in this draft----a portable character set is only defined.

J-35)  p48-75,  "i18nrep" repertoire

It is not necessary to define such a confusing symbol name set in an
international standatd.


J-36)  Conclude an open discusion below.
--------Start of open discussion B-------

>> 4. What does "byte" means in this standard? Since this standard
>>    does not require any "processor", an individually addressable
>>    unit of storage does mean nothing.
>
>The standard is meant for processing in IT environments,
>so there is always a processor behind it somewhere.

Since the document you are writing is "Standard", please do not hide
anything behind the preparations of the standard text.  If a "processor"
exists, you should describe the processor and conformance of the
"conformning implementation of the processor" in the conformance section.
Also, you need to change its scope and title of the standard.  It is not
"specification method", but a "language".

Also, please do not specify something that is helpful for something as
requirement.
 You should specify what are mandatory requirements for conformance,
 and what are allowable extensions of the standard. Please you "may"
 for what are allowable extensions.

I suppose, the conformance clause of your standard could be POSIX or C like
things, if you would like to specify syntax of "cultural convention
language and its compiler".
Please note that I do not say "I agree with the scope change", but just say
please write the standard text appropriately for the scope. Otherwise, reviewer
may be confused.

> I am not sure we need to describe a processor for this,
> but we can discuss it at the next WG20 meeting.

I believe that you are an expart of C language, right?
Please carefully read the conformance section of C language.
C standard specity 2 different conformance. One is conforming "processor",
and theother is conforming "application".  As you may know well,
"Implementation" of C language standard is language "processor" of C
language, I mean compiler. Then C language standard specifies how the
comforming processor shall behave. In addition to that, C langauge
standard specifies how comforming C application shall be written. That is
application conformance.

My point is what is the purpose of your standard. If you would like to
specify how cultural  convention set shall be specifyed, I mean
application conformance, the phenominon that conform to your standard is
just description of cultural convention, maybe written on a paper.
Then you do not need to care about "processor". But, if you like to  
specify limitation and/or parameter for a cultural convention set
description file "processing" system, then the preparation of your
standard will looks like language standard, and you should specify both
implementation conformance and application conformance.

O.K. you should familiar with POSIX. Main portion of your standard comes
from POSIX.2 localedef utility. POSIX also specify implementation
conformance of localedef utility itself, and application conformance for
localedef file. The limitation of maximam byte is for localedef utility.

> > Also, please do not specify something that is helpful for something
> > as requirement. You should specify what are mandatory requirements
> > for conformance, and what are allowable extensions of the standard.
> > Please you "may" for what are allowable extensions.
>
> The thing in question was inherited from POSIX.
> We need at least to maintain it for POSIX compatibility.

Do not be afraid. You can simply says that implementation may extend its
syntax and specify something something. Then POSIX conforming localedef
file becomes your standard conforming.

Please note that the objective of the POSIX compatibility is make POSIX
conforming localedef file conformity to your standard, not to make your
standard conforming ones as POSIX conforming localedef file.

> I am not great expert in writing conformance clauses, but I
> hope I am learning. I at this time do not see the big difference
> between a programming language and a specification method.
> The specification method is to be interperted by some IT system, just
> like a programming language.

Keld, please please do not say "I'm not expert in writing some part of
standard"
. If you say so, need to say you should not be project editor. Project
editor need to have enough capability of writing standard text, even he is
not an expart on the subject technology area.

It is our WG20's credibility problem. Thus, I need to come back from open
discussion on sc22wg20 mailing lsit to parsonal mail.

Anyway, you should try. Without having your new draft, we can not discuss
it in the next WG20 meeting. Then we can not send new text to SC22 for CD
ballot.

In order to send new text to SC22 immediately after the next WG20 meeting, 
all of is sues should be resolved in the next WG20 meeting, and revised
version need to be prepare in the meeting.

-------end of open discussion B-------



 -- Minor editorial --

J-36)   p.9, 4.1.0.2 escape_char:

The sentence
        All examples this standard uses "/" as the escape character,
        except where otherwise noted.
should be changed to
        All examples in this standard uses "/" as the escape character,
        except where otherwise noted.


J-37)   p.11, 4.1.1 LC_CTYPE:
"in clause 3.2.5" should be changed to "subclause 3.2.3".


J-38)   p.13, Table 1:
The line "In    Can also belong to" should be removed.


J-39)   p24, 4.2.2.1 "script" keyword:
"4.2.2.1" should be changed to "4.1.2.1".


J-40)   p24, 4.2.2.3 "collating-symbol" keyword:
"4.2.2.3" should be changed to "4.1.2.3"

_______________ end of Japan comments ___________________________________
______ beginning of Netherlands comments accompanying negative vote ___

The NNI votes NO on CD 14652 in SC22N2504.
These no votes pertain to both the registration vote and the document vote.
The NNI will vote yes on the CD registration when the comments under -1-
and -2- have been properly resolved.
The NNI will vote yes on the CD when the comments under -1-, -2- and 
-3- have been properly resolved.

The NNI has the following comments:

-1- Market relevance

The way this specification has been phrased effectively limits the 
use of this specification to POSIX/Unix and C platforms.
This market is rather small; much larger markets and existing
notations for a Cultural Conventions Specification seem to have 
been ignored.
Even in this small market this document addresses only a minor part 
of 9945-2 and is understood to provide a small improvement to 
that specification.
It is unclear whether the POSIX market will accept such slight 
improvements to this 9945-2 standard in a separate document.

The NNI is of the opinion that WG20 has, and should have, a much 
broader scope than the POSIX platform and considers such a 
specification of limited applicability unacceptable.

The NNI suggests the following course of actions:
(a) WG20 is requested to develop a Platform and Language Independent
    Specification (PLIS) for a Cultural Conventions Specification (CCS-PLIS).
    This CCS-PLIS describes the functionality needed for a CCS without 
    any reference to windows, files, programming languages and 
    other implementation issues.
(b) WG20 is requested to provide implementations of this CCS-PLIS for 
    major platforms, amongst which the Wintel, the Macintosh, the POSIX 
    and mainframe platforms.
(c) The CCS-PLIS for POSIX is to be developed in cooperation with WG15.
It should be noted that the CCS-PLIS should be defined in such a way that
for each of the platforms mentioned above conformance clauses with respect
to the CCS-PLIS can be specified.

-2- Relation to Framework document

The relation between this document and the Cultural Dependent Items 
formulated in the Framework Document is unclear.
The Framework Document mentions that WG20 will deliver, amongst others,
specifications for the following cultural dependent items:
hyphenation of words, word representations of numbers, writing directions,
voice messages and postal addressing formatting.
The NNI had expected that the now presented 14652 document would contain 
such specifications. 
The NNI requests the following information from WG20:
- will these additional items be added to 14652 in the (near) future?
- if so, what will be the life expectancy of the current 14652 document?

-3- Technical comments

(a) The lexical and syntactic structure of the files has been specified 
    incompletely.
    The document cannot be understood without knowledge of 9945-2.
    The document mixes lexical/syntactical structure and 
    semantics of the specification.
    It is requested that a complete syntactical definition is given
    using EBNF (ISO 14977, or a variant thereof) and that a clear
    separation between lexical structure, syntactical structure and
    semantics will be maintained in the document.
(b) The definitions as given in section 3 are unclear, incomplete in 
    some cases, over-complete in other cases and self-contradictory in 
    a few cases.
    It is requested that this section is redeveloped, preferably
    in an axiomatic style.
    The document itself contains much terminology that has not been 
    defined in section 3.
(c) The document seems to mix-up the concepts of `value' and `constant'.
(d) The numbering system used is highly inconsistent:
    There are two sections 4.1.2.13.5 and two sections 4.1.2.13.6
    After 4.1 follows 4.1.0.1
    WG20 is requested to debug their documents before presenting them
    to the NBs.
(e) the relationship between this document and CD 14651 is
    unclear: is it for instance possible that a system comforms
    to 14651 and not to 14652? The relationship needs yo be
    explaned.

__________________ end of Netherlands comments __________________________
_____ beginning of USA comments accompanying negative vote ____________

The US National Body votes to Disapprove the CD Registration and the CD
Ballot for ISO/IEC CD 14652.  See comments below:

General Comments

Re 4.1.1 LC_CTYPE

While the presence of the LC_CTYPE specification in CD 14652 is
understandable, given the fact that CD 14652 is derivative from ISO/IEC
9945-2 (itself derivative from the XPG-4 specification of locale), it is
inappropriate to extend the LC_CTYPE mechanism for dealing with
character properties to cover the repertoire of ISO/IEC 10646.

ISO/IEC 10646 specifies the *Universal Character Set*, and in the
context of the Universal Character Set, character properties of the type
that LC_CTYPE is concerned with are best treated as inherent to the
characters. It would be correct to enumerate these properties in a
standard- perhaps even in 14652, if not 10646 itself-but it is incorrect
to imply, through the general FDCC-set syntax spelled out in 14652, that
it is o.k. to redefine any of these properties in an FDCC-set
definition, the same way that LC_MONETARY or LC_NUMERIC entries can be
tailored for local cultural conventions.

Character properties are *not* subject to local cultural conventions. It
is *not* acceptable to redefine GREEK SMALL LETTER TAU to be uppercase,
or to define CIRCLED DIGIT SIX to be punctuation, for example. Such
definitions do not belong in specifications for *cultural conventions*,
or if character properties must be defined there, they should at least
be clearly earmarked as different from all other categories of an
FDCC-set.

The one obvious exception to this generality is case-mapping.
Case-mapping relations do vary by language (with well-known examples for
Turkish, French, and German). The specification of the LC_CTYPE
"properties" <toupper> and <tolower> should be clearly marked as
exceptional in this way. CD 14652 should give the default case-mapping
values for the "i18n" FDCC-set, as shown, and then specify that these
particular values should be redefined or overridden to obtain correct
cultural specification for case-mapping for Turkish, for French, or
whatever.

******

Re 3.1.6 FDCC-set

The introduction of this new term seems unnecessary. The concepts
presented in CD 14652 are so closely modeled on the XPG-4 notion of
"locale" (except for the attempt to extend the character set coverage to
10646 and expand the concept of LC_COLLATE), that the new term obscures
rather than clarifies what 14652 is about.  Retention of the term
"locale" or perhaps a adjectivally modified version of the term "locale"
("extended locale" ?) would be preferable.

******

Re 3.2.3 Ellipses

The introduction of distinctions between two-dot, three-dot, and
four-dot ellipses seems overly complex and subject to error in use.
Furthermore, the explanations, both on pages 8 and 41ff are confusing.

If such distinctions between range notations must be maintained, they
should be better described, with clearer examples.

Also, it is generally better practice to simply have a single range
notation for a formal syntax, while maintaining clear syntactic
differentiation of the elements which can form the items at each end of
a range. So if the FDDC-set syntax must distinguish a range a symbols, a
range of decimal values, a range of octal values, a range of hexadecimal
values, and so on, the notation for "symbol", "decimal value", "octal
value", "hexadecimal value", and so on should be unique and mutually
exclusive, so that interpretation of the type of range does not depend
on the number of dots.

******

Re 4.1.2 LC_COLLATE

The syntax introduced for tailoring a collation sequence definition for
cultural conventions is overly complex. It is very tightly coupled to
the specific way in which a collation is defined in CD 14651, which
itself is in question. A much simpler syntax has been promulgated by the
Java developers to accomplish the same task, and it would be desireable
to examine the alternatives before standardizing an LC_COLLATE syntax of
unnecessary complexity. Unlike most of the rest of the categories
involved in an FDCC-set definition, which merely specify lists of
things, the LC_COLLATE syntax introduces notions of scope, reordering,
and a macro control language. Granted that reordering rules are needed
for defining collations, it is unclear that all of the rest of the
syntax is.

Re B.1.2 LC_COLLATE Rationale

This states "The syntax for the LC_COLLATE category source is the result
of a cooperative effort between representatives for many countries and
organizations working with international issues, such as UniForum,
X/Open, and ISO,..." We believe that this intentionally overstates the
degree of cooperative effort involved and omits the fact that there is a
serious lack of consensus in the international community, both about how
to define the international string ordering and how to specify a syntax
for tailoring it. Major implementors of international string ordering
based on 10646 disagree with the approach taken in these drafts, and the
standard should not paper over those differences with misleading
implications that everyone agrees about how to do it.

p. 74. In the rationale for LC_COLLATE, there is an estimation made that
the standard covers the requirements for European languages, and that it
will extend well to cover Cyrillic and Middle Eastern scripts (see below
for editorial comment), and for the level 3 collation required for
Chinese and Japanese. However, the standard will fail for dealing with
scripts (such as Thai and Lao) that require *reordering* of characters
within a string before calculating weights. That fact should be noted.
Furthermore, the standard deliberately ignores the role of combining
marks in collation. Implementation of 10646 with combining marks is not
well-guided by this standard. It is quite unclear how to modify an
LC_COLLATE definition to take combining marks into account. If combining
marks are out-of-scope for CD 14652, this should be clearly stated and
be consistently carried through. If they are not out-of-scope, then the
tailoring syntax for LC_COLLATE should either account for them, or CD
14652 should state clearly what the alternative approaches involving
tailoring of CHARMAP or REPERTOIRMAP could be, and how they would be
implemented, *with specific examples*.


================================================================

Specific Technical Comments

pp. 12 & 26: <space> and <blank>

It is unclear from either the definitions of <space> and <blank> on page
12, or from the specification of the "i18n" FDCC-set for LC_CTYPE why
certain space characters from the 10646 repertoire are not listed:

U+00A0 NO-BREAK SPACE
U+2007 FIGURE SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE

If having a <nobreak> property precludes a character from being included
in the <blank> or <space> types, that should be spelled out in the
definition of those categories.

*********

pp. 16-20: Bugs in the <toupper> and <tolower> tables

In the toupper table, the entry (<U0258>,<U018E>) is incorrect and
should be removed.

In the toupper table, (<U0275>,<U019F>) should be added.

In the toupper table, (<U1E9B>,<U1E60>) should be added.

In the tolower table, the entry (<U01DD>,<U018E>) has the items
reversed. It should read (<U018E>,<U01DD>).

In the tolower table, (<U019F>,<U0275>) should be added.

**********

pp. 20-21: <alpha> specification

The list of <alpha> characters for 10646 differs significantly from that
implemented for the Alphabetic category for Java.  Insistence on
maintaining a distinction, based on principled or unprincipled arguments
about the alphabetic status of this or that character, will lead to
implementation confusion between the Java community and those who
implement based on locales derived from the "i18n" FDCC-set. Given the
importance of Java, and the fact that it has already provided a
widespread, commercially significant answer to the question of which
10646 characters are alphabetic, the <alpha> category in CD 14652 (if
included here at all-see general comments above) should be harmonized
with the Java values.

A major defect in the <alpha> list is the omission of combining
characters from many scripts which clearly have the alphabetic property
(e.g. the combining vowel matras from Indic scripts).  Such omissions
would result in nonsensical specifications of alphabetic spans in such
scripts, if taken seriously.

To simplify correction of the CD 14652 text for the <alpha> property,
here is the suggested list, as implemented in Java (divided into
Alphabetic and Ideographic). (Not all unassigned subranges within these
ranges are separately called out, to make this list shorter.)

#Alphabetic

0041..005A LATIN CAPITAL LETTER A..
           LATIN CAPITAL LETTER Z
0061..007A LATIN SMALL LETTER A..
           LATIN SMALL LETTER Z
00AA       FEMININE ORDINAL INDICATOR
00B5       MICRO SIGN
00BA       MASCULINE ORDINAL INDICATOR
00C0..00D6 LATIN CAPITAL LETTER A WITH GRAVE..
           LATIN CAPITAL LETTER O WITH DIAERESIS
00D8..00F6 LATIN CAPITAL LETTER O WITH STROKE..
           LATIN SMALL LETTER O WITH DIAERESIS
00F8..02B8 LATIN SMALL LETTER O WITH STROKE..
           MODIFIER LETTER SMALL Y
02BB..02C1 MODIFIER LETTER TURNED COMMA..
           MODIFIER LETTER REVERSED GLOTTAL STOP
02E0..02E4 MODIFIER LETTER SMALL GAMMA..
           MODIFIER LETTER SMALL REVERSED GLOTTAL STOP
037A       GREEK YPOGEGRAMMENI
0386       GREEK CAPITAL LETTER ALPHA WITH TONOS
0388..0481 GREEK CAPITAL LETTER EPSILON WITH TONOS..
           CYRILLIC SMALL LETTER KOPPA
0490..0559 CYRILLIC CAPITAL LETTER GHE WITH UPTURN..
           ARMENIAN MODIFIER LETTER LEFT HALF RING
0561..0587 ARMENIAN SMALL LETTER AYB..
           ARMENIAN SMALL LIGATURE ECH YIWN
05D0..05F2 HEBREW LETTER ALEF..
           HEBREW LIGATURE YIDDISH DOUBLE YOD
0621..063A ARABIC LETTER HAMZA..
           ARABIC LETTER GHAIN
0641..0652 ARABIC LETTER FEH..
           ARABIC SUKUN
0670..06D3 ARABIC LETTER SUPERSCRIPT ALEF..
           ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
06D5..06DC ARABIC LETTER AE..
           ARABIC SMALL HIGH SEEN
06E1..06E8 ARABIC SMALL HIGH DOTLESS HEAD OF KHAH..
           ARABIC SMALL HIGH NOON
06ED       ARABIC SMALL LOW MEEM
0901..0939 DEVANAGARI SIGN CANDRABINDU..
           DEVANAGARI LETTER HA
093D..094C DEVANAGARI SIGN AVAGRAHA..
           DEVANAGARI VOWEL SIGN AU
0958..0963 DEVANAGARI LETTER QA..
           DEVANAGARI VOWEL SIGN VOCALIC LL
0981..09B9 BENGALI SIGN CANDRABINDU..
           BENGALI LETTER HA
09BE..09CC BENGALI VOWEL SIGN AA..
           BENGALI VOWEL SIGN AU
09D7..09E3 BENGALI AU LENGTH MARK..
           BENGALI VOWEL SIGN VOCALIC LL
09F0       BENGALI LETTER RA WITH MIDDLE DIAGONAL
09F1       BENGALI LETTER RA WITH LOWER DIAGONAL
0A02..0A39 GURMUKHI SIGN BINDI..
           GURMUKHI LETTER HA
0A3E..0A4C GURMUKHI VOWEL SIGN AA..
           GURMUKHI VOWEL SIGN AU
0A59..0A5E GURMUKHI LETTER KHHA..
           GURMUKHI LETTER FA
0A70..0AB9 GURMUKHI TIPPI..
           GUJARATI LETTER HA
0ABD..0ACC GUJARATI SIGN AVAGRAHA..
           GUJARATI VOWEL SIGN AU
0AE0       GUJARATI LETTER VOCALIC RR
0B01..0B39 ORIYA SIGN CANDRABINDU..
           ORIYA LETTER HA
0B3D..0B4C ORIYA SIGN AVAGRAHA..
           ORIYA VOWEL SIGN AU
0B56..0B61 ORIYA AI LENGTH MARK..
           ORIYA LETTER VOCALIC LL
0B82..0BCC TAMIL SIGN ANUSVARA..
           TAMIL VOWEL SIGN AU
0BD7       TAMIL AU LENGTH MARK
0C01..0C4C TELUGU SIGN CANDRABINDU..
           TELUGU VOWEL SIGN AU
0C55..0C61 TELUGU LENGTH MARK..
           TELUGU LETTER VOCALIC LL
0C82..0CCC KANNADA SIGN ANUSVARA..
           KANNADA VOWEL SIGN AU
0CD5..0CE1 KANNADA LENGTH MARK..
           KANNADA LETTER VOCALIC LL
0D02..0D4C MALAYALAM SIGN ANUSVARA..
           MALAYALAM VOWEL SIGN AU
0D57..0D61 MALAYALAM AU LENGTH MARK..
           MALAYALAM LETTER VOCALIC LL
0E01..0E2E THAI CHARACTER KO KAI..
           THAI CHARACTER HO NOKHUK
0E30..0E3A THAI CHARACTER SARA A..
           THAI CHARACTER PHINTHU
0E40..0E45 THAI CHARACTER SARA E..
           THAI CHARACTER LAKKHANGYAO
0E47       THAI CHARACTER MAITAIKHU
0E4D       THAI CHARACTER NIKHAHIT
0E81..0EAE LAO LETTER KO..
           LAO LETTER HO TAM
0EB0..0EC4 LAO VOWEL SIGN A..
           LAO VOWEL SIGN AI
0ECD       LAO NIGGAHITA
0EDC       LAO HO NO
0EDD       LAO HO MO
0F40..0F81 TIBETAN LETTER KA..
           TIBETAN VOWEL SIGN REVERSED II
0F90..10F6 TIBETAN SUBJOINED LETTER KA..
           GEORGIAN LETTER FI
1100..1FBC HANGUL CHOSEONG KIYEOK..
           GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI
1FBE       GREEK PROSGEGRAMMENI
1FC2..1FCC GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI..
           GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
1FD0..1FDB GREEK SMALL LETTER IOTA WITH VRACHY..
           GREEK CAPITAL LETTER IOTA WITH OXIA
1FE0..1FEC GREEK SMALL LETTER UPSILON WITH VRACHY..
           GREEK CAPITAL LETTER RHO WITH DASIA
1FF2..1FFC GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..
           GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
207F       SUPERSCRIPT LATIN SMALL LETTER N
2102       DOUBLE-STRUCK CAPITAL C
2107       EULER CONSTANT
210A..2113 SCRIPT SMALL G..
           SCRIPT SMALL L
2115       DOUBLE-STRUCK CAPITAL N
2118..211D SCRIPT CAPITAL P..
           DOUBLE-STRUCK CAPITAL R
2124       DOUBLE-STRUCK CAPITAL Z
2126       OHM SIGN
2128       BLACK-LETTER CAPITAL Z
212A..212D KELVIN SIGN..
           BLACK-LETTER CAPITAL C
212F..2131 SCRIPT SMALL E..
           SCRIPT CAPITAL F
2133..2138 SCRIPT CAPITAL M..
           DALET SYMBOL
2160..2182 ROMAN NUMERAL ONE..
           ROMAN NUMERAL TEN THOUSAND
3041..3094 HIRAGANA LETTER SMALL A..
           HIRAGANA LETTER VU
30A1..30FA KATAKANA LETTER SMALL A..
           KATAKANA LETTER VO
3105..318E BOPOMOFO LETTER B..
           HANGUL LETTER ARAEAE
AC00..D7A3 <Hangul Syllable, First>..
           <Hangul Syllable, Last>
FB00..FB17 LATIN SMALL LIGATURE FF..
           ARMENIAN SMALL LIGATURE MEN XEH
FB1F..FB28 HEBREW LIGATURE YIDDISH YOD YOD PATAH..
           HEBREW LETTER WIDE TAV
FB2A..FD3D HEBREW LETTER SHIN WITH SHIN DOT..
           ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM
FD50..FDFB ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM..
           ARABIC LIGATURE JALLAJALALOUHOU
FE70..FEFC ARABIC FATHATAN ISOLATED FORM..
           ARABIC LIGATURE LAM WITH ALEF FINAL FORM
FF21..FF3A FULLWIDTH LATIN CAPITAL LETTER A..
           FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A FULLWIDTH LATIN SMALL LETTER A..
           FULLWIDTH LATIN SMALL LETTER Z
FF66..FF6F HALFWIDTH KATAKANA LETTER WO..
           HALFWIDTH KATAKANA LETTER SMALL TU
FF71..FF9D HALFWIDTH KATAKANA LETTER A..
           HALFWIDTH KATAKANA LETTER N
FFA0..FFDC HALFWIDTH HANGUL FILLER..
           HALFWIDTH HANGUL LETTER I

#Ideographic

3007       IDEOGRAPHIC NUMBER ZERO
3021..3029 HANGZHOU NUMERAL ONE..
           HANGZHOU NUMERAL NINE
4E00..9FA5 <CJK Ideograph, First>..
           <CJK Ideograph, Last>
F900..FA2D <CJK Compatibility Ideograph, First>..
           <CJK Compatibility Ideograph, Last>

**********

pp. 43-70: "i18nrep" repertoire file

This list is arbitrarily chosen, and the principles for characters in it
are unstated. If the repertoire file is not going to correspond to one
of the named and numbered subsets of ISO/IEC 10646 (and Subset 300, the
BMP, would be the obvious choice), then the choice of characters in the
repertoire file *must* be justified in 14652.

On inspection, it is clear that many combining characters from 10646
have been omitted, but this is not done systematically or consistently.
For example, combining characters for U+064B ARABIC FATHATAN .. U+0652
ARABIC SUKUN *are* included. But if so, why not GERESH, etc., for
Hebrew?

On pp. 68-69, the C0 controls are duplicated in this list. They appeared
already (on page 43), with different mnemonics. This calls into question
the meaning of the REPERTOIREMAP file.  Are duplications of characters
allowed, in which case the REPERTOIREMAP file is really a definition of
the mnemonics by which characters can be referred to (e.g. <ESC> and
<EC>), or is it intended to be a listing of the characters in a
repertoire, in which case no duplications should be allowed?

If the intention is actually to define a repertoire, then the C1 control
functions defined on page 69 should be omitted.  These are not specified
by 10646 at all, and it is dangerous in 14652 to try to override the
function of other standards which specify the usage of C1 controls.

If the intention is, rather, to just define a bunch of short mnemonics,
then most of this entire listing is useless and should be omitted.
Introducing mnemonics such as <c*> for GREEK SMALL LETTER XI and <z%>
for CYRILLIC SMALL LETTER ZHE and <K%> for HEBREW LETTER FINAL KAF is
completely confusing. A very small percentage of these mnemonics has
seen widespread use in plaintext reference to accented characters.  The
rest should be completely abandoned in CD 14652 in favor of use of the
hexadecimal value as the unique symbolic identifier for a 10646
characters (e.g. <U0436>).

The pejorative and inaccurate note "(not a real character)" should be
dropped from the listing of combining characters on pp. 69-70.
Furthermore, it is completely unexplained why most of these are given
user-defined character values when they are actually encoded characters
in 10646. E.g.

 <"'> <UE003> NON-SPACING ACUTE ACCENT <ISO-IR-103_C2/> (not a real
character)
must be amended to:

      <U0301> COMBINING ACUTE ACCENT

with the correct 10646 encoding and character name.

Additional technical comments


This proposal is not ready for prime time.  We must coordinate this
standardization effort with Java and Win32 internationalization.  We
need to treat XPG4 as one of the contributing standards, not as the
standard being extended.

A1. The mapping of Unicode character types to POSIX LC_TYPE attributes
should be specified, but doing this by using the XPG4 LC_TYPE syntax is
not appropriate.  These character attributes are in general not
culturally specific.  The base POSIX character attributes are also
missing a large number of attributes needed for parsing a larger
character set.

A2. There are a small number of upper/lower case conversions which are
locale dependent.  Even in locales with such modifications (such as
Turkey) it is still necessary to have universal upper/lower functions to
be able to deal with matching of names (such as file names) which are
processed simultaneously in multiple locales.

A3. Cultural cases of differing case mapping should be defined as
exceptions, rather than building up a complete upper/lower table.  The
existing POSIX locales have tended to incompleteness in the case mapping
tables.

A4. Although the LC_COLLATE syntax is complex, it at least tries to
address the problems of doing override collation from a base collation
order.  This is similar to what Java has done, but in this case the Java
syntax is simpler than the 14652 proposal.  If we limit the scope of
what we expect this locale based sorting to do it is a usable
compromise.  Those people who need complex sorting including numeric
ordering, conversion of numerics to names, and phonetic reordering
should expect to use the locale as the basis for information but to
significantly pre-process the data.  For sorting file names a universal
multi-script collation with overrides for various locales is good
enough.

A5. The two letter mnemonics used in the i18nrep section are worthless.
I think the best solution is to use the "meaningful" names for the basic
latin characters and punctuation, and the unicode based names for other
characters.


================================================================

Editorial Comments

pp. 64 ff. "IDEOGRAPHIC" is consistently misspelled in the character
names. If this misspelling has not been caught, then all other character
names should be carefully checked against 10646 to ensure that they are
exactly correct.

Spelling errors:

p 11, 3rd paragraph "depreciated" --> "deprecated"

p. 42, 2nd paragraph, last line "an" --> "and"

p. 74 "with Slavic or Middle East character sets" should be corrected to
"for Cyrillic or Middle Eastern scripts".

__________________ end of SC22 N2612 _________________________________






