From rinehuls@Radix.Net  Thu Oct 14 17:59:55 1999
Received: from mail1.radix.net (mail1.radix.net [207.192.128.31])
	by dkuug.dk (8.9.2/8.9.2) with ESMTP id RAA17576;
	Thu, 14 Oct 1999 17:59:54 +0200 (CEST)
	(envelope-from rinehuls@Radix.Net)
Received: from saltmine.radix.net (saltmine.radix.net [207.192.128.40])
	by mail1.radix.net (8.9.3/8.9.3) with SMTP id MAA14133;
	Thu, 14 Oct 1999 12:00:55 -0400 (EDT)
Date: Thu, 14 Oct 1999 12:00:52 -0400 (EDT)
From: William Rinehuls <rinehuls@Radix.Net>
Reply-To: William Rinehuls <rinehuls@Radix.Net>
To: sc22info@dkuug.dk
cc: keld simonsen <keld@dkuug.dk>
Subject: SC22 N3024 - Summary of Voting on PDTR 14652: Specification Method for Cultural Conventions
Message-ID: <Pine.SV4.3.96.991013174512.11110C-100000@saltmine.radix.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

_____________________ beginning of title page _______________________
ISO/IEC JTC 1/SC22
Programming languages, their environments and system software interfaces
Secretariat:  U.S.A.  (ANSI)

ISO/IEC JTC 1/SC22
N3024

TITLE:
Summary of Voting on PDTR Ballot for PDTR 14652: Information technology -
Programming languages, their environments and system software interfaces -
Specification Method for Cultural Conventions (Technical Report, Type 1)

DATE ASSIGNED:
1999-10-13

SOURCE:
Secretariat, ISO/IEC JTC 1/SC22

BACKWARD POINTER:
N/A

DOCUMENT TYPE:
Summary of Voting

PROJECT NUMBER:
JTC 1.22.30.02.03

STATUS:
WG20 is requested to prepare a Disposition of Comments Report and make a
recommendation on the further processing of the PDTR.

ACTION IDENTIFIER:
FYI to SC22 Member Bodies
ACT to WG20

DUE DATE:
N/A

DISTRIBUTION:
Text

CROSS REFERENCE:
N2955

DISTRIBUTION FORM:
Def


Address reply to:
ISO/IEC JTC 1/SC22 Secretariat
William C. Rinehuls
8457 Rushing Creek Court
Springfield, VA 22153 USA
Telephone:  +1 (703) 912-9680
Fax:  +1 (703) 912-2973
email:  rinehuls@radix.net

____________ end of title page; beginning of overall summary ___________

                       SUMMARY OF VOTING ON

Letter Ballot Reference No:  SC22 N2955
Circulated by:               JTC 1/SC22
Circulation Date:            1999-07-07
Closing Date:                1999-10-08

SUBJECT:  PDTR Ballot for PDTR 14652:  Information technology -
Programming languages, their environments and system software interfaces -
Specification Method for Cultural Conventions (Technical Report Type 1)

-----------------------------------------------------------------------
The following responses have been received on the subject of approval:


"P" Members supporting approval
       without comment                   7

"P" Members supporting approval
       with comment                      1

"P" Members not supporting approval      4

"P" Members abstaining                   1

"P" Members not voting                   8

"O" Members supporting approval
       without comment                   1

----------------------------------------------------------------------
Secretariat Action:

The comment accompanying the abstention vote from France was:  "Due to
lack of resources."

The comments accompanying the affirmative vote from Canada and the
comments accompanying the negative votes from Denmark, Germany, Japan and
the United States of America are attached.

WG20 is requested to prepare a Disposition of Comments Report and make a
recommendation on the further processing of the PDTR.


______ end of overall summary; beginning of detail summary _____________

                 ISO/IEC JTC1/SC22  LETTER BALLOT SUMMARY
                                    

PROJECT NO:    JTC 1.22.30.02.03

SUBJECT:  PDTR Ballot for PDTR 14652:  Information technology - Programming
          languages, their environments and system software interfaces -
          Specification Method for Cultural Conventions (Technical Report,
          Type 1)

Reference Document No:  N2955           Ballot Document No:  N2955
Circulation Date:  1999-07-07           Closing Date:  1999-10-08
                                                              
Circulated To: SC22 P, O, L             Circulated By: Secretariat


                  SUMMARY OF VOTING AND COMMENTS RECEIVED

                     Approve  Disapprove Abstain Comments   Not Voting
'P' Members

Austria                ( )       ( )       ( )       ( )       (X)
Belgium                (X)       ( )       ( )       ( )       ( )
Brazil                 ( )       ( )       ( )       ( )       (X)    
Canada                 (X)       ( )       ( )       (X)       ( )
China                  ( )       ( )       ( )       ( )       (X)
Czech Republic         (X)       ( )       ( )       ( )       ( )
Denmark                ( )       (X)       ( )       (X)       ( )
Egypt                  ( )       ( )       ( )       ( )       (X)
Finland                (X)       ( )       ( )       ( )       ( )
France                 ( )       ( )       (X)       (X)       ( )
Germany                ( )       (X)       ( )       (X)       ( )
Ireland                (X)       ( )       ( )       ( )       ( )
Japan                  ( )       (X)       ( )       (X)       ( )
Netherlands            (X)       ( )       ( )       ( )       ( )
Norway                 (X)       ( )       ( )       ( )       ( )
Romania                ( )       ( )       ( )       ( )       (X)
Russian Federation     (X)       ( )       ( )       ( )       ( )
Slovenia               ( )       ( )       ( )       ( )       (X)
UK                     ( )       ( )       ( )       ( )       (X)
Ukraine                ( )       ( )       ( )       ( )       (X)
USA                    ( )       (X)       ( )       (X)       ( )

'O' Members Voting

Korea Republic         (X)       ( )       ( )       ( )       ( )

_________ end of detail summary; beginning of Canada comments _________

              COMMENTS ACCOMPANYING THE CANADA AFFIRMATIVE VOTE ON 
                          SC22 LETTER BALLOT N2955

We are pleased to note that the editor has taken a number of our
suggestions for improvement of the document when it was in its previous 
incarnation, and incorporated them in this Technical Report. Our comments
specific to the current draft follow.


1. There are instances throughout this document where it refers to itself
as a standard - sometimes on the same page! For instance, the introduction
on page iv, line 70 says "This Technical Report" yet page iv, lines 77,
110 (twice) call it a standard. Another one is on page 1, line 133
(calling it a Technical Report) and line 139 (calling it a standard). Some
more are mentioned below. These have to be fixed; our suggestion is that
the entire document be searched and all instances be corrected. 

2. In many places, this document expresses things in a way suggesting
conformance to a standard. The pedigree of this technical report was a
standard but it is no longer a standard and artifacts of this pedigree
must be removed. It should not state the prescriptive "...shall be used.."
or "..shall contain ..." but rather just describe and specify things. In a
number of places the editor has already made the change but not
consistently throughout the document and that creates problems. An
example, amongst many throughout this document, can be found on page 7,
line 485:

            "A category source definition shall consist of a category
            header, a category body, and a category trailer. A category
            header shall consist of the character string ......."

            This could be replaced by:

            " A category source definition consists of a category header,
            a category body, and a category trailer. A category header
            consists of the character string ...." 


Specific comments: 

1. page iv line 74, 75 "..., and a way to specify how much is covered and 
the status of it ." If this is an indirect reference to LC_IDENTIFICATION
then it is wide of the mark. It needs to be reworded in that case. If it a
reference to something else, it still fails as it is not clear from
reading the document what this refers to.

2. page iv line 91 "With those data.." ==> "With that data .."

3. page iv line 103 "..a built-in nature .." does not make sense. Perhaps
it was supposed to be " ..built-in .."

4. pages 2 and 3 Section 3.1 - it would be better to order the terms and
definitions or at least group them

5. page 3 line 268 "..this standard .." ==> "..this TR .."

6. page 3 line 272 "..this standard .." ==> "..this TR.." 

7. page 6 line 462 " ...while it in .." ==> "..while in many cases, it
could ..."

8. page 11 line 691 remove the words "..need not be a national extent .." 
because it does not add anything; the extent is tied to ISO 3166.

9. page 11 line 704 + "If any of the above information is non-existent,
it must be stated; the corresponding string is then the empty string "
should be restated as "If information required for any of the mandatory
keywords above is not available, then the corresponding string is an
empty string." 

10. page 11 line 709+ "Note: Only one language can be addressed with
the concepts of a FDCC-set; to address for example a bilingual culture,
one need to have 2 FDCC-sets" Here, language and culture are being
confused and it would be better to state:  "Note: Only one language per
territory can be addressed with a single FDCC-set; an additional FDCC-set
is required for each additional language for that territory."

11. page 12 line 738 "ISO" ==> incorrect; there is no entry like this in
ISO 3166 today 

12. page 12 line 742+ It may be better to change the "i18n:1999" entries 
to "i18n:2000" now rather than later. Also applies to page 11, line 717.

13. page 13 line 808 States "Basic keywords" Does this imply that the
keywords that relate to transliteration are somehow of the "advanced"
variety? It would be better, if a division is desired, to label these as "
Character Classification Keywords".

14. page 14 line 874+ xdigit. To allow xdigit to take on script and digit
variants is a bad, and dangerous concept and an unnecessary addition.
The xdigit area should be fenced off to be only 0-9, a..f, and A..F.

15. page 15 line 926+ map. Remove this keyword. This is dangerous
character encoding territory and one that does not belong in LC_CTYPE,
if at all in this TR. Have there ever been any requests to have this type
of stuff included in this category? 

16. page 16 line 967 "A Automatically included; see text" - this is only
for the portable character set and only parts of it are relevant for a
class. Perhaps it would be better to state it as such.

17. page 16 line 976 section 4.3.2 Opposition to having transliteration
specified in this document has been stated before. Having that same
transliteration specification under LC_CTYPE does not make it better; in
fact we are vehemently opposed to having this in LC_CTYPE. Of th two
evils, it is better to have it as a separate category (say LC_XLITERATE)
than bundled into LC_CTYPE.

18. page 19+ line 1108+ section 4.3.3 The comments lines, of the type
"Table 9 Basic Greek", inserted in the "upper" "lower" and "alpha" etc.
keywords to break up the sections is a good idea. It should be continued
in the "punct", "graph", "toupper", "tolower", etc. keywords.

19. page 19 general Are all of these ranges accurate? Do we need to
continue to specify these or should this TR point elsewhere for this
information?

20. page 22 line 1371 Why are currency signs and miscellaneous symbols
included under "punct"? 

21. page 28 line 1846+ This is a little unclear particularly the last 
sentence; it does not make much sense, so better wording cannot be
suggested. 

22. page 29 line 1877+ A few of the keywords are defined to be optional.
Are all the others mandatory? This should be made clear as was done under
LC_IDENTIFICATION and LC_CTYPE.

23. page 39 line 2082 "COLL_WEIGHT_MAX limit. The minimum value is 7." As
commented on before and accepted last time, the minimum value cannot be 7;
it can be a maximum value of 7 (and that is what the MAX in the
COLL_WEIGHT_MAX is!). As noted previously, 14651 states this value as 4 -
so, since one cannot have a value less than the minimum value, it would be
in error. This has to be a maximum value of 7.

24. page 34 line 2140+ Please use the same font size as the other 
"Example" text.

25. page 34+ line 2171/73 The statement should be amended to use 
"sort-rule" instead of "sort-rules" in both cases. This would then be in
line with the example where each "sort-rule" is either the rule for
forward sorting or backward sorting etc. on a level.

26. page 41 line 2607 The keywords "valid_from" and "valid-to" should be
removed. There are new entries relating to dual currency support. Dual
currencies, in almost all cases, are only present for a relatively short
period of time and as such should not entertained in this manner. There 
are other well-established mechanism for handling these transient 
situations.  

27. page 41 line 2622 The "conversion_rate" keyword should be removed. The
reason for removal is that a currency conversion rate is not a constant;
it fluctuates by the second. Conversion rate handling is best done outside
of this category. 

28. page 43 line 2680 int_curr_symbol ==> "int_curr_symbol" for 
consistency

29. page 43+ line 2677+ Group the keywords, rather than having them
scattered as at present, by "int_curr_symbol" and "currency_cymbol" usage.
For example, list all of the "int_*" keywords first, followed by all of
the non-"int_*" keywords. The order of the keywords within these groups
should be the same.

30. page 44 line 2773 Replace "..formatted international monetary 
quantity.." with "..formatted monetary quantity using the 
"int_curr_symbol"." for consistency.

31. page 45 line 2791 as above for comment 30.

32. page 45 line 2815 To be culturally neutral, the value for 
"mon_decimal_point" should be an empty string i.e. the same value as the
"mon_thousands_sep" keyword.

33. page 46 line 2871 As above, to be culturally neutral, the value for
the "decimal_point" should be an empty string, as per the value for the
"thousands_sep" keyword.

34. page 47 line 2905+ "The second operand is an integer specifying the
Gregorian date in the format YYYYMMDD" .This suggests that one can enter
any date. Is that the intent.? Is so, what relevance does it have with the
"week" keyword? If not, then this needs to be explained, and the tie-in
with the "week" keyword needs to be elaborated. As it stands, one cannot
determine it's usefulness or validity.

35. page 47 line 2907+ "The third operand is an integer specifying the 
weekday number to be contained in the first week of the year". The first
problem with this is that the term "weekday number" has not been define up
to this point. The second problem is with what is the intent and the
statement reflecting that intent. The intent, if we understand this
correctly, is to ascertain how many days are required in a week for it to
be considered the first week of the year. If this is correct, then state
it as such. If not, please elaborate so it can be evaluated.

36. page 48 line 2980+ This keyword seems to be geared towards the notion
of a single timezone per LC_TIME. While it is true that many countries in
the world have a single timezone, there are others where multiple
timezones are used. One such country is Canada. This keyword needs to be
changed to allow specification of multiple timezone usage. 

37. page 49 line 2989+ "<std> and <dst> Indicates no less than three, nor
more than 10 characters that are the designation for the standard <sdt> or
summer <dst> time zone" .

The first problem with this is to do with the range of characters allowed
in the name of the timezone; with this restriction as it stands, one
cannot enter any timezone name - for example, one timezone name, for the 
standard time, is "Eastern Standard Time" which clearly extends beyond the
10 character limit! Restricting to just common abbreviations or acronyms
does not always work either. What is needed are two keywords each for each
of <std> and <dst>. The first keyword indicates the full name of the
timezone, and the second keyword indicates the abbreviations or acronym.

The second problem is with the term "summer" for the <dst>. This should
be replaced by "Daylight saving time or summer time". Remember that the
acronym "dst" does stand for "Daylight saving time". 

38. page 49 line 3020+ This only deals with the rule based change
to/from Daylight saving time. All countries in the world that observe
<dst> do not all have rule based criteria for its observance. The keyword,
or the "timezone" keyword, should indicate that it ONLY applies to rule
based <dst> observance.

39. page 53 line 3237+ Section 4.9 If this is to be useful, allowance has
to be made for different cultural conventions used in different parts of
the world. First , all countries do not use millimetres. Second, common
country-specific industry standard names should be allowed - such a s A4
in the U.K., and "letter", "legal" in Canada. 

The other problem is that the current statements for "height" and "width"
suggest specification of a single size. That is not the common practice.
Multiple paper sizes have to be allowed for - for example, one needs to be
able to specify the dimension of both "letter" and "legal" sizes following
the example above.

40. page 55/56 line 3356+ The keywords "lang_term" and "lang_lib" are
specifying exactly the same thing - "a three-letter abbreviation of th 
language for *** use, according to ISO 639-2." The only difference
between the two statements is the term "terminal" and "library" replacing
"***" for the appropriate keyword. These keywords should be replaced
with a single keyword "lang_ab3" (and the keyword "lang_ab" changed to 
"lang_ab2" to indicate the two-letter abbreviations from ISO 639) ) 
following the same convention as adopted for the "country_ab2" and
"country_ab3" keyword in this section.


______ end of Canada comments; beginning of Denmark comments ___________


DENMARK COMMENTS ACCOMPANYING NEGATIVE VOTE ON SC22 LETTER BALLOT N2955

 
We can inform you that Denmark disapproves the balloted text for PDTR
14652. However, if the report type is changed to Type 2 and the following
general and specific observations are implemented, Denmark will change its
vote to approval.

From an overall perspective the PDTR should be rewritten in order to 
achieve the goal of being a useful Technical Report.  The rewriting should
pay special attention to the use of 'it', 'its', spoken language forms,
"short-hand" sentences and the like, which makes the text hard to read
and, thus, error prone. 

All usage of the term 'standard', 'this standard', etc. should be replaced
by 'Technical Report', 'this Technical Report', etc.

Definitions should be sorted alphanumeric and renumbered accordingly. 

Definitions (and other material) copied from other 
standards/specifications/TRs should carry a reference to their origin
with a clear indication of differences from the original, if applicable.

The definition of FDCC-set is confusing because it (sort of) defines 
itself. A definition of FDCC would probably help.

The definition of 'cultural convention' as 'A data item for information
technology that may vary dependent on language, territory, or other
cultural habits' is particularly helpful for the reader in identifying
the actual subject of the Technical Report.

Line 307/308 states 'The first eight entries in Table 1 are defined in
ISO/IEC 6429 and others are defined in ISO/IEC 10646-1'. It is unclear if
this indicates that the remaining entries stems from somewhere else, in
which case the text should be adjusted.

Plans for future editions of this TR should be moved to Notes or removed.

The title of subclause 4.1 should be changed to FDCC-set description,
because FDCC-set already is defined in subclause 3.1.6.
 
The use of the word 'shall' seems inconsistent with the concept of a TR
 
The use of the word 'define' (all forms) in the running text of the TR
should, in general, be replaced by 'specify' or ' describe' (similar 
forms). Definitions belong to subclause 3.
 
References to other International Standards should be normalised (e.g.
see <section x.x> in International Standard ISO/IEC yyyyy-z) throughout
the text.

Line 556: The expression 'shall be escaped' is unclear.
 
Line 605: Change 'wth ' to 'with'
 

_____ end of Denmark comments; beginning of Germany comments _______


DIN Vote on SC 22 N 2955

Disapproval with Comments:

Germany is in favour of the transformation of the former FCD  14652.2
into a TR. In Malvern, WG20 opted for this transformation to safeguard
the valuable information inherent in the draft which should, while in
Germany's views not ripe for standardization, not be withheld from the
implementor community.

However, it notes in many places that the conversion is as yet incomplete.

  This includes the frequent self-description of the document as a
"standard". A sample of this occurs right on the cover page ("Document
type: International standard"), followed by p. iv ("There are a number
of benefits coming from this standard" and "This standard specifies
..."), p. 1 ("specifications in this Standard") and so forth.

  Germany also disapproves of the LC_CTYPE category of the FDCC set
(mainly section 4.3). No character classifications should be doublicated
between SC22/WG20 and the Unicode Consortium, as inconsistencies are
likely to slip in to the detriment of implementers. 

  The need for a conformance clause is also to be decided upon by the WG.

  Numerous other details need to be settled before the PDTR can be
finalized. Many of those have been brought forth in the national body
comments to the last FCD and should be clarified before any progress
can be made.


___________ end of Germany comments; beginning of Japan comments ________

    
Comments on PDTR 14652

The National Body of Japan disapproves PDTR 14652 for the reason below.

If the comment is satisfactorily resolved, Japan will change its vote to
approval.

	-------

The reason why this document becomes a technical report instead of a
standard should be explained in this document.

Japan proposes to add some paragraphs for this purpose as follows.

	---- beginning of the proposed text ---

This Technical Reports presents a trial for defining a general mechanism
to specify cultural conventions.  Though its contents are developed in
order to form a standard, it is decided to be a technical report in order
to give information to public earlier instead of resolving the issues
coming from the National Bodies.

The issues includes but are not limited to 

1) Whether the features which have their origin in ISO/IEC 9945-2 --
POSIX Part 2 -- works well after its separation from ISO/IEC 9945-2 or
not.

2) Whether it makes sense or not to have a default value, which may be
considered as a recommendation, for each cultural convention item.

3) Whether each specification form fits for world-wide cultural
variations or not.

The preparer of this report, ISO/IEC JTC1 SC22, expects the rapid
progress of internationalization in the field of information technology
will solve the above mentioned issues and this technical report will be
used as a base for a new standard in near future.


______ end of Japan comments; beginning of USA comments ________________


The US National Body votes to disapprove ISO/IEC PDTR 14652, Information
technology - Programming languages, their environments and system
software interfaces - Specification Method for Cultural Conventions, see
below for comments


US comments to the PDTR ballot on TR 14652
Documents:  SC22 N2955 (SC22/WG20 N690, L2/99-209)
September 24, 1999


General Comments

The US is in favor of the change of status of this document from a
draft standard to a draft technical report, which seems appropriate
under ISO directives, given the disagreements within the committee
regarding how to proceed and the lack of consensus to make the
document an International Standard. However, the conversion from
a draft standard to a draft technical report does not seem to have
been completed. Two issues stand out:

1. There are a number of places in the document where it refers to
itself as a "standard". These must, of course, all be corrected.
Places we noted were: p. iv, lines 77, and 110 (two times) and
p. 3, line 268. The entire document should be searched to guarantee
that no other instances occur.

2. While it is not unheard of for a Technical Report to contain
a conformance clause, and makes a certain amount of sense, given
the history of this document, it is still unusual. Furthermore, the
language throughout the Technical Report is expressed in terms that
express conformance: "...shall be used as...", "...shall contain...",
and so on. The more appropriate rhetorical structure for a Technical
Report is simply to describe and specify things, rather than
strive to sound as if it is a standard even when it isn't. To
pick a typical example from page 7 of the document:

  "The FDCC-set definition text shall contain one or more FDCC-set
   category source definitions, and shall not contain more than one
   definition for the same FDCC-set category."

That is a prescriptive specification, tied to the conformance clause.
The descriptive reformulation would be:

  "The FDCC-set definition contains one or more FDCC-set category
   source definitions, and does not contain more than one definition
   for the same FDCC-set category."

We would prefer that the text be rewritten to remove its prescriptive
character. But failing that, it would make sense to at least add a
prominent paragraph to the Forward that describes the origin
of the document, explains why it is written *like* a standard, but
is not actually a standard. What is there now is just directives
boiler plate, but does not explain why the text is still written
in its prescriptive manner and why it still has a conformance clause.

Technical Comments

p. v, line 126

  "A standard set of values for all the categories has been defined
   covering the repertoire of ISO/IEC 10646-1."

  This may not be correct, depending on what is meant by "covering
  the repertoire". LC_COLLATE, by reference to FCD3 14651, does indeed
  provide full coverage for 10646-1 (plus Amendments 1-7), plus
  two characters from Amendment 18. LC_CTYPE also intends to cover
  the same repertoire, although it is difficult to determine whether
  it is actually complete. In any case, PDTR 14652 is insufficiently
  precise in defining the repertoire to be covered, since the
  "repertoire of ISO/IEC 10646-1" is constantly changing.

  It would be helpful to refer to specific versions of the Unicode
  Standard to define repertoire, rather than 10646-1 plus amendments,
  since that would be more precise (and mechanically checkable, since
  a fixed, machine-readable data file exists for each Unicode version).
  For example, 10646-1 plus Amendments 1-7 (or 1-9, for that matter,
  since Amendments 8 and 9 did not add characters), corresponds
  to Unicode 2.0.0. But the addition of the euro sign and object
  replacement character can be specified exactly as Unicode 2.1.9,
  but does not correspond to any particular configuration of 10646-1:1993
  plus amendments.

p. 1, line 169 in Normative References

  This lists 10646-1:1993 as the normative reference. We think this
  is a mistake. Given the timing of this Technical Report, it is
  clear that the reference should be to 10646-1:2000, the
  second edition. Then the specification of the particular repertoire
  of that standard to be covered should be by reference to
  a particular version (or versions) of the Unicode Standard,
  or by reference to the numbered collections of the standard
  (e.g. collection BMP-AMD.7 is explicitly provided in 10646-1:2000
  to specify the repertoire that matches Unicode 2.0.0.).

p. 3, line 270, section 3.2.1 Notation for defining syntax.

  This entire syntax is unhelpful and could easily be dispensed
  with if the thrust of this document were not to maintain
  upward compatibility with POSIX specifications. In a
  descriptive specification of cultural conventions, it is
  far better to simply specify a tagged format of some sort.
  This would allow dispensing with the superfluous C-style
  printf specifiers and "\n" line terminators, etc. Such a
  format would be more useful to other users of the
  cultural conventions, and if properly designed could be
  easily converted to XML, where it could be transmitted
  and verified in standard ways.

  Furthermore, use of the printf argument specifiers in this
  meta-syntax conflicts with the definition of other
  similar format specifiers for date, time, and other
  format specifications within particular FDCC-set categories.
  This is most confusing in a Technical Report. The TR
  should describe the specification for cultural conventions
  in an implementation neutral but clear way -- and then the
  POSIX++ implementation is free to make use of printf-style
  specifiers for its actual implementation on UNIX platforms.

p. 4, line 304, section 3.2.3 Portable character set.

  This is completely unnecessary for the specification of
  cultural conventions. A neutral specification would simply
  make use of the UCS identifications of characters. The
  requirement to specify the Portable character set is a
  POSIX artifact that should itself be restricted to the
  POSIX specifications.

p. 7, line 491ff

  The definition of category body is not consistent with the
  allowance for comment lines at any point.

p. 9, line 585 

  "Concatenated constants can include a mix
  of the above character representations."

  This is just a silly idea. At the very least this should
  be deprecated.

p. 11, line 677ff, 

  Section 4.2 LC_IDENTIFICATION
  "All keywords are mandatory unless otherwise noted, and
  the operands are strings."

  This is a *good* example of how the specification should
  work. This should be carried through for all the other
  FDCC-set categories, so that the printf-style format
  specifications can be dropped.

p. 11, line 717

  "i18n:1999" ==> "i18n:2000"

p. 12, line 738,

  the "i18n" LC_IDENTIFICATION category.

  "ISO" does not match the spec for territory, which
  required this to be a 2-letter 3166 code.

p. 12, line 740ff. 

  Update all 1999's to 2000's.

p. 13, line 795ff.

  The "double increment hexadecimal symbolic ellipses" are
  a clever, but still goofy convention. They make the
  tables for LC_CTYPE less mind-numbing, but a better
  strategy is to define all such property categories by
  reference to a specified level of the UnicodeData.txt
  database, and then to define only the deltas from that
  to satisfy the committee on certain categories that might
  differ in usage from that specified in UnicodeData.txt.
  This would be far, far easier to verify and to
  implement than what is currently provided.

p. 14, line 874ff, xdigit

  It is an incredibly bad idea to allow hex digits to
  have script variants. xdigit should always refer to
  0..9, a..f, A..F, period, full stop. Anything else is
  inviting implementation disasters.

p. 15, line 926 map

  The "map" keyword is an unnecessary extension of the concept
  of an FDCC-set. The FDCC-set is not the place that all
  possible relations between characters are defined. The
  concept of an FDCC-set should be restricted to the
  specification of *cultural* practices for formatting
  quantities, names, addresses, and such, which are
  clearly relevant to localization of software. Mixing it
  all up with an architecturally unsound approach to the
  specification of character encodings and character semantics
  is a very, very, bad idea. This should just be removed.

p. 16, line 976ff,

  section 4.3.2 Character string transliteration

  It is no more acceptable to have a bad transliteration
  specification tucked away inside the LC_CTYPE category
  than it was to have it specified as a separate category.
  In fact, it is even less coherent than before, when
  treated as a part of the LC_CTYPE specifications of
  cultural conventions. This section should be removed, as
  it has nothing to do with the specification of cultural
  conventions.

  By the way, if this is included in the Technical Report, it
  is quite likely that it will be widely ignored by those
  implementing transliteration (outside the POSIX community
  at least), and will just make the committee look silly.

p. 18, line 1091.

  <U3200>..<UFAFF> is not the correct
  range for ideographic characters in UCS, in whatever
  version.

p. 19, line 1108ff "i18n" LC_CTYPE category.

  Once again, for the Technical Report to try to define all this
  is an incredible waste of time and error-prone. Now that
  14652 is a Technical Report and not an International Standard,
  the allergy it has previously shown towards referring to
  the industry standard implementation of these properties
  should be shrugged off in favor of a more useful and
  accurate approach.

  We have not tried to locate *every* error in these tables, but
  some examples will show the problem. 

  For the "upper" category: 01A6 should be added; 01C5, 01C8,
  01CB, 01F2 should be removed (those are titlecase, not uppercase);
  all the IPA extensions should be removed (the IPA small-caps
  letters are notionally lowercase, and should not be included
  in the "upper" category); 03D2..03D4 should be added;
  the range <U03E3>..(2)..<U03EF> has the wrong start point --
  it should be <U03E2>, ... and so on.
  
  On p. 21, line 1315,
  the CJK unified ideographs range starts
  at the wrong point. It should be 4E00, not 4E01.

  On p. 22, line 1369,
  the cntrl range is <U007F>..<U009F>,
  not <U0077>..<U009F>.

  For the "punct" category on p. 22, this departs very strongly
  from the UnicodeData definition of punctuation. "punct" here
  includes currency signs and miscellaneous symbols as well
  as true punctuation. This should be justified or corrected.

LC_CTYPE category (continued)

  By the way, the consistent use of UCS designations to refer
  to characters in the "i18n" LC_CTYPE specification puts the
  lie to the usefulness and requirement for 14652 to define
  and make use of the bizarre repertoiremap of section 6 of
  14652 (p. 62 ff). Consistent use of ASCII characters as
  themselves, a judicious use of a few other symbolic names
  for the few other characters explicitly referred to in
  ways where that would help (e.g. for examples for
  collation, etc.) and use of UCS symbolic names for everything
  else would be far, far preferable for this Technical Report,
  to the way it currently stands.

p. 28, line 1848 

  "This ordering is used by regular expressions...
  also as the collation weight to be used in sorting."

  The intent of this sentence is unclear. What is the
  effect on pattern matching if collation weights *are*
  explicitly specified? This item must be clarified.

p. 29, line 1888 

  "This value is elsewhere referred [to] as
  the COLL_WEIGHT_MAX limit." (also on p. 33, line 2081)

  Where is this elsewhere? Either specify the elsewhere, or
  drop this non sequitur.

p. 31, line 1964 

  "lower than the coded character set value"

  Since this is talking about a *symbolic* ellipsis here,
  should this be lower in the sequence of symbolic names,
  rather than lower in the *coded character set value* ?

p. 31 ff., general

  All this specification of *how* to assign weights and deal
  with the IGNORE's, and so on should be left to 14651, and
  acquired in 14652 by reference. This discussion here is
  just inviting inconsistencies in implementation. The appropriate
  scope for 14652 is to specify the additions to the LC_COLLATE
  syntax over and above 14651, i.e. specification of limits
  on numbers of levels, introduction of the symbol-equivalence
  keyword, and use of the copy keyword.

p. 32, line 2014. 

  "A <comment_character> occurring where
  the delimiter ";" may occur, terminates the collating
  statement."

  This is a *good* example of the way the entire text of the TR
  should be written. This is descriptive, and does not resort
  to the prescriptive "shall" when describing a format.

p. 34, line 2121 Note.

  This claim is not generally true. It depends on what type of
  decomposition is done. This claim is for canonical decomposition.
  Clarify the text if this note is to remain.

p. 35, line 2203  F and B, and B and P

  The way these "and"s are used here, this claim is very difficult
  to parse and understand. This should be broken out to explicit
  single statements about what terms are mutually exclusive.

p. 36, line 2226 ".

..and shall be present in the source FDCC-set
  copied via the "copy" keyword."

  This is not required by 14651, and should not be, because it
  restricts tailorings unnecessarily. The point is, that a
  <collating-symbol> has to be defined before it is referred
  to by a reorder-after statement, but there is no reason why
  that definition has to come from a particular FDCC-set copied
  via the "copy" keyword. If this is POSIX-specific stuff, once
  again, the implementation details should be elsewhere, and not
  in the neutral specification of FDCC-sets for cultural
  conventions.

p. 37, line 2293, section 4.4.12.1 section reordering statements

  This syntax extends the syntax of 14651 for this statement type.
  That should be explicitly pointed out and explained, since it
  is not going to be the expectation of those using the TR.

p. 38 ff., "i18n" LC_COLLATE category

  The long list of collating symbols is not needed here. This
  completely duplicates the list present in the 14651 common
  tailorable template, but with the introduction of just
  four collating-symbols: <BLANK>, <CAPITAL-SMALL>, <SMALL-CAPITAL>,
  and <BOTH>. The introduction of <BLANK> is not necessary.
  That is the result of an error in the 14651 table for two
  entries, where "<BLANK>" should be corrected to "<BASE>".
  The other three are simply unexplained additions by the
  editor. And even *if* they were needed for some further
  purpose, the correct specification for 14652 is simply to
  specify the 3 additional collating-symbols, along with
  any desired symbol-equivalences, rather than duplicating
  the rest of the list in 14652.

p. 53, line 3277 ff. "i18n" LC_MESSAGES

  Why is "[+1]" and "[-0]" suggested as the default
  definition for yes and no in the "i18n" LC_MESSAGES category?
  "[1]" and "[0]" might make some sense, though "y" and "n"
  would be better, given the status of English as the
  international language. But at the very least, the
  minus sign on the zero makes no sense and should be
  removed.

p. 53, line 3237, Section 4.9 LC_PAPER

  Paper size conventions certainly are cultural conventions,
  but there is a mismatch between what the requirements should
  be for specifying these cultural conventions and what is
  presented here.

  For the U.S., for example, paper size conventions are
  "letter" (8-1/2"x11") and "legal" (8-1/2"x14"), whereas
  for the U.K., normal printing paper is A4 (210mm x 297mm).
  In addition to the metric series defined in DIN 66008,
  there may be other local traditional sizes not defined in
  terms of the metric system, as for example, traditional
  sizes used in book publishing.

  A specification for cultural conventions should allow the
  expression of all this, and not attempt to reduce everything
  to one height and one width expressed in millimeters (which
  won't even be accurate for U.S. paper sizes, by the way).

  This FDCC-set category seems to be, instead,
  another implementation-driven category that could be used
  transiently to convey some "locale-based" information to
  a printing process through some API. Effectively, the
  FDCC-set is being conceived of a one gigantic C struct
  for storing anything an application might ever conceivably
  need for cultural adaptability for access to an API, rather than
  as a "specification for cultural conventions", which is
  nominally what this Technical Report is about.

p. 54, line 3262, Section LC_NAME

  This entire specification is very weak and basically useless.
  The list of keywords provided is Western-centric.

  This is a good example of a category that should be
  reorganized to *first* provide a discursive listing of
  name formatting and salutation conventions in different
  cultures, before trying to invent a syntax to make it
  machine-readable. Otherwise this is "standards-fishing"--
  promulgating a standard syntax in the hope that it may
  be sufficient and implementable, but in the absence of
  evidence of use or enough data to demonstrate its appropriateness
  to the field of application.

p. 54, line 3298 %m Middle names

  It is unclear whether this is intended to be one string
  associated with potentially multiple names. If so, this
  should be clarified. Also, what is the relationship between
  the potentially multiple names of %m and the apparent
  single initial letter of %M "Middle initial"?

p. 54, line 3200 %p Profession.

  This is also unclear as to intent. Does this, combined
  with the "i18n" LC_NAME category value for name_fmt
  imply that "Lawyer John C. Smith" and
  "Assistant Manager at McDonald's James T. Peabody" are
  valid formatted names?

p. 54, line 3274 name_gen

  The example given for the use of Japanese "-sama" as a
  salutation is incorrect. While it is true that "-sama" is
  not gender-specific, it is not appropriate "for all
  persons", and certainly not in all contexts. Furthermore,
  it is an honorific, and not a salutation, since it cannot
  be used independently of a name.

p. 55, line 3305 

  "The va[lu]e may be stored in the database
  with the person information."

  What database? Once again, this specification is referring
  to some unclear context outside the scope of the document,
  with the implication that this is designed for some particular
  implementation, rather than standing independently as a
  specification. Either delete the reference to "the database"
  or make it clear in this document what is intended by it.

p. 55, line 3318 name_fmt specification

  This is essentially just a mask definition for a format.
  It would be better stated explicitly as a mask, rather than as a
  printf style format string. And there is nothing gained
  by use of the symbolic name "<p>", rather than just "p";
  this just clutters the specification and makes it hard
  to read.

  The same comment applies to the format masks for
  LC_ADDRESS pos.al.fmt (p. 56, line 3400), and the
  LC_TELEPHONE tel_int_fmt (p. 57, line 3443).

p. 55, section 4.11 LC_ADDRESS

  There is no justification for why the LC_ADDRESS category,
  which should be focussed on specification of cultural
  specifications for addresses, is cluttered up with a bunch
  of keywords related to ISO 3166 country codes, motor
  vehicle country codes, ISO 2108 ISBN codes, and ISO 639
  language codes. Once again this looks like a case of
  tossing in the kitchen sink, rather than designing the
  category appropriate to its usage. All of these superfluous
  keywords should be removed from the category. If an
  FDCC-set needs to specify any of this information, it
  should be elsewhere, and not just tossed into LC_ADDRESS
  for no apparent reason.

  The Rationale for this category in Annex B does not
  discuss this, either.

p. 57, section 5. CHARMAP

  As we have indicated before, specification of a CHARMAP
  is completely out of scope for cultural conventions.
  This is just a bad design that continues the way POSIX
  deals with implementation of character set encodings.
  There is no good reason for the 14652 Technical Report
  on the specification method for cultural conventions to
  be continuing and expanding on this bad design. It
  should be removed. (See further comments on the Annex B
  Rationale below.)

p. 58, line 3502, <escseq>

  Even with the example now provided on p. 61, this
  explanation for <escseq> is still almost incomprehensible.
  The same applies to the discussion of operands for
  the <include> keyword on p. 59, which also refer to
  "the g-set or c-set to be defined" and the "range
  of characters in the referenced charmap". This is
  another reason why this entire section should be dropped
  from the TR.

p. 60, line 3593 ff.,

  "The encoding part shall be expressed as..."

  Even if this is specified for upward compatibility with
  existing POSIX practice, it is ridiculous in this day
  and age to be encourage the use of octal or decimal
  in the specification of character encoding values.
  Hexadecimal is clearly the radix of choice, and should
  be encouraged by the technical report. The others should
  be deprecated.

p. 61, line 3625 ff.,

  "Example of using ISO 2022 techniques"

  These examples do now provide enough information to be
  able to make a guess at how the CHARMAP keyword "<escseq>"
  is intended to be used. But the examples also illustrate
  the reason why this entire CHARMAP section should be
  removed from the Technical Report. This is an implementation
  format for extension of the POSIX architecture for
  character set encoding definitions, rather than any
  information useful for the specification of cultural
  conventions. Anyone wanting to understand the structure
  of these character encodings would be far better off going
  to the easily available (and far more complete and accurate)
  book by Ken Lunde on CJKV Information Processing. And as
  for cultural conventions, this section of CHARMAP (as well
  as the subsequent one on REPERTOIREMAP) just get in the
  way and confuse the issues.

p. 61, lines 3661, 3662.

  If the CHARMAP section and examples stay in, at the very
  least, the example characters used should be legal,
  assigned characters. The comment says "the character
  codes are only examples", but in fact <U0365> and <U0744>
  are not valid, assigned characters in the version of
  10646-1 normatively referred to by the Technical Report.
  At the least, make the effort to use assigned characters.

p. 62, Section 6 REPERTOIREMAP

  Once again, we object to this ridiculous REPERTOIREMAP,
  which does not belong in 14652. It should be removed.

  The justification on p. 63 keeps getting longer, but is
  no more convincing than before. The *concept* of a
  repertoiremap is prior art, but the wholesale extension
  of that prior art to include thousands of the editor's
  whole cloth inventions can hardly be characterized as
  prior art. It is simply an example of dogged persistence.

  At most, perhaps 100 or so of these symbols beyond
  the range of Latin-1 characters have any usefulness.
  The rest are just the result of a mediocre concept
  extended at least two standard deviations beyond the
  range of reasonableness. Why would anyone want to
  use the symbol "<W*;?J>" instead of "<U1FAF>" to
  refer to U+1FAF GREEK CAPITAL LETTER OMEGA WITH DASIA AND
  PERISPOMENI AND PROSGEGRAMMENI? Or "<_./>//>" instead
  of "<U25E2>" for U+25E2 BLACK LOWER RIGHT TRIANGLE?

  The silliness of this REPERTOIREMAP, which takes up nearly
  one quarter of a 114 page Technical Report, is illustrated
  by its incompleteness. After inventing 2318 of these
  symbols, the editor apparently ran out of gas, and didn't
  extend them to cover Georgian, Thai, Lao, or any of the
  Indic scripts. He included symbols for compatibility
  Arabic positional shape characters, but not symbols for
  compatibility fullwidth ASCII or halfwidth katakana
  characters. Why? And it is completely unclear how this
  mechanism would be extended, except by more arbitrary
  and nearly random string assignments, to cover the
  repertoire of Unicode 3.0 (with 1165 Yi syllables and
  630 Canadian Aboriginal Syllabics syllables, just to name
  the most problematical).

p. 64, line 3823 - 3848

  The introduction of these "<a8>" .. "<z8>" symbols
  here is problematical. First of all, these are intended
  for use with LC_COLLATE for tailoring of the table from
  14651, but that is not explained here or anywhere.

  Second, the particular assignments of these symbol
  weights to particular Unicode characters is invalid
  in general. There is nothing that ensures that U+0252
  is going to be the "last A" in the common tailorable
  template table. This could be affected either by a
  revision of that table or the addition of new characters
  to 10646 that get incorporated into the table. It is
  inadvisable in any case to tie a hack for tailoring
  Latin characters to particular Unicode values. The
  proper way to do this is to introduce "<a8>", etc.
  as collating-symbols in LC_COLLATE and then introduce
  the appropriate tailoring with reorder-after statements
  based on a particular version of the 14651 table.

p. 92 Annex B (informative) Rationale

  The entire Technical Report now is informative. Restructuring
  the document as a Technical Report should move most of this
  material into the main body of the text. There is no
  rhetorical reason why it should be separated off in
  an annex, since it provides explanatory material that
  is often needed at the point where concepts are introduced.

  In particular, the LC_MONETARY Rationale (B.1.4) is a
  better start toward what the Technical Report *should*
  contain about monetary formatting than the definition
  of the LC_MONETARY category itself.

p. 92, line 6183 

  "an ISO/IEC 10646 system that has
  defined 16-bit bytes may..."

  There is no reason for the document to be obtuse about
  this. No Unicode (or 10646) system implements 16-bit
  characters by defining 16-bit bytes. "Byte" in the
  industry has an unshakeable meaning of an 8-bit quantity.
  It is only character standards diehards who keep insisting
  that "byte" is variable width and refers to the width
  in bits that a character is encoded in. The entire
  industry has chosen the other route, and measures data
  in bytes, which is a fixed-size quantity. The days when
  bytes did differ in size on different machine architecture
  are long gone.

p. 93, Section B.1.3 LC_COLLATE Rationale

  Most of this discussion is out of scope. It is arguing
  about 14651, rather than providing the particulars for
  the rationale for the LC_COLLATE category.

p. 94, line 6286 

  "The syntax for the LC_COLLATE
  category source is the result of a cooperative effort
  between representatives for many countries and organizations
  working with international issues, such as UniForum, X/Open,
  and ISO, ..."

  If this rationale is to contain all the general discussion
  about collation weighting and contents relevant to 14651,
  rather than just the discussion of the specific keywords
  for the LC_COLLATE category in the FDCC-set, then why
  is not the Unicode Consortium on that list? Either add
  it or remove all the out-of-scope discussion of 14651-
  related issues from this rationale.

p. 95, line 6315

  "It is estimated that the Technical
  Report covers the requirements for all European languages,
  and no particular problems are anticipated for Cyrillic
  or Middle Eastern scripts."

  First of all, it is not PDTR 14652 that covers these
  requirements, but FCD3 14651 that does. PDTR 14652 simply
  provides the metasyntactic shell to incorporate the
  14651 framework inside the LC_COLLATE category.

  Secondly, 14651 *does* cover Cyrillic, Arabic, and Hebrew,
  as well as the rest of the scripts included in Unicode 2.0,
  so there is no reason to make this imprecise statement
  about anticipations.

p. 98, Section B.1.3.3

  Sample FDCC-set specification for Danish.

  line 6485: The symbol "<SPECIAL>" is not defined.

  In general, the introduction of the particular symbols
  needed for this Danish collation specification should be
  done *here*, and not in the "i18n" LC_COLLATE category
  definition on p. 38 and in the "i18nrep" REPERTOIREMAP
  on p. 64.

p. 103, line 6753.

  "It is expected that National Standards
  Bodies will provide specifications."

  This seems a rather forlorn hope, given a bad format
  metasyntax in the first place.

  And why insert a plaintive cry for other NB's to do the
  work, instead of just digging into the IBM Green Book
  discussion of Date Format as a starting point?

p. 103, line 6762.

  "The internationalization working
  group is developing an interface..."

  Who? What internationalization working group, affiliated
  with whom? This is not made clear in the document.

p. 104, Section B.2 Character Set Rationale

  This rationale is completely unconvincing as a rationale
  for including the CHARMAP mechanism in the Technical
  Report 14652.

  In particular, the claim that the "charmap was introduced
  to resolve problems with the portability of, especially,
  FDCC-set sources" should have been addressed simply by
  using 10646 (i.e. Unicode) as the *reference* character
  set for the Technical Report. This is the obvious solution,
  as taken by the HTML and XML standards. This approach allows for
  a source document to be represented in any character encoding,
  but it is interpreted *as if* it were converted to the
  reference character set, i.e., the UCS. In fact, most
  implementations will *actually* convert the source document
  to Unicode, to simplify their parsing/lexing engines.
  The same approach should now be taken towards all such
  specifications, including that of 14652, now that the UCS
  is a reality. Continuing to fob off this old model of
  "source portability" from POSIX on new standards and
  technical reports does a disservice to those who are trying
  to understand and implement them. Instead of assisting in
  source portability, it really only succeeds again and again
  in unnecessarily importing all the complexities of legacy
  character encodings into standards and technical reports
  that have nothing to do with the details of character
  encoding.

p. 105, Section B.3 Repertoiremap Rationale

  "The repertoiremap was introduced to make FDCC-sets
  independent of the availability of charmaps."

  Once again, this entire argument would be obviated by making
  the UCS the reference character set for the document. All
  the complexity of CHARMAP and REPERTOIREMAP would be
  pushed off to where it belongs, in POSIX implementations
  of the technical report, rather than in the technical report
  itself.


Technical Comments re Section 4.5 LC_MONETARY

We consider the specification in section 4.5 to illustrate the
nature of the technical problems endemic to the TR's entire
approach to the specification of cultural conventions. The
metasyntax provided for LC_MONETARY is designed to be an
extension of existing POSIX implementations of currency formatting.
However, the extensions are:

A. Over-elaborate in terms of particular parameter specifications.

B. Insufficiently precise to be well-defined or to avoid undecidable
   cases of conflicting parameter specifications.

C. Result in specifications of cultural conventions for monetary
   formatting that are incomprehensible to the human reader, but
   rather are designed to facilitate a programmers task of
   parsing out parameters and setting a number of boolean settings
   in a localedef implementation.

D. Have a "data normalization" problem forced by insisting that
   a single FDCC-set must have *two* monetary formats specifiable--
   one for the "local" currency and one for the "international"
   currency. This is the Unix euro hack. It seriously complicates
   specification of cultural conventions by pushing the euro
   problem onto the procrustean bed of the single locale. This
   is another sign of implementation considerations driving a
   complex and unclear specification, rather than considerations
   of clear and simple exposition driving the specification.

We consider the third problem to be particularly egregious, since it
means that registered FDCC-set's will be effectively incomprehensible
in the registry and be unmaintainable by visual inspection. This is
likely to result in duplicate, overlapping, and/or mistaken registrations,
where the formatting intent of the human trying to produce these
FDCC-set definitions may not match the behavior of the implementations
using them.

The problems can be illustrated by examining the "i18n" FDCC-set
definition proposed for LC_MONETARY, and then by giving another
example where the mind flows freely, posing an outlandish case
to see what the metasyntax allows (and thereby what it *requires* of
implementations).

LC_MONETARY
% This is the 14652 i18n fdcc-set definition for
% the LC_MONETARY category.
%
int_curr_symbol     ""
currency_symbol     ""
mon_decimal_point   "<,>"
mon_thousands_sep   ""
mon_grouping        -1
positive_sign       ""
negative_sign       ""
int_frac_digits     -1
frac_digits         -1
p_cs_precedes       -1
p_sep_by_space      -1
n_cs_precedes       -1
n_sep_by_space      -1
p_sign_posn         -1
n_sign_posn         -1
%
END LC_MONETARY

O.k., but what does this mean? It apparently is an attempt to provide
a vanilla default that effectively does nothing except specify that
"," is the decimal separator. But rather than being conceived in
terms of a visual international currency format that would make any
sense whatsoever, it is conceived in terms of the implementation of
localedef, with values to match variable initializations in that
program (null strings, unused values undefined). But if you think in
terms of actual formatting recommendations, this implies that
the currency number 1000 would format as:

   "1000"

(or possibly "1000,0" or "1000,00" or ..., since frac_digits in not
specified!)

But the negative value for a currency number -1000 would *also* format as:

   "1000"

(or possibly "1000,0" or "1000,00" or ...)

We don't think it was the intention of the editor of 14652 to recommend
that positive and negative currency values be formatted exactly the
same, but that is the implication of this LC_MONETARY definition. So
an unreasonable default has crept in, simply because thinking in terms
of program parameter initialization for setting values in a cryptic
metasyntax does not lend itself to specification of real formats that
would make sense as defaults or recommendations.

By the way, there is a definite technical problem which *must* be
addressed to even make the specification comprehensible. The meaning
of -1 as an argument value for int_frac_digits, frac_digits, p_cs_precedes,
etc. through n_sep_by_space, is not defined in the TR. So as it stands,
the LC_MONETARY definition is anomalous. It doesn't mean anything to
have a negative one number of fractional digits to the right of a
decimal separator, for example.

Now for the overpermissiveness and imprecision of the metasyntax
suggested. Let's consider the case of Lower Slobovia. Lower Slobovia
used the tugrik as its local currency until the end of 1998, but
had an unplanned currency reform starting earlier this year. Things
being rather chaotic in Lower Slobovia, the currency reform didn't
complete until 1999-09-16, when Lower Slobovia officially declared the
slobovik as its currency and established the conversion rate of
1 slobovik for 9 tugrik, suggested by the court astrologer because
the King of Slobovia has 9 daughters. Now since Lower Slobovia
derives most of its income by rather shady money-laundering, they
have adopted the USD as their international money formatting
convention. This tends to obscure the conversions back and forth
from dollars to sloboviks (or tugriks). Lower Slobovia has chosen
to register the following LC_MONETARY specification to reflect
their local cultural conventions:

LC_MONETARY
% This is the official Lower Slobovian fdcc-set definition for
% the LC_MONETARY category.
%
valid_from          ;"19990916"
valid_to            "19981231";
conversion_rate     1;9
int_curr_symbol     "USD$";"USD!"
currency_symbol     "<tugrik>";<slobovik>"
% For <tugrik> substitute U+20AE.
% For <slobovik> substitute U+2445 (chosen to honor the
%    King of Lower Slobovia's sartorial style)
mon_decimal_point   "?"
mon_thousands_sep   "6"
% The use of "6" as a thousands separator was dictated by
% the king's astrologer for its felicitous sound, but has
% proven quite profitable when foreign computers unfamiliar
% with our conventions misinterpret it as a digit when
% parsing out tugriks and sloboviks.
mon_grouping        1;2;3;2;1
positive_sign       "-"
negative_sign       "+"
int_frac_digits     6;1
frac_digits         4;9
p_cs_precedes       0
p_sep_by_space      2
n_cs_precedes       0;1
n_sep_by_space      0;1
int_p_cs_precedes   1;0
int_p_sep_by_space  2;0
int_n_cs_precedes   0;1
int_n_sep_by_space  0;2
p_sign_posn         0;1
n_sign_posn         2;4
int_p_sign_posn     1;3
int_n_sign_posn     3
%
END LC_MONETARY

The implication for valid_from and valid_to is that the tugrik was valid
from "the beginning of time" to 19981231, and that the slobovik is valid
from 19990916 to "the end of time". (The metasyntax for valid_from and
valid_to in fuzzy on this point, so this is just a stab at what it might
mean to make use of those implicit infinities when defining more than own
currency in the LC_MONETARY specification.) What an implementation will do
when faced with a currency formatting for a date in between is unclear --
but perhaps that is o.k., since Lower Slobovia was in a state of financial
chaos at the time, anyway.

O.k., now let's see what happens to the currency values when we
use this definition to describe a money-laundering operation that
had a credit of a little over 2 billion tugrik last year (2,000,600,609
to be exact) and then had to enter a negative value in the books for
the same amount today. When you work it all out, the fully formatted
currency strings come out to be:

Positive amount in 1998:

Local format:     "(260600660066069?0000 <tugrik>)"

Internat format:  "- USD$ 260600660066069?0"

Negative amount today:

Local format:     "<slobovik>+ 2622628869566?566666666"

Internat format:  "2622628869566?6+!USD"

Is this absurd? Well, of course. But it is *allowed* by the current
specification. That means, in principle, that implementations of this
specification have to be able to produce garbage like this without
choking on various bizarre combinations of parameters. And legions
of programmers are going to be scratching their heads over such things
as what to do with the fourth character of the int_curr_symbol which
"shall be the character used to separate the international currency
symbol from the monetary quantity" when int_p_cs_precedes specifies
that the currency sign goes to the *right* of the monetary quantity
and int_p_sep_by_space specifies that a *space* is used for
separation. Hmm.

To avoid this kind of nonsense, a usable specification should be
constraining the allowable values for parameters and not allowing any
possible combination in the forlorn hope of not offending the Lower
Slobovians when they finally come to register their cultural conventions
and find that their system wasn't accounted for by some constraint on
allowed values.

It would be far, far better to specify in detail the *actual*
cultural conventions for currency formatting, with a clear method
for *describing* those conventions in detail. The IBM Green Book
(August 1994, pp. 11 - 19) does so already in great detail in a far
more useful format for implementers. That actual number of combinations
in use is far, far less than an unconstrained metasyntax such as
that of section 4.5 of PDTR 15642 allows. A better approach would
be to simply define a hierarchy of:

   A. Unsigned positive numeric formats.

   B. Positive and negative signed numeric formats.

   C. Currency sign placements in A. and B.

   D. A list of known local currency signs with their country
      association and relation to official international banking
      signs.

From that, any competent software designer can create a mechanism
for formatting and parsing currency strings that is adaptive to
local cultural conventions. And it can be related to the TR's
specification of values for A, B, C, and D above, so that
one implementer of A:2;B:3;C:1;D:Pts can relate that to another
implementation of the same format.

But PDTR 14652 is attempting something quite different -- it is
attempting to *standardize* the Son-of-POSIX specification for
LC_MONETARY to support portable implementations of localdef for
Linux.

It is our opinion that the two goals--clear and effective
exposition of cultural conventions versus extension of POSIX
mechanisms for portable implementations on UNIX systems--
are effectively at odds, and that the technical quality of
PDTR 14652 is suffering from mixing of incompatible goals.



Minor Technical Comments re Section 4.5 LC_MONETARY

p. 41, line 2616 and line 2621.
The definition of "the beginning
of time" and "the end of time" is not clear here. As a cultural
convention, these should just be UNSPECIFIED. It is then up
to implementations to decide what to do about that. A reasonable
option, of course, is to equate UNSPECIFIED time with either the
first possible or last possible "time" value in a machine
implementation of time, but that is platform-specific in terms
of actual dates that get associated with those values.

p. 41, line 2622ff. 
The convention for using two integers
in a specification of conversion rate also seems implementation-driven,
rather than designed for clarity. If a conversion rate is set
at 1.86301 (not at all unusual as a possibility), why should not
a cultural specification be able to use "1.86301" to express that,
rather than 186301;100000 ? This has to do with avoiding
float parsing complications on Unix rather than with clarity
of specification.


Editorial Comments

1. p. iv, line 84 cultural ==> culturally

2. p. iv, line 104 become ==> becomes

3. p. iv, line 116 backwards ==> upward (cf. usage on p. 1, line 137)

4. p. iv, line 117 particulary ==> particularly

5. p. 2, line 221 Change final quotation mark to "."

6. p. 4, line 290 represent ==> represents

7. p. 10, line 652 indicate ==> indicates

8. p. 20, line 1167 does ==> do

9. p. 29, line 1883 reorder-sections-after ==> reorder-section-after
          line 1884 reorder-sections-end   ==> reorder-section-end

  (This mistake is made many times on subsequent pages. The entire
   document should be searched and all instances fixed. See, for
   example, p. 33, lines 2075, 2089; p. 37 multiple times, etc.)

10. p. 29, line 1888 referred as ==> referred to as

11. p. 30, line 1925 replace-after ==> reorder-after
           line 1934 (same mistake -- search entire document)

12. p. 30, line 1941 " specified by its place." Add
   "in the list of collating statements." to the end of
   the sentence.

13. p. 31, line 1962 ellipsises ==> ellipses

14. p. 33, line 2078 & line 2084 col_weight_max ==> coll_weight_max

15. p. 33, line 2097 with the ==> within the

   (Same error on p. 34, lines 2136 and 2157)

16. p. 34, line 2155 collating-symbol-2 ==> collating-symbol-1

17. p. 36, line 2236  on ==> in

18. p. 55, line 3305  vaule ==> value

19. p. 56, line 3391  start ==> starting

20. p. 58, line 3512  added the ==> added to the

21. p. 90, line 6091  done ==> made

22. p. 90, line 6129  introduce ==> introduced

23. p. 91, line 6132  elipsises ==> ellipses

Editorial Comments re Section 4.5 LC_MONETARY

p. 41, line 2601 "is taken" ==> "is implied"

p. 43, line 2680 For consistency, "int_curr_symbol" should be
in double quotes.

_______________________ end of SC22 N3024 ______________________________




