ISO/IEC JTC1 SC22/WG20 N563

MINUTES - DUBLIN, June 15-19, 1998

ISO/IEC JTC 1 SC22/WG20

Meeting #14 - Internationalization

July 20, 1998

 

1. Introduction and announcements by Convenor

New people: Takata, Whistler, Clews, Yamanaka, Küster

finding consensus

additional agenda point for convenor’s report to the SC22 plenary - 15.3, N566

clarification on electronic document distribution: Word 2.0-6.0

Miles Ellis - PDF converter: In future you should connect to com1.etrc.ox.ac.uk using the user name SC22 and the password JTC1SC22 (case sensitive) rather than using anonymous ftp. Otherwise the procedure remains the same except that, since this service is relatively little used I would be grateful if you would let me know when you have sent any file(s) in order to ensure that I run Distiller. If you don’t let me know then I may not realise and may not run Distiller for several days!

Timing: discuss sort after Tuesday to allow Clews to participate

Please update the distribution list

 

2. Introduction of national delegations, liaisons, and cooperations

Clews, John UK Sesame (6/16/98)
Everson, Michael Ireland ETG
Fujimura, Koreaki Japan Electrotechnical Laboratory
Garland, Tom Ireland Sun (98-06-18 only)
Küster, Marc Germany Uni Tübingen
LaBonté, Alain Canada Trésor du Quebéc
Sherif, Khaled Egypt IBM
Simonsen, Keld Denmark DKUUG
Soor, Baldev Canada IBM
Takata, Masayuki Japan Edogawa University
Whistler, Ken USA Sybase
Winkler, Arnold (USA) Unisys, convener
Yamanaka. Gail USA Oracle

 

3. Appointment of chairperson, secretary, and drafting committee

Chair: Winkler

Secretary Winkler

Drafting committee Simonsen, Fujimura, Whistler, Everson

all approved as presented

4. Approval of prior meeting's minutes

544

Minutes - Cairo, November 1997 Winkler 97-11-20 admin

Minutes are approved.

5. Future Meeting Schedule and Plans

#15 October 18-22, 1998 (changed) Tel Aviv Israel
#16 May 3-7, 1999 Malvern USA
#17 tbd Copenhagen Denmark
#18 tbd Quebec Canada

 

6. Recognition of new documents and assignment to agenda items

Nr.

Title Source Date Project

557

Resolutions from the CAW - January 1998 CAW 98-01-22 admin

558

Liaison report to WG15 Keld Simonsen 98-04-23 admin

559

Language Independent Specification techniques Keld Simonsen 98-04-24 22.30.02.03

560

Digital Winter - contribution from Bob Barbour Bob Barbour 98-05-06 admin

561

Final Agenda - Dublin, June 15-19, 1998 Winkler 98-06-15 admin

562

Participants - Dublin, June 15-19, 1998 Winkler 98-06-19 admin

563

Minutes - Dublin, June 15-19, 1998 Winkler 98-06 admin

564

Resolutions - Dublin, June 15-19, 1998 SC22/WG20 98-06-19 admin

565

JTC1 resolutions from Sendai JTC1 N5448 98-06-05 admin

566

Convenor’s report to SC22 plenary Sept. 98 Winkler 98-06-19 admin

567

Summary of voting and comments to FCD 14651 - International string ordering (N2607) SC22 N2719 98-05-18 22.30.02.02

568

Disposition of comments to FCD 14651 ballot     22.30.02.02

569

Summary of voting and comments on FCD 14652 - Specification of cultural conventions SC22 98-06-12 22.30.02.03

570

Disposition of comments to FCD 14652 ballot     22.30.02.03

571

Liaison report from SC2/WG2 Winkler June 1998 admin

572

SC22 chairman’s report on JTC1 plenary and chairmens’ forum in Sendai SC22 N2726 98-06-10 admin

573

Dual currency handling in Locales with respect to the euro Soor, Uma
IBM Canada
98-05-28 22.30.02.03

574

Internationalization in Fortran 2000 SC22/WG5 N1320 June 1998 admin

575

Money-to-string function Keld Simonsen June 1998 22.30.02.03

576

Draft Unicode Technical Report #10
Unicode collation algorithm
Mark Davis
Ken Whistler
97-03-30 22.30.02.02

577

Table of replies and comments to Fast Track ballot on ISO/IEC DIS 15897 (EN 12005) SC22 N2717 98-05-14 22.30.02.03

578

Suggested BNF (Backaus-Noir Format) syntax for template tables for ISO 14651 Ken Whistler 98-06-18 22.30.02.02

579

Contributions to the sorting of accents Everson
Melagrakis
98-06-18 22.30.02.02

580

Comments to N573 - dual currency handling in locales with respect to the euro Ienup Sung
Tom Garland
98-06-18 22.30.02.03
22.15435

581

       

582

       

583

       

 

7. Approval of Agenda

Additions:

Convenor’s report to SC22 plenary N566 (15.3)

Approved with addition above

 

8. Liaison Reports

8.1 Additions/deletions/changes to liaisons

Winkler: Reconfirm liaison with SQL - Jim Melton, now SC32/WG3. Jim asked for it.

Resolution: liaison with SC32/WG3, request OK from SC22

8.2 SC22/WG4, COBOL

no report. COBOL wanted to keep WG20 in SC22, fear of disruption of their I18N efforts in the new standard.

8.3 SC22/WG15

Keld: IEEE says that WG15 has no expertise on I18N, but draft 2.b on the POSIX utility standard contains some I18N, coordinated with WG20 work through Keld. Guideline for "National Profiles" will also address I18N. 14766 is the number of this new technical report.

8.4 SC2/WG2

571

Liaison report from SC2/WG2 Winkler June 1998 admin

Action Winkler: Distribute Plane 14 draft from SC2/WG2

8.5 WG14

Keld: C and C++ will hold some meetings together. New C standard will include new identifiers (TR 10176) and dual currency specifications. Dates and time formats are being harmonized between C and C++. Global locale model remains.

No written report

8.6 WG21

Keld: C++ out for FDIS ballot - all I18N from C and object oriented extensions. Conversions from input formats, monetary formats, etc.

no written report

8.7 WG5

574

Internationalization in Fortran 2000 SC22/WG5 N1320 June 1998 admin

Takata: internal draft document in WG5 - will NOT adopt the dynamic locale model. Character "kind" will be used to specify ISO 10646 characters. Also switch from decimal point to comma and back.

8.8 GUIDE/SHARE Europe

dormant, no report

8.9 JTC1/WG5 (now SC35)

Resolution: continue liaison with subject matter and establish liaison with the new SC35 and appropriate working groups. Request SC22 approval.

8.10 CEN TC 304

About sort: Marc Küster: project team on European ordering rules on Subset #2 of 10646, including polytonic Greek, Cyrillic, and all Latins, some symbols. Küster is editor for the sort standard, Everson for the subset standards. Interested in harmonization with 14651 - no competing standards. Updated version will be available soon.

Keld: Reorganization in TC304 in project teams - 15 and counting ....

P1 - sorting

P17 - euro locale related to 14652

P11 - alphabets of European languages

P10 - subsets of 10646

P9 - conversion projects using 2022, and others.

P2 - cultural registry (IS 15897)

8.11 TC37

Keld: IS 12199 (sort) is on hold for alignment with 14651 - upon request of Hjulstad (project leader of CEN sorting standard and editor of 12199).

Action Küster: ask Hjulstad if he wants to be on the WG20 mailing list and invite him to further WG20 meetings.

8.12 ITU-T

no report

8.13 Ada report from Keld:

Some Ada people have done some work together with Keld on an Ada binding for IS 15435.

Action Keld: ask WG9 if they want a formal liaison?

9. Review of prior meetings action items

SD-5

Action item list Winkler 98-05-06 admin

New action for Ken: take over from Kung A9711-8 on word break API.

10. Revision of TR 10176

Reporting that TR is being published and will be available soon. ITTF did not release publication date yet. It is obvious, that a revision of this TR is needed soon, because the repertoire of ISO 10646 is growing rapidly and must be represented fully in Annex A of the TR.

Action Winkler to contact Kido about possible editorship.

Keld said he would be volunteering for editor as previously agreed. Keld also mentioned that WG20 earlier had resolved to revise the TR to include further guidelines for internationalization, but no evidence to that effect could be found in the minutes or disposition of comments.

11. International string ordering ISO/IEC CD 14651

567

Summary of voting and comments to FCD 14651 - International string ordering (N2607) SC22 N2719 98-05-18 22.30.02.02

568

Disposition of comments to FCD 14651 ballot     22.30.02.02

576

Draft Unicode Technical Report #10
Unicode collation algorithm
Mark Davis
Ken Whistler
97-03-30 22.30.02.02

578

Suggested BNF (Backus-Naur Format) syntax for template tables for ISO 14651 Ken Whistler 98-06-18 22.30.02.02

579

Contributions to the sorting of accents Everson
Melagrakis
98-06-18 22.30.02.02

 

The document N567 is incomplete, a part of the UK comments is missing. The convener will reprint the complete document and distribute it in the next mailing as N567R. Action Winkler.

Canonical equivalences are explained by Ken Whistler upon request from Mr. Fujimura. It is not necessary to modify data in order to use combined and decomposed characters for comparison - assign correct weights.

11.1 Canada

Canada #1:

Alain explains the sort algorithm as a reminder for the new participants. Everson says, that an open set of combining characters must be processable to deal with "weird" languages. These are not defined in any symbolic table.

The Unicode tables specify the allowed decomposition, the weights, and the ordering of the accents.

Discussion about the validity of the canonical decomposition for equivalence - important for the implementation of the sort algorithm, should be transparent for the end user.

Marc: "closed" system is not a solution, any new character would invalidate the default table. Only an open system can cope with additional characters without changing the tables. Implementations will define derived tables which fit into 32 bit (experience from Ken)....

Principle agreement to use Unicode method for canonical equivalences. The method must not prevent the construction of storable keys ! Marc: we are sorting linguistic entities, normalization is a great advantage. Normalization is outside the scope of the standard. The data have to "behave" as if they were normalized. Unicode tables support canonical equivalences.

Ken: Europe could implement sorting of a sub-repertoire that contains no canonical equivalences. MES-2 has no canonical equivalences, MES-3 does have them all.

Tables must be created for all combination of combining marks (derived table).

Fujimura: what about Korean Jamos ? This is currently not mentioned in 14651. Ken: An algorithmic transformation creates the Korean syllables and the weights. The Hanguls order in the binary order. Combining Jamo tables are weighted in the order of the Jamos.

Canada #2:

agreed

Canada #3:

Ken: what is the exact form of the syntax of the tables in 14651. Issue is how to tie the table to 14652.

Marc: lets discuss matters of principle, before we cycle through all the comments.

Accepted in principle, syntax to be discussed when John Clews is present.

Canada #4:

The question of user tailorability (smalls before upper, etc...) is in an informative annex. How parameters for different collation behavior are presented to the user should not be defined in the standard, should be in the tutorial rather than in a specific annex.

Fujimura: Textual information is acceptable for the toggles, should be in the tutorial.

Soor: Application has to make the distinction about the toggles. Example: word or string ordering is dependent on the use of the ordering algorithm for specific applications.

 

11.2 Denmark

APIs question must be addressed. Editor has to keep that in mind.

Denmark #1:

OK

Denmark #2:

OK

Denmark #3:

Binary strings are often stored in data bases - a warning to the user is in order to make him aware that keys can be locale dependent and would not work correctly in other locales. Add warning !

Denmark #4:

OK

Denmark #5:

"symbol equivalence" for accents and their combinations. This table should be in 14652 (any number of them, including the Canadian). Inclusion in 14651 can blow-up the table, in Vienna we decided to take them out.

Keld requests the tables as a formal annex.

Soor: IBM does not want their hands forced. They would like to use their own symbolics.

Denmark #6:

accepted

Denmark ed #1:

US preference would be that the names match the names of the 10646 characters.

Denmark ed #2

OK

Denmark ed #3:

accepted, if we retain the APIs.

 

11.3 Ireland

agreed to correct errors, add text,

Ireland - English language

Everson will edit for proper English.

 

11.4 Japan:

Japan #1:

Wait for 14652. Just as a warning

Japan #2a:

point is moot if APIs go away

Principle discussion about the APIs: Sweden, USA, Holland, Germany require that APIs be removed from the standard. Keld: APIs can be moved to 15435, they are not necessary to the use of the tables in 14651. It makes tactical sense to remove the APIs and get on with the work. It is better to describe the functions than to require the APIs. Agreement to remove the APIs.

Principle discussion about the tables: Symbolic tables are fine, but "order start" is controversial.

Ken: We have to find a way to use the tables for the functionality of the "order start" statement. Method of tailoring should not be dependent on the order start.

Marc: tailoring must be kept in 14651, no dependency of POSIX is desirable.

Keld: tables were developed in WG20, not POSIX.

Ken: tailoring must be able to be communicated to somebody else for consistent output. Dependency on 14652 is too much for conformance to 14651.

Marc: dependency on 14652 forces POSIX, 14651 should stand independent from 14652, but complete in itself. 14651 should be freestanding - this might mean to put all syntax back into the standard.

Separate 14651 from 14652:

UK - yes;

DK - yes, but should meet 14652; CAN - no separation needed, but for implementations it would be preferable to have a Unicode core in 14651; Keld sees no reason to separate.

USA - 14652 can "inherit" the table from 14651 and add all the other functions necessary, Unicode tailoring is a syntax for numeric or symbolic table, simple syntax can be implemented in 14652 withany required extension;

IE -

J - fully against 14652, but possibly support 14651, wants to have freestanding 14651, syntax in 14652 is only 4 pages, easily moved to 14651;

D - freestanding standard preferred, develop pseudo syntax for describing the functions;

J - repertoiremap is not needed;

Gail: implementers need a behaviour, not the format;

Egypt: freestanding 14651 is preferred.

Alain: create simplified syntax for 14651 that can also be used in 14652.

Ken: reorder after statements for symbols and for collation elements are fundamental to any tailoring.

"reorder script" can create problems. if single characters of another script are inserted into a script to be re-ordered. "include" statement is unnecessary, if all tailoring starts from the default table.

Marc: can characters be given a different property (e.g. space as a character)? "redefinition" is a valid concept, should not introduce new syntax. "reorder" could be a "reorder after self" to change the properties - this is the same as redefine.

Ordering of Arabic presentation forms is needed - problem of charmapping is in 14652, not in the definition of 14651.

Specials must specify halfwidth, circled, etc... Not only 4th level, also earlier levels. Some might be overkill.

Principle discussion about sorting of Arabic presentation forms: Most data are stored in physical order on mainframes - shaping of data results in presentation forms for incorrect sorting. Reverse order (visual order) for presentation forms - this is not in the scope of UCS character sorting. Conversion to Unicode requires field by field reversal of data. A letter with all its presentation forms have the same primary weight. The presentation forms differ in the tertiary weight. The table has all forms of the same letter together, the tailoring defines the scan backward of the presentation forms to achieve reversed order. Reverse scan per block, not scripts.

The table has to appear in the order, the characters are to be sorted. This contradicts the idea to have duplicate blocks of presentation forms. One section is agreed with intrinsic and presentation forms of Arabic.

What happens with APIs is:

5.1 remains

5.2 away

5.3 warnings

6. revisit conformance

Japan #4:

The word "prehandling" is used differently. in 5.1.1 preparation of the symbolic table data should be used.

Japan #6 and #7

Definitions need to be imporoved.

Japan #8

OK

Japan #9

Use "P"

Japan #10

Rewrite due to prior decisions

Japan #16

OK

Japan #17

Benchmarks for toggles should be made informative - "maybe tested ..."

 

11.5 Netherlands

NA: another FCD will be ballotted

NB: the structure will be revised

NC: we take note and the document will be completely checked and revised by a native speaker of English.

Alain will write detailed comments on the text.

Principle discussion on definitions - refer back to original standards (character, et...), for new ones check in the meeting and perhaps new text.

Prehandling N#52-N#61 comments are moot, this goes into an informative annex

N#64 - end goes away due to elimination of APIs.

 

11.6 Sweden

S#1: All APIs and related definitions will be eliminated.

S#2: Reference file formats will e replaced by BNF format, some symbolic data will remain, but will be self contained. A format will be proposed but not mandatory.

S#3: OK

S#4: Normalization will not be mandated, but for level 3 implementation decomposition is needed and Unicode method is recommended. Can be done in a Note with the URL of the Unicode character table.

S#5: rejected

S#6: accepted . Unicode will use a tailored "default" table (from the tailorable template) as their standard.

Action: Ken will submit Unicode data as a WG20 document, preferably electronic.

S#7: Template will reflect logical order, but tailoring will be allowed. Khaled will write a note to Kent about his opinion on order of presentation forms.

S#8: Note the Thai problem, put it into "preparation" (same is true for Tibetan).

S#9: issues for tailoring (1-3), some rejected (4, 5). Discussion with Fujimura, who proposes to remove compatibility characters from the template - after explanation of Unicode the proposed method is accepted.

S#10: Outside the scope of the standard. Left to prehandling, if needed.

S#11: outside the scope of the standard or specific preprocessing (not default)

S#12: Please bring specific examples if required

S#13: accepted

S#14: OK, same as prehandling

S#15: Word vs string ordering. Which spaces are considered? As default we use string ordering, for word ordering, we use tailoring "ignore off for ...". (The table will not contain deliberate errors).

S#16: Mathematical formulation is overkill. The standard will be clarified so that it is clear that the exact representation is not mandated by the standard - therefore a mathematical definition is not needed.

S#17: out of scope

S#18: accepted in principle, except 5.6 and 7, no mathematical description. BNF plus text. Example could be Danish, including reorder after and tailoring of space...

S#19: take note.

No change in title is accepted.

Transliterative collation will not be included, scope will be edited for clarity.

Example: Keld offers to do a Danish example.

Use of notes for non-normative parts - accepted in principle.

Definitions will be revisited.

Section 5: accepted

Symbols Hy etc.. will disappear.

 

11.7 United Kingdom

UK#1A: maintain 14651 in parallel with the devlopment of 10646 and its amendments

UK#1B: Conformance clause will be revised, API section is moot. For conformance, tailoring must be defined, can be zero.

UK#1C: no more relationship to 14652. Abbreviation tables create a maintanance problem - autogenerate (possibly from the 10646 names). Readability of abbreviations is desirable, must be fixed, never change. Format of the autogenerated tables: SUxxxx, AUxxxx, mwhere Uxxxx is the identifier of the character.

Discussion about min, half, cap, sub - should they be also be transcribed into a symbolic value - decision is NO.

Unidata.txt defines the primary order of alphabets. From that we can generate basekeys.txt and compkeys.txt with all the weights. symdump.txt is an additional file that supports the sorting of symbols. All files are generated from the unidata.txt that is updated with new additions. UnicodeData-3.0.2.txt (version dependent) feeds into unidata.txt for collation related information.

UK#1D: bindings: moot with APIs gone

UK#1E: remove these user requirements

UK#1F: Definitions: will be revised, field, procedure, precision will go away with the APIs. Glyph goes away also.

Editorial comments from the UK:

 

Scripts need not be defined in the template - tailoring can be defined by "section start" and "section end".

Principle discussion of section (script) definition: the section start and section end statements are part of the tailoring, not part of the template. Could also be done by specification of the first symbol of a section.

BNF needs to be specified, Ken will provide a proposal.

UK#3A1: see script definition above.

UK#3A2: order of scripts - see below the principle discussion.

Principle discussion of script sequence: there seem to be requirements that scripts that behave similarly, are sorted so that they follow each other. Everson and Clews would like that the template (default) follows these relationship. There are very differing opinions on this subject with additional problems of scripts to be added later that would break the default sequence and need tailoring of the tables anyway.

Solution for many seems unnecessary - binary sequence is fine as the default. No user input exists on these scripts. One other solution is that Europe will define a standard European profile that would possibly be implemented by the computer companies.

Result: can be re-visited when user input exists. WG20 will keep the UK request in mind in particular for Ethiopic and Georgian scripts.

UK#3A3: agreed

UK#3A4: script codes is moot

UK#3B: handled by tailoring method

UK#3C: solved by numeric symbols

UK#3D: Hebrew precedes Arabic, covered by table. Order of accents needs to be defined.

UK#3De: Greek input from Evangelos was distributed. Problems for Cyrillic letters is noted, need to be discussed in UTC.

UK#3E: accepted

 

11.8 USA comments:

USA#1: No more than 3 customizable levels in the conformance. This is mainly for Java to be able to conform to 14651. Java has no mechanism to tailor the 4th level. The Unicode order will be declared to be conformant.

USA#2: create equivalent results is OK

USA#3: moot

USA#4: OK

USA#5: should include the latest amendments.

USA#6: YES

USA#7: Tables were provided, BNF to follow.

Principle discussion about the sequence of specials: Unicode treats punctuations as ignoreable with the symbolic weight in addition. For some 3000 characters swork needs to be done - Unicode sorts according to code points. There is very little "user requirement" existing, it should not be mandated by a standards organization, what the user requirements are. Maintenance problem for any additional special exist, if they are not in binary order.

Two options are obvious: full binary or full logical (Everson) order. Possible compromise could be to order the specials with existing preferences (about 50 perhaps) and order the rest binary. This would eliminate the maintenance problem.

Unicode order is good; for data in other character sets, a table showing this sequence would be helpful for the users. Alain’s proposal to take into account the Canadian standard (for about 40 specials). The sequence in the Unicode tables is dependent on character properties. It would be unfortunate to mix characters of differing properties arbitrarily. Fundamental problem: compatibility (spacing) accents in Canada will need tailoring.

Proposal Marc: use the 76 specials from Canada as the example of the tailoring from the Unicode sequence to the one compliant to the Canadian standard. Ken and John: specials should not be dealt in a special way, if needed, this should be done by tailoring. Ken: The reason for the Canadian standard was to order EBCDIC and ASCII data in the same expected way.

Marc: proposed examples for tailoring: Arabic, Danish, Canadian, Specials (IBM)

Ken, Baldev, and Alain will discuss the final order of specials and make a reasonable order. The decision of this group will be reflected in the next draft.

 

Principle discussion about the sequence of accents: Unicode has a different order than 14651, putting all possible accents to the letter a. A limited subset of these accents has been used in the market for some years. The order of the diacritics is very similar as the order of the specials - neutral way to order them in the sequence of Unicode and let people tailor the sequence. Double diacritics have a derived value. Ken would like to minimize the number of diacritics whose weights must be changed to get consensus.

Arabic sequence is also considered - 14651 is correct, Unicode will be adjusted.

For accents: research is needed, list of accents in 14651 must be extended to completely cover all. Minimize changes.

Michael will prepare a paper and discuss it on e-mail.

Solution: take what is in 14651 today and extend it with all other in 10646 order.

Ken Whistler prepared a contribution for the BNF syntax for the generation and tailoring of the tables in 14651 (N578). Comments are appreciated to produce a final document that will become part of the document as normative.

Conformance: pick a repertoire, generate a set of strings. Demonstrate that a sort generates the exact same order as if sorted with the implementation according to the standard. Same for tailoring.

Planned Progression for 14651:

Mid September New draft available to restricted web Alain
Israel meeting final review and decision to send to SC22 for FCD ballot participants in Israel
End June draft disposition of comments Alain to all participants of the Dublin meeting
End of July comments on DoC to Alain participants
End of August Final DoC send to SC22

FTP://dkuug.dk/jtc1/sc22/incoming for upload of documents, tables, etc.

John Clews described efforts in CEN TC304, TC37, and TC46 for ordering especially in library sorting. ISO 12199 is for Latin only, ISO 999 is a filing standard, TC46/SC2 is working on a combination of transliteration and filing (where TC46/SC2/WG8 deals with the transformation issues).

 

12. Cultural convention-specification standard ISO/IEC CD 14652

569

Summary of voting and comments on FCD 14652 - Specification of cultural conventions SC22 N2732 98-06-12 22.30.02.03

570

Disposition of comments to FCD 14652 ballot     22.30.02.03

575

Money-to-string function Keld Simonsen June 1998 22.30.02.03

577

Table of replies and comments to Fast Track ballot on ISO/IEC DIS 15897 (EN 12005) SC22 N2717 98-05-14 22.30.02.03

580

Comments to N573 - dual currency handling in locales with respect to the euro Ienup Sung
Tom Garland
98-06-18 22.30.02.03
22.15435

Keld explains that this ballot, had it been a DIS ballot, would have passed.

 

12.1 Canadian comments:

Change bars not possible, Keld agrees to produce an editor’s report and line numbers for easier identification of changes. Boxes or other functions for readability will be used where possible.

CAN#1User can not necessarily select the cultural specification for a specific application - this is a choice of the system administrator and the application provider. Accepted in principle, some re-write will be done to explain the situation

CAN#2: see Japanese comments’’. Remove sentence in the introduction that references the 14651 dependency.

CAN#3a: accepted:... dependent on language, culture, or ...

CAN#3b: accepted to remove

CAN#3c:

CAN#3d: accepted, point to other standard

CAN#3e: accepted

CAN#3f: accepted

CAN#3g: clarify need

CAN#3h: accepted, will be explained

Principle discussion about scope and relationship to POSIX.

J#01: "nothing more than POSIX" Keld says that this in the only project that addresses I18N functionality in POSIX and other systems. New TD on cultural and linguistic adaptability is formed to support the enhancement of I18N in all JTC1 standards. Question is the market relevance of the POSIX format for the future.

Keld: the POSIX format allows processing on IT systems.

Ken: strong feedback from the vendor community outside POSIX - can not "just" be compiled and used.

Gail: vendors want content, not the format

Fujimura: if this is a compilable set of definitions, the standard must contain the conditions of the compilation.

Marc: registry standard EN 12005 in CEN is fast tracked as ISO 15897. Island could not create a conformant locale to this EN 12005. Question about the reliability of the information, collected in free format. Keld: errors in the POSIX syntax, needed is only the free-form. Release of copyright is an additional issue. X/Open registry of locales exist but costs money for access. Implementation today (Gnu) is to prove the concept.

Major UNIX vendors and Mac would be potential customers for this standard. This standard would extend the POSIX work with I18N functionality. Fujimura: then the work should not be done in WG20, but in POSIX.

Tom: we should possibly find a language independent way - POSIX exists and works fine for Sun and/or any other POSIX conversant company. Is there an other way ?

Ken: right way to do the work is to find the significant categories and then seek the best way for specifications for each of the categories. Example: collect all monetary formats, document them and then give information, which are used where. This helps every company that needs this kind on information.

Ken: major change in this standard is the introduction of 10646 (repertoiremap, collation, etc...). For many of these categories, various other specification methods might be better suited.

Arnold gives a little history - POSIX asked WG20 to enhance the I18N functionality, JTC1 rejected the proposed cultural registry standard (due to lack of participating NBs).

Soor: in Cairo, we tried to make the standard more palatable to NON-POSIX systems. Especially Java would be very much in need for the cultural data. We need to pick a format for the specification that is easier to use for more vendors, including the architects of the Java I18N efforts.

Michael and Baldev: lets restrict the work to POSIX and get on with it.

Arnold gives his personal opinion that POSIX is "dead". Sun: POSIX is not dead! We want to come up with specification of cultural conventions. We have to take care of POSIX and also other systems.

Baldev: if 14652 should be valuable for other systems than POSIX, we need to change the format of the specification.

Question: what is more valuable for the IT community - the information or the format in which the information is presented ????

Decision: make clear in the scope that this standard is more POSIX related. Keld promises to provide BNF syntax. Will look like BNF syntax used in the POSIX or X/Open standards. Japan can accept the work if new syntax is used and if new functionality is added. Suggests to fall back to CD stage.

A Japanese suggestion to work on registration procedures was countered by the convener, that this work item had been rejected by JTC1 and no new work item has been requested by WG20.

Suggested title change to "Specification method for cultural conventions". Resolution. Action: Include in convener’s report.

J#08 and J#09: accepted

J#10: accepted - ellipsis will be fully defined in the BNF syntax.

J#11: accepted

J#12: use UCS names where ever symbolic names are needed in the document. accepted in principle.

J#13: comment_char - Keld to check, if allowed.

J#14:

The editor will propose a disposition of comments to all the commenting NBs and try to resolve the issues of syntax. Substantive issues such as the contents of new LC_types, must be discussed here. Other issue - paper based vs computer based. DK suggested some new categories. Japan has BIDI issues.

No: spelling, hyphenation, transliteration,

Japan wants:

BI-DI, other time formats (JIS X0301)

USA is opposed to putting more things into FDCC-sets, as this would further split the I18N conformance.

Character properties must not be specified in the cultural convention, but are properties of the character itself. Add no new things that are not needed for POSIX compatibility.

LCC_type has many parameters that are contentious. BIDI is a point in case.

Resolution: (proposed by Michael Everson) that the liaison to SC2/WG2 requests WG2 that properties be defined for all characters in ISO 10646.

Action - Winkler: check the possibility of making it a TR.

Fujimura: J19 and CAN #4b - what is an "outdigit"? Keld: Output must be defined in the locale (example: Saudi Arabia uses only Arabic numbers, even in English text). If it is numeric formatting, it should be in the LC_NUMERIC section. Makes sense only with IS 15435. There is discussion, if the specification standard needs extension before the API standard is approved.

Progression for 14652:

Draft disposition of comments by end of July

3 weeks for comments by participants

On September 7 we will have a agreed disposition of comments.

On September 21 a new draft of 14652 will be available

Further discussion in the meeting in Israel.

 

13. Internationalization API standard

Use of X/Open specifications OK, if credited to X/Open. Petr Janecek on 6/4/98 - e-mail followed, announcing that the OpenGroup board will decide in its next meeting.

There are various ways to write the specifications language independent. Keld gives examples - the group recommends :

procedure setlocale (category, string, localename)

parameter: 1: category input

2: string input

3: locale output results

Ken: any API specification is a rather large document, clarity is required, not compactness, shortness ...

Relation to 14652 is important, Keld will provide a paper that shows, which APIs deal with which 14652 specifications.

Progression of 15435:

New working draft by next meeting in Israel.

Registration request before the end of 1998.

 

14. ISO/IEC 10646 Issues

Input method for 10646 is in IS 14755 - very basic methods (pick from screen, UCS identifiers, etc...).

Ken: Windows NT has a method to pick a character from the charmap.

Alain: This method does not provide any information about the character. The standard mandates that the coding and the name be displayed (in users language).

Ken: This might delay the implementation for some time (at least this part).

Ken: A major problem: We are now at amendment 27 of 10646 - they need to be added to the tables in various WG20 standards. Especially the TR 10176 needs to be re-issued with the extended list. A revision should be done in a way that the TR points to a dynamic table of identifiers - thus the TR does not need to be changed for every amendment of 10646. The dynamic table needs clear and precise versioning (version number and date at least, as additional info the included amendments. It also needs a version history .

Fujimura: wait for requests from programming language standards writers asking for advise on the use of characters in identifiers.

Ken: can not agree, update must be done fast and with the knowledge of WG2 specialists. Language committees should be put on notice that the table of 10646 characters in identifiers is dynamic and changes with each amendment. This has to be considered in programming language standards.

It is important that the implementers and the standards committees work closely together on such issues. Only one list can exist.

Action: WG20 to request liaison with Unicode consortium in a resolution to SC22 plenary (and convener’s report)

Action: WG20 to request liaison from Unicode Consortium. (Ken).

15. Other business

15.1 Results from JTC1 meeting in Sendai

565

JTC1 resolutions from Sendai JTC1 N5448 98-06-05 admin

572

SC22 chairman’s report on JTC1 plenary and chairmens’ forum in Sendai SC22 N2726 98-06-10 admin

New TD for cultural and linguistic adaptability, consisting of SC2, SC22/WG20, and JTC1 WG5. WG20 remains in SC22. JTC1 WG5 becomes SC35 - secretariat needed.

There will be 2 meetings, one to plan in New York Oct. 2-4, 1998, the other to work out details in Paris on December 2-4, 1998. Nominations for New York: Alain, Keld, Arnold - resolution.

Direction to the participants: Make sure that the voting procedures are in line with the technical contents of the work. Make sure that the agenda is concise and not allow free discussion. Documents should be requested to be well structured and specifically targeted to the task.

15.2 CAW results from Ottawa

557

Resolutions from the CAW - January 1998 CAW 98-01-22 admin

moot, see 15.1

15.3 Convenor’s report for SC22 plenary

566

Convenor’s report to SC22 plenary Sept. 98 Winkler 98-06-19 admin

Add liaison request between SC22 and Unicode.

Add liaison reconfirmation request with SC32/WG3 on SQL.

Add liaison request between SC22/WG20 and SC35 with the relevant working groups.

16. Review of Priorities and Target Dates

reviewed and approved

17. Review of Actions Items from this meeting

reviewed and approved

18. Approval of Resolutions

approved

19. Adjournment

meeting is adjourned.