WG15 Defect Report Ref: 9945-2-24
Topic: tr

This is an approved interpretation of 9945-2:1993.


Last update: 1997-05-20

	Class: Defect situation

The standards states what it states, and conforming implementations
must conform to this. However, concerns have been raised about this
which are being referred to the Sponsors of the standard for consideration as
a future amendment.


	Topic:			tr
	Relevant Sections:

Defect Report:

       Component:  tr -	Sect
       Submitted by:  Alex White
       Ref. No.:  tr.1
       Proposed	Resolution:
       The interpretation request correctly describes what is in
       the standard but	this was not what was intended.	 The
       working group will draft	and propose a change to	.2b to
       describe	what was originally intended.

          In Section - Standard Input {of tr},  the  standard 
          states that the standard input  to  tr  ``can  be  any  file 
          type.'' [Draft 12 of ISO/IEC 9945-2:1993  (July  1992),  p. 
          483, line 10456] 
          However, in Section  -  Environment  Variables  {of 
          tr},  the  standard  states  that  the  LC_COLLATE  variable 
          ``shall determine the behaviour  of  range  expressions  and 
          equivalence classes.''  [Ibid., p. 483,  lines  10499-10500] 
          and in Section 4.64.7 - Extended Description  {of  tr},  the 
          standard states that the \octal construct 
               [...] can be used  to  represent  characters  with 
               specific coded values.  An  octal  sequence  shall 
               consist of a backslash  followed  by  the  longest 
               sequence  of  one-,  two-,  or   three-octal-digit 
               characters (01234567).  The sequence  shall  cause 
               the character whose encoding is represented by the 
               one-, two-, or three-digit  octal  integer  to  be 
               placed into the array. 
          [Ibid., p. 484, lines 10525-10530] 
          These two statements cause tr to be unusable on any files of 
          type other than text.  Historically,  tr has  been  used  to 
          manipulate files containing binary data.  For  example,  the 
          perfectly valid, and useful construct: 
               tr -d '\200-\2ff' 
          to delete all characters with the top bit on or even 
               tr  '\200-\2ff' '\0-\1ff' 
          to strip the top bit (which are useful operations on  binary 
          files), no longer work. 
          For example, in the PC character set, \200 is  a  C-cedilla, 
          and \2ff is not defined as a glyph.  Therefore, according to 
          section,  the  most   likely   interpretation   is 
          characters which collate from C-cedilla (probably the letter 
          D) through the end will all match  here.   This  is  clearly 
          wrong, not historical practice, and of no use whatsoever. 
          May we interpret the standard  as  permitting  octal  escape 
          sequences as endpoints of a range to not use  the  collating 
          order, but rather byte ordering? 

WG15 response for 9945-2:1993 

The standard is clear in its requirement that octal sequences used as
endpoints in a range be treated as collating elements. The
implementation must follow this requirement. Concern over the wording of
this area of this standard has been forwarded to the sponsors.

Rationale for Interpretation: