WG15 Defect Report Ref: 9945-2-29
Topic: regular expressions

This is an approved interpretation of 9945-2:1993.


Last update: 1997-05-20


	Class: No change


	Topic:			regular expressions
	Relevant Sections:	2.8

Defect Report:
       Please provide an interpretation	of the following taken from
       Section 2.8 of ISO/IEC 9945-2:1993.

       I think I know what the specified behavior is for the
       following cases,	but maybe I've opened an interesting
       question	or two.

       Given a locale in which "ch" is a multiple character
       collating element that collates between "c" and "d", then
		 [[.ch.]]	 matches "ch".

       This makes it pretty clear that
		 [^[.ch.]]	 doesn't match "ch" (and not even
		 just the "c").

       Therefore, consistency argues that
		 [^c]	 matches "ch"
       And, of course,
		 [c]	 doesn't match "ch" (and not even just the

       If we're	in agreement so	far, then the simple rule is that
       if the string to	check against a	bracket	expression can be
       taken as	a multiple character collating element,	then the
       matching	process	must do	so.

       I'm pretty sure about the above.	 What I'm not so sure about
       is the behavior for character classes.  Take, for example,
       when presented with "ch".  The rationale	for POSIX.2
       confirms	that ``character classes are not intended to
       include collating elements''.  However, there are still two
       possible	answers: "ch" doesn't match, and the "c" of "ch"
       matches.	 I like	neither	of these answers; neither fits my
       intuitive belief	that "ch" should match as a unit.  Even
       worse, the nonportable
		 [a-z]	 *does*	match the unit "ch"!

       What is actually	specified for [[:alpha:]] here?

WG15 response for 9945-2:1993 

A character class expression is defined in section of the
standard, as a set of characters belonging to a character class, as
defined in the LC_CTYPE category of the current locale.  A range
expression is defined in the same section as a set of collating elements
that fall between two elements in the current collation sequence,

Thus, a collating element ch, which is not a character, would be matched
by the range expression [a-z], but not by the character class (set of
specific characters specified in the locale file) [:alpha:].  [:alpha:]
would match the 'c' and the 'h' individually, for the same reason that
the expression [c] matches the 'c' in ch, but not the collating element

Rationale for Interpretation: