From keld@dkuug.dk Fri Jun 17 09:03:02 1994
Received: by dkuug.dk id AA19315
  (5.65c8/IDA-1.4.4j for i18n@dkuug.dk); Fri, 17 Jun 1994 07:03:07 +0200
Message-Id: <199406170503.AA19315@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Fri, 17 Jun 1994 07:03:02 +0200
In-Reply-To: ALB@immedia.ca
       "(TC304.190) Full-text searching: don't keep it simple and stupid!" (Jun 16, 17:23)
X-Charset: ASCII
X-Char-Esc: 29
Mime-Version: 1.0
Content-Type: Text/Plain; Charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Mnemonic-Intro: 29
X-Mailer: Mail User's Shell (7.2.2 4/12/91)
To: ALB@immedia.ca, bealle@torolab6.vnet.ibm.com, cpwg-mail@revcan.ca,
        paref@vm1.ulaval.ca, umavs@torolab6.vnet.ibm.com
Subject: Re: (TC304.190) Full-text searching: don't keep it simple and stupid!
Cc: i18n@dkuug.dk, sc22wg20@dkuug.dk, tc304@dkuug.dk

ALB@immedia.ca writes:

> Subject  : Full-text search: don't keep it simple and stupid
> 
> >Keld, my company (which produces a full text search product) is
> >attempting to establish character classes for various European
> >languages.  For most such languages, our users prefer that we
> >ignore case and accents.
> >
> >However, Danish seems to have some exceptions to this.  An 'O'
> >with a slash is treated as a separate letter.  Are there others?
> >For example, would users be upset if a search for "angstrom"
> >ignored the ring, or conversely, would they be upset if a search
> >with the ring did NOT find ones without (and vice-versa)?
> >What is normal practice in Denmark?
> 
> Keld answered, legitimately and correctly:
> 
> >In Denmark, the letters O WITH STROKE, AE and A WITH RING are genuine
> >letters and people would be very upset if it is not handled as such.
> 
> Now I think for French (and perhaps German and other languages too), the answer
> is unfortunately not as simple.

I agree with Alain, that a number of parameters should be available,
so different searches (for example with regards to precision) are
possible. 

The point in my above comment was that a cultural requirement is also
needed as a parameter, and that is not listed in Alain's model.
Or maybe you could say it is implicitely included, as the comparison
is done on a sorting algoritm - which may be cultural dependent,
as per the different national POSIX locales available.

Keld
