From kuhn@cs.purdue.edu  Fri Oct 11 03:05:13 1996
Received: from arthur.cs.purdue.edu (root@arthur.cs.purdue.edu [128.10.2.1]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id DAA07880; Fri, 11 Oct 1996 03:05:11 +0100
Received: from ector.cs.purdue.edu (root@ector.cs.purdue.edu [128.10.2.10]) by arthur.cs.purdue.edu (8.7.6/PURDUE_CS-1.4) with ESMTP id VAA27621; Thu, 10 Oct 1996 21:05:06 -0500 (EST)
Received: from localhost (kuhn@localhost [127.0.0.1]) by ector.cs.purdue.edu (8.7.6/PURDUE_CS-1.4) with SMTP id VAA00071; Thu, 10 Oct 1996 21:05:03 -0500 (EST)
Message-Id: <199610110205.VAA00071@ector.cs.purdue.edu>
To: "Alain LaBont/e'/" <alb@sct.gouv.qc.ca>
cc: brannon@isocs.iso.ch (Keith B.), sc18wg9@dkuug.dk, sc22wg20@dkuug.dk
Subject: Re: (SC22WG20.1653) Personal contribution to ISO/IEC on some problems with WinWord or RTF formats for ISO/IEC electronic document distribution 
In-reply-to: Your message of "Thu, 10 Oct 1996 16:07:14 -0400."
             <199610102012.VAA00991@dkuug.dk> 
Date: Thu, 10 Oct 1996 21:05:01 -0500
From: kuhn@cs.purdue.edu ("Markus G. Kuhn")

In message <199610102012.VAA00991@dkuug.dk>, "Alain LaBont/e'/" wrote:

> I already signaled, as an editor, problems that were encountered in SC18/WG9
> with incompatibility of different national versions of WinWord format, which
> is one of the generic formats recommended by JTC1 as an acceptable format.

We had many discussions like these very often before, therefore I'll
keep my points brief:


File formats for the electronic distribution of ISO documents
-------------------------------------------------------------

WinWord as a document distribution format is *absolutely* unacceptable
for the following reasons:

  - The Microsoft Word BNF ("binary native format", this is the official
    Microsoft term) is not designed to be compatible across

      - Word versions
      - national variants of Word
      - platform variants of Word

    Microsoft explicitely made no committment to keep any compatibility,
    they only promise to provide upwards compatibility filters in new
    versions on the same platform in the same national version (see
    Word 7.0 BNF documentation, available from Microsoft on request).

  - There exist no document viewers for non-MS operating systems. At
    universities and research labs, it is VERY common that the desktop
    machines in daily use by scientists and developpers use operating
    systems like Solaris, HP-UX, Linux, IRIX, Plan9. If anything done by ISO
    is politically incorrect, then it is to require members of
    committees to purcase otherwise unneeded software products from
    Microsoft Inc. just in order to be able to access electronicly
    distributed documents.

  - WinWord requires to reformat documents if the output paper size is
    changed, which causes trouble as long as North America continues to
    use the 216x279 mm "U.S. Letter format" instead of ISO A4.

  - Because of WinWord macro viruses, in some environments the
    usage of externally supplied WinWord documents is not allowed
    by site security regulations.

WinWord and its BNF are certainly a suitable tool for preparing and
maintaining a working draft of a document locally by the editor, but
so are many other excellent wordprocessing tools (Framemaker,
Interleave, WordPerfect, etc.).

The same applies for RTF.

The clearly best final form document distribution format currently
(october 1996) available is Adobes' PDF (Portable Document Format).

  - There are numerous *freely* available PDF viewer applications
    available from various developers, for example on

      <URL:ftp://ftp.cs.wisc.edu/ghost/aladdin/>  (files ghostscript*NNNN*)
      <URL:http://www.adobe.com/acrobat/readstep.html>
      <URL:http://www.contrib.andrew.cmu.edu/usr/dn0o/xpdf/xpdf.html>

  - PDF ensures that the final results are identical on all output
    devices (even if the paper formats differ slightly like A4 vs.
    U.S. letter).

  - PDF files can be created either with the commercial Adobe Acrobat
    software or with the freely available Aladdin ghostscript 4.01
    software (URLs see above).  Many popular word processing tools
    come with drivers that allow to write PDF files.

  - PDF is a technology very similar to PostScript, with the following
    advantages:

      - PDF has been specifically designed for archiving and distributing
        final form electronics documents (unlike PostScript and all the
        native wordprocessing files).

      - PDF does not have the portability problems that appear when
        printer specific PostScript commands are used

      - PDF is a compressed file format, while you have to use
        PostScript together with a some compression software, which
        is inconvenient especially for the inexperienced user.

      - PDF compresses pages individually, which reduces
        memory requirements and increases acess speed compared to
        PostScript compressed with some external tool.

      - PDF enforces an overall document structure (pages, etc.),
        while PostScript is a universal programming language with
        even some security problems (although not as serious as the
        WinWord macro virus problem).

      - PDF allows on modern Web servers (with byte file addressing
        extension) to browse through individual pages of PDF files
        without downloading the whole file.

      - PDF is very well integrated with Web browser technology like
        Netscape and MS-Explorer. If you select a PDF link, either your
        PDF browser will be started or you get directions of how to install
        a PDF browser.  Even my little sister was able to install a
        PDF browser with Netscape under Win95 recently.

For these reasons, practically all electronic components manufacturer
web sites offer data sheets today in PDF and this is a document
delivery application *very* comparable to the distribution of ISO
standard drafts and related documents. (for one of many examples, look
at <URL:http://www.dalsemi.com/DocControl/PDFs/pdfindex.html>.

The best way to send PDF files by e-mail is to append it as a
MIME-encoded attachment with the "application/pdf" content type
indicator. This way, any decent e-mail software will automaticly start
a PDF viewer when the mail is displayed. If your e-mail software
doesn't, you really should look for a new one. If you want to test
this on your system, just ask me for a test e-mail.

PDF uses the PostScript font technology. There exist now freely
available high quality replacements for the common standard PostScript
fonts (see GhostScript 4.0 distribution, URL above). This effectively
makes both PDF and PostScript a non-proprietary technology as you can
now work well with these formats without hanging any Adobe software.

Summary:

  - PostScript is a nice document distribution format.
  - PDF is better than PostScript.
  - both are not dependent only any specific products
  - WinWord BNF is only suitable for local use by the editor(s)
    of a document (as are a number of other native wordprocessor
    formats) and not for wide distribution to readers.
  - For very simple non-technical texts, HTML and ASCII text are also
    good solutions.

This text is my short summary of my experience gained after several
years of frustrating experience with other document file formats and
electronic document solutions.  It is based on very careful and
indepth research (e.g., I have the full WinWord BNF documentation) and
a lot of practical experience with various systems in various
environments on various platforms. Please do not dismisse this easily.
I can easily provide further help, references and comments on
alternative file formats if you are interested.

Suggestion: Let's convert Alain's recent drafts to PDF and see whether
we encounter any significant difficulties!

I was negatively surprised when I first saw the ISO electronic
document distribution guidelines and hope to be able to contribute to
a better next version.

Feel free to redistribute or reuse this text.

Markus

P.s.: I did not talk about standardized document preparation tools,
because what I consider to be the ideal solution here (a system based
on James Clark's "Jade" SGML/DSSSL engine) will need at least half a
year more development before reaching maturity. If you want to have an
early look at what might become the probably final and ideal SGML
based solution, check <URL:http://www.jclark.com/>.

-- 
Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email: kuhn@cs.purdue.edu
