From carson@siggraph.org  Fri Mar  6 06:11:17 1998
Received: from siggraph.org (siggraph.org [205.168.252.205]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id GAA00297 for <SC24@dkuug.dk>; Fri, 6 Mar 1998 06:11:09 +0100
Received: from study.huntleigh.net by siggraph.org (SMI-8.6/SMI-SVR4)
	id WAA06027; Thu, 5 Mar 1998 22:09:07 -0700
Message-Id: <3.0.32.19980305221853.00fc66a0@siggraph.org>
X-Sender: carson@siggraph.org
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Thu, 05 Mar 1998 22:19:31 -0700
To: SC24@dkuug.dk, vrml-mpeg4@vrml.org
From: Steve Carson <carson@siggraph.org>
Subject: Text only version of comments on MPEG-4
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

The SC 24 Secretary will submit these comments to SC 29 at the close of
business today, Friday 6 March 1998. Please e-mail any concerns or
corrections to bothg myself and Dick Puk (carson@sigggaph.org and
puk@igraphics.com) as soon as possible. Some formatting will not be correct
in this text version so a MS Word version follows.

Comments
from
ISO/IEC JTC 1/ SC 24
on=20
ISO/IEC CD 14496-1 (MPEG-4)=20
(ISO/IEC JTC 1/ SC 29 N2291)

The following comments were prepared by the national bodies of ISO/IEC JTC
1/ SC 24 and the VRML Consortium through its Category C Liaison with
ISO/IEC JTC 1/SC 24.
General Comments:
1. Since CD 14496-1 (MPEG-4 Systems) makes normative reference to both the
abstract syntax and the semantics of ISO/IEC 14772-1 (VRML) there must be a
normative mapping included within CD 14496-1 (perhaps in a normative annex)
that precisely defines how:
a) the abstract syntax of the CD 14496-1 data stream maps to the abstract
syntax of VRML in sufficient detail to determine that the conformance
requirements of ISO/IEC 14772-1 regarding abstract syntax are being met;
b) nodes that listed in Clause 7.2 as "common" between CD 14496-1 and
ISO/IEC 14772-1 map;
c) nodes that are unique to CD 14496-1 (and not present in VRML) but that
have functionality related to presentation of and interaction with
information can be realised as conforming extensions of ISO/IEC 14772-1
using the available extension mechanisms (PROTO and EXTERNPROTO).
 The above information should be provided for the 2D, 3D, VRML and complete
profiles as defined in Sub-clauses 7.8.1.1 through 7.8.1.4, but not
necessarily for the audio profile as defined in Sub-clause 7.8.1.5.
2. There is no RER document accompanying this specification as required by
JTC1 Procedures. This document should be created and should accompany the
FCD ballot.
3. It is not clear how the MPEG-4 standard plans to cope with the evolution
of the VRML standard. In consequence, it is likely that incompatibilities
between VRML and MPEG-4 will arise as VRML is extended.   For example, what
happens if a VRML extension is similar to, but incompatible with, an
extension defined by CD 14496-1? As the normative parts of this document
refer frequently to the VRML specification, VRML must be a normative
reference. As a consequence, SC29 should investigate how the two should
stay in step. SC24 and the VRML Consortium believe that simply referencing
a specific version of ISO/IEC 14772-1 is an insufficient alignment
strategy, since both VRML and the many commercial products that conform to
it will evolve in an orderly and planned way. A plan for maintaining
alignment should be formulated and published.
4. The way the semantics for the new nodes is described in the MPEG-4
document is muddled with the details of the encoding. The semantics and
abstract syntax of nodes defined in MPEG-4 should be given first with the
encoding of those nodes separately defined later in the standard. This will
make it much easier to distinguish between semantics, syntax, and encoding.
5. New nodes to support 2D have been included in CD 14496-1 to satisfy some
perceived requirements of MPEG-4. Many of these requirements are also
useful for users of VRML. However, the design provided in CD 14496-1 does
not integrate well with the ISO-standardised VRML functionality. The VRML
Consortium has initiated a project to carefully architect 2D functionality
into VRML which not only will satisfy the MPEG-4 requirements but also
satisfy on-going requirements of the VRML Community. A preliminary design
for these nodes is attached as Exhibit A which is considered an inherent
part of these comments. It should be emphasised that this preliminary
design is only an initial draft, but it is the base document for the first
planned amendment to ISO/IEC 14772-1. Work PDAM 1 to IS 14772-1 is underway
but could not be completed in time for these comments. The intent of the 2D
design present in Exhibit A is to minimise the number of nodes while still
meeting all MPEG-4 requirements and allowing the specification of a 2D-only
profile for VRML. While the encoding of the new nodes in binary format will
not be part of this amendment, the same techniques used by MPEG-4 for
creating a binary encoding of the 3D nodes should be straightforward.
6. Throughout:  References to IS 14772-1 refer to "Sections". However,
International standards refer to such subdivisions as "Clauses". See Part 3
of the ISO Directives for rules on specifying references to sub-clauses of
other standards.
7. For all Semantic Tables: There is semantic information in the font used
to specify the field/event names as defined in IS 14772-1. This same
semantic information has been lost from the semantic tables contained in
CD14496-1 and should be recovered.
8. For all Semantic Tables: Events (both eventIns and eventOuts) do not
have default values. The entry in the tables for these items should be=
 blank.
9. Each of the new nodes being introduced which define geometry (e.g.,
Circle and Curve2D) should have accompanying figures illustrating the
geometry.
10. For all Semantic Tables:  A format for this table was agreed at the
Fribourg meeting but was not used in the preparation of CD 14496-1. Since
the details of the agreed format are critical for the binary encoding, an
additional ballot round at CD (not FCD) is requested to provide adequate
review time.
11. It is absolutely crucial that ALL nodes defined in IS 14772-1 be
supported in the first ISO/.IEC 14496-1 standard. Since several nodes have
been left out of this specification, it must be considered incomplete.
Unless all nodes are supported, use of VRML content will not be possible
and implementations of IS 14496-1 will not able to be considered conforming
implementations of IS 14772-1. This would be unfortunate for both MPEG-4
and VRML.
12. A critical part of the VRML functionality has been left out of CD
14496-1. This is the PROTO and EXTERNPROTO mechanism. These nodes are
needed to support effective world authoring. PROTO increases a compression
ratio of a single BIFS stream if there are repeats of similar scene graph
parts. The implementation is not particularly difficult because it is quite
similar to macro in other programming languages. EXTERNPROTO increases a
compression ratio of multiple BIFS streams if there are repeats of similar
scene graph parts in them. Again, the implementation is not particularly
difficult because it is quite similar to macro in other programming
languages. The special form of the EXTERNPROTO mechanism which allows
browser-specific implementations need not be supported. The EXTERNPROTO
mechanism is a highly effective means of pre-caching node definitions at
the end user site (i.e., the terminal) so that they need not be downloaded
with every world. The VRML Consortium is even now in the process of
identifying a standard object library which can be downloaded once and then
used by all worlds which wish to access the objects. Such objects include
PROTO definitions, textures, and audio clips. Most VRML content uses the
PROTO and EXTERNPROTO facility to make authoring easier and more efficient.
13. Pixel coordinate addressing is incompatible with VRML and should not be
used. The Image(2D) (sub-clause 7.2.5.2.2.8) and VideoImage(2D) (sub-clause
7.2.5.2.2.24) nodes can define the rendering of their content such that the
content elements are mapped to single pixels. However, the positioning of
these nodes should be in standard transformation units.
14. The portion of the specification concerned with Facial Animation (in
particular, sub-clause 7.2.3.3) effectively fills the need for which it was
designed. However, this design uses one rather narrowly focused animation
technique and does not allow for easy integration with other techniques. It
is difficult to author and places a heavy burden on authors who wish to use
other techniques. Other valid techniques include (but are not limited to)
keyframed CoordinateInterpolators, morph targets, joint animation, texture
map animation, and animated "cut-out" shapes. There are also other
parameter systems for facial animation. It is important that animators who
want to use other techniques not have to carry the burden of FAPs (or any
other high-level technique) by default. A structure such as the one being
developed by the VRML Consortium H-Anim Working Group should be adopted to
allow selection of the most appropriate technique.=20
Technical Comments:
1. Figure 7-1:  The terms DMIF and CB have not been previously defined.
2. Clause 7.2.1.2: In the second sentence, the term "syntax" is not
correct. Both the syntax and the semantics of the constructs must be
specified for a scene description to be properly presented.
3. Clause 7.2.2.1: the term "attribute" is typically used in computer
graphics to specify modifiers of geometry. It is suggested that the more
general term "property" be used to avoid confusion. Note that, in VRML,
nodes have both geometric and appearance properties. Typically, the
appearance properties take on the term attributes. Thus, bullet item 1 in
this sub-clause becomes more clear. As currently stated, there is no bullet
item indicating that the geometry of a node (or the pixels in an image
texture) are part of the scene description when clearly they are. It is
also unclear whether geometric nodes such as a sphere have "audio/video
properties". Typically, throughout the industry the term "audio/video"
refers to aural and visual (video) streams only.
4. Clause 7.2.2.2, 1st paragraph. This paragraph is inaccurate. VRML
specifies a complete spectrum of audio/video as well as synthetic graphics
elements within a scene description. This paragraph should be reworded as:
"The BIFS scenes are described conforming to the provisions of IS 14772-1
with additional BIFS-specific nodes. The combination provides the following
features:
=B7 2D only primitives
=B7 3D only primitives
=B7 A mix of 2D and 3D primitives, in several ways:
=B7 2D and 3D complete scenes layered in a 2D space with depth
=B7 2D and 3D scenes used as texture maps for 2D or 3D primitives
=B7 2D scenes drawn in the local X-Y plane of the local coordinate system in
a 3D scene"
In addition, the term "primitives" is not used in VRML and should not be
used in BIFS. It is suggested that this term be replaced by "geometric
nodes". Note also that there is nothing in VRML that keeps a properly
formed VRML world from being composed only of aural nodes with no geometric
nodes present.
5. Clause 7.2.2.3: The "2D coordinate system" should not be finite in
extent. In fact, 2D coordinate systems should be considered a degenerate
case of 3D coordinate systems. When a Viewport2D node is specified, any
limitations on a conceptually infinite 2D coordinate system should then be
stated as part of this environmental construct.
6. Clause 7.2.2.3, 1st paragraph:  This paragraph implies that coordinate
units may be non-square. This should, in fact, only happen when non-uniform
scaling is being applied by a Transform node. Conceptually, all coordinate
units should be considered of the same size unless such scaling is being
applied. The mapping of any coordinate system (2D or 3D) to the rendering
surface should assume that no such scaling occurs during this mapping. It
should be up to the implementation of the compositor to adapt for any
non-square pixels in the implementation hardware.
7. There is no clear definition what should happen when the aspect ratio of
the top Layer2D and device display area are different. The possible
mappings from the top Layer2D to the device rendering area are:=20
=B7 fit the longer edge keeping the aspect ratio and allow blank area beside=
s
shorter edges;
=B7 fit the shorter edge keeping the aspect ratio and allow clipping for the
longer edge direction;
=B7 fit both edges without keeping the aspect ratio;
=B7 leave it as it is described in Layer2D;
=B7 allow the aspect ratio and extent of the Layer to be different and
independent of the display surface.
 The above mappings should be defined, and choosable from the content.
8. Clause 7.2.2.5: The first row of the table is not correct. IS 14772-1
considers all coordinates to be specified in meters. There should be no
difference between 2D and 3D coordinate units.
9. Clause 7.2.2.8.2: The term "runtime" is nowhere defined.
10. Clause 7.2.2.9: 1024 is an inadequate number of identifiers. It is
quite easy to create complicated worlds that require many more than that
number. Note that there are essentially an infinite number in VRML since
arbitrary combinations of the allowed characters may be used. It is
suggested that a BIFS control node be provided which specifies the number
of bits which are to be used for this purpose. Alternatively, a 32-bit
field would effectively remove this restriction.
11. Clause 7.2.2.12.1.1: The use in this paragraph of "For instance" is
inappropriate in an ISO/IEC standard. Each such item required by BIFS
should be enumerated.
12. Clause 7.2.2.13.7.1: "ROUTEs" are not nodes. The third action makes no
sense as currently written.
13. Clause 7.2.2.14.1: In the 6th paragraph, it is not clear what the text
"[number 1 through k(1)]" means. It would seem that the range being
indicated has one data type (1) on one end and another data type (k(1)) on
the other. Should this be "1 .. j" or "k(1) .. "k(j)"?
14. Clause 7.2.2.14.5: 14. Clause 7.2.2.14.5:  This concept is useful and
is not limited to 2D. But placing ordering information in the child nodes
leads to inefficient traversals. This information should be in a parent
node explicitly designed for ordering. An OrderedGroup node should be
specified which would behave identically to the Group node (and can contain
both 2D and 3D nodes within the group) but would also specify a drawing
order field.  See Exhibit A for more information.
15. Figure 7-10 is highly confusing and nowhere described. It should both
be referenced from the text and its content described.
16. Clause 7.2.3.1.2: This description does not conform to VRML. VRML
requires that multiple WorldInfo nodes be supported and that they occur
anywhere. These nodes are the means of providing copyright and various
other information which must accompany the world content. It is also used
by many VRML worlds for parametrizing the worlds via PROTOs so that the
parameters can be accessed by Scripts.
17. Clause 7.2.4.1.10: In the 3rd paragraph past the equations, the
reference to ANSI C is inappropriate. In the case of a binary encoding, all
floating point numbers should be encoded as IEEE Floating Point with an
appropriate entry in Clause 2 for the IEEE standard for floating point
numbers. The ANSI C float specification supplies only to a string
representation of a floating point number.
18. Clause 7.2.4.1.10: The bullet should actually be an enumerated list. In
the first bullet item, it is not clear that the encoding used retains all
precision of the associated floating point value. Such precision must not
be lost since these normals are used in the lighting equations which can be
quite sensitive to the values provided.
19. Clause 7.2.4.2.3.1: The 3rd sentence should be removed since it adds
nothing to the description and gives the false implication that only 2D
nodes need this facility. In actual fact, VRML requires that nodes in a
group be identified by position in the children list.
20. Clause 7.2.5.1.2.1.2: The differences between AnimationStream and
MovieTexture nodes should be enumerated here. It is  difficult to try and
identify these differences from reading the detailed semantics. It is also
not clear what is being streamed by the AnimationStream node.
21. Clause 7.2.5.1.2.1.3: Since MovieTexture nodes are only used for
textures, it is not clear that the same semantics apply to AnimationStream
nodes which seem to be a series of commands which may do alterations to the
scene graph and other actions. If the AnimationStream only applies to
textures, it should be explicitly so stated; otherwise, the comparison with
MovieTexture nodes should be removed and the exact behaviour of
AnimationStream nodes described. The current detailed semantic seems to
intermix the concepts of AnimationStream node and MovieTexture node
indiscriminately.
22. Clause 7.2.5.1.2.10.3, 2nd paragraph:  Character value 13 only refers
to carriage return. It does not imply a linefeed.
23. Clause 7.2.5.1.2.10.3, 3rd paragraph. This is no such thing as a
FontStyle2D node. The correct node type is FontStyle.
24. Clause 7.2.5.1.2.11: While this node is unnecessary, it could be left
in even though more flexible and powerful functionality is already
supported using Script nodes. As it is, the description is unclear as to
exactly what the semantic of this node is. In addition, this node is
misnamed. The term Valuator in international standards when applied to
information presentation refers to an input class which can return a
continuous value usually derived from some physical input device such as a
dial. This node should be either removed (since Script nodes can provide
the same functionality more flexibly) or, at a minimum, renamed to be
something else (e.g., EventMapper).
25. Clause 7.2.5.1.3.2.1: The first field of the VRML node is missing and
must be supported for conformance.
26. Clause 7.2.5.1.3.2.3: The 2nd paragraph mentions a children field but
there is no children field in this node. In any case, that statement is
meaningless since an AudioClip node only accesses a single sound source.
27. Clause 7.2.5.1.3.7.2: This sub-clause contains statements which attempt
to duplicate information in IS 14772-1 but may actually change the semantic
of the node so as to be non-conforming. Only information which restricts
the semantic in a manner unique to MPEG-4 should be included.=20
28. Clause 7.2.5.1.3.7.2: The question "(CH: What do we do if it is not
available?)" was posed in the text. This is obviously an editorial comment
left over from an early draft. The answer should not occur or the world is
non-conforming.
29. Clause 7.2.5.1.3.10.3, 2nd paragraph:  A list of valid nodes should be
specified.
30. Clause 7.2.5.1.3.10.4, 2nd paragraph: The text "apparent spatialization
position" should be replaced by "geometry".
31. Clause 7.2.5.2.2.1: This node is unnecessary and should be removed
since it duplicates the functionality of the Background node. It is easy to
define a PROTO which restricts the Background node to the desired
functionality. Note that it has been suggested that if the skyColor field
is set to an empty Color node, a reasonable interpretation would be define
the sky as being transparent. This is allowable in ISO/IEC 14772-1 and
could be mandated in CD 14496-1.
32. Clause 7.2.5.2.2.6.3: This description discusses width and height
fields when the semantic table only contains a size field. The detailed
semantics should be rewritten appropriately to refer to the size field..
33. Clause 7.2.5.2.2.7: This node is unnecessary and should be removed. The
current Group node is perfectly adequate for the concept.
34. Clause 7.2.5.2.2.8.1: This node is unduly restrictive. There should be
an additional field which allows the position of the image to be specified.
35. Clause 7.2.5.2.2.8.2: This node has nothing to do with the ImageTexture
node since this is not a texture node. References to the ImageTexture node
should be removed from the Detailed Semantics.
36. Clause 7.2.5.2.2.8.3, 2nd paragraph: It is more common for this type of
node to be positioned using the upper left corner.
37. Clause 7.2.5.2.2.8.3: It is not clear from the semantics whether this
node behaves like a Billboard node (always orienting its image to the
screen plane) or like other 2D nodes where the image is painted on the
current z=3D0 plane and thus is subject to transformation.
38. Clause 7.2.5.2.2.9: This node is unnecessary and should be removed. It
has the same parametrization as the current IndexedFaceSet node. The
dimensionality can easily be determined by the Coordinate node in the coord
field.
39. Clause 7.2.5.2.2.10: This node is unnecessary and should be removed. It
has the same parametrization as the current IndexedLineSet node. The
dimensionality can easily be determined by the Coordinate node in the coord
field.
40. Clause 7.2.5.2.2.11: This node is unnecessary and should be removed.
The current Inline node is perfectly capable of inlining either a 2D or 3D
world.
41. Clause 7.2.5.2.2.12.3, 3rd paragraph:  The list of preferred text
breaking points should include "after hyphens".
42. Clause 7.2.5.2.2.13:  If linewidth is to be specified, it is also
necessary to specify fields defining cap and join style.
43. Clause 7.2.5.2.2.13.2: This node should apply to all nodes which
generate lines. Whether they generate the lines in a single plane or not is
independent of the dimensionally.
44. Clause 7.2.5.2.2.13.3: The lineStyle field provides a very restrictive
set of line styles. There is already an international standard for
specifying line widths. See the specification in IS 9592-1:1997 (PHIGS). A
similar mechanism is specified in IS 8632:1992 (CGM). It is not necessary
to use an indirect specification but the technique for describing an
arbitrary dash pattern are what is needed. It is suggested that a lineStyle
node be created which contains the "dash cycle repeat length" (in meters)
and the list of dash segment lengths specified in arbitrary units relative
to one dash cycle. Then the lineProperties node would contain fields
specifying the Adaptability, Continuity, and Offset along with a field for
the lineStyle.
45. Clause 7.2.5.2.2.13.3: In the 2nd paragraph after the bullet list, it
is noted that widths are considered geometric entities which are affected
by transformations. This usually prohibits the use of any hardware assisted
wide line capability. In addition, this assumption is computationally
expensive and hence is likely to impact performance. This is especially
true since the width specification is in local coordinate systems which
means that the width can change over the extent of the line. It is
suggested that the width specification instead be considered cosmetic and
unaffected by transformation. This is typically done by allowing an
implementation to choose a nominal width of line to which is applied a
multiplicative linewidth scale factor specified in a field.
46. Clause 7.2.5.2.2.14.3, 1st paragraph: The diffuseColor field should
probably be an emissiveColor field since the diffuseColor field is only
applied when lighting is in effect.
47. Clause 7.2.5.2.2.14.3, 2nd paragraph: The filled field is inappropriate
for a Material node. It should be part of the geometric node definition.
48. Clause 7.2.5.2.2.14.3, 3rd paragraph. The default behavior when this
field is not specified is ill-formed. It forces an implementation to use an
entire pixel for the width thus preventing any effective anti-aliasing from
be applied. Instead a nominal width line should be specified.
49. Clause 7.2.5.2.2.14.3, 4th paragraph. The Shadow node renders a second
polygon under the first, offset by the amount given in the given colour.
This sort of approximate shadow generation has been widely used in
presentation graphics for many years. Unfortunately, it is neither a
powerful nor a useful technique by modern standards and fundamentally
conflicts with shadows from true 3D lighting techniques. While it is true
that some present systems do not implement shadows as part of their
lighting models, the sort of approximation that this node suggests is best
left out of a standard and added manually by the artist using an authoring
system when needed. Note that this shadowing technique would be a good area
for a PROTO implementation. Why is the semantic table specified differently
from the others? An SC24 representative spent considerable time producing
semantic tables in this form by the end of the Fribourg meeting but then
they were not used when all that was needed was to cut and paste them.
Please use one form or the other but not both.
50. Clause 7.2.5.2.2.15.2: This node is not a texture and hence should not
be described in terms of texture nodes. In fact, it is a peculiar form of
geometry which is not well-specified in as much as it is not clear whether
the resultant display is affected by the associated transforms, is
displayed parallel to the screen plane or in the current Z=3D0 plane, or eve=
n
where it is positioned.
51. Clause 7.2.5.2.2.16: This node is unnecessary and should be removed.
The current PlaneSensor node provides all of the functionality.
52. Clause 7.2.5.2.2.17: This node is unnecessary and should be removed.
The PointSet node can be used with the dimensionality determined by the
content of the coord field.
53. Clause 7.2.5.2.2.18: This node is unnecessary and should be removed.
The PositionInterpolator node can provide this functionality.
54. Clause 7.2.5.2.2.21: This node is unnecessary and should be removed..
Shadows should be computed based on the effects of lights in the scene.
VRML allows such shadows to be rendered if a browser wishes. This node
interferes with that at least on a conceptual level.
55. Clause 7.2.5.2.2.23: This node is unnecessary as the Transform node can
be constrained to perform only 2D effects.
56. Clause 7.2.5.2.2.24: This clause seems to duplicate clause 7.2.5.2.2.15
but is more consistently presented. However, it still has the same problems
in its .2 sub-clause.
57. Clause 7.2.5.3.3.13: The semantics of this node are specified in IS
14772-1. Only the restrictions should be specified in this clause.
58. Clause 7.2.5.3.3.15.2: Only the ability to reference BIFS-Updates and
BIFS-Anim should be described here. The second sentence should be replaced
by "The external source may produce BIFS-Updates and BIFS-Anim frames."
59. Clause 7.2.5.3.3.28.2, last paragraph: Here the semantics are undefined
yet in the Group node the semantics are TBD. Both should be specified as
being undefined.
60. Clause 7.2.5.3.3.29.2: This paragraph breaks most current VRML content
and is not conforming to ISO/IEC 14772-1. The semantic should be identical
to that defined in ISO/IEC 14772-1.
61. Clauses 7.2.5.4.2.1 and &.2.5.4.2.2: These nodes should only control
the layering effect and should be combined into a single layer node.
Ordering effects should be controlled if desired by encapsulating the
children in an OrderedGroup node. There is no need to have two nodes for
this control. There should be a single Layer node which can handle children
of any dimensionality. In particular, each layer should be separately
rendered including resetting the depth buffer between layers. We are still
investigating the issue of possibly separating  clipping from layering.
62. Clauses 7.2.5.4.2.1 and 7.2.5.4.2.2:  The size field, including its
implied behaviour, must be better defined. This node would be better termed
a Viewport.
63. Clauses 7.2.5.4.2.3 and 7.2.5.4.2.4:  These should be combined into a
single CompositeTexture node which defines the scene to be used as a
texture. Any valid set of children nodes should be allowed. This
functionality is independent of dimensionality.
64. Clauses 7.2.5.4.2.3 and 7.2.5.4.2.4:  It is not clear why there should
be any restriction on the attachment of sensors to an object to which a
SceneTexture is applied. However, it is reasonable for there to be a
restriction on the interpretation of sensors in the children subtree of the
CompositeTexture node. It is better to allow such sensors to exist but
ignored than to be prohibited. In this manner, existing worlds can be used
to produce a texture by in-lining them. Note that the children of a
CompositeTexture node are not really part of the geometry of the scene
graph of which the CompositeTexture node is a part. Instead, there is a
separate local scene graph used only to produce the texture represented by
the CompositeTexture node.
65. Clause 7.2.5.4.2.5:  This node is redundant and duplicates the
functionality already provided by the CompositeTexture node. This
unnecessarily increases the footprint of the compositer.=20
Editorial Comments:
1. Throughout: The presentation of bullet and numbered lists should be made
consistent and should follow the style defined in Part 3 of the ISO
Directives. Note that these directives indicate that enumerated lists
should use lower case letters for the first enumeration level.
2. Throughout:  The Normal style should be set up to provide leading before
and/or after each paragraph so that proper and consistent leading occurs
between paragraphs. Extra empty paragraphs should not be used to provide
leading between paragraphs.
3. Throughout:  The document should be spell checked.
4. Throughout:  The phrase "enables to" is not acceptable English. It is
used many times throughout the document. Each occurrence should be found
and the surrounding text rewritten to remove it.
5. Throughout:  Several different representation styles for the
contractions used for the terms "two-dimensional" and "three-dimensional"
are used. A single, consistent abbreviation format should be chosen for
both abbreviations and then applied throughout.
6. Throughout:  Equations are typically centred on their own lines in
international standards. It is suggested that this be done in this standard
to insure clarity and proper layout of the equations. See Part 3 of the ISO
Directives for guidance.
7. Table of Contents:  Section 7.2.5.3.3.27 (Semantic Table) is actually a
subsection of the previous section on the Spotlight node.
8. Clause 0.2.2:  The phrase "a known amount of receiver buffers" is poor
English. It is suggested that it be replaced by "a known amount of receiver
buffer resource".
9. Clause 0.2.3:  There is an inappropriate paragraph mark in the middle of
the paragraph.
10. Throughout:  The word "must" should be replaced by "shall" or otherwise
reworded.
11. Clause 0.5.1:  The phrase "must be properly identified" should be
replaced by "require proper identification".
12. Clause 0.5.2:  The second paragraph should have the following text
appended:  "BIFS is an encoding of the elements of IS 14772-1, the Virtual
Reality Modeling Language (VRML) with additional elements and constructs."
13. Clause 0.5.3:  There is an inappropriate paragraph mark in the middle
of the paragraph.
14. Clause 3:  According to Part 3 of the ISO Directives, the contents of
this clause belong in the preceding clause since they are all normative
references.  Clause 3 should then be removed. Also, "DIS 14722-1"
(reference 2) is now ISO/IEC 14772-1. This occurred in December 1997.
15. Clause 3: Reference 7 is inappropriate and should be replaced by a
reference to ISO/IEC 10646 of which it is a part.
16. Clause 2/3: Non-ISO standards being used in MPEG-4 have not been
referenced. These include applicable IETF recommended practices such as the
one for Uniform Resource Locators. See ISO/IEC 14772-1 for proper=
 references.
17. Clause 7:  Why does this clause start on a new page while previous
clauses do not? The document should be consistent.
18. Clause 7.1.2.2:  The occurrences of the abbreviation (e.g.) differ in
presentation format (i.e., "e.g.:" vs. "e.g. "). A consistent format should
be used.
19. Clause 7.1.2.3: Part 3 of the ISO Directives states that notes should
be in a font size two points smaller than the standard presentation size.
20. Clause 7.2: Why is there so much white space before this sub-clause?
7.2 should immediately follow 7.1.4.3.
21.  Clause 7.2.1.3.3:  The use of 1st person constructs is inappropriate
in an international standard.
22. Figure 7-8: This is a Table not a Figure and should be properly
labelled as such. See Part 3 of the ISO Directives.
23. Clause 7.2.2.6:  The first occurrence of the word "different" should be
"differently".
24. Clause 7.2.2.8.1: The 1st paragraph should be reworded as follows: "For
each of the basic data types, single field and multiple field data types
are defined in IS 14772-1:1997, Clause 5.2. Some further restrictions are
described herein."
25. Clause 7.2.2.13.3:  Is this a "Working Draft" as stated herein or a
"Committee Draft" as stated elsewhere? This is apparently left over from a
previous version.
26. Clause 7.2.2.13.5: The spelling of "audio-visual" varies throughout the
document. A consistent spelling should be used.
27. Clause 7.2.2.13.6:  What does the construct "Sub-clause =85" mean? It
appears that there is some sort of linked and embedded resource that was
not included within the file.
28. Clause 7.2.2.13.8: The phrase "allow to trigger events" is poor
English. It should be replaced by "allow triggering of events".
29. Clause 7.2.2.14: The two bullet items are indented too far.
30. Clause 7.2.2.14.1: The font used for this document does not clearly
differentiate between the numeral "1" and the lower case letter "l" thus
making the sixth paragraph confusing. It is suggested that the example
index be changed from "l" to "j" to avoid this problem.
31. Clause 7.2.2.14.2.1: The table in this sub-clause does not have a table
number or table title. It should be both numbered and titled and then
referenced by number in the text. See Part 3 of the ISO Directives for
table title positioning and format.
32. Clause 7.2.2.16: The term "browser" has not been previously defined.
Since this term is not typically used in MPEG-4, it is suggested that it be
replaced by the term "compositor".
33. Clause 7.2.2.17.2:  The first sentence of this sub-clause is poorly
written. It is suggested that the phrase "enables to change" be replaced by
"supports external changes to".
34. Clause 7.2.2.17.2, 2nd Sentence: The term "aspect" is incorrect. A
better term would be "appearance".
35. Clause 7.2.2.17.2, last Sentence: The term "behaviour" is unclear. It
is suggested that this be replaced by "ROUTEs".
36. Clause 7.2.2.17.2.1: The phrase "time instant in time" should be
"instant in time".
37. Clause 7.2.2.17.2.1: The 2nd sentence is poor English and should be
rewritten as follows "However, continuous changes of the parameters of the
scene are best provided using the animation scheme described in 7.2.2.17.3.
38. Clause 7.2.2.17.2.1: The text "The Repeat Scene command enables to
repeat all the updates from the last Replace Scene." is poorly written. It
should be replaced by "The Repeat Scene command may be used to replay all
updates since the last Replace Scene command.".
39. Clause 7.2.2.17.2.1, last sentence: The text "identification of node
field" should be "identification of a node field"
40. Clause 7.2.2.17.2.2: The term "BIFS =96Update" should be "BIFS-Update".
The entire document should be checked for consistent usage and presentation
of terms.
41. Clause 7.2.2.17.3: The text =91" BIFS-Anim "=92 seems to have unnecessar=
y
spaces within the quotation marks.
42. Clause 7.2.2.17.3.1: The text associated with the enumerated list is
improperly formatted.
43. Clause 7.2.2.17.3.1: When two enumerated lists exist in the same sub
clause, the second list begins its enumeration with the first available
enumerant after the end of the previous list.
44. Clause 7.2.4.1.1: The term "associated to" is poor English. It should
be "associated with".
45. Clause 7.2.4.1.3, 2nd sentence: "ony" should be "only".
46. Clause 7.2.4.1.4: In the 1st sentence, the term "def field" is in
confusing. Should this not be "field defined for the node"? Then the
following use of "defined" should be replaced by "provided".
47. Clause 7.2.4.2.5.5: The phrase "consists in" is poor English. It should
be replaced by "consists of".
48. Clause 7.2.4.3: In the penultimate sentence of the 1st paragraph, the
"i.e.," should be removed or the entire parenthetical expression enclosed
in parentheses instead of commas.
49. Clause 7.2.4.3: The 2nd paragraph cannot be understood. It needs
rewriting.
50. Clause 7.2.4.3.2.4:  See Part 3 of the ISO Directives for the proper
format for referencing other parts of the same standard.
51. Clause 7.2.4.3.2.4: The table should have a table title and which is
then used in the reference within the text. See Part 3 of the ISO
Directives for information about table titles.
52. Clause 7.2.5.1.2.1.3:  What is a "VOP"? It is not clear from the
context and is not included in the list of abbreviations.
53. Clause 7.2.5.2.2.1.1: The content of this sub-clause should be kept
together.
 Exhibit A
2D in VRML
A Contribution of the VRML Consortium
5 March 1998
Overview
CD 14496-1 (MPEG-4 Systems) adopts the architecture, abstract syntax and
semantics (including node structure) of ISO/IEC 14772-1 (VRML) as the basis
for BIFS. CD 14496-1 also includes additional nodes designed to satisfy
identified requirements of MPEG-4 for 2D representations and for additional
control structures. After review of the CD 14496-1 specification, the VRML
Consortium finds that the functionality inherent in these extensions is of
general utility to the VRML community. This contribution defines a
different 2D architecture that can be viewed as a refinement of the present
CD 14496-1 architecture. This redefinition both satisfies the requirements
of MPEG-4 and more closely matches the VMRL architecture. This contribution
is the base document for a future amendment 1 to IS 14772-1. Adopting 2D
functionality and a 2D-only  profile of ISO/IEC 14772-1 is a major aspect
of this first amendment.
Note that this amendment will also change ISO/IEC 14772-1 to require
mandatory support for the scripting languages whose interfaces are defined
in Annexes B and C. In fact, this decision was made by the VRML Consortium
in July 1997 but only recently has the state of product offerings matured
sufficiently so that such mandatory support could be required without
severe interoperability problems.
This contribution reduces the number of 2D and 2D/3D integration nodes
without loss of functionality or significant additional overhead of
transmission. It makes the nodes more consistent with the current VRML
nodes and allows for efficient implementation of either a 2D-only subset or
a full 2D/3D set.
Details
New MPEG-4 Nodes for 2D
In sub-clause 7.2.5.2, CD 14496-1 defines these nodes for 2D:
Background2D
Circle
Coordinate2D
Curve2D
DiscSensor
Form
Group2D
Image2D
IndexedFaceSet2D
IndexedLineSet2D
Inline2D
Layout
LineProperties
Material2D
PlaneSensor2D
PointSet2D
Position2DInterpolator
Proximity2DSensor
Rectangle
ShadowProperties
Switch2D
Transform2D
VideoObject2D
Additionally, sub-clause 7.2.5.4 defines several nodes supporting the
integration of 2D and 3D:
Layer2D
Layer3D
Composite2DTexture
Composite3DTexture
CompositeMap
The rationale for a full set of 2D nodes appears to be based on these three
assertions:
1. A full set of nodes allows a separate 2D-only profile.
2) A full set of nodes allows specifying coordinates in 2-space rather than
3-space.
3) A full set of nodes allows optimised transformation and rendering of 2D
shapes.
One of the motivations for these nodes as cited in discussions between the
VRML and MPEG-4 communities is to allow for drawing interaction controls on
the MPEG-4 terminal display. The user can then use this "dashboard" to
interact with the world. In VRML, this is normally done outside the world
using browser controls or using html in other frames. We understand that
these options are not considered possible for many uses of MPEG-4.=20
Regarding the first assertion, it is certainly possible to have a 2D-only
profile comprising a subset of 3D nodes, perhaps even with diminished
functionality (e.g., IndexedFaceSets must have z=3D0).

The intent of the second assertion can be provided by defining a
Coordinate2D node which has only two coordinate components. This can then
be binary encoded by having a QuantizationParameter specifying that all 3D
coordinates are specified with 2 values (x and y) with an implied z value
of 0. Alternatively, the implied z value could be specified in the
QuantizationParameter for greater flexibility.=20

Regarding the third assertion, a 2D-only profile would have sufficient
restrictions and defaults to allow an optimised transformation and renderer
to be created. For instance, a Transform node would be required to have:
    translation	- z must equal 0
    rotation	- vector must be 0 0 1 (rotations must be about z)
    scale 	- z must equal 1
    center 	- z must equal 0
    scaleOrientation	- vector must be 0 0 1
This leaves the "drawingOrder" field as the only remaining difference
between Transform and Transform2D. The manner in which this can be resolved
is described below.
2D Node Disposition
Based on the rationale given above, many 2D nodes can be removed from
sub-clauses 7.2.5.2 and 7.2.5.4 of CD 14496-1 without loss of functionality
or efficiency. The nodes that we believe can be eliminated and the
suggested replacement strategy for each is given below:
Background2D	Use the Background node. A 2D-only profile would only allow
the frontUrl to be specified. One of the items they need is a method of
allowing underlying stuff to show through. Currently, VRML forces sky as a
minimum. The interpretation of the Background node field skyColor when
empty would now be that the "sky" is transparent. This is compatible with
current IS 14772-1 which does not state an interpretation for the case of
this field being empty.
Coordinate2D	It may be possible to remove this node also by degenerating
the Coordinate node to handle 2D coordinates as well as 3D coordinates.
This can certainly be done using the QuantizationParameter node of the
binary encoding. Whether there a way of doing the same thing for the utf-8
encoding is being investigated.
DiscSensor	Use the CylinderSensor. Restricted in 2D-only profile to provide
functionality of the DiscSensor.
Group2D	Use Group.
IndexedFaceSet2D	Use IndexedFaceSet. The coordinate overhead is taken care
of by using Coordinate2D node in the coord field and a 2D-only profile can
make the concept of "solid" ignored.
IndexedLineSet2D	Use IndexedLineSet The coordinate overhead is taken care
of by using Coordinate2D node in the coord field.
Inline2D	Use Inline
LineProperties	See Material2D
Material2D	This node has a "filled" field, which causes IndexedFaceSet to
be filled or unfilled. But that duplicates the functionality of
IndexedLineSet. A filled flag is needed for Rectangle and Circle (and it is
assumed it applies to Curve2D as well). It is agreed that the Circle,
Rectangle, and Curve primitive need to be filled or unfilled. There are two
alternatives: separate nodes, like IndexedFaceSet and IndexedLineSet, or a
"filled" field in each node. The latter is better. This would be
conceptually similar to the "beginCap" field or the "solid" field, both of
which are contained in the geometric nodes they control.
	The lineStyle is a SFInt32 with 6 line styles (such as solid or
dashed-dotted-dotted). Other line styles which might be needed are not
supported. A better mechanism is to support dash definitions as described
in Technical Comment 44 of the main document. Also, this node defines the
line width, but not the join style and cap style of the lines. This is very
important once lines are wider than about 3 pixels.
.	It is agreed that a Circle, Rectangle, and Curve primitive are needed.
These primitives also need to be filled or unfilled. There are two
alternatives: separate nodes, like IndexedFaceSet and IndexedLineSet, or a
"filled" field in each node. The latter is better. There seems to be no
good reason for extra nodes, and this field is similar to the "beginCap"
field or the "solid" field, both of which are in the nodes they control.
However, the meaning of "filled" in the case of the Curve node must be
defined.
	Next is the issue of controlling the line style. The idea of a LineStyle
node (even a limited lineStyle field) is a good one, but it should be
referenced by the nodes using it, rather than the Material2D node. This
makes it look like the Text node, which references the FontStyle node.
	So, the following 3 nodes are required:
	    Circle {
      field SFFloat radius    1
      field SFNode  lineStyle NULL
      field SFBool  filled    TRUE
    }
   	    Curve {
      exposedField SFNode  point     NULL
      exposedField SFInt32 fineness  0
      field        SFNode  lineStyle NULL
      field        SFBool  filled    TRUE
    }
    	    Rectangle {
      field SFVec2f size      2 2
      field SFNode  lineStyle NULL
      field SFBool  filled    TRUE
    }
	Then the Material2D property should be removed and the Material property
be used in its place. For 2D-only profiles (which do not include lighting)
only the emissiveColor and transparency fields would be used.
LineProperties	This node should be renamed LineProperty to fit with the
rest of VRML.
PlaneSensor2D	Use PlaneSensor. Normal restrictions would apply for a
2D-only profile.
PointSet2D	Use PointSet
Position2Dinterpolator	Use PositionInterpolator
Proximity2Dsensor	Note that this node has NO description. In a 2D profile
without navigation, what use is a ProximitySensor? This node should be
removed. No replacement functionality is needed for a 2D-only profile.
Rectangle	Redesigned as above
ShadowProperties	Is this really necessary. Is it that much more expensive
to use reuse the Rectangle, Curve, etc. nodes transformed and with a
different Material? This would be especially inexpensive if they had the
ability to create a "ShadowedObject" PROTO.
Switch2D	Use Switch
Transform2D	Use Transform (see discussion on drawingOrder below).=20
If the nodes listed above are eliminated from sub-clauses 7.2.5.2 and
7.2.5.4 of CD 14496-1, the set of remaining 2D nodes is:
Circle
Curve
Form
Layout
LineProperty
Rectangle
VideoObject2D
Image2D
A new node should be added to sub-clause 7.2.5.4 to handle the issue of
drawing order. This would have semantics for both 2D and 3D scenes as=
 follows:
OrderedGroup {
  eventIn	MFNode	addChildren
  eventIn	MFNode	removeChildren
  exposedField	MFNode	children	[]
  exposedField	MFInt32	order	[]
  field	SFVec3f	bboxCenter	0 0 0
  field	SFVec3f	bboxSize	-1 -1 -1
}
This is simply a group with an extra "order" field. This field specifies
the desired drawing order, with one value per child. Children with the
lowest value are drawn first, highest are last. Children with the same
order value are drawn earliest child first. Any children without a value
they are drawn earliest child first. That makes the default order (empty)
draw children from first to last. For 2D scenes, this simply layers higher
ordered children on top of lower ordered children. For 3D scenes this
properly layers children with identical z values without producing z
tearing. For instance, this would allow a rectangle with a texture of a
painting to be placed on a wall without that rectangle z tearing. There are
well known algorithms for doing this in a z-buffer renderer (in fact OpenGL
has a special extension to handle it). OrderedGroup would perform the same
job as the "drawingOrder" field of Transform2D, but it would do it more
efficiently, and it would be useful in 3D scenes as well.

 2D/3D Node Disposition
The Layer2D and Layer3D nodes in sub-clause 7.2.5.4 should be replaced by a
single Layer node. It is not clear why the Layer3D node has "background",
"fog", "navigationInfo", and "viewpoint" fields. These should be handled by
normal VRML semantics of the nodes in the "children" field of the Layer
node. That is really the only difference between these nodes. Also, the
depth field should be replaced by the "order" concept from OrderedGroup. It
is much simpler for a parent node to have knowledge of the desired
rendering order of its children rather than having to traverse each child
twice, once to find out the rendering order and again to do the actual
rendering. So, the following node is defined to replace the Layer2D and
Layer3D nodes in the present subclauses 7.2.5.4.2.1 and 7.2.5.4.2.2:
Layer {
	field	SFNode	child	NULL
	exposedField	MFNode	childrenLayer	[]
	exposedField	SFVec2f	translation	0 0
	exposedField	SFVec2f	size	-1 -1
}
Note that the children MFNode has been replaced with a private "child"
SFNode field. This is because making this node have children makes it like
a Group node. As such it would need to have bbox and
addChildren/replaceChildren events. This simplifies that. If access to the
children is needed, the "child" field can be a Group node and access the
children obtained through that.

The Composite2DTexture and Composite3DTexture nodes in sub-clauses
7.2.5.4.2.3 and 7.2.5.4.2.4 should be combined as well. A better name for
this node would be SceneTexture since it makes it more obvious what it
does. The following defines this node:
SceneTexture {
	field	SFNode	child	NULL
	exposedField	SFVec2f	size	-1 -1
	field	SFBool	repeatS	TRUE
	field	SFBool	repeatT	TRUE
}
It is not clear why the CompositeMap is needed. If it is an optimisation to
place a SceneTexture onto a simply rectangle it seems to be overkill. It is
not clear how this would optimise the rendering of 2D objects into a 3D
scene. In order to maintain correct perspective you would pretty much need
to separately render and texture the result. Even if such an optimisation
were devised, how hard is it to detect that the geometry was a simply
rectangle (or whatever simplified geometric shape the optimisation could be
applied to. Sub-clause 7.2.5.4.2.5 defining CompositeMap should be removed.
To satisfy the requirement for an orthographic projection, an
OrthographicViewpoint node would be added. This node would have the
following definition:
OrthoViewpoint {
	eventIn	SFBool	set_bind
	exposedField	SFFloat	aspectRatio	0
	exposedField	SFFloat	height	2
	exposedField	SFBool	jump	TRUE
	exposedField	SFRotation	orientation	0 0 1 0	# [-1,1],(-,)
	exposedField	SFVec3f	position	0 0 10	# (-,)
	field	SFString	description	""
	field	SFBool	adjustViewport	FALSE
	eventOut	SFTime	bindTime
	eventOut	SFBool	isBound
}
This is simply a Viewpoint node with the added "adjustViewport" field, and
the "fieldOfView" field replaced with "aspectRatio" and "height". The
defaults would create a viewpoint where 0,0,0 would be in the centre of the
viewport and the corners of a perfectly square viewport would be (1,1) and
(-1,-1). A non-square viewport would adjust the width (and therefore the
corner coordinates) to be wider or narrower than the specified height
field. In all other ways (binding, description, etc.) this node would
behave just like a Viewpoint node. Note that I have left in the 3D position
and orientation fields. I think this is important for consistency. In a
pure 2D environment their default values would give reasonable results.
Changing the position, and even the orientation in a pure 2D environment
gives very useful effects and their overhead is quite minimal. Moreover,
they are needed for this node to be used in 3D.
The results of making these changes gives a re-written sub-clause 7.2.5.4.2
containing only three new sub-clauses, each defining one of these new nodes:
7.2.5.4.2.1 OrderedGroup
7.2.5.4.2.2 Layer
7.2.5.4.2.3 SceneTexture
Conclusions
The above 2D node architecture provides all the functionality of the
current CD 14496-1 design plus a bit more. It also removes the artificial
separation of 2D and 3D nodes. Removal of this artificial separation is a
primary goal of the VRML Consortium. Note: along with the above redesign,
the QuantizationParameter node would need a few added fields to take into
account optimisations to coordinate and transformation parameters.




Steve Carson
Chair, ISO/IEC JTC 1/SC 24
Computer Graphics and Image Processing
---------------------------------------------------------
Steve Carson                 phone:   +1-505-521-7399
GSC Associates Inc.          fax:     +1-505-521-9321
5272 Redman Road             e-mail:  carson@siggraph.org
Las Cruces, NM 88011 USA
---------------------------------------------------------
