Moving Worlds Design

A Proposal for VRML 2.0

Last modified: January 15, 1996. This document can be found at http://webspace.sgi.com/moving-worlds/Design.html

This document describes the "why" of the Moving Worlds VRML 2.0 proposal-- why design decisions were made, why things were changed from VRML 1.0. It is written for armchair VRML designers and for the people who will be implementing VRML 2.0.

It contains the following sections:

Simplifying the scene structure

There has been a lot of feedback from people implementing VRML 1.0 that the very general scene structure and property inheritance model of VRML 1.0 makes its implementation unnecessarily complex. Many rendering libraries (such as RealityLab, RenderMorphics, IRIS Performer) have a simpler notion of rendering state than VRML 1.0. The mismatch between these rendering libraries and VRML causes performance problems and implementation complexity, and these problems become much worse in VRML 2.0 as we add the ability to change the world over time.

To ensure that VRML 2.0 implementations are low-memory and high performance, the Moving Worlds VRML 2.0 proposal makes two major changes to the basic structure of the node hierarchy:

Shape properties (material, texture, shapeHints) are moved to become an integral part of the shape.
Transformation and Separator nodes are combined, so that a Transform defines a coordinate system relative to its parent.

To make this change, two new nodes are introduced (the Shape and Appearance nodes), several are removed (Translate, Rotate, Scale, Separator, and MatrixTransform), and a few nodes are changed (Transform, IndexedFaceSet); this change has the added benefit of making VRML simpler.

Design

The decisions on how to partition functionality into separate objects were motivated mainly by considerations of what should or should not be individually sharable. Sharing (DEF/USE in VRML 1.0, also known as 'cloning' or 'multiple instancing') is very important, since it allows many VRML scenes to be much smaller on disk (which means much shorter download times) and much smaller in memory.

One extreme would be to allow absolutely ANYTHING in the VRML file to be shared, even individual numbers of a multiple-valued field. Allowing sharing on that fine a level becomes an implementation problem if the values are allowed to change-- and the whole point of behaviors is to allow values in the scene to change. Essentially, some kind of structure must be kept for anything that can be shared that may also later be changed.

I considered allowing any field to be shared, but I believe that even that is too burdensome to implementations, since there may not be a one-to-one mapping between fields in the VRML file and the implementation's in-memory data structures.

VRML 1.0 allows nodes to be shared (via DEF/USE), and allowing sharing of any node seems reasonable, especially since events (the mechanism for changing the scene graph) are routed to nodes and because as much compatibility with VRML 1.0 as possible is one of the goals of the Moving Worlds proposal.

Shape

A new node type is introduced-- the Shape node. It exists only to contain geometry and appearance information, so that geometry+appearance may be easily shared. It contains only two fields; the geometry field must contain a geometry node (IndexedFaceSet, Cube, etc) and the appearance field may contain one or more appearance properties (Material, Texture2, etc):

Shape {
    field SFNode appearance
    field SFNode geometry
}

The three-way decomposition of shapes (Shape/Geometry/Appearance) was chosen to allow sharing of entire shapes, just a shape's geometry, or just the properties. For example, the pieces of a wooden chair and a marble table could be re-used to create a wooden table (shares the texture of the wooden chair and the geometry of the marble table) and/or to create multiple wooden chairs.

It is an error to specify the same property more than once in the appearance array, and doing so will result in undefined results.

Geometry

The existing VRML 1.0 geometry types are modified as necessary to include the geometric information needed to specify them. For example, a vertexData field is added to the IndexedFaceSet node to contain Coordinate3, TextureCoordinate2 and Normal nodes that define the positions, texture coordinates and normals of the IndexedFaceSet's geometry. In addition, the fields of the ShapeHints node are added to IndexedFaceSet.

These changes make it much easier to implement authoring tools that read and edit VRML files, since a Shape has a very well-defined structure with all of the information necessary to edit the shape contained inside of it. They also make VRML "cleaner"-- for example, in VRML 1.0 the only shape that pays attention to the ShapeHints node is the IndexedFaceSet. Therefore, it makes a lot of sense to put the ShapeHints information INSIDE the IndexedFaceSet.

Groups

Shapes and other "Leaf" classes (such as Cameras, Lights, Environment nodes, Info nodes, etc) are collected into a scene hierarchy with group nodes such as Transform and LOD. Group nodes may contain only other group nodes or leaves as children; adding an appearance property or geometry directly to a group node is an error.

VRML 1.0 had a complicated model of transformations; transformations were allowed as children of group nodes and were accumulated across the children. This causes many implementation problems even in VRML 1.0 with LOD nodes that have transformations as children; the addition of behaviors would only make those problems worse.

Allowing at most one coordinate transformation per group node results in much faster and simpler implementations. Deciding which group nodes should have the transformation information built-in is fairly arbitrary; obvious choices would be either "all" or "one". Because we believe that transformations for some of the group nodes (such as LOD) will rarely be useful and maintaing fields with default values for all groups will be an implementation burden, we have chosen "one" and have added the fields of the old VRML 1.0 Transform nodes to the Transform node:

Transform {
    field SFVec3f    translation         0 0 0
    field SFRotation rotation            0 0 1  0
    field SFVec3f    scaleFactor         1 1 1
    field SFRotation scaleOrientation    0 0 1  0
    field SFVec3f    center              0 0 0
    field SFVec2f    textureTranslation  0 0
    field SFFloat    textureRotation     0
    field SFVec2f    textureScaleFactor  1 1
    field SFVec2f    textureCenter       0 0
}

These allow arbitrary translation, rotation and scaling of either coordinates or texture coordinates.

Side note: we are proposing that the functionality of the MatrixTransform node NOT be supported, since most implementations cannot correctly handle arbitrary 4x4 transformation matrices. We are willing to provide code that decomposes 4x4 matrices into the above form, which will take care of most current uses of MatrixTransform. The minority of the VRML community that truly need arbitrary 4x4 matrices can define a MatrixTransform extension with the appropriate field.

Classes

The nodes that can appear in a world are grouped into the following categories:

Groups:: Transform, LOD, Switch, WWWAnchor; Group nodes may ONLY have other groups or leaves as children.
Leaves:: Shape, Lighting, Cameras, Info-type nodes (Info, WorldInfo, etc); Leaf nodes are things that exist in one or more coordinate systems (defined by the groups that they are part of)
Geometry:: IndexedFaceSet, IndexedLineSet, PointSet, Sphere, Cube, etc; Geometry nodes are contained inside Shape nodes. They in turn contain geometric properties
Geometric properties:: Coordinate3, Normal, TextureCoordinate2; Are contained inside geometry.
Appearance properties:: Material, Texture2; Are contained inside Appearance nodes, which are contained within Shapes, and define the shape's appearence.
Geometric Sensors:: ClickSensor, PlaneSensor; Are contained inside Transforms, and generate events with respect to the Transform's coordinate system and geometry.
WWWInline:: WWWInline cuts across all of the above categories (assuming that it is useful to externally reference any of the above).
Nodes: All of the above, plus TimeSensors and Script nodes, which are not part of the world's transformational hierarchy.; Nodes contain data (stored in fields), and may be prototyped and shared.

Why Appearance in a separate node?

Bundling properties into an Appearance node simplifies sharing, decreases file and run-time bloat and mimics modelling paradigms where one creates a palette of appearances ("materials") and then instances them when building geometry. Without Appearances, there is no easy way of creating and identifying a "shiny wood surface" that can be shared by the kitchen chair, the hardwood floor in the den, and the Fender Strat hanging on the wall.

Another major concern of VRML in general and the Appearance node in particular is expected performance of run-time implementations of VRML. It is important for run-time data structures to closely correspond to VRML; otherwise browsers are likely to maintain 2 distinct scene graphs, wasting memory as well as time and effort in keeping the 2 graphs synchronized.

The Appearance node offers 2 distinct advantages for implementations:

Memory is saved in the Shape node since it has only a single pointer to an Appearance node rather than many pointers to individual property nodes. In my experience there are orders-of-magnitude more Shape nodes than Appearance nodes so any memory bloat in Shape is a problem. As VRML expands to include more property nodes, Shape bloat becomes even more of an issue.
The Appearance node facilitates state sorting and other optimizations that are applicable to both hardware and software implementations. For example, an implementation can quickly determine if 2 shapes have the same appearance by checking for pointer equality rather than comparing every property reference of a shape. As another exmple, the Appearance node offers a good place to maintain a cache which in the case of a software implementation may be a pre-wired path of rendering modules.

Prototypes

There are several different ways of thinking about prototypes:

An extensibility mechanism that allows a new node to be defined in terms of other, pre-defined nodes. Prototypes can replace all of the existing VRML 1.0 extension mechanisms (isA and fields[]).
A protection mechanism that allows an author to limit what can be done to an object. EXTERNPROTO allows authors to specify how their worlds/objects/behaviors can be used within other worlds/objects/behaviors.
An object-definition mechanism that allows somebody to define objects with specific (standardized) operations allowed. Prototypes allow application-specific policies to be imposed on the general scene structure.
A convenience mechanism that allows geometry and/or behavior to be packaged in a convenient way.
An optimization mechanism that allows browsers to reason about which object can or cannot be changed.
A bandwith-saving mechanism that allows the definition of a world structure to be defined once and re-used multiple times.

The prototype declaration

A prototype's interface is declared using one of the following syntaxes:

PROTO name [ field    fieldType name defaultValue
             eventIn  fieldType name
             eventOut fieldType name
           ] { implementation }
EXTERNPROTO name [ field    fieldType name
                   eventIn  fieldType name
                   eventOut fieldType name
                 ] URL(s)

(there may be any number of field/eventIn/eventOut declarations in any order).

A prototype just declares a new kind of node; it does not create a new instance of a node and insert it into the scene graph, that must be done by instantiating a prototype instance.

First, why do we need to declare a prototype's interface at all? We could just say that any fields, eventIns or eventOuts of the nodes inside the prototypes implementation exposed using the IS construct (see below) are the prototype's interface. As long as the browser knows the prototype's interface it can parse any prototype instances that follow it.

The declarations are necessary for EXTERNPROTO because a browser may not be able to get at the prototype's implementation. Also requiring them for PROTO makes the VRML file both more readable (it is much easier to see the PROTO declaration rather than looking through reams of VRML code for nodes with IS) and makes the syntax more consistent.

Default values must be given for a prototype's fields so that they always have well-defined values (it is possible to instantitate a prototype without giving values for all of its fields, just like any other VRML node). Default values must not be specified for an EXTERNPROTO because the default values for the fields will be defined inside the URL that the EXTERNPROTO refers to.

EXTERNPROTO refers to one or more URLs, with the first URL being the preferred implementation of the prototype and any other URLs defining less-desireable implementations. Browsers will have to be able to deal with the possibility that an EXTERNPROTO's implementation cannot be found because none of the URL's are available (or the URL array is empty!); browsers may also decide to "delay-load" a prototype's implementation until it is actually needed (like they do for the VRML 1.0 WWWInline node).

Browsers can properly deal with EXTERNPROTO instances without implementations. Events will never be generated from such instances, of course, so that isn't a problem. The browser can decide to either throw away any events that are routed to such an instance or to queue them up until the implementation does become available. If it decides to queue them up, the results when they're finally processed by the prototype's implementation could be indeterminate IF the prototype generates output events in response to the input events. A really really smart browser could deal with this case by performing event rollback and roll-forward, re-creating the state of the world (actually, only the part of the world that can possibly be influenced by the events generated from the prototype need to be rolled forward/back) when the events were queued and "re-playing" input events from there.

The fields of a prototype are internal to it, and a browser needs to know their current and default values only to properly create a prototype instance. Therefore, if the browser cannot create prototype instances (because the prototype implementation is not available) the default values of fields aren't needed. So, EXTERNPROTO provides all the information a browser needs.

The prototype implementation

The prototype's implementation is surrounded by curly braces to separate it from the rest of the world. A prototype's implementation creates a new name scope -- any names defined inside a prototype implementation are available only inside that prototype implementation. In this way a prototype's implementation can be thought of as if it is a completely separate file. Which, of course, is exactly what EXTERNPROTO does.

There's an interesting issue concerning whether or not things defined outside the prototype's implementation can be USEd inside of it. I think that defining prototypes such that they are completely self-contained (except for the information passed in via eventIn or field declarations) is wisest.

The node type of a prototype is the type of the first node of its implementation. So, for example, if a prototype's implementation is:
{ IndexedFaceSet { ... } }
Then the prototype can only be used in the scene wherever an IndexedFaceSet can be used (which is in the geometry field of a Shape node). The extra curly braces allow Scripts, TimeSensors and ROUTES to be part of the prototype's implementation, even though they're "off to the side" of the prototype's scene graph.

The IS syntax for specifying what is exposed inside a prototype's implementation was suggested by Conal Elliott of Microsoft. It was chosen because:

it removed some ambiguities that could arise about what a prototype's field's default values were in a previous syntax
it doesn't require nodes to be given names just to expose them in the prototype
it allows fan-in (one eventIn or field going to multiple nodes in the prototype implementation)
it matches traditional programming languages better

Instantiating a prototype

Once a PROTO or EXTERNPROTO has been declared, a prototype can be instantiated and treated just like any built-in node. In fact, built-in nodes can just be treated as if there are a set of pre-defined PROTO definitions available at start-up in all VRML browsers.

Each prototype instance is independent from all others-- changes to one instance do not affect any other instance. Conceptually, each prototype instance is equivalent to a completely new copy of the prototype implementation.

However, even though prototype instances are conceptually completely separate, they can be implemented so that information is automatically shared between prototype instances. For example, consider this PROTO:

PROTO Foo [ eventIn SFVec3f changeTranslation ] {
    Transform {
        translation IS changeTranslation
        Shape {
           ... geometry+properties stuff...
        }
    }
}

Because the translation of the Transform is the only thing that can possibly be changed, either from a ROUTE or from a Script node, only the Transform needs to be copied. The same Shape node may be shared by all prototype instances.

Script nodes that contain SFNode/MFNode fields (or may receive SFNode/MFNode events) can be treated in a similar way; for example:

PROTO Foo [ eventIn SFFloat doSomething ] {
   DEF Root Transform {
      ... stuff ...
   }
   DEF MyScript Script {
      eventIn doIt IS doSomething
      field SFNode whatToAffect USE Root
        ... other script stuff...
   }
}

In this case, a brand-new copy of everything inside Foo will have to be created for every prototype instance because MyScript may modify the Root Transform or any of it children using the script API. Of course, if some of the Transform's children are prototype instances the browser might be able to optimize them.

Issue: If we can get users to use something like this prototype definition, browsers might have more opportunities for optimization:

# A Transform that cannot be changed:
#
PROTO ConstantTransform [
       field MFNode children
       field SFVec3f translation 0 0 0 ... etc for other fields...
   ] {
       Transform { children IS children
                   translation IS translation  ... etc ...
       }
}

I can imagine variations on the above-- Transforms with transformations that can be changed, but children that can't, transformations that can't but children that can, etc.

Extensibility

By extending the syntax of a URL in an EXTERNPROTO, all of the current and proposed extensibility mechanisms for VRML can be handled (credit for these ideas go to Mitra).

The idea is to use the URL syntax to refer to an internal or built-in implementation of a node. For example, imagine your system has a Torus geometry node built-in. The idea is to use EXTERNPROTO to declare that fact, like this:

EXTERNPROTO Torus [ field SFFloat bigRadius
                    field SFFloat smallRadius ]
  "internal:Torus"

URL's of the form "internal:name" tell the browser to look for a "native" implementation (perhaps searching for the implementation on disk, etc).

Just as in any other EXTERNPROTO, if the implementation cannot be found the browser can safely parse and ignore any prototype instances.

The 'alternateRep' notion is handled by specifying multiple URLs for the EXTERNPROTO:

EXTERNPROTO Torus [ field SFFloat bigRadius
                    field SFFloat smallRadius ]
  [ "internal:Torus", "http://machine/directory/protofile" ]

So, if a "native" implementation of the Torus can't be found, an implementation is downloaded from the given machine/directory/protofile-- the implementation would probably be an IndexedFaceSet node with a Script attached that computes the geometry of the torus based on bigRadius and smallRadius.

The 'isA' notion of VRML 1.0 is also handled using this mechanism. The ExtendedMaterial example from the VRML 1.0 spec:

ExtendedMaterial {
  fields [ MFString isA, MFFloat indexOfRefraction,
           MFColor ambientColor, MFColor diffuseColor,
           MFColor specularColor, MFColor emissiveColor,
           MFFloat shininess, MFFloat transparency ]
  isA [ "Material" ]
  indexOfRefraction .34
  diffuseColor .8 .54 1
}

becomes:

PROTO ExtendedMaterial [
   field MFFloat indexOfRefraction 0
   field MFColor ambientColor [ 0 0 0 ]
   field MFColor diffuseColor [ .8 .8 .8 ]
     ... etc, rest of fields... ]
{
    Material {
       ambientColor IS ambientColor
       diffuseColor IS diffuseColor
       ... etc ...
    }
}

ExtendedMaterial {
    indexOfRefraction .34
    diffuseColor .8 .54 1
}

This nicely cleans up the rules about whether or not the fields of a new node must be defined only the first time the node appears inside a file or every time the node appears in the file (the PROTO or EXTERNPROTO must appear one before the first node instance). And it makes VRML simpler.

Why Routes?

Several different architectures for applying changes to the scene graph were considered before settling on the ROUTE syntax. This section documents the arguments for and against the alternative architectures.

All-API architecture

One alternative is to try to keep all behaviors out of VRML, and do everything inside the scripting API.

In this model, a VRML file looks very much like a VRML 1.0 file, containing only static geometry. In this case, instead of loading a .wrl VRML file into your browser, you would load some kind of .script file that then referenced a .wrl file and then proceeded to modify the objects in the .wrl file over time. This is similar to conventional programming; the program (script) loads the data file (VRML .wrl file) and then proceeds to make changes to it over time.

One advantage of this approach is that it makes the VRML file format simpler. A disadvantage is that the scripting language may need to be more complex.

The biggest disadvantage, however, is that it is difficult to achieve good optimizibility, scalability and composability-- three of our most important goals.

In VRML 1.0, scalability and composability are accomplished using the WWWInline node. In an all-API architecture, some mechanism similar to WWWInline would have to be introduced into the scripting language to allow similar scalability and composability. That is certainly possible, but putting this functionality into the scripting language severely affects the kinds of optimizations that browsers are able to perform today.

For example, the browser can pay attention to the direction that a user is heading and pre-load parts of the world that are in that direction if the browser knows where the WWWInline nodes are. If the WWWInline concept is moved to the scripting language the browser probably will NOT know where they are.

Similarly, a browser can perform automatic behavior culling if it knows which parts of the scene may be affected by a script. For example, imagine a lava lamp sitting on a desk. There is no reason to simulate the motion of the blobs in the lamp if nobody is looking at it-- the lava lamp has a completely self-contained behavior. In an API-only architecture, it would be impossible for the browser to determine that the behavior was self-contained; however, with routes, the browser can easily determine that there are no routes into or out of the lava lamp, and that it can therefore be safely behavior culled. (side note: we do propose flags on Scripts for cases in which it is important that they NOT be automatically culled).

Another disadvantage to this approach is that it allows only re-use of geometry. Because the behaviors must directly load the geometry, it is impossible to "clone" a behavior and apply it to two different pieces of geometry, or to compose together behavior+geometry that can then be re-used several times in the same scene.

The disconnect between the VRML file and the script file will make revision control painful. When the VRML file is changed, the script may or may not have to be changed-- in general, it will be very difficult for a VRML authoring system to maintain worlds with behaviors. If the VRML authoring system cannot parse the scripting language to find out what it referrs to in the VRML file, then it will be impossible for the authoring system to ensure that behaviors will continue to work as the VRML file is edited.

All-VRML architecture

Another alternative is to extend VRML so that it becomes a complete programming language, allowing any behavior to be expressed in VRML.

The main disadvantage to this approach is that it requires inventing Yet Another Scripting Language, and makes implementation of a VRML browser much more complicated. If the language chosen is very different from popular languages, there will be very few people capable of programming it and very little infrastructure (classes, books, etc) to help make it successful.

Writing a VRML authoring system more sophisticated than a simple text editor becomes very difficult if a VRML file may contain the equivalent of an arbitrary program. Creating ANY VRML content becomes equivalent to programming, which will limit the number of people able to create interesting VRML worlds.

The main advantage to an all-VRML architecture is the opportunity for automatic optimizations done by the browser, since the browser knows everything about the world.

Routes and Script nodes architecture

The alternative we chose was to treat behaviors as "black boxes" (Script nodes) with well-defined interfaces (routes and fields).

Treating behaviors as black boxes allows any scripting language to be used (Java, VisualBasic, ML, whatever) without changing the fundamental architecture of VRML. Implementing a browser becomes much easier because only the interface between the scene and the scripting language needs to be implemented, not the entire scripting language.

Expressing the interface to behaviors in the VRML file allows an authoring system to intelligently deal with the behaviors, and allows most world creation tasks to be done with a graphical interface. A programming editor only need appear when a sophisticated user decides to create or modify a behavior (opening up the black box, essentially). The authoring system can safely manipulate the scene hierarchy (add geometry, delete geometry, rename objects, etc) without inadvertently breaking connections to behaviors.

The existing VRML composability and scalability features are retained, and because the possible effects of a behavior on the world are known to the browser, most of the optimizations that can be done in an all-VRML architecture can still be done.

Implementing and Optimizing routes

This section gives some "thumb-nail" design for how a browser might decide to implement routes. It points out some properties of the routes design that are not obvious at first glance and that can make an implementation of routes simple and efficient.

There doesn't need to be any data copying at all as an event "travels" along a route. In fact, the event doesn't need to "travel" at all-- the ROUTE is really just a re-naming from the eventIn to the eventOut that allows the composability, authorability, extensibility and scalability that are major goals of the Moving Worlds design.

The data for an event can be stored at the source of the event-- with the "eventOut". The "eventIn" doesn't need to store any data, because it is impossible to change an "eventIn"-- it can just point to the data stored at the "eventOut". That means that moving an event along a ROUTE can be as cheap as writing a pointer. In fact, in the VERY common case in which there is no "fan-in" (there aren't multiple eventOut's routed into a single eventIn) NO data copying at all need take place-- the eventIn can just point to eventOut since that eventOut will always be the source of its events.

Exposed fields-- fields that have corresponding eventOut's-- can share their value between the eventOut and the field itself, so very little extra overhead is imposed on "exposed" fields. Highly optimized implementations of nodes with exposed fields could store the data structures needed to support routes separately from the nodes themselves and use a dictionary mapping node pointers to routing structures, adding NO memory overhead for nodes that do not have routes coming into or out of them (which is the common case).

Because the routing structures are known to the browser, many behavior-culling optimizations are possible. A two-pass notification+evaluation implementation will automatically cull out any irrelevant behaviors without any effort on the part of the world creator. The algorithm works by delaying the execution of behaviors until their results are necessary, as follows:

Imagine a TimeSensor that sends alpha events to a Script that in turn sends setDiffuseColor events to an object, to change the object's color over time. Allocate one bit along each of these routes; a "dirty bit" that determines whether or not changes are happening along that route. The algorithm works as follows:

Time changes. All routes from the TimeSensor are marked dirty, all the way through the routing network (from the TimeSensor, to the Script, to the object who's material we're changing). This "notification" process can stop as soon as a route that has already been marked "dirty" is reached. Most browsers will probably let notification continue up through the children[] MFNode fields of groups; if the notification eventually reaches the root of the scene the the browser will know that the scene must be redrawn.
When (or before) the browser redraws the scene, any object that will be drawn that has a route to it with its dirty bit set must be re-evaluated by evaluating whatever is connected to it up-stream. When a route is evaluated, its dirty bit is cleared.

This two-pass "push notification / pull events" algorithm has several nice properties:

If an object cannot be perceived, its behavior does not need to be run. In the example of the changing material, if the object is never drawn, its material will not be needed. The TimeSensor may change, but the route from the TimeSensor to the Script will be dirty, so notification will stop right away. Note that TimeSensor is carefully defined such that only the last "time changed" event is guaranteed to be available, so you don't even need to queue up all of the "time changed" events. In the case of a ClickSensor or something else where every event might be important, either events will have to be queued up OR scripts will have to be run as soon as there is more than one event waiting.
If the behavior is affecting something like a transformation, then it will automatically always get run IF that transformation is needed to compute the perceptibility of the object. If a maximum possible bounding box can be specified by the world creator, then even behaviors that affect transformations need not get run.
If the behavior is affecting something that cannot be seen, the scene doesn't need to be redrawn. And it won't be, because the notification process will stop at the first "dirty" route and will never reach the root of the scene graph.
If the view changes so that something now CAN be seen, the scene will be redraw (because the view changed) and portions of the scene that were marked "dirty" that are now visible will be "clean".
If you have to shadow things that are changing in your rendering library, the dirty bits will tell you exactly what changed so you can update your rendering library.

Scene Graph? WHAT Scene Graph?

Moving Worlds has been carefully designed so that a browser will only need to keep the parts of the VRML scene graph that might be changed. There is a tradeoff between world creators who want to have control over the VRML scene graph structure and browser implementors who also want complete control over the VRML scene graph structure; Moving Worlds is designed to compromise between these two, allowing world creators to impose a particular structure on selected parts of the world while allowing browsers to optimize away the rest.

One example of this is the routing mechanism. Consider the following route:

Shape {
    appearance Appearance {
        material DEF M Material { ... }
    geometry Cube { }
}
ROUTE  MyAnimation.color -> M.setDiffuseColor

A browser implementor might decide not to maintain the Material as a separate object, but instead to route all setDiffuseColor events directly to the relevant shape(s). If the Material was used in several shapes then several routes might need to be established where there was one before, but as long as the visual results are the same the browser implementor is free to do that.

There is a potential problem if some Script node has a pointer to or can get a pointer to the Material node. In that case, there _will_ need to be at least a stand-in object for the Material (that forwards events on to the appropriate shapes) IF the Script might directly send events to what it thinks is the Material node. However, Script nodes that do this MUST set the "directOutputs" flag to let the browser know that it might do this. And the browser will know if any Script with that flag set can get access to the Material node, because the only way Scripts can get access to Nodes is via a field, an eventIn, or by looking at the fields of a node to which it already has access.

World creators can help browsers by limiting what Script nodes have access to. For example, a browser will have to maintain just about the entire scene structure of this scene graph:

DEF ROOT Transform {
    children [
          Shape { ... geometry Sphere{ } },
          Transform {
             ... stuff ...
          }
    ]
}
Script {
    directOutputs TRUE
    field SFNode whatToChange USE ROOT
    ...
}

Because the Script has access to the root of the scene, it can get the children of that root node, send them events directly, add children, remove children, etc.

However, this entire scene can be optimized below the Transform, because the browser KNOWS it cannot change:

PROTO ConstTransform [ field MFNode children ] {
    Transform { children IS children }
}
DEF ROOT ConstTransform {
    children [
          Shape { ... geometry Sphere{ } },
          Transform {
             ... stuff ...
          }
    ]
}
Script {
    unknownOutputs TRUE
    field SFNode whatToChange USE ROOT
    ...
}

Because of the prototype interface, the browser KNOWS that the Script cannot affect anything inside the ConstTransform-- the ConstTransform has NO exposed fields or eventIn's. If the ConstTransform doesn't contain any sources of changes (Sensors or Scripts), then the entire subgraph can be optimized away-- perhaps stored ONLY as a display list for a rendering library, or perhaps collapsed into a "big bag of triangles" (also assuming that there are no LOD's, of course).

The other nice thing about all this is that a PROTO or EXTERNPROTO (or WWWInline, which is pretty much equivalent to a completely opaque prototype) can be optimized independently of everything else, and the less control an author gives over how something might be changed, the more opportunities for optimizations.

Transforms, Events, NodeReference

The children of a Transform (or other group node) are kind of strange-- they aren't specified like fields in the VRML 1.0 syntax.

Issue: They could be-- they are functionally equivalent to an MFNode field. For example, this:

# Old syntax?
Transform {
    Transform { ... }
    Transform { ... }
}

is equivalent to the slightly wordier:

# New syntax?
Transform {
  children [
    Transform { ... } ,
    Transform { ... } 
  ]
}

... where "children" is an MFNode field. The issue is whether or not we should keep the VRML 1.0 syntax as a convenient short-hand that means the same as the wordier syntax. The advantages are that it would make the VRML file syntax easier to parse and would eliminate some ambiguities that can arise if fields and nodes are allowed to have the same type names. The disadvantages are that it would make VRML files slightly bigger, is less convenient to type in, and is a change from VRML 1.0 syntax.

In any case, to allow grouping nodes to be used as prototypes and to allow them to be seen in the script API, their children must "really" be an MFNode field. So a Transform might be specified as:

PROTO Transform [
    field SFVec3f translation 0 0 0
    eventIn SFVec3f setTranslation
    eventOut SFVec3f translationChanged
        ... etc for the other transformation fields...
    field MFNode children [ ]
    eventIn MFNode setChildren
    eventOut MFNode childrenChanged
] ...

Specifying events corresponding to the children field implies that the children of a Transform can change-- that the structure of the scene can be changed by behaviors.

Setting all of the children of a Transform at once (using setChildren) is inconvenient; although not strictly necessary, the following might be very useful:

    eventIn MFNode addChildren
    eventIn MFNode removeChildren

Sending an addChildren event to the Transform would add all of the children in the message to the Transform's children. Sending a removeChildren event would remove all of the children in the message (little tiny issue: maybe SFNode addChild/removeChild events would be better?).

The Transform node's semantics were carefully chosen such that the order of its children is irrelevant. That allows a lot of potential for implementations to re-order the children either before or during rendering for optimization purposes (for example, draw all texture-mapped children before all non-texture mapped children, or sort the children by which region of space they're in, etc). The addChildren/removeChildren events maintain this property-- anything using them doesn't need to concern itself with the order of the children.

A previous version of Moving Worlds had a node called "NodeReference" that was necessary to allow nodes to be inserted as children into the scene. Exposing the children of groups as MFNode fields eliminates the need for something like NodeReference.

Script node: Minimal API

This section describes the API from the point of view of somebody using VRML to create behaviors. At least the following functionality will be necessary:

init/destroy/processEvents

The browser must call the user's init routine before calling processEvents or destroy.

The processEvents routine may be called any time between init and destroy, and will usually process all waiting events and generate events and/or modify the Script node's fields.

The browser must call the destroy routine to allow the script an opportunity to do cleanup. After destroy is called, processEvents must not be called until after another init is done.

get/set fields

The fields of a script node must be accessible from the API. That implies that the VRML field types (SFFloat, MFFloat, etc) must somehow be exposed in the API.

send/receive events

The processEvents routine must have access to a list of events received from things routed to it. Each event will have:

Name
Type (any of the field types)
Value (same as field value) and API to get/set the event's contents (both get and set for events that will be output, only get for events that come in)
Timestamp

Synchronization API

To support scripting languages such as Java which allow the creation of asynchronous processes (threads), some mechanism for synchronizing with the browser when changing the Script's fields and generating events is necessary. At the very least, a mechanism to "bracket" or "bundle up" a set of changes is necessary.

Script node: Node API

Once a Script node has access to an SFNode or an MFNode value (either from one of the Script's fields, or from an eventIn that sends the script a node), we must decide what operations a script can perform on them. A straw-man proposal:

get/set "exposed" fields

For any field"foo" of the node that has both a "setFoo" eventIn and a "fooChanged" eventOut, allow that field to be directly set and get. There should be API to get the list of exposed fields, of a given node.

get list of eventIn/eventOut

Given a node, there should be a way of determining what events it can send and receive.

establish routes

There should be some way of establishing a route from within a Script, assuming the script has somehow gotten access to the nodes on both ends of the route.

"compileVRML"

An API call that allows VRML file format contained in a string to be "compiled" into a Node from inside a script, allowing a Script node to receive file format from over the network, for example.

communication with the browser

The API must provide methods by which a Script node can communicate with the browser to request operations such as loading a new URL as the world (to allow WWWAnchor-like functionality controlled by a Script), to get the current "simulation" time (which may be different from the current wall-clock time), etc.

Convenience method: search by name/type

Search for nodes by name or by type "under" a given node. Assuming that the children of group nodes is exposed in the API as an MFNode field called "children", this is really a short-hand convenience way of performing something like:

Node search(Node startingNode, ...criteria...) {
   for all fields of startingNode {
      if field type is SFNode {
         Node kid = contents of field
         if kid matches criteria, return kid
         else {
            Node Found = search(kid, criteria)
            if (Found != NULL) return Found
         }
      }
      else if field type is MFNode, for all values i {
         Node kid = value[i]
         if kid matches criteria, return kid
         else {
            Found = search(kid, criteria)
            if (Found != NULL) return Found
         }
      }
    }
    return NULL
}

Throughout this discussion I'm assuming that access to prototyped nodes is restricted by the prototype's interface. That will allow implementations to know what can and what can't change, which will enable many optimizations.

Materials

The VRML 1.0 material specification is more general than currently supported by most 3D rendering libraries and hardware. It is also fairly difficult to explain and understand; a simpler material model will make VRML 2.0 both easier to understand and easier to implement.

First, the notion of per-vertex or per-face materials/colors should be moved from the Material node down into the geometric shapes that support such a notion (such as IndexedFaceSet). Doing this will make colors more consistent with the other per-vertex properties (normals and texture coordinates) and will make it easier for browsers to ensure that the correct number of colors has been specified for a given geometry, etc.

The new syntax for a geometry such as IndexedFaceSet will be:

IndexedFaceSet {
  exposedField  SFNode  coord             NULL
  exposedField  SFNode  color             NULL
  exposedField  SFNode  normal            NULL
  exposedField  SFNode  texCoord          NULL
  ...
}

A new node, similar to the Normal/TextureCoordinate2 nodes, is needed for the color field. It is often useful to define a single set of colors to function as a "color map" that is used by several different geometries, so the colors are specified in a separate node that can be shared. That node will be:

Color {
    exposedField MFColor rgb [ ]       # List of rgb colors
}

The material parameters in the material node would all be single-valued, and I suggest that the ambientColor term be removed:

Material {
  exposedField SFColor diffuseColor  0.8 0.8 0.8
  exposedField SFColor specularColor 0 0 0
  exposedField SFColor emissiveColor 0 0 0
  exposedField SFFloat shininess     0.2
  exposedField SFFloat transparency  0
}

If multiple colors are given with the geometry, then the they either replace the diffuse component of the Material node (if the material field of the Appearance node is not NULL) or act as an "emissive-only" source (if the material field of the Appearance node is NULL).

Issue: The colors in a VRML SFImage field are RGBA-- RGB plus transparency. Perhaps we should allow SFColor/MFColor fields to be specified with 1, 2, 3 or 4 components to be consistent with SFImage. That would get rid of the transparency field of Material, allow transparency per-face or per-vertex, and would allow compact specification of greyscale, greyscale-alpha, RGB, and RGBA colors. However, that might cause problems for the behavior API and would make parsing more complicated.

Simplified Bindings

Another complicated area of VRML 1.0 are all of the possible bindings for normals and materials-- DEFAULT, OVERALL, PER_PART, PER_PART_INDEXED, PER_FACE, PER_FACE_INDEXED, PER_VERTEX, and PER_VERTEX_INDEXED. Not all bindings apply to all geometries, and some combinations of bindings and indices do not make sense.

A much simpler specification is possible that gives equivalent functionality:

IndexedFaceSet {
  ...
  field         MFInt32 coordIndex        [ ]
  field         MFInt32 colorIndex        [ ]
  field         SFBool  colorPerFace      FALSE
  field         MFInt32 normalIndex       [ ]
  field         SFBool  normalPerFace     FALSE
  field         MFInt32 texCoordIndex     [ ]
  ...
}

The existing materialBinding/normalBinding specifications are replaced by simple booleans that specify whether colors or normals should be applied per-vertex or per-face. If indices are specified, then they are used. If they are not specified, then either the vertex indices are used (if per-vertex normals/colors), OR the normals/colors are used in order (if per-face).

In more detail:

If normals/colors are NOT specified, the browser should:
-- normals: generate automatically.
-- colors: just use Material specification from Appearance (like VRML 1.0 OVERALL)
If normals/colors are specified (non-NULL normal/color field):
- If normal/colorPerFace is TRUE:
  - If Index is not empty, then use indices to choose normals/colors per face
    (equivalent to VRML 1.0 PER_FACE_INDEXED)
  - Else use the normals/colors in order
    (equivalent to VRML 1.0 PER_FACE)
- If normal/colorPerFace is FALSE:
  - If Index is not empty, then use indices to choose normals/colors per vertex
    (equivalent to VRML 1.0 PER_VERTEX_INDEXED)
  - Else choose normals/colors using the coordIndex indices
    (also equivalent to VRML 1.0 PER_VERTEX_INDEXED)

Texture coordinates do not have a PerFace flag, because texture coordinates are always specified per vertex. The rules for texture coordinates are the same as for per-vertex colors/normals: if texCoordIndex is empty, the vertex indices in coordIndex are used.

IndexedLineSet would add color and colorPerSegment fields, with similar rules to IndexedFaceSet. PointSet would need only a color field (OVERALL color if empty, otherwise color-per-point). The shapes that allow PER_PART colors in VRML 1.0 (Cylinder, Cone) would also only need a color field (PER_PART colors if specified, OVERALL otherwise).

Comparison with VRML 1.0: if all of the possibilities are written out, the only binding missing is the VRML 1.0 PER_VERTEX binding, which ignores the Index fields and just takes colors/normals in order for each vertex of each face. For example, in VRML 1.0 if the coordIndex array contained [ 10, 12, 14, -1, 11, 13, 10, -1 ] (two triangles with one shared vertex), then the PER_VERTEX binding is equivalent to a PER_VERTEX_INDEXED binding with indices [ 0, 1, 2, -1, 3, 4, 5, -1 ] -- that is, each positive entry in the coordIndex array causes another color/normal to be taken from their respective arrays. VRML 1.0 files with PER_VERTEX bindings that are converted to VRML 2.0 will be somewhat larger, since explicit indices will have to be generated.