David R. Nadeau, John L. Moreland
San Diego Supercomputer Center
(SDSC)
Permission to copy without fee all or part of this material is granted
provided that the copies are not made or distributed for direct commercial
advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the
Association of Compouting Machinery. To copy otherwise, or to republish,
requires a fee and/or specific permission.
(c) 1995 ACM The wide applicability of VRML makes efforts to standardize on a few built-in
behaviors impractical. Instead, a general-purpose behavior description mechanism
is needed. Several behavior languages have been proposed, including Perl
[12], TCL [7], and Java [2]. In each case, a behavior script is written
in one of these languages and bound to a VRML file. When the VRML file is
downloaded into the browser, the script is run. Thereafter, as users interact
with the scene, the script responds and alters scene content. Later, when the
user leaves the scene or quits the browser, the behaviors are flushed from the
system.
To implement such a system requires that several key issues be addressed:
2. When and how are behaviors started, stopped, and flushed from the
system?
3. How do behaviors and the browser communicate? How do behaviors respond
to user interaction and automatic events?
4. In what language are behaviors written?
Issues 1, 2, and 3 are independent of the choice of a behavior language. It
is these issues that are addressed in this paper and by SDSC's prototype
Virtual Reality Behavior System (VRBS) (pronounced "verbs"). Issue 4,
dealing with a language for behaviors, is discussed in [3].
Section 2 below outlines the VRBS system structure. Section 3 discusses the
VRBS WWWScript extension to VRML. Sections 4 through 8 describe and
discuss the VRBS communications protocol. Sections 9 and 10 return to the system
structure. Finally, sections 11 through 14 discuss future directions.
* If separate, which is started up first, how and when are the remaining
components started, how and when are they shutdown, and what happens if
there's a problem?
* Which component does what task?
* Can individual components be replaced with alternative implementations
without affecting the others?
* A behavior interpreter
* One or more VRML world files
* One or more behavior script files
* This paper to appear in the proceedings of VRML 95, the First Annual
Symposium on the Virtual Reality Modeling Language, December 13-15, San Diego,
CA USA. The proceedings are published by ACM.
Abstract
VRML [1] provides constructs for
controlling the shape, appearance, position, size, and orientation of 3D
objects. It does not include constructs to change these features in response to
user interaction or automated actions. Such behaviors are clearly needed.
To provide the flexibility and power needed for complex behaviors, a scripting
language is necessary. Several possible languages have been proposed by the VRML
community. A common denominator, however, is the need for an interprocess
communications protocol that links an executing behavior script to a VRML
browser. This paper discusses the prototype Virtual Reality Behavior System
(VRBS) behavior communications protocol and system structure in development
at SDSC.
1. Introduction
A behavior is the
description of a response to a user interaction or an automatic event (such as a
timer alarm going off). Such a response may cause a scene shape to move, a color
to change, or a light to turn on or off. In more complex applications, behaviors
alter multiple scene nodes, or restructure entire chunks of the world.
1. How are behaviors bound to VRML files? When are they downloaded?
Issue 1 requires a script node extension to the VRML syntax. Issue 2
requires the definition of a behavior system structure and the semantics of
behavior startup and shutdown. Issue 3 requires that a communications protocol
be developed whereby the browser and running behaviors may communicate. Finally,
issue 4 requires the development of a scripting language and an API for
browser-behavior communications.
2. The VRBS System Structure
A system
structure defines system components and how and when they interact. Issues
include:
* Are the browser and language interpreter the same program or separate?
Figure 1 shows the VRBS system structure. The principle components
are:
* A VRML browser
Figure 1. VRBS System Structure
A behavior language API provides functions to the script that allow it to query and change scene content by communicating with the browser. In response, the browser parses behavior requests and updates the scene. For user- or time-triggered actions, a behavior nominates callbacks, asking the browser to notify it for desired events. When an event occurs, the browser sends the event to the interpreter which executes the behavior callback.
The VRBS protocol ties the browser and interpreter together. It defines messages to communicate scene changes and events, as well as messages to control the startup and shutdown of the interpreter and its behaviors.
The VRBS system structure is similar to that common to interactive applications. The use of separate processes allows the independent development of browsing and behavior language functionality:
* The interpreter is ignorant of VRML syntax, Internet protocols, or windowing system details. It's roles are to (1) execute scripts sent to it by the browser, and (2) provide an API so that scripts can send messages to and receive events from the browser.
WWWScript { name "" # SFString }A WWWScript node, like existing WWWInline nodes, provides the URL of a remote file. Where a WWWInline loads a VRML scene file, a WWWScript loads a VRML behavior file.
As with all URLs, the MIME type or file name extension of a file indicates the type of data being read. As the industry explores different behavior language options, different MIME types will be defined for different languages.
When a WWWScript node is processed by the browser, the script is retrieved from the Internet, it's MIME type checked, and the user's .mailcap file scanned for the name of an appropriate behavior language interpreter to invoke. In this way the same VRML node syntax can be used to generically reference a behavior of any behavior language.
The prototype implementation of WWWScript nodes within SDSC WebView [4] recognizes the MIME extension type:
x-script/x-pbswhere x-script is a major MIME type for scripts in general, and x-pbs is the specific minor MIME type for behavior scripts written using Perl.
A typical use of WWWScript reads:
DEF prog WWWScript { name "http://host/file.pbs" }WWWScript nodes are not restricted to one per scene. When multiple WWWScript nodes are encountered, each node's script is loaded in to the same interpreter.
To reduce name space collisions, the behavior language interpreter is provided both the script text, and the name of the VRML WWWScript node (prog in the example above). That node name can be used to group script functions within their own name space. In a Perl implementation, for instance, the node name is used as a Perl package name. The script's code is loaded into that package.
When multiple behavior scripts are loaded simultaneously, each in to a different package, scripts may make calls to each other using the package name. This is stylistically similar to the use of a node's name for VRML USE node instancing.
Where an author has omitted a name for a WWWScript node, a unique package name is automatically generated. SDSC WebView, for instance, generates a name of the form b# where # is a unique random number.
A WWWScript node may occur anywhere within a scene graph, including within a VRML file loaded by a WWWInline. The root world of a tree of WWWInline'd worlds is said to be the owner of the entire behavior set, including those loaded by it directly, those loaded by WWWInline'd files, and so on down the world tree. This serves to flatten the behavior script name space, allowing scripts of WWWInline'd worlds to easily access script functions loaded at other points in the scene graph.
Each root world has its own behavior set, and its own behavior language interpreter. This establishes a natural border between worlds, preventing all the scripts of all the worlds ever loaded in to a browser from piling up indefinitely. Instead, a behavior set exists only as long as the root world exists. When the browser flushes the root world from it's memory, the associated interpreter and it's behaviors are flushed as well.
To load a retrieved script in to an interpreter, and later communicate with that script, the browser and interpreter must use a mutually agreed upon communications protocol. The prototype VRBS protocol and its design motivation are discussed in the following sections.
A prime example of this issue is the scene graph. A scene graph is a conceptual data structure expressed by an input VRML file. It is not necessarily a convenient internal implementation data structure for all browsers. A generic behavior protocol cannot, therefore, depend upon such a structure. An abstraction is needed to provide browser implementors the flexibility to define their own efficient and appropriate data structures for their target graphics platform.
The VRBS protocol allows scripts to query and change only those scene nodes given names via the VRML DEF construct. This is stylistically compatible with the VRML USE construct which allows scene nodes with names, and only those nodes, to be repeatedly instanced.
VRBS does not require that there be an internal scene graph representation, only that the named nodes of a VRML scene be accessible within the browser. The browser must, however, retain the semantics of VRML scene node relationships. For instance, consider the following VRML file:
#VRML V1.0 ascii Separator { DEF sphereColor Material { diffuseColor 1.0 0.0 0.0 } Sphere { } } Cone { }A behavior may access the named Material node sphereColor to, say, change it from red to blue. Regardless of the internal browser data structure used, the Sphere should turn blue, and the Cone should not.
For example, the VRBS protocol does not require that the scripting language have a notion of objects. The application of an object-oriented programming style is entirely up to the scripting language. The protocol simply defines messages to notify the interpreter when events have occurred. It is up to the interpreter to decide how it delivers those events and to whom.
The VRBS protocol does assume the language has some support for name space segregation, typically via package, module, or class structures. In the absence of such, it is expected that the language interpreter will process incoming scripts and adjust function and variable names to automatically avoid name space collisions.
The VRBS protocol is entirely string-based. No multi-byte binary data is included, thereby avoiding byte order and floating point issues.
Traditional protocol-building tools, such as XDR [10], build binary packets. While efficient in C, they can be problematic for text-processing style scripting languages. Additionally they require porting a support library and its integration in to a language interpreter.
To achieve a script-friendly protocol, the VRBS protocol is, again, entirely string-based. Messages are easily handled using simple text operations already available in any language. No modifications or additional support libraries are needed to enable typical existing languages to use the VRBS protocol.
In VRBS, the protocol allows for the definition of new opcodes and arguments. An interpreter may query the protocol version number supported by a browser, as well as the set of extensions supported by that browser. Receipt of a message with an unknown opcode is treated as a non-fatal error, allowing the browser or interpreter to continue on.
Similarly, individual scripts may query the features supported by a browser and compute platform, then adapt accordingly. This allows a script run on, say, a PC to perform differently than one run on an SGI Onyx Reality Engine II.
The VRBS protocol distinguishes between two stages of behavior initialization: load and startup.
When a browser encounters a WWWScript node, it retrieves the script and hands it off to an appropriate language interpreter. The interpreter is asked to load the script only, but not to execute it. During the load phase, the interpreter should sanitize the script, watching for illegal or unsecure operations. A Perl implementation, for instance, parses an incoming script and blocks all access to eval, require, undef, and so forth. Additionally, it traps calls to open files, allowing it to redirect file access to a user-defined temporary directory.
Once an interpreter has finished loading a new behavior, it notifies the browser. The browser responds by requesting the interpreter to start up the behavior. If any sanitization problems occurred, the browser will not start up the script.
The differentiation between load and startup also allows a browser to pre-load scripts, or maintain them in a local loaded cache. If the language supports script precompilation, the load stage does this work, allowing the start up stage to get straight to executing the behavior.
* The header
* The body
All message header and body components are NULL-terminated ASCII strings. All numbers are expressed in decimal as strings and are constrained to be storable (when converted to binary) within a 32-bit variable.
Pre-Header HeaderNBytes (1 byte)The pre-header tells a browser or interpreter how many bytes to read from the connection in order to retrieve the entire message header.
Note that, unlike the remainder of a message, the pre-header is binary rather than a NULL-terminated string. This allows a message parser to read exactly 1 byte with a single read call, instead of looping to read a variable-length NULL-terminated string. This helps to keep the number of read calls down and increase the performance of message parsing. By keeping the pre-header to a single byte the protocol also avoids byte order issues.
Header OpcodeClass Opcode BodyNBytesThe opcodeClass is a signed integer that indicates the general category of operation to be performed. The opcode is a signed integer, that indicates the specific operation to be performed.
The BodyNBytes is a positive signed integer that indicates the number of bytes that follow as part of the message body.
The pre-header's HeaderNBytes field always gives the actual size of the transferred header, which may be greater than that required for the above three components. This allows future versions of the protocol to add additional header fields (such as a time stamp) without breaking existing applications. Older applications that do not recognize the new fields will silently skip over the extra header fields by using the HeaderNBytes byte count.
For example, the OpGetNodeField opcode requests the browser to change the value of a field of a named node. The message body provides:
Body (OpGetNodeField) worldId nodeName fieldName fieldValueEach argument is a string. nodeName gives the name of the node to change, while fieldName gives the name of a field in that node (such as "diffuseColor" for a Material node). fieldValue is the new value for the field.
The worldId field is discussed in a later section. It is provided on some, but not all protocol messages.
The body is always BodyNBytes long, which may be larger than that required for the opcode's arguments. Again this allows for protocol changes in the future that may add additional optional arguments to individual opcodes. Older applications will use the BodyNBytes field to silently skip extra data in a message body.
ClBehaviorToWorld ClBehaviorToBehavior ClBehaviorToBrowser ClBehaviorToInterpreter ClBrowserToWorld ClBrowserToBehavior ClBrowserToBrowser ClBrowserToInterpreter ClInterpreterToWorld ClInterpreterToBehavior ClInterpreterToBrowser ClInterpreterToInterpreter ClWorldToWorld ClWorldToBehavior ClWorldToBrowser ClWorldToInterpreterTable 1 . VRBS Opcode Classes The intent of opcodes within a given opcode class is clear from the name: ClBehaviorToWorld opcodes are messages from a behavior to a world, while ClWorldToBehavior opcodes are the reverse, and so on. This naming structure makes clear the 16 initial possibilities for messages to and from behaviors, browsers, interpreters, and worlds. There is nothing in the protocol, however, that prohibits additional classes. Possible future classes might include those for messages to and from behavior library servers, databases, the window system, and so forth.
The prototype VRBS protocol defines opcodes for some, but not all of these classes. The ClBrowserToBrowser class, for instance, is currently empty.
Key protocol messages include those in the ClBehaviorToWorld class, which describe operations to be performed upon a node, or queries about a node field's value.
The ClBehaviorToBrowser class include opcodes to query the browser's abilities, including what VRML node types and node fields it supports, and what viewer types it implements.
The ClWorldToBehavior class supports opcodes to start and stop a world's behavior set, and deliver user- and timer-events to the behavior.
Key administrative tasks are included as opcodes within the ClWorldToInterpreter class. Typical operations include loading a behavior in to the interpreter, and telling the interpreter to quit when a world is flushed from a browser's memory.
Table 2 provides a brief list of the VRBS opcode classes and opcodes. The list is provided to illustrate the protocol's style. Opcode argument details are left to the protocol manual [5].
ClBehaviorToWorld OpGetWorldUrl OpGetNodeNames OpGetNodeBBox OpGetNodeType OpSetNodeField OpGetNodeField OpSetNodeField1 OpGetNodeField1 OpDeleteNode OpInsertNode OpReplaceNode ClBehaviorToBrowser OpGetBrowserHost OpGetBrowserPort OpGetBrowserName OpGetBrowserVersion OpGetBrowserFeatures OpGetBrowserNodeTypes OpGetBrowserNodeFields OpGetBrowserViewerTypes ClInterpreterToWorld OpAddEventInterest OpRemoveEventInterest OpReady ClInterpreterToBrowser OpHello OpInformation ClWorldToBehavior OpBehaviorStart OpBehaviorStop OpBehaviorEvent ClWorldToInterpreter OpLoadBehavior OpLoadBehaviorFile OpQuitInterpreterTable 2 . VRBS Opcodes
The VRBS protocol defines that a worldId is a unique identifier for a root world within a browser. It does not define the nature of that identifier. It could be a name, a number, an address, or whatever. For example, SDSC WebView generates a unique random number for each VRML world and treats that as a worldId. worldIds of consecutively loaded worlds may not have any relationship to each other (and do not in SDSC WebView). worldIds may or may not be reused during a browser session (they are not in SDSC WebView).
The VRBS protocol supports the OpAddEventInterest message through which an interpreter requests notification of world events. OpRemoveEventInterest cancels such an interest.
Typical language implementations will cover these opcodes with an API that allows a script to nominate a function to be called each time a desired event arrives at the interpreter. SDSC's prototype Perl implementation, for instance, maintains a set of Perl functions to call on each event type. In any case, an interpreter's response to an incoming event is not defined by the VRBS protocol. Only the names and parameters of events are defined.
Messages from the browser to the interpreter, such as events, are not Acked by the interpreter. This lack of symmetry prevents deadlock situations. For example, suppose that both the browser and interpreter had to Ack the other on each message. Now, let them both send a message to the other at exactly the same time, and then enter a read waiting for an Ack back. The browser, expecting an Ack from it's message to the interpreter, instead gets a new request. Similarly, the interpreter, expecting an Ack from it's message to the browser, instead gets a new event. Both shelve the request or event and re-enter a read in hopes of finding an Ack. Since both have put off processing the other's message, neither can send back an Ack and they both hang waiting for the other. To avoid this problem, the browser never waits for an Ack and the interpreter never sends one.
Several alternate protocol designs are possible to avoid this kind of deadlock situation. For instance, browsers and interpreters could avoid blocking on reads and, using a message id, track Acks and messages despite possible interleaved delivery. This kind of approach works, but has several side-effects.
Interleaved message handling breaks the remote procedure call style visible to a behavior script. Ideally, a script simply calls a function in an API. The API function packages a message, sends it off to the browser, awaits a response, then returns a status or query answer to the script just by the function call returning. The remote execution of the call is not visible to the script.
However, requiring interleaved message handling either forces the interpreter to allow message functions to return before getting answers, or requires the interpreter to support multi-threaded execution (so that one script can continue while another waits for a message answer).
In the former case, a script function call that sends a message would return nothing. Some indeterminate time later the answer would arrive back from the browser, be saved within the interpreter, and the script told of its arrival. The script is responsible for getting this late answer and trying to pick up where it left off earlier. This can create very inconvenient program structure, making it problematic to write even simple behaviors.
Supporting multi-threaded script execution is also a problem. It is difficult to implement within a language interpreter, and is not supported by most current scripting languages.
An alternate approach is to skip Acks altogether. Roughly this is the case with the X Window System protocol. Xlib queues packets destined for the server, flushing them periodically or when a client makes a server query. Xlib calls return immediately, without error codes indicating if the server liked, or disliked, what it was sent. Sometime later faulty applications see a protocol error message show up on stderr if there was a problem. Again, this can make authorship of even simple scripts problematic.
The VRBS protocol's non-symmetric handling of Acks eliminates most of these problems. Scripts make API function calls to send messages. The call blocks, waiting for the Ack back from the browser. When received, the call returns the status, error code, or query answer exactly like returning from a function call. Script writing is straight-forward with all message handling looking like function calls.
If the browser quits, messages destined for it from the interpreter have no place to go. The interpreter is expected to detect this and quit as well. Detection of browser, or interpreter, death is left up to the operating system's interprocess communications utilities. For example, on UNIX systems, SDSC WebView uses sockets [10]. A read or write on a socket to a process that has died will return an error, letting the interpreter know the browser has died unexpectedly. The protocol needn't support any explicit process death detection.
When a browser encounters a WWWScript node, it uses the script's URL to retrieve the script file to local disk and looks up the script's MIME type in the user's .mailcap file. The appropriate entry's language interpreter is invoked, passing it:
The combination of hostname and port number uniquely addresses an open interprocess communications port on an Internet host. The interpreter, upon startup, opens a connection back to the browser, on that host and at that port number. It's opening message to the browser, OpHello, queries the browser's VRBS protocol version and features. From this information the interpreter can adapt to variations in VRBS protocol revisions.
The interpreter then issues an OpReady message back to the browser, passing it the initial worldId, indicating it is ready to receive a behavior script for the world. The browser checks its transfer queue, pulls out the top pending behavior script and, using an OpLoadBehavior message, sends it to the interpreter for loading and sanitization.
When the interpreter finishes with that script, it again sends OpReady, and so on until all behavior scripts pending in the browser have been loaded. At that point the browser issues an OpBehaviorStart for each loaded behavior. This starts execution of the behavior within the interpreter.
Behaviors may be loaded in to an interpreter at any time; they needn't be loaded all at once. For instance, suppose that a WWWScript node is a child of a LOD node. The browser could only encounter the script, and load it, when the LOD triggers the child by user proximity.
Browsers also may delay issuing an OpBehaviorStart as they see fit. A browser may, for instance, preload scripts, starting them up only as needed. In the LOD example above, an alternate implementation would load the WWWScript's script before the LOD triggers the child. Only the startup of the script would be delayed until the LOD trigger. The semantics of behavior execution remain the same, but the implementation varies to adapt to different browser styles.
In any case, when a behavior script is finally started, it typically immediately expresses interest in one or more events, nominating callback functions for each. The interpreter sends the browser an OpAddEventInterest message. Thereafter, the browser sends events to the interpreter as they occur. The interpreter makes the callbacks, and the behavior script reacts to the events as it sees fit.
VRML file:
#VRML V1.0 ascii Separator { DEF example WWWScript { name "example.pbs" } DEF xform Rotation { rotation 0.0 1.0 0.0 0.0 } Cube { } }Perl script:
# Register interest in a Timer &AddEvent( "rotFunc", # call rotFunc "", # pass this data "EvTimer", # when timer triggers 0.1 ); # every 1/10 second # Update a rotation each call $angle = 0; sub rotFunc { $rad = ($angle/180.0)*3.1415; &SetNodeField( # set "xform", # node "rotation", # field "0.0 1.0 0.0 $rad" ); # to this $angle += 10.0; }
Consider a VRML file with a behavior script. The script, at startup, gets its worldId, opens a network connection to a remote collaboration server and passes the worldId, browser hostname, and port number to the server. The collaboration server opens a connection back to the browser and starts issuing scene change messages, just as would a behavior running within a local interpreter.
World events may be delivered back to the collaboration server, allowing it to detect user actions. It responds by issuing messages to the user's browser as well as messages to the browsers of all other network users viewing the same world. If one user, say, reaches out and starts a cube spinning in the scene, then via the collaboration server, all users viewing the same world see a cube start spinning.
The simplicity of the VRBS protocol, and its independence from assumptions about the client allows this kind of flexibility.
Key VRBS features include:
* Independence from browser implementation style.
* Independence from behavior language features.
* Independence from compute architecture attributes.
* Script-friendly protocol.
* Extensible and reasonably secure.
* Supports a traditional event model.
* Higher-level tools and libraries to more easily support common behavior operations.
* A binary variant of the VRBS protocol for use when a scripting language can easily deal with binary data.
This work has been supported through major funding from the National Science Foundation. The opinions, conclusions, or recommendations expressed herein are those of the authors and do not necessarily reflect the views of SDSC, NSF, General Atomics, or their sponsors.
2. Gosling, J.; McGilton, H., "The Java Language Environment: A White Paper," http://java.sun.com/whitePaper/javawhitepaper_1.html.
3. Moreland, J.L.; Nadeau, D.R., "The Virtual Reality Behavior System (VRBS): Using Perl as a Behavior Language for VRML," Proceedings of VRML95, 1995.
4. Nadeau, D.R.; Michaels, C., Moreland, J.L.; Zlotin, D., "SDSC WebView," http://www.sdsc.edu/projects/vrml/tools/webview/help/webview.html.
5. Nadeau, D.R.; Moreland, J.L., "The Virtual Reality Behavior System (VRBS)," http://www.sdsc.edu/projects/vrml/vrbs/vrbs.html.
6. Nye, A., edited by, Xlib Reference Manual, Volumes Zero, One, and Two, O'Reilly & Associates, Inc., 1992.
7. Ousterhout, J.K., Tcl and the Tk Toolkit, Addison-Wesley, 1994.
8. Shafer, D., The Complete Book Of HyperTalk2, Addison-Wesley, 1991.
9. Silicon Graphics Inc., Open Inventor C++ Reference Manual, Addison-Wesley, 1994.
10. Silicon Graphics Inc., IRIX(TM) Network Programming Guide, chapter 4 (Sockets-based Inter-Process Communications), chapter 8 (XDR and RPC Language Structure), chapter 9 (XDR Programming Notes), and Appendix B (XDR Protocol Specification).
11. Silicon Graphics Inc., "WebSpace... because the world is not flat!," http://www.sgi.com/Products/WebFORCE/WebSpace.
12. Wall, L.; Schwartz, R.L., Programming Perl, O'Reilly & Associates, Inc., 1991.
13. Wernecke, J., The Inventor Mentor, Addison-Wesley, 1994.