Application virtualization, past and future
An introduction to application virtualization
Platform virtualization vs. application virtualization
Virtual machines (VMs), in their first incarnation, were created by IBM 60 years ago as a way to share large and expensive mainframe systems. And although the concept is still applied in current IBM systems, the popular concept of a VM has broadened and been applied to a number of areas outside of virtualization.
The area of virtualization that IBM popularized in the 1960s is known as platform (or system) virtualization. In this form of virtualization, the underlying hardware platform is virtualized to share it with a number of different operating systems and users.
Another application of the VM is to provide the property of machine independence. This form, called application (or process) virtualization, creates an abstracted environment (for an application), making it independent of its physical environment.
Aspects of application virtual machines
In the application virtualization space, VMs are used to provide a hardware-independent environment for the execution of applications. For example, consider Figure 1. At the top is the high-level language, which developers use to construct applications. Through a compilation process, this high-level code is compiled into an intermediate representation called object code. In a non-virtualized environment, this object code (which is machine independent) is compiled into the native machine code for execution on the physical platform. But in an application virtualization environment, the object code is interpreted within an abstract machine to provide the execution. The key advantage here is that the same object code can be executed on any hardware platform that supports the abstract machine (the interpreter).
Figure 1. Application VM for platform independence
In addition to creating a portable environment in which to execute the object code, application virtualization provides an environment in which to isolate the VM from other applications running on the host. This setup has a number of advantages, such as detailed resource management and security.
The object code for a VM is also called bytecode, specifically defining an instruction set that an interpreter executes. The term bytecode evolved from implementations that efficiently implemented their virtual instruction sets as single bytes for simplicity and performance.
Now, let's look at some of the historical uses for application virtualization and explore some of its modern uses.
Virtual machine history
One of the earliest uses of application virtualization occurred in the 1960s for the
Basic Combined Programming Language (BCPL). BCPL was an imperative language
developed by Martin Richards at the University of Cambridge and was a precursor
B language that evolved into the
C language we use today.
Although BCPL was a high-level language (similar to
the intermediate code that the compiler generated was called O-code
(Object code). The O-code could be interpreted on a physical machine (as a VM)
or compiled from O-code to the native machine language of the host. This
functionality provided a number of advantages in the context of machine
independence. First, by abstracting the O-code from the physical machine, it
could easily be interpreted on a variety of hosts. Second, the O-code could be
compiled to the native machine, which permitted the development of one compiler
and the multiple compilers that translate O-code to native machine instructions (a
simpler task). This machine independence made the language portable across
machines and therefore popular because of its availability.
In the early 1970s, the University of California at San Diego implemented the VM approach for execution of compiled Pascal. They called the intermediate representation p-code, which sought independence of the underlying hardware to simplify the development of the Pascal compiler (instead of relying on an abstract pseudo-machine architecture). The Forth language also applied VMs, namely, zero-address or stack-based architectures.
In 1972, Xerox PARC introduced the Smalltalk language, which relied on a VM for execution. Smalltalk was one of the first languages built around the concept of objects. Both Smalltalk and p-code heavily influenced one of the most prominent VM-based languages in existence today: the Java language. Java first appeared in 1995, developed by Sun Microsystems, and developed the idea of platform-independent programming through the Java Virtual Machine. Since then, Java technology has become a building block of web applications. From server-side scripts to client-side applets, Java technology raised awareness of VM technologies and introduced newer techniques that bridged interpretation and native execution using just-in-time (JIT) compilation techniques.
Many other languages include the concept of VMs. The Erlang language (developed by Ericsson) uses a VM to execute Erlang bytecodes and also to interpret Erlang from the source's abstract syntax tree. The lightweight Lua language (developed at the Pontifical Catholic University of Rio de Janeiro in Brazil) includes a register-based VM. When a Lua program is executed, it is translated into bytecodes, and then executed in the VM. Later, this article looks at a bytecode standard that can be used for any language.
Virtual machines today
The use of VMs to provide an abstraction to the physical host is historically a common method and today evolves and finds application. Let's look at some of the newer open source solutions that push the concept of VMs into the future.
Dalvik is an open source VM technology developed by Google for the Android operating system. Android is a modified Linux kernel that incorporates a software stack for mobile devices (see Figure 2). Unlike many VM technologies that rely on stack-based architectures, the Dalvik VM is a register-based virtual architecture (see Related topics for more information on the architecture and instruction set). Although stack-based architectures are conceptually simple and efficient, they can introduce new inefficiencies, such as larger program sizes (because of stack maintenance).
Figure 2. Simple architecture of a Dalvik software stack
Because Dalvik is the VM architecture, it relies on a high-level language compiled
into the bytecodes that the VM understands. Rather than reinvent the wheel,
Dalvik relies on the Java language as the high-level language for application
development. Dalvik also relies on a special tool called
to convert Java class files into Dalvik VM executables. For performance, the VM
may further modify a Dalvik executable (dex) for further optimizations,
including JIT compilation, which translates the dex instructions into native
instructions for native performance. This process is also known as dynamic
translation and is a popular technique for increasing the performance of
As shown in Figure 2, a Dalvik executable (along with an instance of the VM) is isolated as a single process in Linux user space. The Dalvik VM has been designed to support execution of multiple VMs (in independent processes) simultaneously.
The Dalvik VM is not implemented on the standard Java runtime and therefore does not inherit the licenses over it. Instead, Dalvik is a clean-room implementation published under the Apache 2.0 license.
Another interesting open source VM project is Parrot. Parrot is another register-based VM technology that was designed to efficiently execute dynamic languages (languages that perform certain operations at run time that are commonly performed at compile time, such as altering the type system).
Parrot was originally designed as a run time for Perl6, but it is a flexible environment for execution of bytecodes for many languages (see Figure 3). Parrot supports several input forms, including the Parrot Abstract Syntax Tree (PAST), which is useful for compiler writers; the Parrot Intermediate Representation (PIR), which is a high-level representation that can be written by people or automatically by compilers; and the Parrot Assembly (PASM), which is below the intermediate representation but useful both for people and for compilers. Each form is translated and executed in Parrot bytecode on the Parrot VM.
Figure 3. Simple architecture of the Parrot VM
Parrot supports a large number of languages, but one aspect that makes it so
interesting is its support for both static and dynamic languages, including
specific support for functional languages. Listing 1
shows a simple use of PASM. To install Parrot with Ubuntu, simply use
sudo apt-get install parrot
The following session illustrates a simple string manipulation program in Parrot.
Note that although Parrot implements this code as assembly, it's much more
feature rich than the assembly you may be used to. Instructions in Parrot use
dest,src syntax, so Listing 1
shows a string register being loaded with text. The
instruction determines the length of the string and loads it into an integer
Listing 1. PASM example
$ more test.pasm set S1, "Parrot" set S2, "VM" length I1, S1 print I1 print "\n" concat S3, S1, S2 print S3 print "\n" end $ parrot test.pasm 6 ParrotVM $
You'll find a rich set of instructions within Parrot (see Related topics for more details). The authors chose richness of features over minimalism, making it easy to code and build compilers for the Parrot VM.
Even with the high-level abstraction that PASM provides, PIR is even more
comfortable to high-level programmers. Listing 2 provides
an example program written in PIR and executed by the Parrot VM. This example
declares a subroutine called
square that squares the
number and returns it. This process is called by the main subroutine (labeled with
:main to tell Parrot to run it first) to print the result.
Listing 2. PIR example
$ more test.pir .sub square .param int arg arg *= arg .return(arg) .end .sub main :main .local int value value = square(19) print value print "\n" .end $ parrot test.pir 361 $
Parrot provides a rich application virtualization environment for the development
of machine-independent applications that also seek high efficiency. You can
find a large number of languages that support compiler front ends designed
for Parrot, including
C, Lua, Python, Scheme,
Smalltalk, and many others.
Other uses of application virtual machines
So far, you've seen the historical uses of application virtualization, including two recent examples. Dalvik is powering application development within current handsets, and Parrot provides an efficient framework for compiler writers for static and dynamic languages. But the concept of application virtualization is being implemented in a number of other areas outside of the approaches explored thus far.
One particularly interesting use is likely running on the computer you're using right now. Systems that use the new Extensible Firmware Interface (EFI), which is a BIOS replacement, can implement firmware drivers in what's called the EFI Byte Code (EBC). The systems firmware includes an interpreter that is invoked when an EBC image is loaded. This concept was also implemented in Open Firmware by Sun Microsystems using Forth (a language that includes its own VM).
In the game world, the use of application virtualization is not new. Many modern games include scripting of nonplayer-character behaviors and other game aspects using languages that execute bytecodes (such as Lua). But the concept of application virtualization in games actually goes back much farther.
Infocom, the company that introduced text-based adventures such as Zork, saw the value in machine independence in 1979. Infocom created a VM called the Z-machine (named after Zork). The Z-machine was a VM that permitted an adventure game to be more easily ported to other architectures. Rather than having to port the entire adventure to a new system, an interpreter would be ported that represented the Z-machine. This functionality simplified the porting process to other systems that may have different language support and entirely different machine architectures. Although Infocom's goal was to ease the pain in porting between the architectures of their day, their work continues to simplify porting and results in making these games accessible to a new generation (even on mobile platforms).
Other game applications of VMs include the ScummVM (which provides a VM environment for the Script Creation Utility for Maniac Mansion (SCUMM) scripting language (created in 1987). SCUMM was developed by LucasArts to simplify development of a graphical adventure game. ScummVM is now used to play a large number of text and graphical adventure games on a variety of platforms.
Just as platform (or system) virtualization has changed the way we provision and manage both servers and desktops, application virtualization continues to provide efficient mechanisms to abstract an application from its host system. Given the popularity of this approach, it will be interesting to see an evolution of both software and hardware to make application virtualization even more flexible and efficient.
- Browse more articles on virtualization or all of Tim's articles on developerWorks.
- Wikipedia provides a great set of resources to learn more about VMs (both platform and application). Check out the page on virtual machines in addition to a page specifically devoted to p-code machines.
BCPL, the precursor to the
Clanguages, originated in 1967 from Martin Richards. You can read the first BCPL reference manual online as part of Project MAC. You can also download the latest version of BCPL from its home site.
- EFI Byte Code, or EBC, specifies an interpretive layer for portable component drivers. You can learn more about EBC and UEFI in Boot Loaders: Small, Fast System Initialization (Dr. Dobb's, September 2010).
- Although Forth has been around since the 1970s, it continues to find applications as a VM language. You'll find Forth applied in space sciences, embedded systems, BIOSes, and any other application that exists with scarce resources. Learn more about Forth at the Forth Interest Group.
- Follow developerWorks on Twitter, subscribe to a feed of Linux tweets on developerWorks, or follow M. Tim Jones on Twitter.
- Dalvik is the VM environment for the Android operating system. Dalvik was developed by Dan Bornstein and is maintained by Google as part of Android. Learn more about the Dalvik machine through its bytecodes (available in user documentation). You can also learn more about Dalvik in Introduction to Android development (Frank Ableson, developerWorks, May 2009).
- Parrot is a VM designed to efficiently execute static and dynamic languages through a variety of intermediate representations over Parrot bytecode. Parrot is available as open source and can be used with a number of languages. To learn more about Parrot's instruction set, check out the opcodes available within it.
- Application VMs are popular in the game development world. One of the earliest uses was by Infocom in its text adventure games (such as Zork). You can learn more about the Infocom VM, called the Z-machine, as well as interpreters that exist for various platforms. Another application of VMs was the SCUMM, used in graphical adventures by LucasArts. SCUMM has been implemented as open source as ScummVM and is bringing older games back to life on new hardware.
- In the developerWorks Linux zone, find hundreds of how-to articles and tutorials, as well as downloads, discussion forums, and a wealth of other resources for Linux developers and administrators.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.