 | Level: Intermediate Cameron Laird (claird@phaseit.net), Vice President, Phaseit, Inc. Rick Kennell, Middleware Engineer and Virtualization Technician, Purdue University
02 Oct 2007 nanoHUB is a virtual computing center created to support
nanotechnology research. It uses open source components to achieve far more powerful
results than previous remote access facilities. This article details specific
configurations and enhancements necessary to make the most of the performance,
security, and usability such common software as VNC and WebDAV provides.
nanoHUB.org employs a unique middleware system that precisely balances security,
performance, and convenience to support distributed public research on
nanotechnology. Scientists who use this research gateway concentrate on their own
studies rather than computing issues.
As important as nanoHUB's current virtues, though, are the lessons it teaches
about how to configure and enhance well-known open source components to achieve a
much richer virtual computing experience than most developers seem to realize is
possible. Let's look at what it takes to allow researchers like Supriyo Datta,
Director of the NASA/Purdue Institute for Nanoelectronics and Computing, to
report, "With little effort on our part, we can make our simulation tools
available to our colleagues. And with essentially no effort on their part, our
colleagues can be running molecular electronic simulations." As seen in
Figure 1 below, nanoHUB nicely balances ease of use both for
the scientists who publish their works there, and for their colleagues who view
what they've made available.
Figure 1. Screen shot of carbon
nanotube simulation
nanoHUB is a Web-based
research center
The basic idea is simple: You're a nanotechnology researcher, and you want to
take advantage of the resources of the Network for Computational Nanotechnology
(NCN), a consortium of several universities and other collaborators (see
Resources). Your keyboard and display, on your desktop or
lap, should connect you to the full wealth of data, computing power, simulation
hardware, and applications of NCN, wherever you happen to be and however those
particular resources are physically provisioned at the moment.
It should be simple. It takes a lot of serious development attention to
make it appear so from the perspective of an end user because:
- All communication must be encrypted.
- The firewalls providing protection local to end users have policies that are
generally not understood, even in the minority of cases where the end users
are aware of their existence.
- NCN resource usage must be measured and, in some cases, rationed.
- The technical capabilities of the desktops with which researchers connect
differ wildly.
- Many applications are restricted to particular hardware combinations and in
ways that are hard to generalize.
- nanoHUB, as a whole, must be manageable, scalable, reliable, available,
testable, and recoverable.
- nanoHUB needs to act compatibly with existing security infrastructures, in
other words, let end users rely on the accounts and passwords they already
know as much as possible.
While several proprietary and open source tools meet one or two of the
requirements, there are, at best, a lot of configuration complexities along the
way. Where's the convenience for nanoHUB visitors?
That convenience emerges only through a well-balanced mix of customized VNC, SSL,
X11, Apache, and several other open source projects. Consider, for a moment, the
display near the top of this article of a nanotube simulation (see
Figure 1). This image was obtained in a few seconds by
clicking on the specific nanoHUB application (CNTbands 2.0), selecting the
appropriate parameters, and interactively adjusting the 3-D image. No further
software is needed, apart from the typical Java™ runtime interpreter
available with a Web browser. In particular, a user does not have to obtain and
configure the visualization system. Most researchers don't know how to use OpenDX
or PyMol—much less how to install them—and they
shouldn't have to.
It's a good thing nanoHUB doesn't require such installations. Nanotube
visualization, in common with many others nanoHUB supports, is quite complex and
correspondingly slow on a typical desktop computer.
The simulation was originally available as a downloadable application with no
visualization. Downloads have essentially stopped in the last year, because
nanoHUB's ease of use and the effectiveness of hardware-accelerated visualization
entirely outweigh the marginal benefits of researchers being able to compile and
modify the source code themselves. For them, using local resources to visualize
the results is no advantage. For these working scientists, nanoHUB is easier to
use and gives more interesting results than any locally hosted alternative.
Moreover, this remarkable example generalizes. It's traditional in scientific
circles to assume that researchers need "hot" hardware on their desks to see good
graphics. nanoHUB demonstrates over and over that its well-tuned VNC solutions
deliver better performance, even through mediocre Wide Area Networks
(WANs). NanoHUB takes advantage of plenty of special-purpose facilities hosted at
NCN sites, then relies on VNC's RFB protocol to communicate the visual results to
the desktops where nanoHUB's clients sit. Computing specialists concentrate on
wringing the most from nanoHUB's physical assets, while the nanotechnology
researchers around the world who use nanoHUB turn their focus from computing
issues back to the science at which they're best.
Software refinements:
efforts to achieve "it just works"
That same screen shot hints at several other examples of the ways nanoHUB
staffers have solved computing problems so that researchers can push nanoscience
forward. To the right of the Result selector, you see a green download
arrow that points down. If you click on it, a new browser window with a static
image of the simulation appears; from that window, of course, it's straightforward
to put the browser to work to print the image, save it to an external file, and so
on.
A lot goes into these operations: Bear in mind that the button press was only
seen on the computation server. To implement the file download, the server sends a
message to the client to ask its Web browser to pull the file from nanoHUB. The
request is routed to the application where a Tcl-based Web server provides the
file. File upload is just as convenient. End users get what they're after, and you
don't worry about how an image visible through VNC affects the launch of a new
browser frame.
There's more. Notice what happens if you grab the border of the window and
stretch the application's display: It behaves sensibly. This isn't just resizing a
window, because the dimensions of the serving X11 frame buffer limit any resizing.
Instead, nanoHUB resizes the X11 frame buffer itself, and it allows the viewer's
frame buffer to shrink-wrap the application to an appropriate size. To the
best of our knowledge, this technique has never been documented before for X11
servers. The Xephyr X Server, for example, updates Xnest in several regards, but
it doesn't attempt this rather subtle effect.
The application display also includes a Popout button that unembeds
the VNC window. The benefit here has to do with poor rendering of embedded Java
applets for certain browsers. In particular, end-user movement of the cursor
pointer often leaves mouse trails with Firefox on Mac OS X, or with any
browser running on a dual-screened XP desktop. Unembedding gives a fully
resizable VNC-based window that doesn't exhibit mouse trails.
NanoHUB's implementation has plenty of other desirable properties. It's hosted
entirely on virtual machines, which themselves can of course be snapshotted,
rehosted, backed up, written to DVDs, and so on. If nanoHUB's current home at
Purdue University were somehow destroyed, it could, at least in principle, be
restored to nearly its current state.
The virtualization infrastructure contributes to nanoHUB's testability and
security. It's a straightforward matter to launch "1000 random application
sessions in rapid succession," or "100 sessions could be started simultaneously,"
as the nanoHUB site itself explains. Moreover, the current implementation
"synthesizes a fully private environment for each application session". The deep
manageability of these options multiplies the ability of the nanoHUB's technical
team to isolate errors and enhance security.
NanoHUB also leverages standards to simplify other aspects of its management.
Much maintenance is possible through WebDAV. For this purpose, think of WebDAV as
a protocol that extends HTTP's retrieval model to give read and write capabilities
with common browser technology. How can WebDAV access a multi-user file system
like the computationally complex nanoHUB structure? With a quick and easy mapping
from the Web server into a specific user through FUSE, a programmable file system.
This provides more secure access than a traditional alternative: Run the Web
server as root (!), with instances switching to the privilege of the file
accessor. The latter approach of letting the Web server have root is an
architectural choice made by plenty of other sites. It's satisfying to realize,
though, that the open source ecosystem is now rich enough to provide all sorts of
pieces, such as FUSE, that can be combined to yield better—more
secure, more efficient, more maintainable—answers.
Modifications to open
source components
NanoHUB's users have plenty of familiarity with computers. NanoHUB affords them
qualitatively different capabilities than they have anywhere else; its success in
combining reliability, performance, functionality, and security achieves
breakthroughs that only nanoHUB gives them.
NanoHUB and its immediate predecessors have been in development for over a decade
now; it's beyond the scope of this article to detail all the refinements that
enable nanoHUB. The modifications to VNC and X11, for example, are numerous,
deeply complicated, and scattered over many source files to address security
considerations that simply didn't arise at the times these products were
originally coded. Also, the release of NCN source is governed by the default rules
of Purdue University, which do not favor open source.
The modifications to the WebDAV FUSE mapper are localized enough to be
comprehensible, even in relative isolation. Most of the FUSE
fusexmp.c example code is option parsing, error
handling, and boilerplate. Where the original has
fuse_main(...), nanoHUB relies on a version that
sequences (see Listing 1):
Listing
1. nanoHUB's elaboration of fuse_main()
fuse_setup(...);
umask(0);
setfsuid(...);
chroot(...);
setregid(...);
setreuid(...);
fuse_loop(...);
fuse_teardown(...);
|
The Goldilocks level for security is the idea here: to be rigorous about
setting just the right permissions and access, neither too much nor too little.
Comparisons
nanoHUB does not share a physical display of a computer like such popular
remote control systems as the Fog Creek Copilot, the Microsoft®
Remote Desktop, the WebEx Desktop, or GoToMyPc; instead, it's reminiscent of
Citrix, in that it hosts many virtual displays on a single server. There are no
licensing fees with nanoHUB, and nanoHUB is also more portable than Citrix, both
for hosting and display. Also, nanoHUB clients are embeddable in Web browsers, as
illustrated above, and connections can be more easily shaped to traverse typical
firewalls.
Over and over, nanoHUB embodies the general theme of being like some more common
technology—a conventional Web application, for
instance—but re-fashioned in its fundamentals to hit a higher
standard of security, performance, or scalability.
Port forwarding
Port forwarding presents an instructive example: nanoHUB uses a connection router
on the Web server for port forwarding. It listens on port 563 (NNTPS) and waits
for a connection. When nanoHUB delivers a VNC client configuration to a user's
browser—securely, through HTTPS—nanoHUB adds a few
parameters that tell the client how to negotiate with the connection router. The
router passively forwards the connection to the appropriate VNC server on a
private internal network.
Earlier in its history, nanoHUB used iptables for connection routing. At its
best, iptables can be considerably more efficient than userland routing, because
the latter involves packets in two passages through the kernel from the
network, to the router, and back to the other network. Userland routing also
forces at least two context switches.
The kernel overhead turned out to be negligible, though, compared to end-to-end
network latency. More crucial was that nanoHUB's earlier use of iptables was on a
host that required dynamic addition and removal of iptables rules to rout a
particular port to another port on an internal network. If that specific host ever
went down, it became colossally difficult to resynchronize with existing sessions.
The table updates also led to races that occasionally lost packets during their
traversal of the frequently changing ruleset.
Userland routing for port forwarding eliminated these problems. Even better, the
connection router provides another layer of security and monitoring; in
particular, it can easily report on exactly when and how long an application has
been viewed. This is crucial both to understand how nanoHUB benefits end users and
also for accounting.
Passage through
a proxy
 |
Unsigned applets
The Java security model has several provisions that bear on networking as it
applies to applets: Fundamentally, an end user's client can only make arbitrary TCP
connections back to the domain from which it was loaded. If that involves
an explicit proxy, the applet must be signed and run with elevated privileges,
as seen below.
Signed Java applets involve a cascade of complications that nanoHUB
developers long sought to avoid. One potential loophole in the security model is
that it is possible to make HTTPS connections through a Web proxy with an
unsigned applet and the limitation that the client might read from the server
only when the latter closes the outbound half of the connection.
This hints at the possibility of an unsigned applet that opens two HTTP
streams—one that would never stop writing and the other to
write a header and read continually.
After a great deal of experimentation and research, the conclusion was
that some proxies would eventually constrain the lifetime of these unusual
connections. One of the great difficulties of work in this area is the
prohibitive expense of experimentation with commercial proxies. In any case,
resynchronizing after one or both of the connections were terminated would
introduce more complexities.
The programming team's other attempts to use unsigned applets hit equally
insurmountable obstacles. Signed applets are the right choice for nanoHUB.
|
|
nanoHUB's method of firewall transit through a Web proxy is unusual and perhaps
even unique. As the accompanying sidebar details, nanoHUB designers reluctantly
decided to use the standard proxy CONNECT access method with a signed applet. Even
with this decision in place, several months of shakedown went to accommodation of
corner cases:
- Non-conforming proxies
- Web browsers with idiosyncratic methods of communicating proxy parameters to
the Java Runtime Environment
- Careful implementation of timeout on network failure in the Java 1.4
language
- Layer-7 filters that mistake nanoHUB connections for file sharing
applications violating copyright laws
- End users behind proxies they don't understand and can't explain or
analyze
nanoHUB has learned and now traverses essentially all the proxies necessary
to serve its clientele.
Open source science
NanoHUB's reduction of computing friction has yet another consequence: It
radically encourages scientific sharing. For several decades, it has been common
for physical scientists to publish results based on
computations—but those computations themselves were essentially
irreproducible. Programs were managed as proprietary artifacts. Even researchers
who nominally put their calculations in the public domain rarely were effective,
because they lack expertise in the computing crafts that bear on portability and
deployment. To understand fully a particular scientific conclusion too often has
required time with the specific computers where that conclusion was uniquely
calculated.
NanoHUB changes all that. Its commoditization of so many aspects of
computing, including rendering, display, high-performance calculations, security,
and hosting, makes it more convenient, in many cases, for nanoscientists to
run their programs in sharable ways instead of on the computers in
their own labs.
It's too early to analyze all the consequences. One can hope, though, that greater
openness and visibility in the computational parts of nanoscience will boost the
velocity of research over all. Certainly, rationalization of the
computing infrastructure has been welcome to the numerous researchers and students
already taking advantage of nanoHUB. It appears that nanoHUB's success represents
a virtuous cycle between several distinct elements:
- The active and progressive nanoresearch community
- The benefits open research brings to education
- High-quality security and remoting programs that enable open source-based
collaborative technologies
- A lot of hard detail work
Plenty of challenges remain. For example, Figure 2 represents the fourth energy level for a quantum dot in a pyramidal
geometry. The capped wings bear little resemblance to normal P orbitals,
and they are a shape for which few students have yet developed an intuition. We hope
that access to a suitable interactive visualization engine will promote
comprehension of such eigenvalues, along with numerous other aspects of
nanoscience.
Figure 2. Screen shot of the
fourth energy level for a quantum dot
Conclusions
Over twelve calendar years of thoughtful programming have gone into nanoHUB. X11,
VNC, and other well-known technologies promise remote computing, but several
generations of refinements on the base they provide have made nanoHUB's middleware
as invisible as it has become. Diligence in virtualization at all levels and in
all directions has resulted in software that puts science before the artificial
constraints common in many computing environments. End users needn't learn
computing details to invoke sessions that yield scientifically significant
results. Open source components make great raw materials but, at this point, it
still takes plenty of frontline programming and systems administration to achieve
the seamless computing experience scientists want.
Acknowledgments
As mentioned above, the work of many people over many years has gone into
nanoHUB. Among those whose recent contributions directly improved this article was
Sundar Jeyaraman.
Resources Learn
-
nanoHUB:
This site is a "resource for nanoscience and technology ... created by the
NSF-funded"
NCN.
This
page explains what
nanotechnology is and introduces specific themes of NCN's contribution.
-
Middleware: "The
Chronology of nanoHUB Middleware" presents the architectural fundamentals behind
nanoHUB. Early forms of the middleware were in use by 1995. Development since then
has improved performance, security, and reliability, while enhancing
maintainability through the use of standard components.
Source code for many nanoHUB
applications is available.
-
TeraGrid
'07
conference: Many of the points of the current article were presented under the
title, "Using nanoHUB.org as a Science Gateway", at TeraGrid '07 on 4 June 2007.
The same "utility computing 2.0" spirit behind nanoHUB appeared in an earlier
article for developerWorks,
"Remote
computing with a Linux application server farm,
(developerWorks, February 2007)" which also relied on VNC and related
virtualization technologies.
- You can display the nanotube simulation live on
your own desktop through these steps:
- Make sure your browser supports JRE 1.4 or greater.
-
Register with nanoHUB
without fee.
-
Log in.
- Select "My nanoHUB."
- Among "My Tools," select
CNTbands 2.0.
- An applet will appear in your browser window; select the "Simulate" button,
and, in the "Result:" selection, choose "Molecular structure: overall."
-
"SSL
secures VNC applications"
(developerWorks, January 2007): This article explains one of the earlier forms of
encryption and authentication used for VNC traffic in nanoHUB and similar
applications.
-
FUSE: FUSE stands for "filesystem in userspace,"
that is, a programmable, fully functional file system. Why program a file system?
One way to think about this is as "impedance-matching;" rather than having to
force an architectural design onto a foundation it poorly fits, FUSE can slide
between the two to make both interfaces "snug."
- "Virtualization" is an important theme that
pervades nanoHUB's implementation. VNC and the virtual file systems FUSE makes
possible are mentioned above. Xen, introduced in
"Could it be time
to virtualize?"
(developerWorks, October 2006), is a well-known technology for
"paravirtualization" of machine instances of Linux® and a few other operating
systems. Along with these, and several minor technical pieces, nanoHUB also
crucially leverages OpenVZ. OpenVZ is roughly comparable to Xen; while limited to
Linux, it boasts higher performance than Xen. OpenVZ doesn't yet have the "name
recognition" of Xen, despite coverage in such presentations as
"Virtual
Linux"
(developerWorks, December 2006).
-
WebDAV:
Everyone knows what the Web's about: Your browser here can read documents
from there.
As originally conceived, though, writing those
documents was nearly as important. In the Web's actual history, the latter part
was widely neglected for many years. Recently, though, WebDAV has become
sufficiently widely adopted that it's now practical to rely on this extension to
HTTP to support distributed authoring (the "D" and "A" in "WebDAV").
The
Wikipedia
entry for WebDAV
is more readable than the official WebDAV home page referenced above.
developerWorks itself has published dozens of articles that touch on the topic,
among which
"Web
Folders and WebDAV"
(developerWorks, January 2004) and
"The
Future of Distributed Software Development on the Internet"
(developerWorks, January 2004) make the best introductions for our purpose.
-
"Lightweight
Web servers"
(developerWorks, July 2007): These servers are among the valuable raw materials
for a construction of the complexity of nanoHUB.
-
Rethinking
the Linux Distribution:
This article touches, among other subjects, on use of existing technology, and
especially VNC, to integrate native applications within a browser as desktop. The
focus of the article is on end user look and feel, and thus complements our
attention to security and performance details. Still more specialized effects are
available with today's range of X-based window managers, as described in
"Can't
Get Enough Desktops!"
Note that Xephyr is
an effective modernization of the
Xnest recommended by the
latter article. Skippy gives
X11-based "task switching" in the manner of
Expos? (someone help me
get the e-ague through XML), and even more esoteric interfaces and features become
practical. "Rethinking ..." and nanoHUB converge in recognizing the potential to
reframe all of these possibilities in terms of centrally-managed deployment.
virtualize ... radical ... abstract from installed OS ...
- Another radical complement to infrastructure,
such as nanoHUB, is a "Live CD," such as cl33n,
which boots, connects to the public Internet, and launches a browser instance.
- Several earlier articles, including
"Open
source in the biosciences"
(developerWorks, November 2002) and
"Collaboratory:
An Open Source Teaching ... Facility ...",
explore the significance of open computing for science. A monograph on the subject
might be forthcoming.
-
Citrix: Citrix and many other software and
hardware companies sell "thin clients" in one form or another. Among the
"thinnest" of these are such products as the
ViewSonic ND4210w,
a "network display." This particular one is a large-screen LCD suitable for
viewing by a research group or class, with built-in media player, Web browser,
Flash player, and so on. It can be attached to a network "out of the box," with no
PC to configure, and it is instantly ready to display nanoHUB applications. Note
that our point is not to endorse any particular product, but to illustrate how the
nanoHUB combines with other elements to make new levels of functionality,
security, and reliability practical.
-
Application
Virtualization Takes Hold:
"Application virtualization" is another label under which automation is currently
taking place. When an organization addresses security and manageability concerns
by running applications with a minimum of local installation, including changes to
the registry, system loadable libraries, and system configuration, the "remoting"
VNC and Web applications can legitimately be viewed as varieties of application
virtualization.
-
AIX®
and UNIX®
:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and
UNIX?:
Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
- Search the AIX and UNIX library by topic:
-
AIX
5L™ Wiki:
A collaborative environment for technical information related to AIX.
-
Safari bookstore:
Visit this e-reference library to find specific technical resources.
-
developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
-
Podcasts:
Tune in and catch up with IBM technical experts.
Get products and technologies
-
IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
About the authors  | 
|  | Cameron Laird is a long-time developerWorks contributor and former columnist. He often writes about the open source projects that accelerate development of his employer's applications, focused on reliability and security. He first used AIX twenty years ago, when it was still an experimental product. He's been an enthusiastic consumer of and contributor to a variety of memory debugging tools through that time. You can contact him at claird@phaseit.net. |
 | |  | Rick Kennell is a middleware engineer and virtualization technician for Purdue University. |
Rate this page
|  |