Skip to main content

skip to main content

developerWorks  >  AIX and UNIX  >

nanoHUB does remote computing right

Science Gateway uses open source pieces to achieve new milestones in distributed research

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Cameron Laird (claird@phaseit.net), Vice President, Phaseit, Inc.
Rick Kennell, Middleware Engineer and Virtualization Technician, Purdue University

02 Oct 2007

nanoHUB is a virtual computing center created to support nanotechnology research. It uses open source components to achieve far more powerful results than previous remote access facilities. This article details specific configurations and enhancements necessary to make the most of the performance, security, and usability such common software as VNC and WebDAV provides.

nanoHUB.org employs a unique middleware system that precisely balances security, performance, and convenience to support distributed public research on nanotechnology. Scientists who use this research gateway concentrate on their own studies rather than computing issues.

As important as nanoHUB's current virtues, though, are the lessons it teaches about how to configure and enhance well-known open source components to achieve a much richer virtual computing experience than most developers seem to realize is possible. Let's look at what it takes to allow researchers like Supriyo Datta, Director of the NASA/Purdue Institute for Nanoelectronics and Computing, to report, "With little effort on our part, we can make our simulation tools available to our colleagues. And with essentially no effort on their part, our colleagues can be running molecular electronic simulations." As seen in Figure 1 below, nanoHUB nicely balances ease of use both for the scientists who publish their works there, and for their colleagues who view what they've made available.


Figure 1. Screen shot of carbon nanotube simulation
Screen shot of carbon nanotube simulation

nanoHUB is a Web-based research center

The basic idea is simple: You're a nanotechnology researcher, and you want to take advantage of the resources of the Network for Computational Nanotechnology (NCN), a consortium of several universities and other collaborators (see Resources). Your keyboard and display, on your desktop or lap, should connect you to the full wealth of data, computing power, simulation hardware, and applications of NCN, wherever you happen to be and however those particular resources are physically provisioned at the moment.

It should be simple. It takes a lot of serious development attention to make it appear so from the perspective of an end user because:

  • All communication must be encrypted.
  • The firewalls providing protection local to end users have policies that are generally not understood, even in the minority of cases where the end users are aware of their existence.
  • NCN resource usage must be measured and, in some cases, rationed.
  • The technical capabilities of the desktops with which researchers connect differ wildly.
  • Many applications are restricted to particular hardware combinations and in ways that are hard to generalize.
  • nanoHUB, as a whole, must be manageable, scalable, reliable, available, testable, and recoverable.
  • nanoHUB needs to act compatibly with existing security infrastructures, in other words, let end users rely on the accounts and passwords they already know as much as possible.
While several proprietary and open source tools meet one or two of the requirements, there are, at best, a lot of configuration complexities along the way. Where's the convenience for nanoHUB visitors?

That convenience emerges only through a well-balanced mix of customized VNC, SSL, X11, Apache, and several other open source projects. Consider, for a moment, the display near the top of this article of a nanotube simulation (see Figure 1). This image was obtained in a few seconds by clicking on the specific nanoHUB application (CNTbands 2.0), selecting the appropriate parameters, and interactively adjusting the 3-D image. No further software is needed, apart from the typical Java™ runtime interpreter available with a Web browser. In particular, a user does not have to obtain and configure the visualization system. Most researchers don't know how to use OpenDX or PyMol—much less how to install them—and they shouldn't have to.

It's a good thing nanoHUB doesn't require such installations. Nanotube visualization, in common with many others nanoHUB supports, is quite complex and correspondingly slow on a typical desktop computer.

The simulation was originally available as a downloadable application with no visualization. Downloads have essentially stopped in the last year, because nanoHUB's ease of use and the effectiveness of hardware-accelerated visualization entirely outweigh the marginal benefits of researchers being able to compile and modify the source code themselves. For them, using local resources to visualize the results is no advantage. For these working scientists, nanoHUB is easier to use and gives more interesting results than any locally hosted alternative.

Moreover, this remarkable example generalizes. It's traditional in scientific circles to assume that researchers need "hot" hardware on their desks to see good graphics. nanoHUB demonstrates over and over that its well-tuned VNC solutions deliver better performance, even through mediocre Wide Area Networks (WANs). NanoHUB takes advantage of plenty of special-purpose facilities hosted at NCN sites, then relies on VNC's RFB protocol to communicate the visual results to the desktops where nanoHUB's clients sit. Computing specialists concentrate on wringing the most from nanoHUB's physical assets, while the nanotechnology researchers around the world who use nanoHUB turn their focus from computing issues back to the science at which they're best.

Software refinements: efforts to achieve "it just works"

That same screen shot hints at several other examples of the ways nanoHUB staffers have solved computing problems so that researchers can push nanoscience forward. To the right of the Result selector, you see a green download arrow that points down. If you click on it, a new browser window with a static image of the simulation appears; from that window, of course, it's straightforward to put the browser to work to print the image, save it to an external file, and so on.

A lot goes into these operations: Bear in mind that the button press was only seen on the computation server. To implement the file download, the server sends a message to the client to ask its Web browser to pull the file from nanoHUB. The request is routed to the application where a Tcl-based Web server provides the file. File upload is just as convenient. End users get what they're after, and you don't worry about how an image visible through VNC affects the launch of a new browser frame.

There's more. Notice what happens if you grab the border of the window and stretch the application's display: It behaves sensibly. This isn't just resizing a window, because the dimensions of the serving X11 frame buffer limit any resizing. Instead, nanoHUB resizes the X11 frame buffer itself, and it allows the viewer's frame buffer to shrink-wrap the application to an appropriate size. To the best of our knowledge, this technique has never been documented before for X11 servers. The Xephyr X Server, for example, updates Xnest in several regards, but it doesn't attempt this rather subtle effect.

The application display also includes a Popout button that unembeds the VNC window. The benefit here has to do with poor rendering of embedded Java applets for certain browsers. In particular, end-user movement of the cursor pointer often leaves mouse trails with Firefox on Mac OS X, or with any browser running on a dual-screened XP desktop. Unembedding gives a fully resizable VNC-based window that doesn't exhibit mouse trails.

NanoHUB's implementation has plenty of other desirable properties. It's hosted entirely on virtual machines, which themselves can of course be snapshotted, rehosted, backed up, written to DVDs, and so on. If nanoHUB's current home at Purdue University were somehow destroyed, it could, at least in principle, be restored to nearly its current state.

The virtualization infrastructure contributes to nanoHUB's testability and security. It's a straightforward matter to launch "1000 random application sessions in rapid succession," or "100 sessions could be started simultaneously," as the nanoHUB site itself explains. Moreover, the current implementation "synthesizes a fully private environment for each application session". The deep manageability of these options multiplies the ability of the nanoHUB's technical team to isolate errors and enhance security.

NanoHUB also leverages standards to simplify other aspects of its management. Much maintenance is possible through WebDAV. For this purpose, think of WebDAV as a protocol that extends HTTP's retrieval model to give read and write capabilities with common browser technology. How can WebDAV access a multi-user file system like the computationally complex nanoHUB structure? With a quick and easy mapping from the Web server into a specific user through FUSE, a programmable file system. This provides more secure access than a traditional alternative: Run the Web server as root (!), with instances switching to the privilege of the file accessor. The latter approach of letting the Web server have root is an architectural choice made by plenty of other sites. It's satisfying to realize, though, that the open source ecosystem is now rich enough to provide all sorts of pieces, such as FUSE, that can be combined to yield better—more secure, more efficient, more maintainable—answers.

Modifications to open source components

NanoHUB's users have plenty of familiarity with computers. NanoHUB affords them qualitatively different capabilities than they have anywhere else; its success in combining reliability, performance, functionality, and security achieves breakthroughs that only nanoHUB gives them.

NanoHUB and its immediate predecessors have been in development for over a decade now; it's beyond the scope of this article to detail all the refinements that enable nanoHUB. The modifications to VNC and X11, for example, are numerous, deeply complicated, and scattered over many source files to address security considerations that simply didn't arise at the times these products were originally coded. Also, the release of NCN source is governed by the default rules of Purdue University, which do not favor open source.

The modifications to the WebDAV FUSE mapper are localized enough to be comprehensible, even in relative isolation. Most of the FUSE fusexmp.c example code is option parsing, error handling, and boilerplate. Where the original has fuse_main(...), nanoHUB relies on a version that sequences (see Listing 1):


Listing 1. nanoHUB's elaboration of fuse_main()
                
      fuse_setup(...);
      umask(0);
      setfsuid(...);
      chroot(...);
      setregid(...);
      setreuid(...);
      fuse_loop(...);
      fuse_teardown(...);
    

The Goldilocks level for security is the idea here: to be rigorous about setting just the right permissions and access, neither too much nor too little.

Comparisons

nanoHUB does not share a physical display of a computer like such popular remote control systems as the Fog Creek Copilot, the Microsoft® Remote Desktop, the WebEx Desktop, or GoToMyPc; instead, it's reminiscent of Citrix, in that it hosts many virtual displays on a single server. There are no licensing fees with nanoHUB, and nanoHUB is also more portable than Citrix, both for hosting and display. Also, nanoHUB clients are embeddable in Web browsers, as illustrated above, and connections can be more easily shaped to traverse typical firewalls.

Over and over, nanoHUB embodies the general theme of being like some more common technology—a conventional Web application, for instance—but re-fashioned in its fundamentals to hit a higher standard of security, performance, or scalability.

Port forwarding

Port forwarding presents an instructive example: nanoHUB uses a connection router on the Web server for port forwarding. It listens on port 563 (NNTPS) and waits for a connection. When nanoHUB delivers a VNC client configuration to a user's browser—securely, through HTTPS—nanoHUB adds a few parameters that tell the client how to negotiate with the connection router. The router passively forwards the connection to the appropriate VNC server on a private internal network.

Earlier in its history, nanoHUB used iptables for connection routing. At its best, iptables can be considerably more efficient than userland routing, because the latter involves packets in two passages through the kernel from the network, to the router, and back to the other network. Userland routing also forces at least two context switches.

The kernel overhead turned out to be negligible, though, compared to end-to-end network latency. More crucial was that nanoHUB's earlier use of iptables was on a host that required dynamic addition and removal of iptables rules to rout a particular port to another port on an internal network. If that specific host ever went down, it became colossally difficult to resynchronize with existing sessions. The table updates also led to races that occasionally lost packets during their traversal of the frequently changing ruleset.

Userland routing for port forwarding eliminated these problems. Even better, the connection router provides another layer of security and monitoring; in particular, it can easily report on exactly when and how long an application has been viewed. This is crucial both to understand how nanoHUB benefits end users and also for accounting.

Passage through a proxy

Unsigned applets

The Java security model has several provisions that bear on networking as it applies to applets: Fundamentally, an end user's client can only make arbitrary TCP connections back to the domain from which it was loaded. If that involves an explicit proxy, the applet must be signed and run with elevated privileges, as seen below.

Signed Java applets involve a cascade of complications that nanoHUB developers long sought to avoid. One potential loophole in the security model is that it is possible to make HTTPS connections through a Web proxy with an unsigned applet and the limitation that the client might read from the server only when the latter closes the outbound half of the connection.

This hints at the possibility of an unsigned applet that opens two HTTP streams—one that would never stop writing and the other to write a header and read continually.

After a great deal of experimentation and research, the conclusion was that some proxies would eventually constrain the lifetime of these unusual connections. One of the great difficulties of work in this area is the prohibitive expense of experimentation with commercial proxies. In any case, resynchronizing after one or both of the connections were terminated would introduce more complexities.

The programming team's other attempts to use unsigned applets hit equally insurmountable obstacles. Signed applets are the right choice for nanoHUB.

nanoHUB's method of firewall transit through a Web proxy is unusual and perhaps even unique. As the accompanying sidebar details, nanoHUB designers reluctantly decided to use the standard proxy CONNECT access method with a signed applet. Even with this decision in place, several months of shakedown went to accommodation of corner cases:

  • Non-conforming proxies
  • Web browsers with idiosyncratic methods of communicating proxy parameters to the Java Runtime Environment
  • Careful implementation of timeout on network failure in the Java 1.4 language
  • Layer-7 filters that mistake nanoHUB connections for file sharing applications violating copyright laws
  • End users behind proxies they don't understand and can't explain or analyze
nanoHUB has learned and now traverses essentially all the proxies necessary to serve its clientele.

Open source science

NanoHUB's reduction of computing friction has yet another consequence: It radically encourages scientific sharing. For several decades, it has been common for physical scientists to publish results based on computations—but those computations themselves were essentially irreproducible. Programs were managed as proprietary artifacts. Even researchers who nominally put their calculations in the public domain rarely were effective, because they lack expertise in the computing crafts that bear on portability and deployment. To understand fully a particular scientific conclusion too often has required time with the specific computers where that conclusion was uniquely calculated.

NanoHUB changes all that. Its commoditization of so many aspects of computing, including rendering, display, high-performance calculations, security, and hosting, makes it more convenient, in many cases, for nanoscientists to run their programs in sharable ways instead of on the computers in their own labs.

It's too early to analyze all the consequences. One can hope, though, that greater openness and visibility in the computational parts of nanoscience will boost the velocity of research over all. Certainly, rationalization of the computing infrastructure has been welcome to the numerous researchers and students already taking advantage of nanoHUB. It appears that nanoHUB's success represents a virtuous cycle between several distinct elements:

  • The active and progressive nanoresearch community
  • The benefits open research brings to education
  • High-quality security and remoting programs that enable open source-based collaborative technologies
  • A lot of hard detail work

Plenty of challenges remain. For example, Figure 2 represents the fourth energy level for a quantum dot in a pyramidal geometry. The capped wings bear little resemblance to normal P orbitals, and they are a shape for which few students have yet developed an intuition. We hope that access to a suitable interactive visualization engine will promote comprehension of such eigenvalues, along with numerous other aspects of nanoscience.


Figure 2. Screen shot of the fourth energy level for a quantum dot
Screen shot of the fourth           energy level for a quantum dot

Conclusions

Over twelve calendar years of thoughtful programming have gone into nanoHUB. X11, VNC, and other well-known technologies promise remote computing, but several generations of refinements on the base they provide have made nanoHUB's middleware as invisible as it has become. Diligence in virtualization at all levels and in all directions has resulted in software that puts science before the artificial constraints common in many computing environments. End users needn't learn computing details to invoke sessions that yield scientifically significant results. Open source components make great raw materials but, at this point, it still takes plenty of frontline programming and systems administration to achieve the seamless computing experience scientists want.

Acknowledgments

As mentioned above, the work of many people over many years has gone into nanoHUB. Among those whose recent contributions directly improved this article was Sundar Jeyaraman.

Share this...

digg Digg this story
del.icio.us Post to del.icio.us
Slashdot Slashdot it!



Resources

Learn
  • nanoHUB: This site is a "resource for nanoscience and technology ... created by the NSF-funded" NCN. This page explains what nanotechnology is and introduces specific themes of NCN's contribution.

  • Middleware: "The Chronology of nanoHUB Middleware" presents the architectural fundamentals behind nanoHUB. Early forms of the middleware were in use by 1995. Development since then has improved performance, security, and reliability, while enhancing maintainability through the use of standard components. Source code for many nanoHUB applications is available.

  • TeraGrid '07 conference: Many of the points of the current article were presented under the title, "Using nanoHUB.org as a Science Gateway", at TeraGrid '07 on 4 June 2007. The same "utility computing 2.0" spirit behind nanoHUB appeared in an earlier article for developerWorks, "Remote computing with a Linux application server farm, (developerWorks, February 2007)" which also relied on VNC and related virtualization technologies.

  • You can display the nanotube simulation live on your own desktop through these steps:
    • Make sure your browser supports JRE 1.4 or greater.
    • Register with nanoHUB without fee.
    • Log in.
    • Select "My nanoHUB."
    • Among "My Tools," select CNTbands 2.0.
    • An applet will appear in your browser window; select the "Simulate" button, and, in the "Result:" selection, choose "Molecular structure: overall."

  • "SSL secures VNC applications" (developerWorks, January 2007): This article explains one of the earlier forms of encryption and authentication used for VNC traffic in nanoHUB and similar applications.

  • FUSE: FUSE stands for "filesystem in userspace," that is, a programmable, fully functional file system. Why program a file system? One way to think about this is as "impedance-matching;" rather than having to force an architectural design onto a foundation it poorly fits, FUSE can slide between the two to make both interfaces "snug."

  • "Virtualization" is an important theme that pervades nanoHUB's implementation. VNC and the virtual file systems FUSE makes possible are mentioned above. Xen, introduced in "Could it be time to virtualize?" (developerWorks, October 2006), is a well-known technology for "paravirtualization" of machine instances of Linux® and a few other operating systems. Along with these, and several minor technical pieces, nanoHUB also crucially leverages OpenVZ. OpenVZ is roughly comparable to Xen; while limited to Linux, it boasts higher performance than Xen. OpenVZ doesn't yet have the "name recognition" of Xen, despite coverage in such presentations as "Virtual Linux" (developerWorks, December 2006).

  • WebDAV: Everyone knows what the Web's about: Your browser here can read documents from there.

    As originally conceived, though, writing those documents was nearly as important. In the Web's actual history, the latter part was widely neglected for many years. Recently, though, WebDAV has become sufficiently widely adopted that it's now practical to rely on this extension to HTTP to support distributed authoring (the "D" and "A" in "WebDAV").



    The Wikipedia entry for WebDAV is more readable than the official WebDAV home page referenced above. developerWorks itself has published dozens of articles that touch on the topic, among which "Web Folders and WebDAV" (developerWorks, January 2004) and "The Future of Distributed Software Development on the Internet" (developerWorks, January 2004) make the best introductions for our purpose.

  • "Lightweight Web servers" (developerWorks, July 2007): These servers are among the valuable raw materials for a construction of the complexity of nanoHUB.

  • Rethinking the Linux Distribution: This article touches, among other subjects, on use of existing technology, and especially VNC, to integrate native applications within a browser as desktop. The focus of the article is on end user look and feel, and thus complements our attention to security and performance details. Still more specialized effects are available with today's range of X-based window managers, as described in "Can't Get Enough Desktops!" Note that Xephyr is an effective modernization of the Xnest recommended by the latter article. Skippy gives X11-based "task switching" in the manner of Expos? (someone help me get the e-ague through XML), and even more esoteric interfaces and features become practical. "Rethinking ..." and nanoHUB converge in recognizing the potential to reframe all of these possibilities in terms of centrally-managed deployment. virtualize ... radical ... abstract from installed OS ...

  • Another radical complement to infrastructure, such as nanoHUB, is a "Live CD," such as cl33n, which boots, connects to the public Internet, and launches a browser instance.

  • Several earlier articles, including "Open source in the biosciences" (developerWorks, November 2002) and "Collaboratory: An Open Source Teaching ... Facility ...", explore the significance of open computing for science. A monograph on the subject might be forthcoming.

  • Citrix: Citrix and many other software and hardware companies sell "thin clients" in one form or another. Among the "thinnest" of these are such products as the ViewSonic ND4210w, a "network display." This particular one is a large-screen LCD suitable for viewing by a research group or class, with built-in media player, Web browser, Flash player, and so on. It can be attached to a network "out of the box," with no PC to configure, and it is instantly ready to display nanoHUB applications. Note that our point is not to endorse any particular product, but to illustrate how the nanoHUB combines with other elements to make new levels of functionality, security, and reliability practical.

  • Application Virtualization Takes Hold: "Application virtualization" is another label under which automation is currently taking place. When an organization addresses security and manageability concerns by running applications with a minimum of local installation, including changes to the registry, system loadable libraries, and system configuration, the "remoting" VNC and Web applications can legitimately be viewed as varieties of application virtualization.

  • AIX® and UNIX® : The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.

  • New to AIX and UNIX?: Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.

  • Search the AIX and UNIX library by topic:
  • AIX 5L™ Wiki: A collaborative environment for technical information related to AIX.

  • Safari bookstore: Visit this e-reference library to find specific technical resources.

  • developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.

  • Podcasts: Tune in and catch up with IBM technical experts.

Get products and technologies
  • IBM trial software: Build your next development project with software for download directly from developerWorks.


Discuss


About the authors

Photo of Cameron Laird

Cameron Laird is a long-time developerWorks contributor and former columnist. He often writes about the open source projects that accelerate development of his employer's applications, focused on reliability and security. He first used AIX twenty years ago, when it was still an experimental product. He's been an enthusiastic consumer of and contributor to a variety of memory debugging tools through that time. You can contact him at claird@phaseit.net.


Rick Kennell is a middleware engineer and virtualization technician for Purdue University.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top


IBM and AIX are registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.