Skip to main content

skip to main content

developerWorks  >  Autonomic computing  >

Meet the experts: Mickey Nix on life in the trenches

Ever wonder how IBM implements autonomic computing solutions in the real world?

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

developerWorks staff (dwinfo@us.ibm.com), Editorial Staff, IBM

23 Feb 2005

This question and answer article features Mickey Nix, an IT Architect/Consultant for Autonomic Computing technology at IBM®. developerWorks talked with Mickey about the procedures and practices when implementing and deploying an autonomic computing solution for partners and customers.

developerWorks: Tell me about your job -- what you do at IBM and how did you come to be at IBM?

Photo of Mickey Nix Mickey Nix: I've been with IBM since 1989. I came in as an OS/2 developer and instructor. I had a lot of different consulting-type jobs in the meantime. In the beginning, mostly, I was consulting on OS/2 and eventually that morphed into e-business, which again morphed into on demand business. I don't want to go into a lot of detail about the task, but I had worked with IBM business development organizations on several leading-edge initiatives.

Customer proof-of-concept?

When you deploy a new or enhanced network for a customer, it is vital that the system be tested, not just to prove to the client that it will work beyond the theoretical, but to also prove to the implementer that it will function as promised.

Customer proof-of-concept is IBM's testing to show a client that the planned implementation will work beyond just the theoretical.

Testing is a critical component of any network deployment. One of the deficiencies that continues to be encountered is in how the customer's networks are put together. Experience has shown that planning, trial, and proof of concept are critical components in the design and implementation of networks because many networking designs should work, but every network has something unique about it.

My primary responsibility for the last year has been focused on engaging with customers and helping them define projects or proofs-of-concepts using our online computing technologies. The first year (I've worked in the group now for a little over two years) I worked in the business development organization, and my focus was on business partners, and engaging with business partners and those that we thought had good synergy, not only with our own IBM product set, but who also believed strongly in our online computing mission as we set it forth, and those who wanted to partner and find different ways that we could go to market together -- win/win types of situations. That first year I focused on business partners and helping them to integrate their online computing technologies. For the last year I've focused on customers. My position title was IT Architect. My role was really that of IT consultant.

dW: Did you like working with partners or customers best?

MN: I like it both equally. Before coming to computing organizations, I spent six or seven years with IBM Developer Relations working primarily with business partners, so I had a lot of experience in working with business partners. Before that, though, I had experience consulting customers; when I was still an OS/2 programmer, I went out and helped customers with their OS/2 programming.

dW: So this was actually sort of a homecoming, I guess.

MN: This was sort of a homecoming, yes. I can't say I have a preference for one or the other. They have very different needs and requirements.

Business partners are, of course, focused on improving their products. And not just improving their products, but also finding better improved channels to market. The goal is to find ways that they can partner with IBM and leverage our promotional marketing engine as well as our wealth of technical expertise.

Customers on the other hand, they want proven relations; things that are going to enhance their end-user experience and improve their total cost of ownership almost immediately. It is two different audiences, and both have a different set of challenges. But it certainly keeps you on your toes.

dW: I'm sure it does. You mentioned customer proof-of-concept and that seems to be a central tenet to this work. You want to tell me a little about that?

MN: I guess before I do that I ought to explain the business development organization. When you asked me what I do within the group, we have within our overall computing organization a cross-group effort called offering teams in which we try to define what is the raw demand for self-healing or self-optimizing or self-configuring or self-protecting elements within the Autonomic Computing Toolkit. It's a cross-group effort; we have architects, we have business development specialists, we have strategists. It's kind of a mixed bag.

Over the last year I participated on the self-optimizing and policy offering teams. I've also served for the last year as the Rational brand liaison for our group. One of my responsibilities is to work with the Rational development managers to try to help them understand [the IBM] autonomic computing framework and common components to help them integrate those into their existing products and bring those to market as well.

To understand proof-of-concepts, you have to think back to what our mission really is with the customer. Our mission is multileveled. With level one, we're trying to identify customers that are willing or eager to again reduce their total cost of ownership in areas of problem determination and configuration, those that understand that our technologies are new and they want start to integrate mindshare and go to market together. I think that building the mindshare, though, is one of the primary keys.

But at the same time, we also begin to provide the customer with solutions that are well-tested, that are proofed, that are ready to be used in a real production environment. We want to enhance their IT-management experience and offer solutions.

Proof-of-concepts is the first step that we have with the customer. Once we've gone in and explained our online computing vision and they say they like it, how do we move forward? We come back with the next step, a proof-of-concept or a project in which we can take their existing infrastructure and find a way to use our new autonomic computing technologies and, in some cases, existing IBM software. Primarily, we focus on the technologies that are found in the Autonomic Computing Toolkit around problem determination and solutions installation or self-configuring, and also the Integrated Solutions Console.

dW: You sound like you're about to launch into the process. Go ahead.

MN: The first thing we do with the customers after they've bought into the idea and they want to proceed is go out and conduct a two-day project-readiness workshop. During this two-day project-readiness workshop we try to include both executives and management leads as well as the top technical people within the customer's organization. In the workshop, first of all, they take the lead by explaining to us their IT infrastructure and the solutions that they've deployed, what some of their problems are within their user experience, how they deal with management problems, the aggravations that they have installing applications or servers, or in configuring those servers and applications, and in tuning them the way they want.

We listen to the pain points they're communicating to us first. We take all that in and on a truly dynamic basis, we turn around, in a very short period of time, like 10 minutes, and we present to them an overview of our autonomic computing strategy and the components in the Autonomic Computing Toolkit. We try to sort or tailor it in a way so that it's meaningful to what they've just described to us. So, we customize the next explanation of our technology in a way that's meaningful to what they've described...

dW: ...to what they've just told you...

MN: ...then we try to define some possible scenarios for a project or proof-of-concept. Once we've defined some possible scenarios in an area of problem determination or solution install or whatever, then we drill down even further and try to define some specific use cases that we can implement for them. We have a discussion between ourselves and the company in which we talk about possible solutions and a possible architecture for the solutions, and we try to come away at the end of the day with a pretty good feeling of "here's what we can build into a proof-of-concept" that uses our technologies and is meaningful to this customer.

The jStart Program

The IBM jStart Program works with customers in a hands-on partnership to develop and implement a project that showcases business success by helping you to apply on demand technologies to address strategic initiatives so you can adopt these technologies before your competition.

jStart managers consult with customers to help them define a business problem, create a project plan for either a new or existing project, then build and deploy the solution. In jStart, customers work with IBM development teams; the Almaden, Toronto, and Hursley testing labs; and IBM Global Service specialists.

Overnight we meet and put our heads together on the project; typically it's either myself alone or sometimes I'm joined by the IBM Software Group's jStart team. In most of the engagements, we try to have one person serve as a project manager and the other serve as a technical consultant. In some cases we've had two technical consultants, but we've never had more than three people on these together; it just depends on the size of the customer and the scope of the project.

Then we come back the second day. We try to come back the second day with a detailed explanation of "here's what it is we want to do," "here's what we think the division of responsibility between your company and ours will be," "here are the new assets that we'll have to create," "here are the core components that already exist that we can use," and "here is other IBM software that you may want to consider using for stage two."

With that, if there's agreement, then we go back and build something called a project objectives document, which is kind of a statement of work. Again, we try to formalize the process with the customer buying into what it is we want to do; we give them a sign-up document in which they commit to a 6-to-8-week period of time. Then, during that 6 to 8 weeks, we build whatever assets need to be built.

We go back for a one-, two-, or three-day engagement. We actually install and deploy the solutions and we train the customer's people on how to use those solutions. Then, we get their feedback and see if there's not something else we need to add to the solution to make it complete. That can take as long as one, two, up to six or eight weeks depending on if we have to build assets or not.

When I say "build assets," that means items like adapters that we use to convert beta logs to the Common Base Event format. Assets might be correlation schemas that we use with the Log and Trace Analyzer component, assets might be resource models that we use with the Autonomic Management Engine. In the beginning, the assets were viewed as the bigger part or effort of the engagement; now we have a number of assets already built that we are able to reuse, so our engagements go that much faster.

dW: So, you've got a couple of examples of some of your prebuilt assets?

MN: We've built a number of adapters for the Generic Log Adapter. We've built correlation schemes, resource models; we've also even built new DB2® assets, including database schemas, so we can save information there and then actually come back and view that information with the Log and Trace Analyzer and a custom parser that we wrote for it. We have quite a number of assets that we've built.

dW: Which components do you generally focus on when you're working with customers?

MN: Our focus or our objective, not to confuse the endpoint, is to find ways to use our autonomic computing technologies in actual customer production environments. For the most part, we are reliant on the technologies that are available to the public in the Autonomic Computing Toolkit. And anybody out there can go to www.ibm.com/developerworks/autonomic and follow the links to the toolkit on that page and download a number of bundles in the toolkit that center around problem determination, solution install, and our Integrated Solutions Console for building a systems administration solution using the Integrated Solutions Console in a Java framework.

Generic Log Adapter for Autonomic Computing

The Generic Log Adapter for Autonomic Computing is a rule-based tool that transforms software log events into the standard situational event formats in the autonomic computing architecture (the Common Base Event format). The adapter is an approach to providing a producer proxy for the early participation of software groups in the autonomic computing architecture.

It consists of two main components: a Rule Builder and configuration tool (as well as the adapter run-time environment). In addition, the Generic Log Adapter provides a plug-in architecture for customization, with required functions external to the user's software.

The Rule Builder generates parsing rules for a product's log files and is used in the configuration of the Adapter. The rules and the product logs are fed into the adapter run-time environment, which converts the logs into the standard situation formats of the autonomic computing architecture, using appropriate schema, and forwards them to the Log and Trace Analyzer or to any management tools capable of consuming the adapter's output.

In our engagements, we focus on those problem-determination technologies, and there are a number of important components. One is the Generic Log Adapter to accomplish what we need to in the area of self-healing. We think that it's important to start to encourage the logging of these events in a common and consistent fashion. If you go today and look at one vendor's applications logs, you'll see that how they log events is very different from how another vendor logs events.

dW: I've heard something about this. You've done some work on standardizing event logging, yes?

MN: There were, prior the formation of the IBM Autonomic Computing [technology], no existing standards for the format of how events should be logged, so one thing we did early on is we actually commissioned a group from research to do a study on both IBM and non-IBM application logs, and did a comparison and contrast to see if there are any similarities. Pretty much what they found was that nobody is logging in the same format or fashion. Even within a single software vendor; if the vendor has multiple applications, chances are really good that those multiple applications all have their own different formats for logging events. Even within single applications; some applications actually have four or five or more logs. The researchers found that even for those logs within the same application, the logging format was different.

The problem was rife because environments today are very complicated. If you take a typical Web application server, you've got Java, of course, you've got an operating system, you've got a database server, you've got an HTTP server, you have a Web applications server, you might have a directory server or a portal server, you probably have firewall components, and you probably have other networked components. It's a varied and complicated environment, and all of those resources have their own way of logging. They all log in different formats.

When it's time to debug a problem, the way a customer's problem-management team works is when there's a problem, an end-user calls and says my application just had such and such happen. Then a help ticket is opened at the helpdesk. The ticket is handed to highly-skilled, highly-trained IT experts (each probably has a focus on a specific resource like databases, Web applications, networks and communication, firewalls, and so on).

All of those people are familiar with their areas and products, but chances are they don't know how to interpret the logs of other products. So one of the difficulties when trying to define the root cause of the problem can be summed up in the query: "I know that my application or my product experienced the problem, but what is the cause of the problem?" Followed closely by "Who do I talk to next?"

In the long run, you have eight or more people in a war room putting their heads together trying to do data correlations (in their heads) and come up with what must be the root cause. When they do, then you have to go out and experiment and try the different situations they've developed.

dW: This must be frustrating for support people?

MN: This is a very time-consuming and costly process, both in terms of money and in terms of people. And this is the pain point that we try to address with the customers. Most customers we talk to can identify with how expensive it is to do problem determination, do the root-cause analysis, and do problem resolution. They're looking for technologies to improve upon that. One of the autonomic computing team's first efforts was to define a common model for logging events, this is called the Common Base Event model, and we've submitted it to OASIS as a standard.

Common Base Event

A small event can change things far beyond the seeming initial circumstance, especially in the complex world of e-business, where multitudes of interconnected systems must work together to perform the housekeeping activities that are necessary to keep a computing system functioning. Nothing is as pervasive in this type of environment, can have as much of an impact, as the simple, lowly event.

The event encapsulates data sent as the result of an occurrence or situation, and it represents the foundation on which these complex systems communicate. Fundamental aspects of enterprise management and e-business communication, performance monitoring, security and reliability management, order tracking, to name a few, rely on the fidelity and viability of events.

That's why it is imperative to standardize the formatting of events. That's where the Common Base Event, a new standard for events for enterprise and business applications, comes in. The Common Base Event definition for meta-data provides the following, critical information:

  • The identification of the component that is reporting the situation
  • The identification of the component that is affected by the situation (which might be the same as the component that is reporting the situation).
  • A common description of the situation that occurred
  • The content that can be used to correlate situations

Knowing that not everybody is going to move to a Common Base Event model overnight, our team also developed an adapter strategy (Generic Log Adapter) to give end users the ability to normalize existing legacy logs in proprietary formats to a Common Base Event format and to then pass those events to designated targets (such as stdout, flat files, agents, databases, or predictive analysis and correlation tools).

Internally at IBM, we're looking across the different software brands to determine how best to implement the Common Base Event format. For new development, we're aiming to get a commitment to using the Common Base Event. But, again knowing that not everyone is going to move to a Common Base Event plan overnight, we encourage the creation of adapter rules for use in the Generic Log Adapter.

dW: And the Common Base Event fits into the Generic Log Adapter component. Tell me more about the component.

MN: The Generic Log Adapter is another part of the Autonomic Computing Toolkit that I go out and work with customers on. There's both a Generic Log Adapter run time and a Generic Log Adapter model builder so that customers can be plugged into the Eclipse framework or Rational® Application Developer. The GLA Model Builder can be used to create adapters that can be used to monitor existing logs in their own native text or binary format and in real-time, we can actually pick off events as they're added to the logs. The adapter also has the ability to then take those events and to parse those into the Common Base Event definition. Once the adapter's converted these events to Common Base Events, it can export them in an XML format to a designated target like a filesystem, a database, or an agent running on the machine that can be accessed remotely by another application.

dW: That's where self-healing comes along?

MN: Well, this is the beginning of it. The beginning of self-healing is dealing with information or events from multiple unlike resources in a common, consistent fashion. Once we have these events in this Common Base Event format, we can create a single tooling platform that knows how to consume Common Base Events and interpret those events, leading us to the next step, the process of doing predictive analysis. That leads us to being able to correlate those events; that's a great lead-in to the third component of the Autonomic Computing Toolkit, the Log and Trace Analyzer.

dW: That's a plug-in to the Eclipse platform?

MN: The Log and Trace Analyzer is another plug-in to the Eclipse platform. Of course, it (along with the Generic Log Adapter builder) can plug into the Rational Application Developer 6.0. Also, we can just use the Eclipse Hyades environment that can be downloaded for free. (Editor's note: The Hyades environment is an open source, integrated test, trace, and monitoring environment for Automated Software Quality (ASQ) tools. It includes a range of open source reference implementations).

The Log and Trace Analyzer is an example of an event viewer with predictive analysis and correlation capability. With the Log and Trace Analyzer, native logs can be imported into the Log and Trace Analyzer tool and transformed into a Common Base Event format for viewing and for comparison against a collection of symptoms. It gives the end-user in the customer environment the ability to very easily consolidate multiple logs that exist in a native, proprietary format into a single, common format for analysis and correlation in a single tool.

The Log and Trace Analyzer for Autonomic Computing

The Log and Trace Analyzer for Autonomic Computing is a tool that enables and speeds viewing, analysis, and correlation of log files generated by such servers as the WebSphere® Application Server, the Apache HTTP Server, and the DB2 Universal Database™. It makes it easier to debug and resolve problems with multitier systems by converting heterogeneous data into Common Base Event format homogeneous data and by providing tools to visualize and analyze the data. By capturing and correlating events from end-to-end execution in the distributed stack, this tool allows for a more structured analysis of distributed application problems that facilitates the development of autonomic, self-healing, and self-optimizing capabilities.

Everything comes back to the Common Base Event format. From there you have a lot of capabilities you don't have with other tools or in other environments because everything's in the same format. You can view these events in a time-synchronized format just one after the other, and visually see what event occurred when within the total environment, not just in a single product.

Within the Log and Trace Analyzer, events are color-coded by level of severity: informational, warning, and error. A user can sort these events by any element of the Common Base Event structure.

In addition to that, we can also examine events and compare them against a known symptoms database (which is another technology we're working on and that we're making available as part of the Autonomic Computing Toolkit). In addition to that the Log and Trace Analyzer can also group multiple logs in something called a correlation. Once we've grouped multiple logs in this correlation, we can do a view on that group, and actually see how events in multiple logs are related (or correlated) using a graphical interaction diagram.

dW: It's like a relationship diagram.

MN: It's like a relationship diagram. Basically, each log has its own vertical line and each vertical line represents the events in a time sequence. And there's also a connecting line that shows which logs are related to other log events or are related to other log events based on their thread ID or process ID. We can write correlation schemas to show any type of relation regarding elements of the Common Base Event structure.

dW: Will the system throw up a red flag for you or is this designed to put the information together so a human being can analyze it?

MN: What I've just described is what we're doing with the customer in the first phase of the project. We try to help our customers evolve toward autonomic computing by moving them along a roadmap. Usually, the goal of our first phase is to help customers get to what we call a predictive level where we have technologies in place to monitor resources and compare against a known body of past experiences or symptoms.

dW: What about this symptoms database?

MN: Using the Log and Trace Analyzer, we have the ability to import or create a symptoms database. In a symptoms database, users can capture information about past events such as what the event is called, a description of the event, and also what recommended action might need to be taken.

Typically, in the second phase of the proof-of-concept we try to make the leap to a closed-loop solution where a tool monitors, analyzes gathered information against a known body of symptoms or policies, builds a plan, and executes upon that plan to fix or tune a resource in the infrastructure. We refer to this as an autonomic manager, and one of the tools we use in our Autonomic Computing Toolkit is called the Autonomic Management Engine.

dW: What does this component do?

The Autonomic Management Engine

The Autonomic Management Engine (AME) is a sample autonomic manager implementation that monitors system resources, sends aggregated events, and performs corrective actions for problems. AME constantly monitors the system looking for events to handle.

MN: Basically the Autonomic Management Engine gives us the ability to monitor resources. One of the key components of the Autonomic Management Engine is something called a resource model. The resource model provides the underlying code for querying the state of specific resources and transforming that information to something that other tools and humans can easily work with, Autonomic Management Engine uses the Common Information Model to do this. In addition, we provide a Resource Model Builder and a prebuilt Common Base Event resource model as part of the Autonomic Computing Toolkit. With these resource models, we can monitor resources like databases or Web application servers or other components to monitor events that we manage.

Using the Common Base Event resource model with the Autonomic Management Engine, users can monitor Common Base Event logs such as those generated by a Generic Log Adapter run time. Part of the Autonomic Management Engine resource model is a simple VisitTree function written in Javascript that provides users with the entry point for adding logic to react to specified events in a unique manner. Scripts and executables can be called from this function to fix problems or tune resources in real time.

Again, this is a typical phase two activity in our proof-of-concepts, and we frame this as getting our customers to the adaptive level, a level where autonomic computing software monitors, analyzes, plans, and executes without the intervention of a human. It gives us a vehicle or a mechanism to actually take one of the common problems that we see when we first introduced the Generic Log Adapter and Log and Trace Analyzer and used the tool for a few weeks. We started to see certain patterns, a database connection problem might lead to an HTTP server problem, and both of those might have really been causing us to drop the network link. We started to see specific patterns of problems.

With this Autonomic Management Engine, we can now take those known patterns and start to code automated solutions for those problems. In some cases, the customers simply want to be able to take those specific patterns or problems and then maybe notify somebody; in other cases they actually want to fix the problem. There are many different combinations of notification and action scenarios. It does give us the ability to create a closed-loop solution.

From here, the next step would be to add the capability to govern the actions taken by the autonomic manager using a policy engine and rules. At this point, a customer would have some level of autonomic capability.

dW: You got any real life examples?

MN: I do. We did exactly what I just described with Singlestep Technologies, Inc. and LAN Solutions. One of the goals of our Business Partner Program is to identify vendors that have network management and monitoring solutions that have synergy with our own IBM solutions (such as Tivoli Enterprise Console®) and to find identified business partners that have solutions that can take advantage of these autonomic computing common components (such as the Generic Log Adapter, Log and Trace Analyzer, and Autonomic Management Engine). They can utilize those technologies to improve their products and, then, in turn, go out and improve the experience that their customers have by using their products.

LAN Solutions is a company in McLean, Virginia that provides network services for a number of small- and medium-sized customers. They actually monitor and manage these customers' networks. One of the solutions we put in place did involve the use of the Autonomic Management Engine in conjunction with Singlestep's Unity and Policyscape technologies.

Unity is an automation engine that can take events from a number of monitoring solutions, and policies can be applied to those events to specify what actions to take when certain events are seen. What Singlestep did was they embraced a Common Base Event format, they wrote handlers within their own solutions so that their Unity Automation Engine could communicate with our Autonomic Management Engine. This done, both can provide input to and fix the output from the monitoring engine.

At a really high level, I guess, the best way to explain what we did is to say we put the Autonomic Management Engine in place to watch for a known set of problems that the LAN Solution customers were having. When those problems were encountered, it actually passed information back to Unity. At the Unity Automation Engine, policies have been described that tell administrators what to do when these specific events were seen by the Autonomic Management Engine.

We also put a solution in place that looks for worm infections. The Autonomic Management Engine again monitored a specific set of resources and Common Base Event logs, looking for a specific set of indicators that told us that somebody had just set a worm loose in the environment that was now infesting itself. The Autonomic Management Engine again reported back to the Unity engine where we had a solution already deployed at the customer site.

As soon as we had it deployed, flipped the switch as it were, within two minutes we were actually seeing the results of the problems appearing up on the major monitors that technicians watch in the war room or console room. (We had actually created a couple of troubleshooter applications, so we could make a couple of problems happen, but before we even got to do our own testing, we were actually seeing the results.) The technicians said, "Well, this is working."

dW: How much effort did that save the company?

MN: Initially, the company said they felt this would probably save their technicians at least 40 percent of their time, specifically with a known problem set. We've been told since then that it's probably closer to 60 or 70 percent.

dW: How does business partner Singlestep feel about this level of performance?

MN: Singlestep is obviously excited about this solution. They're in the process of taking it forward to other customers. They've been a good partner in that regard. We've had a number of good partners, I hate to just single one out.

dW: Give me another one and I'll put at the bottom of this interview "if you want to ask about more partners, call Mickey."

MN: I was going to say if you want to know about public partners, we should have a link on our Web site.

dW: We've got that resource in our Resources section. Do customers and partners all react the same way when you flip the switch and start saving them resources?

MN: When people on a daily basis have to deal with problems, then they get to use these technologies, they get really excited because right away, even if you never found a problem with the tools to solve, just the ability to consolidate logs and get the information in a common format and view it in a single place, that's going to save at least 50 percent of the time and effort just putting the information together.

dW: I can understand that.

MN: And then to have the ability to inspect and do some really sophisticated analysis in correlations and even take it to the next level, you can start to automate the process and identify these problems and actually deal with the problems in a real-time basis. That makes people excited.

dW: It's like adding 50 new sysadmins to your workforce.

MN: I think the main thing is, the customers, and especially the technicians, that have to deal with problems see this as a way for us to free them of a lot of the time that they're having to spend on mundane problem determination and resolution. They get to spend more time on the areas of the IT infrastructure that their college degree has probably prepared them for. They can spend more time on the business at hand and less on the mundane problems.

dW: Is there anything else you want to add?

MN: There is one other technology that I'd like to mention for problem determination, the Rational Agent Controller, which used to be called just the Agent Controller in the Autonomic Computing Toolkit. (We donated a large part of that code to the Hyades Project so now we brand ours as the Rational Agent Controller, but anyway.) It gives us the ability to put an agent on pretty much any machine in a given environment. From another machine or maybe an administrator's workstation, an administrator that's using our Log and Trace Analyzer can establish communication with one of these agents. The Analyzer knows how to ask to import a specific log file. The agent running on the remote machine, as long as we have plugged in the appropriate log file parser to that agent, that agent can facilitate the normalizing of a native log to a CBE format and the transfer or importing the log file on that remote machine to the Log and Trace Analyzer.

It's really a neat technology because it doesn't require that we put Generic Log Adapters on every machine or for every log that they're going to monitor. We can put a fairly minimal amount of log adapter or parser code in the agent.

dW: So it keeps the overhead down.

MN: Overhead is way down. It's a really neat technology.

dW: Before we wrap this up, what are you doing for NASA?

MN: I went out to NASA to work with a team from Oceanography and Atmospheric Sciences. . .

dW: NOAA? National Oceanographic and Atmospheric Administration?

MN: They're at the Goddard Space Flight Center, and they're grabbing information from 16 satellites that orbit the earth on a daily basis and they map every point on the earth at least once during the day.

They have all this information coming in from satellite feeds and they have to take the information and then build the 3D models and the different weather maps and pretty much everything you see when you see a weather map or you see a 3D model of a hurricane. These models actually have multiple layers so that you can see them from the 90-billion mile view or you can click a button and count down to see them from the 1-mile view.

They store that information to make climate models. And they keep that information for 100 years. That's their practice. Petabytes of data.

dW: Petabytes?

MN: Petabytes. Today they store it to tape and have managed this process with several very large storage area networks. I was able to observe these servers and their technicians. Robotic arms put tapes in the storage servers. In some cases the tapes are just defective and in other cases the robotic arm misses the storage server slot. Several types of errors can occur during the storage process.

Their intent was to find a way to monitor those problems and respond to them as automatically as possible because, currently, they have several people that sit at Goddard and watch their computers, just looking for problems. The idea would be to relieve them of this problem, that of managing all this data. This project is to explore avenues to make this work and they've assigned this as a Master's project for a graduate student at the University of Maryland.

dW: Thank you for your time.



Resources



About the author

developerWorks staff




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top