Part of the enduring appeal of The Good, The Bad, and the Ugly, the best-known of Sergio Leone's "spaghetti western" movies, is the simplicity of its plot. At its heart, it is simply a treasure hunt, which involves three characters we identify as good, bad, and ugly as they try to outwit each other in their search for Confederate gold. Only their combined knowledge will lead them to the prize, which is buried along with a corpse somewhere in the vast graveyard known as "Sad Hill." Toward the end of the film, the "ugly" character named Tuco, played by Eli Wallach, finally arrives at the graveyard, and here in one of cinema's truly great moments, backed by a unforgettable score, he begins his frantic and haphazard search for the grave of Arch Stanton.
What is impressive about this scene is the sheer scale of the graveyard -- it appears to contain at least 2,000 graves, arranged in a circular fashion. Tuco starts digging at the center of Sad Hill, where the graves are oldest, and works his way outward initially in concentric circles. But soon his search becomes more haphazard as he despairs at the magnitude of the task. Finally, after a dizzy scene of multiple graves passing by he come upon Arch Stanton's grave, only to find it devoid of any gold. Indeed, all that's there is a rotting corpse.
We were reminded of this story in thinking about the main challenge regarding reusable asset consumption (RAC): identifying significant content for a given context. Alone, the main characters in the movie are unable to locate the gold, because there are too many graves in the graveyard; or, in the case of RAC, there are too many "graveyards" (i.e., repositories). We feel too much like Tuco, looking for gold. In the case of reusable assets, how many of us would stop digging and start building our own components?
In software project development, without adequately delimiting the scope of the search we are unable to identify a meaningful set of reusable artifacts. Of course, this dilemma arises in many human endeavors, from building design, to rocket science, to patent law. How do we provide the right assets, the content, to help solve the problem at hand, the context? What we need is a way to automate a context-to-content mapping to suggest the best assets for our architects, engineers, and lawyers.
Automating context-to-content mapping
As a software developer or a consultant on an engagement, you work in a specific context. The context is provided by the scope of the project and by the functional and non-functional requirements for that project. The scope of a project may be determined by the industry you are in -- insurance, finance, telco, etc. -- and by what architectural style you use -- service-oriented architecture (SOA), client/server, distributed, etc.
Now, here is the idea regarding reusable assets: As a practitioner, you would like to be directed to the content relevant to your context. For example, on an insurance project, a functional requirement around creating a claims system can be mapped to reusable software assets such as an insurance Unified Modeling Language (UML) model previously designed for a claims system. A nonfunctional requirement, such as a transactional claims process, can map to another type of reusable software asset such as a software pattern to help make consistent architectural decisions.
But how we do automate this context-to-content mapping for developing software in a consistent manner to allow better consumability of reusable assets, such as models and patterns? The problem with implementing such a vision is that we quickly run into what we term "the multiple buckets" problem.
Imagine that at home you organize all your clothes into buckets. Each bucket has a different type of clothing in it -- one bucket contains all the shoes; another bucket contains all the pants; another has all the jackets; another the dresses. Now let's imagine for a moment that these are very sophisticated buckets. These buckets have the ability to classify all of the items contained within them, as well as to match items together. For example, the shoe bucket has the ability to classify all of the shoes by color and size as well as matching all of the left shoes with their corresponding right shoes. To top it off, these buckets each have a shiny display panel on the front that gives detailed information about any item in the bucket. So with the bucket of shoes you can see from the front panel all the information that you need about a certain shoe, such as when you bought it, how often you have worn it, if it has laces or not, etc.
This is all fine and dandy, but you run into a problem when you are invited to a prestigious conference to give a presentation. This is an important conference and you are one of the guest speakers. You would like to send a message to all your clothes buckets, "Please give me an ensemble to make me look good for this conference presentation!" But your clothes buckets will only look blankly at you, with the shiny digital displays beeping and making you nervous as you frantically root through one bucket after another. All you want is an outfit that will make you presentable at the conference. You don't want to give a talk on enabling asset consumability wearing your favorite Harley Davidson leather jacket along with plaid pants and sneakers. The problem is, each bucket is smart about its own contents but knows nothing about the other buckets.
Repository-based information management systems can be understood by this same multiple-buckets concept. You manage different kinds of assets so that you can take best advantage of their specificity, which works fine when you are only focusing on one kind of asset. But when grouping elements from different kinds, as in our dressing analogy, you can't ask the different buckets to record information about the assembly from different buckets. It would soon become a maintenance nightmare when you throw away a pair of shoes or a shirt.
Another solution could be to have all the elements in a single, generic bucket, but this would sacrifice a lot of flexibility as we would no longer be able to select an element based on one of its specific family characteristics (like shoes without laces, a shirt with short sleeves, etc.).
A better solution is to maintain the different buckets while federating them with a kind of "uber" bucket that manages the clothes' metadata, such as which jacket goes with which pants, which shirt, etc., as well as keeping track of where and in which particular clothes bucket the actual jackets, pants, and shirts are physically located.
This uber bucket would also need to keep track and learn from the choices you make. When you ask the uber bucket for a clothing ensemble to wear to give your presentation at a conference, the uber bucket provides a host of choices in terms of combination of jackets, pants, and shirts. When you finally make a selection deciding on a particular jacket with particular pants with a particular shirt, you want the ability to capture and record this information as well. This kind of traceability becomes critical in any kind of subsequent impact analysis you want to do, and will also help you get better guidance the next time you search for attire for the same kind of event.
To be really usable, this uber bucket would also need to satisfy the following set of non-functional requirements:
- Manage complexity of associations. Back to the clothes buckets. Clearly I need to be able to manage and maintain all the association between all my clothes -- e.g., which pants go with which jackets, etc. My wardrobe increases as I get older and, presumably, will have more money to buy fancier clothes, so the management of these associations will become more and more important. Furthermore, I also need to keep track of the instance association data as described above.
- Scalability and performance. No matter how the uber bucket is built, it will have to be scalable as the number of buckets (bucket of jackets, shirts, pants, shoes, scarves, etc.) and the number of items in those buckets (the number of jackets I have in the jackets bucket) increases. There is a direct link here between the number of physical items and their associated metadata. Also -- and this is a no-brainer -- the uber bucket will have to perform. That is, when I ask "what is the best ensemble to make me look good for the conference," I would like the answer before I go to the conference.
- Validation as well as constraint and consistency checking. This relates to the old chestnut, "garbage in, garbage out." The uber bucket needs to be self-governing in that it needs a way of validating the information that it contains. Also, it needs to be mindful of the kind of information we are putting in. Since this uber bucket manages clothing metadata, it is pointless to put in metadata about your car or your dog's vaccination schedule.
- Automatic inferencing. Since we are going to all this trouble of collecting all this metadata, the last thing we want is for it to be just sitting there. We want this data to be working for us. So, if I know leather jackets can be worn with denim pants and denim pants can be worn with biker boots, then by inferencing, leather jackets can be worn with biker boots.
- Standard-based. Finally, whatever implementation we come up with needs to be based on standards. This may seem like another no-brainer, but there are a lot of priority buckets out there.
The only thing worse than not being able to ask your uber bucket meaningful questions is having an empty uber bucket. If I am forced to root through each of my buckets and pull the items out, one by one, and hand-populate their metadata into the uber bucket, I will end up hating this system and will not use it. Therefore, in order for this to work I need a way of pre-populating the uber bucket with metadata for the items in all of the buckets I seek to manage. I must also be sure that all the data needed for a search is included in this pre-population effort. Not being able to retrieve attire from my bucket would make it useless. This requires careful definition and description of the clothes being maintained in the buckets.
Finally, and this is where the rubber meets the road, you need a meaningful and imaginative way of presenting this information to the user. In other words, you need to build an easy-to-understand view of the context-to-content mapping into your everyday tooling. In the bucket analogy, the context was "Looking good for a presentation at a conference," and you wanted to know "What is the most appropriate clothing ensemble for this event?" However, depending on the size of your wardrobe this may be a lot of information. You may have fifteen jackets that potentially match numerous pants and shoes, but some shoes do not go with some jackets. Ultimately, the user has to make a choice between all the different possible combinations of pants, jackets, etc., and therefore the information must be presented to the user in a way that maximizes how the user can use this information. One way of doing this would be show the combination and permutations graphically, showing which shirt goes with which pants and which shoes in order to make meaningful choices and avoid showing up in a leather jacket with plaid pants and sneakers. This will also enable you, when you chose your favorite jacket, to see at a glance which pants and shoes you can or cannot wear with this jacket.
We have deliberately kept this article at a high, thought-experiment level, which allows us to think through these problems and not be hindered by the arbitrary constraints of any technological implementation. As engineers, we too often think about solutions to problem in terms of the implementation we know. In future articles, we will explore potential implementations of these ideas using semantic Web technology with concrete examples of these ideas in context-to-content scenarios around matching non-functional requirements to software patterns.
In this article we used a context-to-content approach to help address the problems of asset consumability. Inherent in the context-to-content approach is the problem we described in the multiple bucket analogy, and we looked to overcome this problem by outlining a solution what used an uber bucket to help manage and maintain complex associations between assets. In future articles we will come back and provide a solution to some of the questions we posed.
- Participate in the discussion forum.
- A new forum has been created specifically for Rational Edge articles, so now you can share your thoughts about this or other articles in the current issue or our archives. Read what your colleagues the world over have to say, generate your own discussion, or join discussions in progress. Begin by clicking HERE.
-
Global Rational User Group Community

Celso Gonzalez has been working on Software Engineering for the last thirteen years. As one of the Rational World Wide Architecture Management leaders, he provides expertise in domains ranging from Business Modeling to J2EE development, including Requirements Management, Architecture, and Design, to IBM customers and internal resources. Lately he has focused on Java EE development and accelerating development using patterns-based engineering. Before joining the Worldwide Community of Practice team he was part of the IBM Rational Unified Process (RUP) development team, where he contributed to the Business Modeling, Requirements, Analysis and Design, and Legacy Evolution areas. He holds degrees in Computer Science, Mathematics and Philosophy.

Dr. Eoin Lane, Senior Solution Engineer, is the lead for harvesting and developing application patterns from key IBM SOA engagements and driving those patterns through IBM pattern governance process to accelerate adoption. Eoin also specializes in the use of Model Driven Development (MDD), asset-based development and Reusable Asset Specification (RAS) to facilitate SOA development.
Comments (Undergoing maintenance)





