Comparing and merging UML models in IBM Rational Software Architect: Part 10. Realign your models after migration or transformation

This article addresses issues of identity mutation as the result of migration to new versions of IBM® Rational® Software Architect across multiple streams, successive regeneration of a model through transformation, or successive model imports. If offers solutions that you can incorporate in your day-to-day work for a smooth and repeatable model-driven development workflow.

Kim Letkeman (kletkema@ca.ibm.com), Development Lead, Modeling Compare Support, IBM Rational

author photo Kim joined IBM in 2003 with 24 years in large financial and telecommunications systems development. He is the development lead for the Rational Model-Driven Development Platform. His responsibilities include UML and EMF compare support, integrations with ClearCase, CVS, Jazz and RAM, domain modeling, patterns, transform core technology, transform authoring for both model to text and model to model transformations, and test automation.



19 October 2010

Introduction

This article explores alignment issues between successive generations of a model in which element identities have changed or mutated.

An immutable element identity is critical for at least two reasons:

  • References from other models must have a unique name to serve as a "key" by which to find a specific element.
  • Compare-and-merge operations must be able to match elements in different generations of the same model, regardless of whether the elements have changed name or location.

It is a serious issue when element identity changes. Suddenly, references between models are invalid, so parallel development can no longer rely on strong compare-and-merge support. In other words, if you change the element's key, the tools treat it as a new element. The previous version of the element is seen to have been deleted.

Several common procedures or workflows will mutate identities as a normal part of their operations. This introduces a break in the lineage of the models, leading to a lack of comparability. This lack of comparability effectively destroys the modification history, thus the workflow becomes unstable.

The following sections address the two IBM® Rational® Software Architect technologies that you can use to work with models of different or broken lineage. The article focuses mainly on the Model ID Alignment technology.


Problems of broken lineage

There are several workflows that might need to address models of broken lineage.

Transformation

The layers in Model-Driven Architecture (MDA), as shown in Figure 1, include computationally independent models, platform independent models, platform-specific models, and software code. The process of moving downwards or upwards from one layer to another layer typically crosses modeling domains. This crossing of domains is often automated using transformation. Adding automation to MDA is typically called model-driven development, or MDD.

Model-driven development (MDD) is a concept whereby a system or application (typically in software. but it can really be anything) is visually modeled by using one or more modeling languages and then implemented with code, often written in a third-generation language that targets a specific runtime environment.

Figure 1. Transformations in Model-Driven Architecture (MDA)
MDA layers with indicated fusion points

When transforming across domains, such as from a process model to a service model or from Java™ code to a UML model, elements or the models' identities can be created or mutated if the two domains do not share identities when representing the same entity. For example, when Java™ code is generated by a Rational Software Architect transformation element, identities can be inserted into the code as comments or as annotations, thus the identities can be preserved when code is later refactored. In such a case, it is possible to make changes on either side and to make sense of the changes during both forward and reverse engineering.

However, in cases where the code is used to generate and maintain a UML representation, as shown in Figure 2, there will be no identities to preserve. Thus, new identities will be created on each execution of the transformation. If the models are to be useful for comparison purposes, the alignment of the newly generated (source) model to the previously existing (target) model must re-establish element identity.

Figure 2. Successive transformation between domains
Transformation and re-transformation in a stream

Model fusion (described later) is built into many transformations, and it is most often used when user intervention is expected. But if a development or build process must remain purely automatic, then model alignment followed by an automated merge would be preferred over fusion.

Migration

Terminology

The author uses Unified Change Management (UCM) terms such as "stream" as shorthand for grouping artifacts under development according to a common timeline (to designate the same group of files at the same versions). You might prefer to use "branch" or "context" to designate the artifacts that are grouped and managed together as they move forward in time.

When moving models across software versions, the UML metamodel and the notational metamodel both typically change. Examples of identity-mutating migrations are from IBM® Rational Rose® or IBM® Rational® XDE® to Rational Software Architect or from Rational Software Architect Version 6 to Rational Software Architect Version 7. Moving from XDE to Rational Software Architect, for example, changes the underlying metamodel from UML 1 to UML 2. This is a massive change with many changes to elements and many new elements, especially within the notational metamodel.

An example of the scope of the changes when moving to Rational Software Architect software is a customer migration that we performed with 100 individual models in two streams. One was a product variant of the other, as illustrated in Figure 3, but with two streams.

Figure 3. Multiple stream model migration between tooling releases
Migration from Version 1 to 2 on three streams

The number of differences per model between streams averaged 10 in the XDE stream set. After migration, the Rational Software Architect stream set showed an average of 10,000 differences per model. Obviously, because nothing else had changed, each model had almost 10,000 new (and quite superfluous) changes.

This customer's issue identified a critical need for a solution. We developed the Model ID Alignment tool in response.

In Figure 3, all models from Stream 1 are copied to Stream 2 and opened in the new tool, allowing the automated migration to take place. Each software configuration management (SCM) system provides this private working area for the new models in a slightly different way, but in all cases the models will be opened in the new tool and end up in a Version 2 "stream."

Import

A variation on migration is the successive importing of a model from an older version of the software. The example shown in Figure 4 was encountered and solved on a customer site. A model is migrated to Rational Software Architect from Rational Rose software and then maintained in both tools while the overall migration is evaluated and then staged.

Figure 4. Re-import when maintaining models in different tools
Import with fusion after changes in Rose and RSA

Changes made in the Rational Rose stream are to be reimported into the Rational Software Architect stream. But this time, the changes from the incoming model must be merged into the Rational Software Architect version of the model that has changed. The fusion feature works well here, because user intervention will be necessary to figure out potential conflicts between changes based on each of these tools

Advanced workflow

It is theoretically possible to maintain multiple product variation streams in a single release. It is also theoretically possible to maintain both stream sets in both Rational Rose and Rational Software Architect while migrating. This is an advanced scenario that would use the import workflow described in this section to take changes from Rational Rose into the Rational Software Architect stream. It might then use the migration scenario to propagate the identities across multiple stream sets.

The actual flow is simple:

  1. After importing the model, rename the model file and check out the existing version.
  2. Then combine models with the newly imported model as the source model and the checked-out model as the target. See the section about Rational Software Architect provides two methods to solve the general problem of models that have similar structures, yet have identities that are not aligned: model fusion and model ID alignment., which follows, for further discussion.

Solutions in Rational Software Architect

Rational Software Architect provides two methods to solve the general problem of models that have similar structures, yet have identities that are not aligned: model fusion and model ID alignment.

Model fusion

Fusion is also known as structural merging. Structural merging is driven by the qualified names and metamodel types (a classifier, for example) that represent the structure of a model. Elements are organized into a hierarchy of packages, and everything exists in a name space. A model might consist of a high-level package M with package P and an element, E. Thus, element E would have this unique address: /M/P/E

If you move the element to package P2, the element's address has changed to /M/P2/E. But it is not possible to determine that this change is a move, rather than a delete-and-add or even a rename, because all three operations can produce the same path for element E.

Fusion presents the two models side by side and attempts to "guess what happened." Fusion has a manual matching technology built in so that you can change the alignment, based on knowledge of the model and the actual changes. When the models are correctly aligned, you can click the OK button to copy the information (all selected changes) from the source model (new, incoming) to the target model (existing).

This feature is accessed in two ways:

  • The fusion dialog is built into most transformations that cross domains and generate a UML model, for example the Java-to-UML transformation. It is launched after the intermediate (source) model has been created. When the dialog is dismissed without canceling the operation, selected changes from the source mode are copied into the target model.
  • The fusion dialog can also be launched ad hoc by selecting a pair of models and selecting the Combine Models command from the context menu, as shown in Figure 5.
Figure 5. Accessing the Combine Models dialog window
Combine Models selected on the drop-down menu

Fusion performs a two-way merge of the structure of the models, matching by identities and qualified names, in that order. In other words, if the element E has the same identity in both source and target models (as shown in Figure 6 on the left and right of the dialog), then no matter where element E ended up in the source model, the fusion dialog will accurately denote the type of change.

Figure 6. The Combine Models dialog window
Visual Combine window for two models

For more information, please read Part 8 of this series: Comparing and merging UML models in IBM Rational Software Architect: Ad-hoc modeling – Fusing two models with diagrams.

Model ID alignment

This function is accessed by selecting the Window > Show view drop-down menu items (see Figure 7).

Figure 7. Accessing the Model ID Alignment view
Model ID Alignment selected in Show View window

Model ID Alignment realigns identities to a known-good baseline model, thereby establishing or restoring the lineage of the new model. It does this by creating a database (essentially a MAP) for each model root and storing a lookup key for every model element with the associated identity (a globally unique identifier, or GUID.) If an element is not found in the database, it is presumed to be new, so it is added to the database.

When a model is aligned with the element identity database, each element in the new model is looked up by a key that is generated from the element's properties and relationships. This key is used to find the identity of its ancestor elements. If an identity is found in the database, it replaces the unaligned identity in the model. If an identity is not found, then the element's existing GUID is used (or a new one if it is missing) and also entered into the database. Thus, each time a model is aligned, the database gets updated to include all elements. In this way, all successive generations can be realigned and lineage can be maintained even when the model is growing and changing in other contexts.

This process is depicted in Figure 8.

Figure 8. Model alignment process
Diagram of how successive model generations are aligned

Figure 8 shows the two stage process for model alignment on a series of streams that contain successive generations of models. In Stage 1, the model identities (GUIDs) from the first group of migrated models are captured into the database with an encoded look-up key. In Stage 2, in the second stream (see Figure 9), matched element IDs are searched by look-up key so that new GUIDs can replace the existing GUIDs. Look-up keys that are not found result in the new GUID being inserted into the database. Figure 8 shows one more alignment of a stream that descends from the second stream. This is a second run of Stage 2, where another set of look-up keys is used to search for GUIDs. If the models in the third stream contain GUIDs that were added to the models in the second stream, those GUIDs are available for alignment from the prior running of Stage 2. In this way, as many generations as necessary can be migrated and aligned.

Figure 9. Model ID Alignment view, stage selection
Selecting Stage 2 in

Example: Model ID Alignment in action

In this section, you create a model and then create successive generations of it to see what a model looks like when it gets misaligned and to see the Model ID Alignment view in action.

Step 1. Create a model that needs realignment

You'll work with a trivial-sized model so that the information is not lost in a sea of details. Figure 10 shows the initial model.

Figure 10. Initial test model
A model with 3 elements and 2 relationships
  1. You will create two copies: one that is already aligned and one that is not.
    1. To create the aligned model, simply copy and paste the actual file. This simulates "checking out" a new version of a model in a software configuration management (SCM) system, such as IBM® Rational® ClearCase® software.
    2. To create the unaligned model, you create a new model with the same name. Then you copy all of the contents under the root node in the original model and paste them under the root node in the new model. This operation reassigns all GUIDs and creates a model that is the same as a model created from running a Java-to-UML transformation a second time.
  2. Activate the Customize View menu item by clicking the downward (expand) arrow at the top-right of the Project Explorer (shown in Figure 11).
  3. When the list of filters is displayed, please scroll down to the bottom and remove the check mark on the filter for UML model files. This will remove the filter on model files and display the model files in the Project Explorer so that you can copy and paste the model. The paste command brings up a dialog window asking you to rename the model (for this example, use test_2).

Figure 11 shows the menu item and the two model files after the copy-paste operation is finished.

Figure 11. Copy-and-paste of model files
Filter menu and two model files
  1. Select the Compare with > Each other command (see menu selections in Figure 12) to launch the comparison editor, which shows the differences in the two models according to their model identities.

Figure 13 shows the result of the comparison.

Figure 12. Compare command for two models
Drop-down menu choices: Compare with > each other
Figure 13. Compare editor results
There are no differences between the selected…

This result confirms that copying the file will simply make another version of the same model, as would checking the file out of ClearCase. Instead, you need a model that looks more like a model that has been transformed from Java and then transformed again later. This second transformation creates a model that looks identical to the output from the first transformation but has no identities in common.

  1. Next, remove the new Main diagram, copy the contents of the model named test, and paste them into test_3.

Figure 14 shows the model named test_3 that was created in the Step 5 to simulate this second transformation.

Figure 14. Test_3 model with copied contents
Test_3 shown after paste, beside test and test_2
  1. Now, compare test_2 with test_3, and you should see significant differences (shown in Figure 15.)
Figure 15. Comparison of Test with Test_3
Error in test, models from different ancestor

Oops! Creating a new model in this way results in models that are identical in structure but have completely different GUIDS, including that of the main package itself. That disallows identity comparison. Models like these (ones that represent unaligned identities) are normally compared by using Fusion.

So how do I show you here what the differences are? Simple; I cheat. To show you these differences, I copy the model at the file system level again, creating test_4. This time, though, I edit it and remove all of its contents except for the root node, replacing them with the contents of the original file: test. Again, I copy and paste everything under their respective root nodes. This model will be a strange hybrid of the correct model identity and newly generated contents.

This example is not useful as a scenario (it is not likely to come out of any typical workflow), but it shows you the number of superfluous changes that occurred with this simple transformation (see Figure 16.)

Figure 16. Identity comparison of Test with Test_4
Test_4 has 10 differences from test

You can now see what happens when you generate a model by using a technique that creates new GUIDs. Even such a small model as this has 10 differences from the original. This model is similar enough in structure (identical, actually) that model alignment should remove all differences. The differences are quite sensible: five additions and five deletions. With mismatched GUIDs, that's the best that identity-based comparisons can do.

A structural comparison, as shown in Figure 17, shows no differences between test and test_4, exactly as we would expect.

Figure 17. Structure comparison of Test with Test_4
Visual combine shows no differences

Step 2. Build the Model ID mapping database

  1. Going back to Test_3, which was the model that has all different GUIDs, start the realignment with the Stage 1 creation of the database file.
  2. Store this file in a folder anywhere in the network.

Do not treat it as an artifact that can be merged though, because it spans all versions of the files. Allow it to grow in perpetuity as more and more versions of the model are created. The more versions of the model involved, the more GUIDs it incorporates.

Figure 18 shows the full Model ID Alignment view when Stage 1 is selected. It is considerably larger than it appears at first.

  1. The default location for the ID Mapping Database folder is the Rational Software Architect workspace in use at the time. This is not a particularly good location for an enterprise resource such as the mapping database, so change it to something that is appropriate to the workflow.

Tips:
If this database is used for a specific model for alignment after regeneration by transformation, then there is no reason why the model cannot be stored in the same Eclipse project with the model root. However, if the database exists to serve migration, and if there might be more streams to align in the future (if the database participates in an ongoing workflow), then this database is a resource that should be on an enterprise file server that is backed up and kept safe for future use.

For this example, Figure 18 shows the project as the target location, because the database has no reason to exist after the project is removed. Also notice the use of the word "keep" in the file names. This is necessary so that ClearCase (if it is the SCM system in use) does not try to remove the files or put them under source control. That is not important until the alignment is finished and all models have their final names and are in their final locations.

Figure 18. Stage 1 Model ID Alignment view
Stage 1 alignment with model and database path
Button functions
A quick note about the buttons at the right of the large file list window:
  • Add Files allows individual models to be selected.
  • Add All Files is a quick way to select all *.emx files so that a database can be created for every model file in one step.
  • Remove and Remove All are housekeeping functions that allow changes to be made to the dialog contents.
  • Check duplicate IDs is useful to find model corruption in the form of elements that have identifiers that are the same. Such elements cause great confusion for and the software functions, especially for the comparison tool. When duplicate IDs are found, they are repaired automatically.

Figure 19 shows the result of the Duplicate ID check when the model is not corrupted in this way.

Figure 19. Duplicate ID check result
Output shows no duplicate IDs found

There are three choices in the radio button group that is labeled Save Location of Models with aligned IDs:

  • The first is to overwrite the input file but save a backup
  • The second is to save a new file with a relevant name
  • The third is to output all of the files to a new folder.

The third option would make sense for a large group of files being converted, but for this test, the second is the simplest choice. It makes what is happening very clear.

At the bottom of the view is the button labeled Generate ID Mapping Files. Clicking this button causes a progress bar to be displayed (in this case, for a very short time). Then a window shows the log file (looking back at Figure 18, notice that the box for the option to show the log after completion was checked.)

Figure 20 shows the result. There are 91 individual elements in this model, and all 91 kept their IDs. This makes sense, of course, because Stage 1 creates the database from the first model's IDs.

Figure 20. Model ID Alignment Stage 1 results
Screen output of alignment results for Stage 1

As Figure 21 shows, the output file and the database have been created.

Figure 21. Files in the project after Stage 1
Input, output, and map files in Project Explorer

Step 3. Align a new model with the mapping database

At this point, the input file (the file in Stream 1) has been crawled to create the identity database. The name of that file as shown in Figure 21 is a bit funky, because it incorporates the identity (a GUID) of the file, along with the file's original name in the map file name.

  1. The output file is identical to the input file, because this was simply Stage 1. It can be discarded.
  2. To run Stage 2 against the new model, which you will remember is test_3.emx, you must set up the GUI by selecting Stage 2 in the Operation field at the top of the view.
  3. You must also add the file name.

This is all shown in Figure 22.

Figure 22. Files in the project after Stage 1
Stage 2 GUI with alternate database field

A new button has appeared, as highlighted in yellow in Figure 22. You can use this button to tell the tool which database file to use. Because the GUID of the main package is different from that in the earlier version of the model, the automatically generated database file name will not match that created initially.

  1. So, use this button to select the database that was originally created for this file.

The button brings up a simple file selection dialog, as shown in Figure 23.

  1. Select the database file by drilling into the appropriate project or finding the alternate location that might have been chosen in Stage 1.
Figure 23. Select the alternate database
Dialog window with alternate database selected

As Figure 24 shows, the main view's ID Mapping Database Folder field and the trailing Browse button change now to signal the presence of an alternate database file and to provide the ability to remove the alternate mapping file.

Figure 24. Alternate database selected
Alternate db label and remove alt mapping button
  1. Now, click the Align Model Element IDs button at the bottom of the dialog shown in Figure 22 to perform Stage 2 alignment on the new model.
Figure 25. Log file after Stage 2
Log file showing 83 changed IDs

The log file that is displayed following Stage 2's completion (Figure 25) shows that 83 of 91 elements were updated. Comparing test with test_3.7.x.keep.emx (the output file from Stage 2) shows no differences in the data. Because of how this file was created, there is, in fact, a change to the name of the main package, which was created when the test file was created as test_3.

  1. Because test_4 is a hybrid file with the same package name as test but the same data as test_3, it makes sense to run alignment on that file also to see if all changes disappear. And they do, as Figure 26 shows.
Figure 26. Alternate database selected
Result: no difference in the output file

At this point, the output file has established perfect lineage with the input file. Had there been a new element in the second stream (test_3.emx), its identity would now be in the database. And had that change been propagated further down the line (say, in the original stream set in IBM® Rational Rose® format), Stage 2 in the third stream would naturally align the new element, along with all the existing elements.

A little discipline is required to plot the specific order of alignment and to ensure that all the files are correctly managed with the appropriate databases, but that is a small price to pay for the power of realignment in a multistream environment.


Automation

There is an API for calling the alignment tool on multiple files at once. This can save considerable time in a two-stream scenario. In a multistream scenario, there is an issue with the fact that the GUI does not accept alternate database paths, which means that each generation is processed with only knowledge of the identities in the previous generation and not with knowledge of all generations of identities. Some combinations of changes might not align reliably in such a case.

If you are transforming or migrating models and want to automate the alignment (for example, to make the alignment part of a build script), the application interface shown in code Listing 1 is available for use in a custom plug-in, perhaps called from an ANT script.

Listing 1. Application programming interface for model alignment
/**
* Align the model element id in contributor file with the id of the
* matching element in the base file. The aligned contributor model is
* output to the contributorOutputFile. The base output file is optional. It
* is usually same as the base file unless the base file has old format.
* 
* @param baseFile -
* Base input model file or fragment file (ie. *.emx or *.efx)
* @param contributorFile -
* File to be align with base file. (*.emx or *.efx)
* @param baseOutputFile -
* Output of base file in current file format. Can be null.
* @param contributorOutputFile -
* Output of aligned contributor file. Can be same save
* contributorFile
* @return true if success. False otherwise.
* @throws IOException
* @throws InterruptedException
*/
public boolean alignSingleModelFile(File baseFile, File contributorFile,
	File baseOutputFile, File contributorOutputFile)
	throws IOException, InterruptedException {}

Notice that the automation handles all files in a model individually. Fragments are aligned with fragments, and roots are aligned with roots. Every file must have a mate in the other stream. Therefore, automation is for repeatable scenarios where model structure is identical.

These are the parameters:

  • baseFile - The previous generation of the file being aligned. It is used to create the identity database.
  • contributorFile - The file being aligned, which must have the same identity as that of the previous generation. In other words, it must be a hybrid like the hybrid example used here to demonstrate the differences.
  • baseOutputFile - An optional path to write the current file if it was up-versioned for this alignment.
  • contributorOutputFile - A path for the output, which can be the same as the input if you prefer to overwrite.

Caveats

This Model ID Alignment tool is not without its limitations, so what follows is a summary of the caveats:

  • Fragmented models are supported in the GUI, but some tests show that the tool does not always find the elements if the structure has changed dramatically (for instance, if Stream 1 model is not fragmented and Stream 2 model is fragmented). It is better that the files have the same physical structure for the alignment process to be considered reliable.
  • The GUI does not automatically find the alternate database file, even if the name of the model files is identical. This forces the manual step of selecting the database every time. Of course, this also guarantees that subtle errors do not creep into the workflow in case there are models with the same names but different content.
  • The automation does not accept an alternate database file, so it cannot be used where the database of identities grows over time. Given that each successive model to align is generally richer, this is not always an issue. But if a model element is removed in an intermediate generation and left inside a later generation, the database created in the alignment of these two cannot contain the correct GUID (no long-term history). Therefore, the element might appear to be new after alignment (which is only correct with respect to the middle stream.) If automation is desired, it might be preferable to annotated elements or rename elements to signal deprecation rather than to delete them.
  • The automation does not handle logical models in their complete forms. Instead, all files are aligned individually. Thus, it is imperative that the model structures be identical for automation to work.
  • The automation does not handle files with different root GUIDS. Those must be aligned before calling the alignment API.

Resources

Learn

Get products and technologies

  • Evaluate IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=551286
ArticleTitle=Comparing and merging UML models in IBM Rational Software Architect: Part 10. Realign your models after migration or transformation
publish-date=10192010