The Cloud & Smarter Infrastructure (CSI, previously Tivoli) worldwide IT team recently upgraded from Tivoli Service Request Manager (TSRM), Tivoli Asset Management for IT (TAMIT) and Common Configuration Management Database (CCMDB) 7.2.1 to SmartCloud Control Desk (SCCD) 7.5. In addition to the version upgrade, there was a name change to bring it into a single product. While the actual upgrade required an outage of less than a day (primarily due to the required database upgrade), the planning and testing that went into the upgrade took several months. It also involved the efforts of a large group of people, not only from IT but from our end users as well as Tivoli development and support.
First, a little background on CSI IT’s use of SCCD is in order. TSRM, TAMIT and CCMDB were originally deployed by the Krakow lab in 2007 and Tivoli IT began using them in 2009. The original servers were deployed in the Krakow lab then moved to the Austin lab in 2012. SCCD is used by CSI IT to manage the support processes for all our services including build, configuration management, cloud, asset management, lab, network and security. In addition, we provide end user classifications to manage the workflows for a variety of other groups within CSI and Software Group. As with all of the CSI products we deploy, we believe we provide a great service to the CSI development, test, sales and support teams by being one of the first organizations to deploy new releases and by serving as a reference and demonstration for them.
We began looking at the upgrade at the end of 2012. We took a backup of our production database (approximately 50GB) and restored it on our test server. We then went through the upgrade process several times in order to work out the kinks as we ran into several issues with the upgradedb process. The first issue we ran into, PMR 64459,820,820, was caused by circular dependencies within the IBM management chain. We synchronize IBM BluePages with the PERSON table. There are instances, primarily at the executive level, where a person is listed as their own manager. Ultimately, we decided to not update the PERSONANCESTOR table as we don't use it. We also ended up dropping all of our custom functions and triggers before running the upgradedb script as they were causing problems with the script. We then recreated them afterwards. Finally, we had to fix a bunch of unique constraint violations in some of our tables.
Once we had the upgrade ready on our test server, we began getting other groups involved. First, we needed to make sure our integrations with other services still worked. We currently integrate with services like ITSAS, FedDB and TEM (Tivoli Endpoint Manager) for asset and security information and BluePages for user and group information and credentials. Verifying these integrations went smoothly and we moved on to the next phase of testing.
When we moved the servers from Krakow to Austin, we had put together a plan for testing the various services provided by TSRM, TAMIT and CCMDB. In addition to various aspects of service request workflows, i.e. ticket creation and resolution, it covered asset management and reporting. We were able to use this same test plan to test the upgrade to SCCD. At the same time, we engaged a fairly large (20+) group of end users representing the most active and/or important user groups of the service. We met with them twice weekly to get their feedback over the three weeks leading up to the production upgrade. We also included the SCCD development team so they could hear the feedback directly. As a result of this testing, we identified and resolved a couple of dozen issues. These issues typically fell into one of two categories; database table permissions or display formatting. The permissions issues were related to our database extensions to support our assets and/or workflows. It’s probably debatable whether or not these permissions should have been handled by the database upgrade or not. The display issues were more along the lines of what we expected with the numerous changes made to the UI for SCCD 7.5. We also weren’t ready for the new Asset portlet so we disabled it. Finally, one of our class extensions (for a button to accept or reject a resolved service request from the Start Center and optionally route the user to a survey) wasn’t working so we removed it. (A “better” solution was provided by the development team after the upgrade so we have since added the button back.)
The production upgrade happened on Saturday, March 2nd. The first issue we ran into was the extremely long time to deploy our EAR into the application cluster environment. (We only had a single node in our test environment.) We also ran into an issue with all the JSP's and servlets being compiled and destroyed on every request which made the applications hangs. We had previously engaged the 3rd level support team and they were able to help us early Sunday morning. Clearing the WebSphere Application Server's class cache resolved the issue.
Post upgrade we’ve run into several issues. First, we’ve seen numerous instances where the browser or Java cache becomes corrupt. We’ve worked with the development team to come up with a process to correctly clear the caches. (Our concern remains though as this is an issue we had never experienced before.) Second, we had quite a few people who were unable to login. It turned out that the SCCD form based authentication had a problem handling special characters in passwords. (We were able to get a fix pack installed a couple of weeks after the upgrade to resolve this issue.) Third, we have seen users generating reports can cause high CPU utilization which, in turn, impacts the performance for other users. (We did some database performance tuning which seems to have helped but, for the long-term, we’re in the process of moving our reports to a separate TCR server with a replica reporting database.)
All-in-all, our experience upgrading to SCCD 7.5 was positive. We’re planning the upgrade to SCCD 7.5.1 now. We expect the upgrade itself to be much easier but the impact on our end users to be much harder. This is due to the significant UI changes in 7.5.1. However, we expect the end result to a much better looking service as well as a much improved service with some of the new features.