About this blog:
This blog focuses on software quality in general, and IBM Collaboration Solutions offerings in particular. The author is an IBM employee, but expresses his observations and opinions as an individual here. The purpose of the blog is to nurture a conversation with our customers and partners about continuous improvement of our software based offerings. ~FTC.
Nice succinct statement from colleague Anna Dreyzin on why social business is important to her, all packed inside 30 seconds. She talks about listening, engaging and helping, adding value and building reputation, managing feedback, extending reach, increasing engagement and growing advocacy. I agree with them all, and I would also highlight Searching. The ability to share and search knowledge across a large team is highly valuable. For an example, see my prior post on How Connections helped Connections.
First of all, a third digit release is what we refer to as a Modification, or Maintenance Release, which means a substantial portion of the effort going into the release is aimed at updating currency and compatibility, and generally making the Sametime 8.5 code stream more consumable and easier to maintain with a lower Total Cost of Ownership (TCO). So as a matter of standard business for a maintenance release, we go through Technical Support call records to identify the relatively more frequent topics called about, which have impact to customer productivity or satisfaction, and then we target actions to eliminate the need for those Support calls. We do this not just to save ourselves cost in delivering Technical Support, but primarily to lower TCO for our clients. The fundamental view is that, all other things equal, a product for which fewer Support calls are needed, is a more consumable product likely to have a lower TCO as well.
So what did we do for quality in Sametime 8.5.2 ?
First, there are naturally fixes for a number of customer-reported defects. These fixes will further stabilize your system and prevent you from needlessly encountering defects already discovered by other users. One of the continuous improvement goals, which each release is required to set, is to reduce the backlog of open defects. Most software vendors prefer not to talk about a backlog of defects, but almost all software products have such a backlog. The traditional argument for not eliminating the backlog entirely is that the activity of fixing the least severe defects has a lower Return on Investment (RoI) for both vendor and customer, than the RoI from spending the same amount of effort instead searching for new and more severe defects, or maintaining currency and compatibility. So while backlogs will likely never disappear entirely, we do set goals to reduce them from release to release, and Sametime 8.5.2 takes a substantial chunk out of our backlog. We have also gone over our pre-release test case suites asking why the customer reported defects against the prior release escaped from our release process. We then make process improvements (update tools & instructions, add test cases, etc) to make sure the same escape doesn't happen again in the future. We refer to that as a closed-loop process, and again, that's pretty much standard business. Call it process maintenance.
NAT firewall traversal is a key addition allowing much easier use of Sametime in enterprises connecting sites that are each protected with a NAT firewall. A/V performance has been improved. And we now support mobile Sametime on the Android platform; no small point. Another much requested feature is the ability to bump an attendee from a meeting. For more feature information, read John Del Pizzo's summary.
Our review of causal analysis data with the Support Team showed that authentication errors were relatively more time consuming to troubleshoot than other errors. We found a number of ambiguous error messages, which were not as helpful to a user as they could be. I'm not surprised here. I see error messages as lacking in almost every software product in the industry, from any vendor. But we honed in on 27 specific error messages involved in the Support calls we reviewed, and we have made a number of logging improvements, which will make these issues much easier to troubleshoot in the future because we collect more detailed information in the log.
On the integrated server console, we have added the ability to monitor the health of all the servers in your Sametime system: Community servers, Meeting servers, Media servers, Mux'es, Gateways, etc. This is the first time we're collecting all that information in one place for a Sametime system, and it greatly simplifies life for the system administrator. Not to mention that it allows issues (like disks or memory running full) to be discovered and corrected before they impact users. The administrator can see what processes are up and running, which might be hung or slow, which resources are heavily utilized, etc. We also added integration points with IBM Tivoli Monitoring to further allow our clients to build detailed alert and notification systems.
We had noticed that especially the Community server didn't offer data as good as other components for troubleshooting incidents in the environment. To improve the First Failure Data Capture (FFDC) posture of Sametime, we have updated the Community server NSD to ensure we now get a full crash stack for the Sametime process allowing us to compare different events, so we can identify repeat issues by matching crash stack signatures to defects. The updated NSD can also include a Windows core file in the event of a server core.
Our beta test partners gave us very good feedback on the Sametime 8.5.2 release, not just in terms of seeing the benefits of the improvements we've made, but also in terms of pointing out targets for further improvement. To be clear, the above is not a full account of all the improvements made in release 8.5.2. I just wanted to share some of the good reasons to plan an upgrade to 8.5.2 soon. .
IBM announced Sametime 8.5.2 Interim Feature Release 1 this week. Much buzz has circulated about the new features already, and you can read about them in the announcements. But in the vein of my "Why better" postings, including the prior post about Why Sametime 8.5.2 is better last May, I want to briefly share some of the quality improvements we have worked on for this Interim Feature Release, or IFR 1, as well. Every software release from IBM containing new function must also identify and achieve specific quality improvements. As an interim feature release, the aggregate underlying development effort is smaller than a full feature release, which means the quality improvements are also fewer than for a full feature release, but we were able to take some good steps anyway. The quality focus in this release was on the serviceability attributes; which are the abilities to diagnose and correct any problems. We focused on providing more helpful and more meaningful log and error messages in three specific areas: (i) the install experience, (ii) the NAT ICE SDK, and the (iii) Meetings.
Within the install, we improved the log and error messages related to validating server connections for other servers in the deployment, such as DB2 and LDAP servers. In addition to improving the validation itself, we also surfaced to the user what is being validated, and what the status of the validation is. Moreover, we reviewed the not yet externalized error messages and externalized them where it made sense to give the user more information. This is ongoing effort that will continue to improve end user messages in future releases.
For the network address translation (NAT) interactive connectivity establishment (ICE) software development kit (SDK), used for integrating awareness into other applications for example, we enabled full detail prints of IceSession and MediaSession failure traces, as well as TURN - or Traversal Using Relay NAT - server details when the IceSession is created. We also made a number of improvements to the Logger output from the C++ ICE SDK.
For Meetings, we focused on improving the log messages associated with the AppShare protocol and updated many key messages.
A simultaneous announcement of Sametime Unified Telephony (SUT) 8.5.2 IFR 1 was also made this week. With the new SUT release, we now support virtualization of the SUT server. And we completed a Telephony Control Server (TCS) configuration tool, which can dramatically lower the time needed to configure your solution. We are already receiving very good feedback on this tool. We also coded an automatic restart mechanism for the event of a TURN server crash. Internally, we had also set targets for further expansion and coverage by our automated test suites, especially in the Audio Visual (AV) and Sametime Unified Telephone (SUT) functionality areas, and those targets were also met.
For a discussion of these releases, I recommend listening to Episode 79 of the This Week in Lotus podcast, entitled Why Sametime 8.5.2 IFR 1 definitely ain't no Turkey! Enjoy the new releases taking Video chat and Unified Communications to new heights. To filter the blog and show just the 'why better' entries, click the "better" tag in the line just below the blog entry title.
Continuing in the vein of prior posts with the ‘better’ tag, I want to describe quality improvements made in recent releases. Notes/Domino 8.5.3 has just been released, and you can read about new features in the announcement. There’s plenty to like. The embedded Symphony version has been updated to 3.0, and the embedded Sametime to 8.5.1, both key advances. There are enhancements to XPages and the Domino Designer, and much more. But quality is not just about new features. It’s about all features working well.
The overall quality objectives for the 8.5.3 maintenance release were to significantly reduce the outstanding defect backlog, to improve integration with companion products like Connections, Sametime and Symphony, and to expand test coverage and test automation. The team has delivered on all of those objectives. All major components (Domino, Notes, Designer, Traveler) reduced their deferred defect backlogs by considerable amounts, some by more than half. The vast majority of those defects had not been reported by customers. They were found in house. Removing them eliminates the risk our customers will run into them. Reducing internal defect backlogs is always an objective for a modification release (a.k.a. maintenance release), but release 8.5.3 has achieved reductions that are greater than is typical for most maintenance releases.
Security is a high priority for any release from IBM. In Notes/Domino 8.5.3 we moved systematically forward with further detailing of our threat model and the adoption of Rational AppScan Enterprise Edition for testing of the full attack surface across the Notes client. Similar efforts were done for Traveler and for iNotes. (Domino did this work previously). All the components had security testing in the past; what’s changing is that we’re adding Rational AppScan testing across all of our portfolio. And of course resolving all security defects before releasing.
The Domino team also focused on memory related improvements in release 8.5.3, delivering new NSD macros, and an administrator capability to track and drop 'bad' IMAP sessions, which can cause server crashes. A very key improvement is a substantial reduction in use of shared 16-bit handles, which will reduce the type of conflicts that can cause potential hangs or crashes. The aggregate result is an even more stable solution. For the Domino Configuration Tuner (DCT), we continue to deliver additional rules to help you ensure your environment is optimized. If you use DCT, be sure to download new rules regularly. We add new rules at least quarterly, and some times monthly. For iNotes, we continue to focus on achieving full parity between the Notes and iNotes client experiences, delivering important improvements to sorting by subject, to auto-processing of calendar entries, and the option to not expand personal groups when sending.
Less visible to our customers is the continued progress on test automation. The more of our standard test scenarios are automated, the more time our engineers can devote to specialized, exploratory testing around new features. Some critical areas have doubled the number of automated tests this year, freeing up engineers to expand coverage, all part of our continuous improvement effort.
Release 8.5.3 is the next global deployment candidate for IBM’s own internal environment of nearly 400,000 users around the globe. Prior to release, the IBM CIO’s Office deployed it to over 4,000 IBM employees, and our Services Division deployed a pre-release build for over 14,000 users. That means over 18,000 people were using it daily before we declared it ready to ship. The CIO servers are primarily AIX and zLinux servers. Although the majority run the client on Windows, there are a few hundred running on Linux and Mac platforms as well.
In summary, there’s a lot to like about Notes/Domino 8.5.3. I’ve described a few highlights of the quality effort here, but of course the proof is in the pudding, or more accurately, in the released software. Enjoy the new release. As always, feedback is welcome.
Back in April, IBM announced the availability of IBM Connections 3.0.1. It's a "maintenance release", but it contains important new function as well. Does that improve 'quality' of the product? Well, yes it does. Although we often separate new function from quality improvement, or separate 'what' the software does from 'how well' it does it, reality is that available function directly impacts fitness for purpose, which is a key component of quality. (See my post "What does Quality mean?" from April 25th 2011 on that). So from a quality perspective, we quality folks too are excited about the new functionality, especially:
the Media Gallery for sharing of photos and videos
the moderation capabilities for Community owners to screen content,
the Ideation blog for sharing and voting on ideas within a Community, and
the integration with Enterprise Content Management (ECM) repositories.
In terms of currency and non-functional attributes, we added (i) server side support for Windows 2008 R2 64-bit, (ii) support for Microsoft Active Directory 2008 as an LDAP, (iii) mobile client support for Blackberry OS 6.0, (iv) database support for Oracle 11g Enterprise Edition Release 2 and Microsoft SQL Server 2008, and on the security side (v) support for CA Siteminder 6.0 and Java SPNEGO for single sign-on.
As usual, our quality program also looks for improvements beyond function, currency and non-functional attributes. One of the most significant non-functional improvements in release 3.0.1 comes from further work on optimizing code performance in the Communities component, where we beat our own goal for the release by a nice margin. And in terms of performance, the Activities and Files components achieved significant improvements in the 3.0.0 release, so users of an environment migrating from release 2.5 to 3.0.1 should see performance improvement in all three: Activities, Communities and Files. The Connectors Help has been updated in release 3.0.1 with a Files connector and an Outlook Social connector. In addition, a variety of internal metrics demonstrate a healthy focus on quality. The defect deferral rate is very low for IBM Connections. Another key quality metric is the release-to-release reduction of the number of support calls per customer, and the number of customer reported defects. Based on early data for the first three months since release 3.0, both metrics are demonstrating continued improvement from release 2.5 to release 3.0. The fix list for Connections 3.0.1 details what fixes for customer reported defects are included in this release. I have no doubt release 3.0.1 will provide even further value to our Connections customers.
And now you can leverage IBM Connections 3.0.1 Portlets for WebSphere Portal to extend the social collaboration experience of IBM Connections via a WebSphere Portal instance to your collaboration partners. This kind of integration across the IBM Collaboration Solutions portfolio is a key way we deliver additional value of your investment in our software.
If you have suggestions for improvement to IBM Connections, or the connectors, or the portlets, I'd appreciate if you would share it in comments to this blog post. Thanks.
In the field of software quality, we rely extensively on a series of quantitative metrics to inform us of trends and performance. We need to be keenly aware that the very instant we set a target value for a metric, we are driving behaviors of the people we charge with achieving the target value. Through it all, leaders need to ensure everybody stays focused on what's best for our business and for our customers; and that excessive pressure to achieve those metrics targets don't interfere with that focus.
Here's a real story from the world outside IBM, where the pressure to achieve metrics targets allegedly caused a Police Force in Brooklyn (NY) to bend the rules and violate rights of the people they were supposed to protect. Listen to "Act Two" within this segment of the radio show "This American Life". The story fills roughly the last 41 minutes of the 59 minute segment, so fast forward to time stamp ~18 mins. Police Officer Adrian Schoolcraft documented extensive cases of focusing on metrics over mission. He is now involved in a law suit against the Police Force. This is a chilling real life story.
After listening to this 'extreme' case, I encourage you to ponder your own software related metrics. Are they as accurate as you think they are? And what behaviors are they driving? What are you doing to prevent similar transgressions in your shop? This is worth giving some thought to on a regular basis for any metrics driven organization. It certainly has direct impact on quality. For that reason, I like to validate metrics results with qualitative observations, and where possible also customer feedback. .
This is a fairly new blog, started just one month ago, so the visit count on each successive entry may rise from a combination of interest in the topic and more people becoming aware of the blog's existence. The prior entry "Why Sametime 8.5.2 is better" saw a very positive rise in visits. I cannot be sure how much of that is due to the topic, and how much is due to increased awareness of the blog, but I take it as a vote in favor of sharing that kind of information. Each new release project has to set strategic quality goals, and achieve them before releasing. My thought is that we should be sharing a "What's better" overview with each new release, just as our Marketing colleagues share "What's new" overviews. I plan to experiment a bit with the level of detail, looking for a balance that is not so detailed it becomes long-winded, nor so sketchy it becomes abstract. Some times, that means discussing select improvements rather than all improvements in a release. Or splitting the material in multiple entries. I just attended the May 24th IBM Collaboration Solutions community call this morning, where the audience confirmed their interest in hearing about continuous improvement. So I'll plan more posts in that vein. Happy to share what I can :-)
Enjoy Peter Presnell's insight gathered over a number of migration projects. Replacing an advanced messaging, application development, and web server platform like Domino is far from trivial. And the author has seen how it often goes down. If you care about quality of your environment, take note of this experience. Thanks to Peter for sharing.
Anyone considering a migration project would be wise to very carefully estimate both the full scope and budget of the effort, as well as the ability to continue support for existing usage patterns. Map how your users are actually leveraging the platform they have today. Don't assume you know.
Years ago, one of my first projects for IBM was to run beta tests for departmental cutsheet network printers. I'll never forget a particular customer: Great and very cooperative IT staff, but when they pointed me to their 'main' print queue, I found limited traffic. To make a long story short, they didn't realize many employees had switched their print jobs to alternate queues on alternate server clusters. Keeping up with all areas of usage is no small task. Know your users, especially when planning change projects, whether migrations, expansions, or other. .
Since this blog is focused on quality, it seems reasonable to start with a description of what the term means to me, in relation to software. I am NOT asking readers to adopt my definition of quality, or to limit your comments based on my definition. In fact, I would like to hear how you define the term.
What does software quality mean to you?
To me, delivering quality software means simultaneously providing:
Code with the right features (capability) & with the features done right (consumability)
Code which is performance optimized, quick to deploy & integrate, easy to learn, easy to use, easy to maintain, easy to scale, and competitive in terms of total cost of ownership
Code assisted by comprehensive knowledge content delivery, and enablement of staff and customers, throughout the software life cycle
Code with rapid support resolution available globally if and when problems occur
Code supported proactively; by providing what’s needed before customers even recognize the need
And finally, for hosted solutions, code and web delivery operations that meet Service Level Agreement (SLA) commitments for specified performance metrics, typically availability.
This encompasses both fitness for purpose, conformance to requirements, total cost of ownership (TCO), and maintainability & support aspects of quality. Would you modify anything in this definition?
Delivering a highly available service is way different from producing a customer installable product. The rightful expectation of the Software-as-a-Service (SaaS), or Cloud, subscriber is that the service is available whenever they need it. A good analog is the dial tone in a land line phone. It's just there when you pick up the phone. And if it's not, your first instinct is to check the cords and make sure the phone is plugged in. In developed countries at least, the absence of a dial tone rarely causes a first assumption that the service is down, but rather that you yourself is at fault somehow, e.g. for not plugging in. That same reliability is expected of Cloud systems. But no Cloud vendors are yet as mature as the PBX systems switching our phone lines. All Cloud vendors have occasional outages still. They're short-lived, but still annoying when it happens. And we're all working to eliminate them through root cause analysis, corrective action, and other means. Most of us come from a background of writing on-premises software, since SaaS is still a young and emerging segment. In some cases, that means there are habits we need to unlearn because they don't work well in the SaaS space. And overall, it is useful to discuss, not just how to develop and deliver (SaaS) Cloud services, but specifically how it differs from our on-premises experience. I plan to share a series of brief observations illustrating the differences between developing & delivering on-premises software, and developing & delivering corresponding cloud services. I will tag each one with 'cloud_difference' for easy collection with the URL:
Would like to share a customer testimonial regarding LotusLive. We have received some nice press coverage in the past year for a very large LotusLive deal, the largest ever. But as this testimonial shows, LotusLive can add value no matter the size of the subscriber's organization.
What I find so interesting in this testimonial is the transformational power of the solution. This is literally a game changer for the subscriber, catapulting their services into the competitive leading edge. This is what we do best. Help customers apply technology to solve business problems. And win. .
The Lotus Notes calendar, for example, has built-in information about time zones and the associated Daylight Saving Time (DST) definitions. When those change, we supply an update (a 'fix') to ensure customers can continue to rely on accurate knowledge of time zone differences and what dates DST starts and ends in each time zone. I have noticed how very seamlessly Notes handles those differences for me, when I schedule meetings with teams around the world, and I'm not 100% sure exactly what day each country switches to or from DST. I've never had an issue with a misunderstood meeting time. But that seamless performance is only as good as the underlying knowledge of Time Zone and DST definitions (whether based on the operating system or the application), so when governments decide on changes - and some times that happens with literally just a few weeks of notice - we do our best to respond quickly to keep our customers' scheduling accurate.
If the change is known far in advance, say over a year, it's pretty easy to apply fixes to change the definitions in the operating system and/or the application. The problem comes when the change of definition happens with such a short notice, there is already a number of appointments in the calendar during the time period affected. For example, when the US accelerated the onset of DST by three weeks in 2007, users who already had calendar appointments in that three week span needed to make a decision on whether to adjust them or not. There is no easy logic to make that decision. If you had an appointment with your dentist 10 am standard time, that would naturally shift to 10 am DST after the government decides to apply DST on that date, because you are both in the same, affected time zone. But if you had a conference call with a major customer overseas, whose time zone definition did not change (different government), then the appointment would probably stay at 10 am standard time, which will now be 11 am DST for you. The calendar cannot distinguish which appointments to adjust, and which not to adjust. Human intelligence is still needed. Fortunately :-)