About this blog:
This blog focuses on software quality in general, and IBM Collaboration Solutions offerings in particular. The author is an IBM employee, but expresses his observations and opinions as an individual here. The purpose of the blog is to nurture a conversation with our customers and partners about continuous improvement of our software based offerings. ~FTC.
As a quality engineer, it's important to explore select customer success stories, so we can work to replicate the associated success factors across all deployments. In that vein, I'd like to offer a list of IBM Collaboration Solutions customer success stories for you to enjoy. I’m not including any analysis here; just sharing a collection of testimonials.
These testimonials demonstrate part of the value our software brings to our clients across our portfolio. Enjoy every one. This list actually comes from an internal blog post I wrote last year, so most of the linked videos are about a year old, but I like the collection covering most of our key on-premises products. I’ll naturally continue to share additional, more recent success stories along the way, as I already have with for example Signature Mortgage and Colleagues in Care.
Hoping we don't tempt fate with our timing, on Friday the 13th of this month, we quietly turned on an automated fault analysis capability for Notes System Diagnostics (NSD) files uploaded to our Technical Support file repository, called ECUREP (for Enhanced CUstomer REPository). In other words, any time an entitled customer uploads an NSD file, our systems will automatically - without delay - perform an analysis to determine what type of incident is reflected (crash, hang, out-of-memory condition, user killed processes, etc), and for a crash whether the crash stack contained in the NSD file matches any known problems in our database. In cases where a customer encounters a problem already seen & solved elsewhere, the system will be able to point to the known defect and the associated technote. In cases where the crash stack contained in the uploaded NSD file does not match any known problems, no result is returned, but per standard Support process, a new defect is opened to track the further analysis. The Support Engineer along with Development may manually apply other internal tools like MemCheck or Laza, to analyze the incident. Fault Analyzer has shipped with the Domino product since version 7.0 to process data captured with the Automated Data Collection (ADC) feature. Local analysis at the customer site can determine general disposition, or incident type, but local analysis won’t match the crash stack against our in-house database of known issues. That database is comprised of all NSD submissions to the ECUREP system, plus similar data captured in IBM’s internal worldwide environment with over 400,000 employees.
The new automated support analysis leverages the same Fault Analyzer tool available in Domino itself, and runs against our latest database of known issues. It can handle compressed archive files in zip, tar, tar.gz, tar.bz2, ar, jar, dump and cpio formats up to 225 MB in size. There is no logical limitation determining the 225 MB cutoff; it's a cautionary, self-imposed limit we have set to avoid slowing down related processes. Once we get a sense of how the analysis system operates, we may alter the limit. The system offers several key advantages. From a customer perspective, a first possible answer is returned much faster in cases, where the crash stack signature is known. From a vendor perspective, it provides our engineers a quick first analysis of the diagnostic data. The system 'stamps' the information into the Problem Management Record (PMR) visible to the entitled customer via the Service Request tool on the Web. This helps keep all information related to the customer's issue in one central thread visible to both the customer and the support representative.
Given that we have just launched this automated use of the Fault Analyzer in our support process, we fully expect that we will have opportunities to tune and improve the process as we learn from initial submissions and analyses. A key design concern has been – and continues to be - minimizing false positives. Returning an incorrect defect match could potentially waste time for both our customer and our support representative, so to start with we have set match criteria that we believe are specific enough to minimize false positives, but we continue to review and tune the algorithms. As we learn from the initial submissions, we will look for ways to refine the match criteria to allow more submissions to find a match, but only in cases where we can assure ourselves the identification can be done with sufficient accuracy. Experience from the first couple of weeks show that less than half the NSD submissions find a match. However, finding matches for 100% of the submissions is not our success criterion for automated fault analysis. New problems, e.g. from interaction with newly released 3rd party components, will obviously have no matches the first time they are submitted by any customer. If we were able to find matches for all submissions, it would mean that all problems were known. And that would mean either that we were terribly lagging in delivering maintenance releases, or that our customers were terribly back level in applying the maintenance. So to improve fault match identification, we're not focused on achieving matches in a specific percentage of cases, but rather on identifying those additional circumstances (crash stack specifics) that allow us to positively match with additional known issues and extend our logic to cover those circumstances as well.
I hope you agree that with the new automated Fault Analysis, we have taken yet one more step to provide more efficient support to our customer base. Crashes, hangs and resource exhaustion should be rare events, but when they do occur, rapid problem identification is key to minimizing business impact. .
This 11 minute video summarizes the key message in Daniel Pink's "Drive: The surprising truth about what motivates us". It is insightful and an interesting explanation of the motivation behind the open source software movement. And, oh by the way, I wish I could draw like that....
Autonomy, Mastery and Purpose describe some - but not all - of the ingredients in a successful breakthrough, whether in product features or in quality improvement. And software development being a team sport, there is a whole extra dimension of interesting motivation issues to manage. Those aside, enjoy this great video.
Back on April 25th 2011, I started my Quality Collaboration blog on Lotus Greenhouse. Due to new authentication requirements implemented in late September, which require authentication with a Greenhouse ID in order to view blog content there, visits to the blog dropped very dramatically. As a consequence, I have relocated the blog to the developerWorks site, where you are reading this post. If you were a reader of the blog in its prior location, please update bookmarks and feeds to reflect the new URLs, if you use them. Since the blog on Greenhouse was still relatively young, I moved all the already posted content from the Greenhouse blog into the new developerWorks blog, so everything is available and searchable in one place. All the posts below have been copied over from the Greenhouse blog. All future posts will be added here on developerWorks, not on Greenhouse. The blog itself, regardless of location, is still referred to as the Quality Collaboration blog.
Looking forward to continuing the conversation on software quality in the new location. ~Flemming
Quality is everyone's job. None of us is as strong as all of us. You've heard the catch phrases. In quality software engineering, they are true. One of the purposes of this blog is to open the conversation about quality to all customers and partners, regardless of which generation they belong to and how much experience they have with our software. I once hired an intern, who discovered a key flaw in a major test harness less than a week after she started. And this was a harness used by an experienced team for years. A fresh set of eyes can some times question things which the experienced eye has learned to accept uncritically. Everybody is invited to share insight and ideas via comments in this blog. Here is an interesting article in Forbes magazine by an IBMer explaining the imperative and challenges of networking across generations: http://www.forbes.com/2010/07/14/networking-social-media-employees-leadership-managing-ibm.html in the enterprise. His overview of differences in approach by three generations is illustrative, though obviously not every individual fits their generation's description. It's particularly important in the field of quality that we don't let generational differences in approach stand in the way. We approach software differently. To me, it's a perfectly normal, rational act to open up a 'manual' to figure out how a particular piece of software works. Most millenials would never dream of doing that. So I need feedback from all generations to ensure we offer a compelling user or administrator experience for each generation.
By virtue of the fact that the system is web-facing, SaaS systems are more likely targets of hackers. The more complex the solution being offered, the more ports are likely to be open, and the more risks exist. We mitigate security risks in a variety of ways, ranging from security code reviews, to vulnerability scanning and more. We have to design and scan against common vulnerabilities like cross-site scripting, cross site request forgeries, SQL injections and insecure direct object references, and also against many less common ones. We naturally use our own Rational AppScan tool for vulnerability scanning, but also other approaches and tools. For obvious reasons, I can’t share a full list. Almost all of these techniques apply equally to on-premises offerings. SaaS differs from on-premises environments by having the vast majority of the user traffic traverse the internet, not just a company intranet. That makes SaaS systems more likely targets of hackers. And as owners of the production environment, we're responsible for operational security. A top priority in managing a SaaS environment is to keep up to date with security and vulnerability patches. We, just like our customers with on-premises offerings, must set up the necessary processes to keep abreast of available security patches from vendors. We also own the responsibility to run penetration testing against the production environment. We need to distinguish between destructive and non-destructive testing when dealing with the production environment. It's ok to define a self-owned account in the production environment and attempt to gain unauthorized access to it, but it's not ok if that - or any other penetration testing - interrupts service for the subscribers. Interruptive or destructive testing must naturally be done against a test environment, built to mimic the production environment as closely as possible. Our teams also do functional security testing, that goes beyond vulnerability testing and penetration testing, as functional failures in the security and privacy functionality spaces – if they existed - could lead to very significant security risk exposures. A key aspect of security for Cloud solutions is that the security framework must rely more heavily on server side data security to prevent unauthorized data access, because the client side is usually a browser, over which we have rather limited control. When new browser versions are released, we rarely have a choice of whether or when to support them. Users expect to be able to use the latest version of FireFox, Safari, or whatever browser they prefer, the same day it is released. Given the importance of Security, which derives from the fact that security incidents can cause loss of trust by customers, we have to assume security flaws will exist in new browser releases, at least for a while until patched. Our server side security has to handle the task and ensure proper security without relying on browser side functionality. Security testing is a top priority for every release of LotusLive (SmartCloud).
To be precise, current browsers supported by LotusLive officially include Internet Explorer, Mozilla Firefox and Apple Safari. See system requirements here. Other browsers like Chrome, Wildfire, and Opera work for many interactions with LotusLive, but are not officially supported.
The Quality of Service (QoS) depends on, not just the technical quality of code & documentation, but also deployment architecture & instructions, health monitoring, operational procedures, integration into hybrid scenarios, and downtime minimization thru redundancy, virtualization, and disaster recovery. In the on-premises world, there is limited incentive for the development team to optimize deployment time. Not that upgrade duration doesn't matter, but there are other priorities dominating trade-off decisions. In the Cloud world, we look to minimize required service interruptions to update the environment. Through clustering, service delivery runtime environments, etc, we can ensure system availability through many upgrades, but there are some changes that still require a planned outage, such as back-end database schema changes. A highly available (HA) system has redundancy built in, so in the event of a failure the redundant part will take over. When it comes to performing system maintenance and updates, such a HA system needs to be maintainable in a continuously available (CA) fashion. HA is automated, while CA still requires human practice to ensure the correct steps are executed in the correct sequence during CA updates. The ease of deploying the update, and the time it takes, is what I refer to as deployability, and it has renewed importance in the cloud. Both the technical complexity of the update, and the skills of the deployment team, matter. A team that has gone through the same deployment several times can execute it quicker than a team doing the deployment for the first time, and they are less likely to execute deployment steps in an incorrect sequence or to miss a check. For that reason, I like to see the production deployment team participate in earlier deployments into the customer acceptance test environment, and into the staging environment ahead of the actual Go Live date in the production environment. Interestingly, engaging the Web Delivery Operations (production) team in updating pre-release environments has the added side benefit of exposing any differences between production and test environments, which we want to eliminate as discussed in Cloud Difference #4: Test emulates the production environment. Automation of complete, virtualized service deployments is key to minimizing the opportunities for human error during updates. In addition, we require interim builds to be delivered and deployed into test environments using the same techniques that will be used in the production deployment. And although rare, in case a deployment runs into problems, we require a back-out option that will allow us to quickly fall back to the last known good configuration and release. An alternative option is to deploy on separate hardware and cut over traffic from the load balancers once the new instance is up and running. This can in many cases eliminate the concern over deployment time and back-out options, but suitable hardware is not always available because of the associated cost. So a well tested, well rehearsed, automated and well executed deployment remains important.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
Virtualization has become the norm over the past decade, but multi-tenancy is not the norm in typical on-premises environments. Instead, it is associated with cloud computing. Both are critical because they drive cost advantages and enable the provider to offer a more competitive subscription rate. We have experience hosting single-tenant systems, under monikers like strategic outsourcing, managed operations, and managed service delivery. There is just no comparison. Most customers come to the cloud to save IT cost, both in terms of avoiding the larger up front license cost, in term of paying only for what you use in the case of so-called metered services, and in terms of a lower overall total cost. And those savings are driven by multi-tenancy and virtualization. Period. Not all systems are inherently architected to be multi-tenant systems, but the overall cloud solution must be. The Blackberry Enterprise Server (BES) is an example in our LotusLive (SmartCloud) environment. BES does not currently have a multi-tenant architecture. To offer a cost competitive BES service to our customers wanting to receive mail on their Blackberries, we've implemented a multi-tenancy architecture on our side connecting into BES without changes needed to the BES source code. If cost is the primary objective, there is no substitute for multi-tenancy; it is essential to cost reduction. Needless to say, both architecture, design, coding and testing have to emphasize prevention of cross-over visibility between tenants. Since multi-tenancy is basically new and rarely implemented in on-premises solutions, there are entire suites of test cases to be added for cloud solutions to verify the complete separation of tenants. Both design and test need to carefully plan around the multi-tenant architecture.
Experience shows that customers coming to the cloud, in spite of most coming for cost reasons, are looking to be able to customize the solution as much as they used to do with on-premises software. Facebook, MyYahoo and other Web 2.0 applications have set end user expectations for some level of control and customization of the User Interface. Building in customization options including basics like themes and skins can help make a Cloud offering more compelling, but it has to be designed carefully. While we want to be smart and offer general customization options, we also want to be very careful about the incremental cost of supporting custom layout and especially custom functionality. It drives up cost, and the cloud being a cost play in the first place, even small increments in cost erode our ability to offer a competitive subscription rate. That's why we have to remain vigilant about cost control in all the choices we make. The cost advantage of cloud offerings stems in large part from multi-tenancy, and the closer each tenant can align with a common design, the more we can drive down cost to offer more attractive subscription rates. It's important to stay in tune with both existing and prospective subscribers here to ensure we strike the right balance between customization and cost control, as also discussed in Cloud Difference #5: Provider controls the Stack. Avoiding customization cost may sound straightforward, but while we have generally focused on improving Total Cost of Ownership (TCO) for our on-premises software for years, it is new for most teams to specifically focus on the incremental delivery cost associated with customization. Developing detailed cost metrics around the delivery operations is a new focus in the cloud, requiring new channels of cooperation between Web Delivery Operations, Development, and Finance, especially to put in place cost models that will allow development teams to evaluate and compare the implications of design alternatives, even before the coding work is done.
PS: To sort the blog and display just the Cloud Difference series, click on the “cloud_difference” tag below the title of any post in the series.
As cloud system administrators for all subscribers, cloud hosts have a powerful opportunity to direct end users’ eyeballs to anything displayed via the User Interface (UI), such as a survey, and they can even vary content with a "question of the month", for example. This is very different from on-premises software, but cloud vendors need to be very careful to resist the temptation to leverage that eyeball ownership. Pop-up surveys served to the wider population of a subscribing company's users would be interruptive and cause dissatisfaction for both the subscribing company and their end users. Instead, we run a range of other programs to collect feedback from our subscribers, all of them based on the subscriber themselves opting in to the process. Design Partner Programs, Customer Councils, Communities, Forums, Lab Advocates and more all serve this purpose. Some programs, like the Design Partner Program and the Lab Advocate Program, are limited in participants because of the greater time commitment, whereas other programs like Forums on developerWorks and Communities on Lotus Greenhouse are open to all subscribers. Most of these programs extend across both our on-premises products and our cloud offerings. But all programs require the licensee or subscriber to agree to participate before we can engage and ask for feedback. That is in fact not a cloud difference, but rather a similarity. Even though the UI ownership is different, the priority is still delivering a compelling service without undue interruptions.
Rapid delivery of enhancements and fixes is even more critical in the cloud. We can't carry a significant backlog of technical debt. The cost of switching providers is smaller in the cloud than it is on-premises, and that means subscribers are quicker to switch. I want to be careful how that comes across. I'm not saying it's ok to carry a large backlog of technical debt for on-premises software. All I'm saying is that the consequences of doing so materialize more immediately in the cloud. It's important to drive down technical debt, both in terms of addressing warnings from automated tools doing code scans, in terms of unit test code coverage, defects, enhancement requests, complexity reduction, re-factoring, and all the many aspects of technical debt. For that reason, the IBM quality program has established a Technical Debt Governance model, which teams are now beginning to adopt in order to strengthen the focus on minimizing technical debt. Sonar is often the tool of choice to create a dashboard detailing technical debt, whether integrated with the IDE (Eclipse, or similar) or integrated with the build systems to produce post build analysis. Allowing developers to see the implications of their code, whether it raises or lowers the overall technical debt, before they even check it in to the source code management system is a powerful approach that motivates cleaner coding, more complete testing, and more complete responses to customer needs. In the cloud, it is also important to understand the range of usage models subscribers want, and to design the software to accommodate them all simultaneously, since it's a multi-tenancy system. Technical debt includes the continuous adjustments needed to better accommodate the preferred usage patterns as they evolve.
The effort of moving a particular subscriber company's existing data and users into a hosted cloud solution is referred to as 'onboarding'. It's essentially just the cloud version of migration from one solution to another. But there is one crucial difference in the fact that data are crossing from one provider to another, typically from the subscribing company's prior on-premises solution to the SaaS vendor's cloud solution. Two very important considerations result. First, the volume of data to be migrated may well be so large, a transfer via the internet would take unacceptably long. We have other options, such as shippable physical storage devices, available for that reason. Second, because the data come from another environment, the Cloud vendor can't assume they have been subject to the same level of virus and malware scanning required in the data center environment. For that reason, we undertake scanning of subscriber data as part of the onboarding process. This is clearly necessary, as we have in fact found incidences of 'unwanted' content in subscriber data submitted for onboarding. It is key to leverage scan engines updated with the latest virus protection information. With these and other safeguards not described here, we can offer a quick, proven onboarding process.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
Post release quality performance must be gauged sooner, on Day 1, Week 1, and Month 1. In the on-premises segment, we take much longer and typically compare release performance at age 1 year after release. We monitor quality performance all year, but the formal release-to-release comparison happens a year after release. We wait that long to ensure we have a significant percentage of the customer base upgraded to the new release. That's not necessary in the cloud, where all users upgrade simultaneously, and there is only one release in production at any given time. We can and should compare releases after just one week, and after one month. Do we receive fewer support calls per user? What do subscribers call about? That quicker turnaround allows for much better feedback to the product team, who still has the release project experience fresh in mind and can design more meaningful improvements as part of the continuous improvement effort. The rapid feedback cycle is actually a tremendous advance for teams who used to deliver on-premises software. Getting almost instant feedback, rather than waiting for a year, enables teams to respond far more rapidly to customer needs.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
Users are not all local to your time zone. They may be spread across the globe. Not only that, but - even if they were all local - different subscribers have different usage profiles. Many office workers use their systems primarily 8-5. And software developers all night it seems :-). Retailers often use systems until 9 or 10 pm. Entertainment industry users may need their system well into the night. And realty and mortgage companies tend to use systems intensely on the weekend, when home buyers are most active. We have customers in all of those categories. This continuous system workload is another effect of both multi-tenancy and the global dispersion of the user base. There is never a good time for a maintenance window, not even on the weekend, if ‘maintenance window’ means an outage of the services. We deal with that - as do all cloud service providers - by minimizing the need for maintenance outages, so we can accomplish most updates without an outage, and only need to take the system off-line briefly in case updates need to be made to a database schema. (PS: For larger enterprises, this challenge applies in their on-premises environments as well. Only smaller enterprises operating in one or a few contiguous time zones have a workload variation with slow periods suited for a maintenance outage. In the cloud, there are no slow periods).
The discipline of monitoring is not new, but it takes on a renewed importance in the cloud. Our on-premises customers monitor their environments too. Some applications have built-in monitoring capabilities, such as Domino Domain Monitoring (DDM) probes. Others may use separate Tivoli Monitoring tools like the IBM Tivoli Composite Application Manager (ITCAM), or open source tools like Nagios, or even third party monitoring services. The cost reduction focus for cloud offerings drives a need for effective monitoring as we run systems closer to their capacity limits than they typically are in on-premises environments. Monitoring has many targets in a complex environment, such as: availability, basic resources (memory, cpu and disk utilization), queue lengths, bottlenecks, application level parameters, response times, log entries, and more. Building more sophisticated application level monitoring capabilities for our hosted LotusLive (SmartCloud) environment provides an advantage for our on-premises products, who integrate the same monitoring capabilities to make them available to on-premises customers as well.
We expand monitoring systems intelligently to go beyond mere notification that a threshold has been exceeded, or a specific service is no longer available, to predictive capabilities. In basic resource monitoring of CPU, memory, and disk utilization, for example, alerts can be set for adverse trends of particular resources, which warrant a closer look before a potentially adverse incident occurs in the environment. Similarly, analysis of log entries recorded ahead of observed incidents in both test and production environments can help us develop predictive capability, so the application level monitoring too can be used to alert us before an incident occurs, not just after it has already occurred. That way, preventive action can be taken to avoid the incident. We have built, and continue to extend, a sophisticated set of monitoring mechanisms around our LotusLive (SmartCloud) offerings, which helps us take preemptive action and keep systems operational.
Another key aspect of monitoring for cloud systems is the differentiation between monitoring in the data center and monitoring from representative end user locations to "see what users see". Proxies, edge caching, network routes and latencies, etc, all contribute to a different end user experience than seen when monitoring on the systems themselves. It is essential to do both. After the earthquake in Japan in March of this year (2011), a disk array in a network acceleration router in Tokyo gradually deteriorated over a span of several days. It worked fine immediately after the quake, so it was assumed at first that it was not damaged. Monitoring solely at the data center location would not have helped us identify that the subset of users accessing the system via this router was affected a few days later.
Similar concerns can play out for global enterprises with geographically dispersed users in their on-premises environment. The monitoring challenges are in many ways similar between cloud services and large scale on-premises environments. The main differences derive from the scale and the variability of network routes caused by most of the traffic traversing the internet.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
Or more to the point, they are forcefully upgraded to it on day 1, so minimizing defect deferrals is really important. Well, isn't that always important, you might say. In general yes, of course it is, but Cloud and On-Premises business differs. A mode of cooperation has evolved in the on-premises business, where - for better or worse - releases numbered .0 [dot zero] are often not rolled out in enterprise production environments. It's not that we or other software vendors don't stand behind, or thoroughly test, our .0 releases. But many of our enterprise customers want to get their hands on the new functionality in the latest release, so they can set it up in a trial environment internally and start becoming familiar with it. They inherently, though rarely explicitly, accept that while we obviously run the full regression test suites against .0 releases, usage patterns for the new functionality may not be fully known yet before the software ships, and so test coverage may leave gaps, especially in areas of unforeseen configurations and usage. Early use in trial environments allows identification of problematic configurations, which in turn allows us to harden the code by fixing issues in the .1 [dot one] maintenance release. Some would say this should happen via beta testing prior to release, and it does to some extent, but the full trial coverage doesn't happen until we put the .0 release out. Customers using the software in a standard configuration with traditional usage patters are normally fine with the .0 release. More [IT] innovative enterprises, who push the envelope with unique configurations and usage patterns that leverage the newest functionality are where we find the most .0 defects. As a quality engineer, I strive constantly to improve the development and test processes to reduce defect rates, but I'm not blind to the symbiosis with innovative enterprises in play here for On-Premises environments.
In the Cloud space, the issue of defects in a first release of new function takes on a different level of importance, for at least two major reasons. First, all users upgrade on the same day. It's not just the innovative bleeding edge customers, who are willing to encounter and resolve issues to engage with new function early, who adopt the release. It is everybody. Including lots and lots of users with lesser software skills than the bleeding edge experimenters. It really requires a different mind set. Average users need rock solid reliability to do their day job. They perhaps care less about new functionality, but they care much, much deeper about reliability than the experimenters do. Many Cloud providers, including us, have a legacy background in offering on-premises software products, and for those, this is a difference. We need to take to heart that the old balance doesn't apply any more. Release criteria need to be tighter. Defect deferrals fewer. Test coverage wider.
Some Cloud vendors allow multiple releases to be in production simultaneously, but that is not the case for our offerings (LotusLive, SmartCloud).
PS: The blond character above is 'Fletcher', whom I have recruited to illustrate several of the Cloud differences in this intended series. Fletcher is my avatar in an internal comic strip used occasionally in our corner of IBM. I am grateful to my creative colleague, Jennifer Kelley, for coming up with Fletcher.
If a failure, such as a crash or hang, occurs in an on-premises environment, the licensee usually keeps that information out of the public domain in order to protect their enterprise reputation. In the cloud, the situation is quite different. A crash or hang will affect a number of subscribers from multiple companies. The media are quickly reporting an outage, and the reputation loss is incurred by the Cloud vendor, not by the subscriber. In fact, subscribers even have an interest in the media helping put pressure on the Cloud vendor to stabilize the system and prevent recurrence. For that reason, the vendor needs to be prepared to offer clear, transparent updates in case of an outage incident. It's obviously too late to design customer communication channels once an incident has occurred. Whether by e-mail distribution, or a stream of web site postings, it is paramount to offer subscribers insight into the extent and expected duration of an outage. The first things customers look for in an outage are an account of affected users, and an estimated time of repair and restart. But a reliable estimate is often not available until the cause of the outage has been understood. In those first minutes (or heaven forbid, first hour) of an outage, another key piece of information to share is the extent of the outage. Subscribing companies will be receiving internal phone calls from their end users inquiring about the outage. They may learn from phone calls that they have users affected in locations A, B and D, but that doesn't tell them whether or not the system is available from locations C, E and F, nor even whether ALL users at locations A, B and D are affected. Rather than finding the answer by contacting end users, we need to be able to communicate in a timely manner what users are affected. For example, in the unlikely event a particular mail cluster, both the primary and secondary server, is incapacitated, we need to identify the subset of users served by that particular cluster and communicate who is affected. And sooner rather than later, we need to give an outlook for the time needed to repair and restart. We work hard to avoid ever getting into such a situation, but if we do, communication plans need to be ready in advance.
A whole new aspect native to the Cloud space is deciding what operational information to share with subscribers regarding your environment, how often to share it, and how to share it. This is a customer relationship management issue, and it's important because transparency promotes trust. The points about failures and information sharing in the prior blog post are very much a part of building and maintaining that trust. But in the day-to-day operations as well, not just in an outage situation, cloud subscribers need trust in their cloud provider; They need a trustworthy cloud service provider. They need to be able to work with their provider on a range of topics, getting the information needed to make decisions, and to support their end users. If the provider is holding back and not sharing information easily, or not being timely with needed information, there is a gap in trust – or just plain responsiveness - which can prevent a deal. It's important to prove as a provider that we are prepared to share the right information, at the right time and with the right frequency, in the right channel, and in the right language. Information that applies to the entire service, like system availability history, might be shared publically, while other information such as usage statistics for a specific company, is naturally restricted to just the subscribing company. Our LotusLive (SmartCloud) system publishes system status information, but not yet history. We have developed a system status dashboard that includes history, but we need to automate a few data flows before we put it into production with daily updates. The onboarding process, one aspect of which was mentioned in cloud difference # 16 “Onboarding requires malware scan”, is another example of the importance of clear and timely communications. It sounds simple enough, but if you come from a legacy background of providing on-premises software, chances are these communication channels were not designed with cloud services in mind, and there is work to do in order to adapt and put adequate mechanisms in place.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
A key aspect of any software user experience is the response time from submission of a request until the result is returned, usually in the form of an updated User Interface (UI). In the lab, where we develop new releases, we measure response times for a variety of transactions, such as opening a community, downloading a file, or sending an e-mail, under a variety of workloads. These are system response times. For development purposes and comparison between releases, it makes sense to eliminate all other parts of the total end user response time. But what matters to the subscriber is the end user response time, and that includes not just the system response time, but also the time needed to communicate across the network between the system and the user. For a cloud service, because that communication takes place across the internet, response times are less predictable. Depending on the nature of the protocols being used in any given transaction, and the network latency between the user and the system, that added piece of response time may vary with end user location. Furthermore, network latency is not a constant; it's not merely a function of the end user's physical and logical location on the network. It also depends on the network load at the time you measure latency. And on any content caching in place. So end user response times depend on at least the type of transaction being submitted, on the route taken across the internet, on network workloads at the time, on caching conditions, and on system workload at the time. Several of these, the cloud service provider has no control of. As a result, it is not possible to offer a map of end user response times based on location. Some cloud service providers offer a speed test tool, which will determine the latency between the user and the service, but the results can only be taken as general guidance. To offer a realistic impression of system response times in a pre-sales situation, the provider needs to offer time limited trial accounts, which the prospective subscriber can exercise in ways (timing, location, etc) that emulate the intended use. And for development purposes, we need to test the service from locations representative of various subscribers' conditions. Some of these locations can be simulated with systems that add latency to network communications. The advantage of simulated latency is the ability to control it, so we can compare successive releases, when they work under identical latencies. As discussed in Cloud Difference #19: Monitoring is Central, providers need to "see what users see". That thought surely extends to response times, but due to the nature of the internet and the protocols that route traffic across it, two identical transactions submitted in rapid succession can have different response times because they were routed along different paths to reach the data center. For that reason, end user response times are generally statistical measures, either average response times, or 90% percentile response times, or similar. We measure from select strategic points around the globe, which gives us a best case estimate for response times from those locations. But that still doesn’t include what is often referred to as the "last mile": the network segments between the subscriber and the nearest low latency segment of the internet. That last mile may consist of both low bandwidth internet segments and segments within the subscribing company's own intranet between the end user and their company's web facing proxy servers or gateway. We can simulate a generic 'last mile', but every company has different network characteristics. So we address response times at three different locations: (i) at the system itself, (ii) across the network including a set of strategically placed edge caching servers, and (iii) at end user locations within subscribing companies. For the first two locations, we can and do take action to improve response times, by optimizing our services code, and by optimizing caching design and server network. However, in the latter "last mile" scenario, we typically do not have access to the subscriber's internal network, so it's the subscriber, rather than the provider, who must take action in cases where significant latencies in the last mile affect end user response times. We have seen examples of subscribing companies' internal environments intercepting packets and interfering with our services and response times. They are all configuration issues that can be worked out, but it explains why the provider usually can not make a blanket statement as to what end user response times will be. Detailed configurations (security scanning, proxy configuration, bandwidth, etc) in the subscribing company's network affect end user response times.
Solution performance has two key aspects. One is end user response times as discussed in the prior post; the other is capacity management, which is the discipline of predicting workloads and sizing systems to ensure reasonable system response times given the number of users accessing the systems. Capacity management differs between on-premises environments and cloud solutions like LotusLive (SmartCloud) in one pivotal way: In the on-premises world, each server is usually looked at in isolation. It serves a specific purpose (file server, database server, mail server, etc) for a specific set of users with reasonably consistent work patterns. Single-Sign-On (SSO) systems notwithstanding, it is often sufficient to size and scale each service independent of each other, and we typically determine or plan for a concurrency rate, or percentage of users active in any given instant, for each individual service. In our LotusLive cloud environment, we need to look at the combined solution of all services for capacity management because of the integration points and dependencies between components. For example, every meeting user goes through the same authentication mechanism used by those accessing file sharing, communities, chat services, and more. And a meeting moderator may be spawning transactions to the Files service by sharing a file in a web conference, even though their end user experience is "conducting a meeting". For that reason, we need to determine concurrency rates across services, not just individually for each service. There is also a need to track resource consumption (cpu, i/o, memory) and analyze trends in real time, in order to predict when additional virtual machines (vm's) need to be spun up. For example, internal operational data give us a pretty clear idea of the cpu consumption level at which service quality starts to deteriorate gradually. That allows us to spin up an additional image in order to serve a growing workload and avoid degrading service quality. The multi-tenancy nature of the cloud solution means that the user base is constantly growing; not nearly as stable as most on-premises environments. Mergers and acquisitions can lead to significant changes in user base and workload in on-premises environments. In the cloud, we live that change every day as new companies sign on to use the services and add potentially large numbers of users to regional data centers. Virtualization technology is what enables us to quickly provision as many servers as needed to service rapid growth in workloads. And good capacity management keeps capacity ahead of the growing demand.
PS: To sort the blog and display just the ‘Cloud Difference’ series, click on the “cloud_difference” tag below the title of any post in the series.
It's in the nature of a multi-tenant system that the various system logs will contain information pertaining to users from different subscribing companies. This sets up a potential conflict, when a subscriber has need for a log capture, e.g. by data subpoena, for troubleshooting, or otherwise, and for confidentiality reasons we can't disclose information about other tenants. We need to be prepared to quickly scrub a log of information about other tenants' transactions and deliver a subset of the log entries showing just those entries, which are relevant to the customer requesting the log. That subset of the log is what we generally refer to as a journal.
We also need to specify the retention time for logs. That's a decision on-premises customers make themselves, but for a Cloud system we need to set that value, and set it so everybody is satisfied, and compliance requirements are met. There is naturally only one log retention period, and it applies to all subscribers.
Finally, we carefully manage access to production system logs. In the interest of our subscribers, only production system administrators can access these logs. Developers needing to extract log information to troubleshoot on behalf of a customer – even though their work is customer requested - need to go through an exception process, demonstrate their legitimate need for the information, and have an extract provided to them. This is not vastly different from on-premises environments, where log access is also controlled. They key difference in the cloud is that the server logs contain information generated by multiple tenants, and we need a repeatable mechanism to filter logs and provide single-tenant extracts.
When the media ask whether cloud computing is ready for prime time, the key topics are resilience and security. Back-up and restore capabilities play an important role for resilience. The ability to recover from adverse events, whether natural disasters, sabotage, disk failures, or other, needs to be broader and more granular than it typically is for on-premises customers. The reasons for this include the concentration of a high number of users being served from one data center and the multi-tenant nature of the system. This means we need the ability to restore not just the whole data center, but also individual companies, individual servers, or individual users, depending on what parts of the system were affected by the disaster. If a limited disaster has rendered an individual company or user 'corrupted', we don't want to have to do a system level restore affecting all users and/or all companies in order to recover those who were corrupted. Rather, we want to be able to perform a restore operation for only the affected companies, servers, or users. Per tenant back-up and restore capability is similarly an important idea unique to multi-tenant cloud environments, though not generally implemented and automated.
Enablement materials help the licensee or subscriber's end users, help desk and administrators understand how to leverage the capabilities at their fingertips. In the on-premises world, those materials ship with a new release, or are posted to the web shortly after. Yeah, I some times battle people's interpretation of 'shortly'. But you know what I mean. Enablement materials are generally launched with the (on-premises) product, which is natural because minor changes in a release can happen up to very late in the release project cycle, and we want to keep the enablement materials in sync with the code. However, in the enterprise Cloud space we rely on the subscribing company's internal help desk to handle end user calls. That's not in and of itself too different from what we do on-premises; the real difference is that the customer's help desk receives calls from end users the first day a new feature has gone Live in our hosted production environment. In the on premises world, there is time to enable the customer's help desk between the day a new release ships, and the day the customer upgrades to it. That window shrinks to zero in the cloud world. So in order to serve their end users well, we need to enable customers' help desks on new releases BEFORE they actually go live. For that reason, we generally provide "What's New" documentation in multiple languages to our subscriber companies several weeks in advance of the Go Live date, giving them the opportunity to ensure their help desk is ready to answer calls on the new release from day 1. The earlier delivery of the enablement materials imposes a freeze on release content; once the enablement materials have been shared with customers, we have to keep release content from changing.
Another aspect of pre-release enablement is our User Acceptance Test environment, which gives select administrators from subscribing companies access to exercise pre-release code and get familiar with new function before it is launched into the production service. The word "select" indicates that we're not pushing administrators to leverage the environment. We're working with those who express the desire to prepare for the new release in greater detail. The administrators who do are often from large enterprises with large numbers of users and an internal help desk that needs to be enabled in advance of the Go Live date.
Point releases concentrate risk. And in the cloud, that's not smart. The point release mind set no longer applies when you operate in the cloud. In the on-premises world, we bundle literally thousands of code changes into a single point release. The aggregation of value from each of the many code changes compensate for the effort an on-premises customer needs to expend in order to update servers, clients, directories, etc in their global environment. In the cloud, there is little to no effort by the customer to upgrade to the new release. So there is no particular reason to concentrate all the code changes in a point release. Instead, we can spread the risk by deploying in much smaller increments, one component at a time. Architectural separation of the solution components is pivotal to building an environment in which we can update one component without requiring a simultaneous update of another. Since our LotusLive (IBM SmartCloud) system in large part hosts code that was originally developed for on-premises products, where architectural separation of components was a lesser concern, we are now making a series of changes to further separate components. In that process, we are also reducing code complexity; a great benefit to go along with the risk reduction. This all leads to more frequent, and smaller, releases. We are now seeing our collaboration services deploy so-called Tune-Ups roughly monthly. They can contain both fixes and new function. Whenever new function is included, there is a need to enable the subscribers with information about the new functionality and how to use it. That's a good reason to group the changes by component, refreshing one component at a time, rather than a number of scattered updates across all the components. The stream of updates still has to be consumable to the subscribers, who will often want to update their help desks for each change. A random stream of changes would be confusing to both help desks and end users. Groups of changes around a particular component or theme are much more consumable. This 'grouping' strategy requires a reasonably rapid release cycle, so no single component has too long a cycle to bring updates to the production environment.
All that effort for traditional software products over the years to configure test environments that emulate on-premises customer environments, to prioritize test scenarios, to gauge what platforms customers use the most, and to understand their usage patterns; All of that becomes so much simpler in the Cloud space. We own the production environment and know exactly how it is built. There is only one production environment architecture, even if it is duplicated across multiple data centers. We can bring Test environments as close as we want to the production environment in terms of both topology, configurations, settings, data population, workloads and usage patterns. There will always be attempts to cut corners and save expense, but net/net, we know EXACTLY what the production environment looks like, and we can even know how it is being used. We just have to peruse monitoring results to find out. Test environment parity is important to ensure our testing is representative of production environment behavior. In our test environments, we continuously act to keep parity as close as possible. We have updated load balancers, anti-virus software, memory configurations and more, to keep our test environments in sync with how the production environment evolves. We have created a process by which the test environment owners are notified in advance, when changes to the production environment are being planned. Simply telling them when changes are made is not sufficient, as it may take time to plan similar changes for the test environments. There might be hardware, or software licenses, to acquire, or there might be schedule conflicts with ongoing test efforts to resolve before we can execute the update. That is why early notification is necessary. This is simply common sense, but also a great advantage for test engineers, who can have more insight into usage patterns in the Cloud environment, than they are used to from the on-premises world.
Although we don't necessarily own the source code for every layer in the stack, we do control what products are used in the stack. Stack management offers us both the opportunity and the responsibility to be smart about our choices for optimum subscriber benefit. For example, our solution involves multiple web server instances, so it makes sense for the sake of limiting complexity to standardize on one particular web server to use for each instance. There may be different schools of thought advocating that each web server is used slightly differently, and that the choice of web server software should consequently differ to achieve the most efficient solution possible. In my view, stability and cost are both top priorities served better by standardizing on one common web server. Reducing complexity leads to higher quality.
This applies not just to the choice of web server software, but more broadly to usage patterns as well. On-premises we often tell customers all the ways they CAN do things, while in the Cloud we may want to focus on the optimal way they SHOULD do things. That's because the cloud is a cost play, and the more variation we support in usage patterns, the higher the cost. That's not to say we need to whittle our options down to one single usage model for all, but we do need to strike a balance that is different from on-premises software. I don't pretend that we'll always know what usage model best serves each customer. We need to offer our users choice. What I am also saying is that we need to take the enormous variability in the on-premises world and help our cloud customers condense that into a reasonably limited set of usage models. Because that allows a cost saving. Just imagine the cost if we each drove our own custom designed and custom built car. For cost reasons, the market settles on a reasonably limited set of models. And as long as we share the associated cost savings with our customers, we'll be fine. But if we overstep and drive condensation only meant to boost our own margins without passing along savings to the customer, we will stumble in a competitive landscape. The focus must still be on providing a compelling service at an attractive price point.
PS: To sort the blog and display just the Cloud Difference series, click on the “cloud_difference” tag below the title of any post in the series.
The prior post stated that "the cloud is a cost play", but that's not really all the cloud is about. The ultra-fast provisioning enabled by virtualization technology can be leveraged to transform the way users are able to work. For example, where most on-premises collaboration environments are fundamentally about internal collaboration, the cloud services providers can - if they want - implement ways of allowing guest accounts. The LotusLive, or IBM SmartCloud, service does this. You can invite your suppliers, partners, or customers, to collaborate with you without having to pay for an incremental license for their account. That's a pretty powerful expansion of how you collaborate. It can transform your collaboration in ways your on-premises environment usually doesn't. Think about event planning, or an ad hoc analysis, or an acquisition, where you might need to share information with collaboration partners outside your company (conference center staff, lawyers, business partners, etc) for a limited period of time. You don't typically give them access to your internal collaboration environment. But in a properly prepared cloud environment, it is easy to include them and control what they can and cannot access. Internally, we need to ensure special scrutiny is applied to such differentiating features, which our Sales teams are likely to lead with. In all fairness, there are drawbacks too. For example, we're limited in our ability to migrate subscribers' pre-existing Domino applications into a public cloud, because of the multi-tenancy nature of the system. But that's why we offer both hybrid and private (single tenant) clouds as well. What can your cloud provider do for you? The options are many.
Whenever a high severity incident, such as a service outage, occurs in a cloud environment, repairing the system and bringing services back online is the immediate priority. But we also need to identify and eliminate the root cause of the problem. The root cause is the reason the problem was injected into the system. This is not to be confused with the immediate failure cause. Because the first priority is always to return services to normal, we first chase and correct the immediate cause of failure. In other words, we take a 'repair' action. However, that will often not prevent recurrence. For that we need to take a ‘corrective’ action as well. We have to get deeper and understand why the system entered the problem state. A cause is a root cause, when elimination of the cause eliminates injection of the problem. That's what separates root causes from all other causes. By understanding and eliminating root causes through corrective actions, we can eliminate entire classes of defects or problems, rather than simply fixing the one defect or problem we discovered. But root causes are also more costly to eliminate, especially when they require a change of human behaviors, such as failure to follow written instructions. Monitoring and correcting human system administrator behavior takes time. That's why in the on-premises world, we tend do more causal analysis, i.e. identifying clusters of similar problems/defects and targeting actions to reduce their occurrence, rather than doing RCA, i.e. determining the ultimate cause of each individual defect, which is time consuming. That balance between the more affordable causal analysis, and the more effective, but costly, root cause analysis, shifts toward RCA in the Cloud services space. To meet and exceed SLA targets, we simply cannot allow the same root cause to hit the availability number twice. Once an incident has occurred, it is relatively more likely to recur - because the triggering condition now exists - unless the root cause is eliminated. Thus, it is imperative to go after elimination of root causes of any adverse incidents observed, whether or not they caused an outage. The first instinct of many teams once they understand 'what' went wrong is to add a test case to the pre-release test case suites used to qualify new releases. But defect removal as a strategy is almost always inferior to defect prevention, and certainly more costly. By broadening our understanding from 'what' went wrong to also see 'why' it went wrong, we can take a corrective action that eliminates all the potential future problems sharing the same root cause, the same 'why'. A recent out-of-memory condition I worked with provides an example. Adding tests and throttling workloads to the troubled component might solve an immediate problem, but we need to go upstream in the development process and understand why this out-of-memory condition was not prevented by coding better memory management in the first place. By so doing, we can prevent similar out-of-memory issues in all components across our solution. Root cause analysis views the development process as a software manufacturing engine, and when it turns out a defective product, there must be a flaw in the engine to be corrected. Maniacally identifying and correcting these flaws pays off by tuning our engine to become flawless efficient, and effective. And in the cloud, that is paramount.
PS: To sort the blog and display just the Cloud Difference series, click on the “cloud_difference” tag below the title of any post in the series.
Because of the limited, pre-defined maintenance windows, release schedules are no longer flexible. In the on-premises world, we tend to pile on release content requested by customers, often planning for the maximum content that the initial project schedule can support. As the project progresses, scope changes occur, and ultimately we'll slide the release date by a month or two, if we have to in order to deliver all the envisioned content with an acceptable level of quality. This works because, until the release is announced, the release date can be changed without much impact to our reputation, or to customers as they haven't yet made plans based on a release date. But in the Cloud world, missing a targeted maintenance window means having to target a different, future window, potentially bumping another update, which had targeted that window. LotusLive currently has two maintenance windows per month, but one is dedicated to security patches and critical Operating System updates, leaving one window per month for deploying new releases. Missing your targeted window and needing to shift to a future window means bumping all other releases one window further out. That's clearly not acceptable to other release streams. Sticking to an inflexible schedule is important. Which, by the way, is a reason that Agile development works well for cloud services. I want to make a key point here that, in the context of this description, maintenance windows are not necessarily the same as outage windows. Our outage windows are shrinking toward zero, but just because we can update the environment without outages, that doesn't mean we will do so frivolously. We will still organize updates in groups and communicate changes to customers in advance, as described in Cloud Difference #2: Enablement precedes launch. From a release perspective, we will still be dealing with planning toward a particular environment update, and that's what I have called maintenance windows above.
PS: To sort the blog and display just the Cloud Difference series, click on the “cloud_difference” tag below the title of any post in the series.
I've decided to wrap the series on cloud differences for now. My next set of thoughts were in the direction of the project management intricacies associated with transitioning an enterprise into a global cloud environment, but I can always return to that some time down the road. For now, I want to remember that my topic is software quality and collaboration, not only in the cloud, but on premises as well. We have exciting stuff going on around social software, mobile devices, unified communications, and exceptional web experiences as well. The cloud difference series was not an attempt to communicate deep technical substance; it was an outline of some of the many things we think through as we build a successful portfolio of cloud offerings. A skate across the surface, if you will. I created this wordle based on the blog content. In case you’re not familiar with wordles, know that relative font sizes represent relative frequencies of occurrence of each word in the source text.
I hope you'll agree we have a compelling set of offerings for cloud based collaboration. I know it's quite competitive. And, as this series has hopefully demonstrated to you, our team is committed to continuous improvement that will continue to position our services as market leading. Feedback on the series welcome :-) Now on to other topics next week.
I’d like to share another LotusLive customer testimonial with you, this one from Colleagues in Care, a non-profit organization of healthcare providers, who have worked in Haiti for over ten years. I had the privilege of meeting today with Drs Kenerson & Hanson, who appear in the video, to discuss how they collaborate in the cloud. LotusLive has a unique guest model, allowing subscribers to invite external guests, which is perfect for an organization that relies on large numbers of volunteers, many of whom collaborate for relatively short periods of time. Naturally, we’re looking at ways to further enhance this particular aspect of LotusLive.
It is fascinating how a collaboration process we leverage every day, and at some level take for granted, can make such a significant contribution when applied to a very real need in a non-profit organization leveraging knowledge from thought leaders around the world. Take a look at the amazing work of Colleagues in Care.
Quality has multiple dimensions, but ease-of-use is undeniably a big part of how users subjectively evaluate software they work with. This comparison of IBM Connections and Microsoft Sharepoint gives an in-depth illustration of how our teams have worked to make IBM Connections easy and intuitive to use. Collaborating, sharing documents, or becoming a social business are all topics of the day, but as this video demonstrates, to ensure you choose the optimal solution, you have to go beyond the buzz words and look at how well a solution aligns with desired work patterns and enables productivity. Social tagging is a very key aspect of IBM Connections, which helps me find relevant material, helps save my own time, and helps prevent me from having to interrupt colleagues with requests. If your team is anything like ours, you have an increasing amount of unstructured information to analyze and drive value from. Without social tagging, and without a single search capability spanning all the content, you couldn't even dream of accomplishing a comprehensive analysis. Forget the buzz words. Witness the power of a well thought out solution that aligns with your needs. .
A quick pointer to an interesting discussion started by my colleague, Fernando
Salazar, in his blog on the confluence of UCC and Social Business: Unified Communications & Social Business, Part I: Apples & Oranges, or Salad Supreme? As we integrate UCC into our social business
solutions, what are the success factors we need to prioritize? Social
collaboration has multiple dimensions we need to think thru as software
The major dimension that comes to my mind is the distinction between ‘push’,
or ‘command’, driven collaboration versus what I call ‘value’ driven
collaboration. When I go to social collaboration systems, I go because I expect
to find and leverage value there. Primarily information I need. Nobody is
telling me to go there. If I don’t access a particular community, activity or
forum for months, nobody is holding me accountable for being a no-show. The
value is in my results. But every business also needs a ‘command’ channel for
type of communications. My manager holds me accountable for being up to date
with my e-mail because that’s where ‘command’ communications happen today. As we
think of integrating collaboration, we have to be careful to allow appropriate
separation, or filtering, of these types of collaboration. The last thing I want
is an overcrowded message stream resembling an overcrowded e-mail inbox. I need
filtering that makes it easy and intuitive to separate the ‘command’ and ‘value’
driven forms of collaboration.
The UCC/Social relationship is another interesting dimension, which focuses
on whether you need the answers instantly or not, and whether you know who to
ask. As much as technology allows you to ask a group of people the same
question, it would clearly be too interruptive if we all sent out multi-person
polls every time we needed an answer. When it comes to the value driven
information exploration work, I often go to a social collaboration system
without knowing who the author is of the information I seek. [See my blog post
entitled “How Connections helped Connections” for an example]. Yet, when
I find the information, I may want to contact the author for additional
perspective. UCC is more acceptable (less intrusive) when used for 1:1
communication. It’s also great for many:many collaboration, but that would be
for meetings, etc. So the synergy between UCC and Social technologies bridges
that spectrum, with Social focused on the many players and UCC focused on fewer
players. I may use the social software to search a great many authors & docs
& tags, and then use UCC software to gather context, chat with the author,
or have a synchronous meeting with the team using the information.
No doubt we need the explicit communication facilitated by UCC. But as we
integrate UCC into the social collaboration models, one of the keys is to pay
attention to the different modes of collaboration (1:1, many:many, information
exploration, information dissemination, decision making, etc) and integrate the
right technology for the right task in the right place; not just offer
ubiquitous presence awareness, or every capability in every place, but offer the
right capability in the right place. This is challenging because the social
software usage models are not always well defined. Vendors write their software
to be configurable and adaptable to appeal to the widest possible set of
enterprises, yet often fail to offer more prescriptive guidance to their
customers in best practices and best usage models. Which means UCC software has
to be very flexible allowing for efficient integration into different usage
models. System administrators need to be enabled to configure what integration
points to surface, and which ones to keep dormant, based on their preferences
and the trade-offs they’re willing to make between functionality and
What do you see as critical success factors for integrating UCC into Social
Business solutions? Please submit answers via Fernando’s blog.
What does dining on a cruise ship have to do with the quality of Collaboration Solutions? Plenty, if you're willing to translate, as quality transcends both the restaurant user experience and the software user experience.
My wife and I recently decided an offer for a Spring Break cruise was too good to decline. We were ready for a break, and the price was right. So we boarded in Miami and had a wonderful break cruising the Caribbean. One night we decided to dine in one of the Italian restaurants onboard, the Venetian on the 6th deck aft. As a quality engineer, I always have my antennas out for quality experiences I can learn from, whether good or bad. Here is what I gleaned from Dinner at the Venetian:
There is a dress code, so we show up looking far more formal than we did half an hour ago up on the pool deck. The entrance itself is a stair case leading down from the 7th floor into the restaurant. The steps are white marble with red carpet runners. The decor looks good: Carefully decorated tray ceilings in light blue and aquamarine with golden edges separating them from the white flat ceiling. White ionic columns with flutes painted in gold. The windows have an artistic custom arrangement of glazing bars. A pianist is at work pleasing the auditory senses. The room is non-smoking and appears clean and inviting. It all looks promising.
Software corollary: Expectations are set in many different ways before the user even lays hands on the software. Although this is a restaurant, we haven't even looked at the menu card yet, but I am already making (preliminary) judgments about whether this was a good choice based on looks and sounds. An important lesson here is that the experience I ultimately have is not just about the software (or food). The total experience involves everything from marketing messages (decor) to the actual software (the food), and the support & services (music, waiters, etc). Quality expectations are higher, when the IBM logo is applied
As we are seated, the pianist hammers out a cacophony of tunes better suited for a British pub or a German bierstube. With my eyes closed, the music paints pictures of steins colliding, foam overflowing, and long benches of people swaying to the music and singing the lyrics they know so well. But as I open my eyes again, I'm back with the Italian decor. The disharmony is jarring to the observer absorbing the details. I glance at the menu, which consists of a static menu plus a regional specialty, which today happens to be Mexican burritos. Excuse me? I thought I came to an Italian restaurant. The intellectual dishonesty, or at least the complete absence of artistic direction, is by now bothering me. What is being billed as Italian, is anything but.
Software corollary: The user experience needs to be consistent across all components of the solution. We drive a great program known internally as OneUI, the purpose of which is to align design elements across products. The same action, say uploading a file to a server, should consist of the same steps, and look the same, whether you're working in a Lotus, Tivoli or WebSphere product. OneUI is a key element in how we achieve that.
The menu cards are gigantic in size, and the text takes up far from all the 'real estate' on the page. After making my decision, I cannot lay the the card back down on the table, where it simply won't fit between the place settings, nor in the empty spot prepared for my plate. The menu card seems to be getting in the way, rather than facilitating the dinner.
Software corollary: Don't make aspects of the solution so large, they won't fit. Memory footprint, CPU utilization, and storage requirements need to be commensurate with expectations, especially when the software runs on a platform servicing multiple applications. We want our software performance to be CPU-limited, rather than memory-limited.
Having decided on the food, I ask for the wine card. Should I really have to ask? Oh, well. Lets move on. Chianti is my favorite wine, made primarily on the Sangiovese grape. It would seem appropriate for an Italian restaurant, but I ended up selecting a non-Italian entree anyway, and my travel companions might not appreciate the rich, high acidity wine. So I pick out a bottle of Argentine wine made on the Malbec grape, which I have recently become a fan of. The waiter leaves to submit our orders. Upon his return, he announces that they are out of the wine I ordered, but he has brought along another Malbec at a lower price. Hmm, so why advertise a wine you don't have? And when you can't fill an order as agreed, offer a better wine at the same price.
Software corollary: Keep the documentation accurate and up to date. It is frustrating to spend time trying to follow instructions or information, only to find out they don't apply any more.
I let the waiter pour a sample, and I approved it. He then went on to fill the glasses of my three travel companions, but he left me with just the sample. To his credit, he realized quickly and corrected the mistake himself. For some reason, he kept the bottle at a setup table in the distance. He naturally intended to fill our glasses intermittently, but it can be hard to keep up with timing, when you have multiple tables to serve. And sure, we ran out at one point. Had the bottle been placed on our table, it would have been easy to fill our glasses ourselves. Perhaps not as elegant, but easy.
Software corollary: If you're going to take over something that is easy for the user or administrator to do, you must do it exceedingly well, as they will otherwise see you as an unnecessary barrier to efficient operation. I'm thinking of our LotusLive Notes hybrid offering here. The hybrid solution allows you to extend an existing on-premises environment with additional Domino servers in the cloud, managed by IBM. This creates a mixed environment, where the customer already has Domino skills, but doesn't have administrative rights on servers located in the cloud. Nothing wrong with that, but needless to say, our administration has to be world class. After ~20 years with Notes/Domino, we think we know how :-)
Lest you think I complain about everything, let me point out that the Chef did a marvelous job. The music and the menu had both made clear to me this would not be an authentic Italian experience anyway, so I settled on a Chicken Tikki Masala rich with Indian peppers. The meat was tasty and juicy, cooked to perfection, and the pepper sauce was perfect, hitting primarily the front of the palate. The vegetables were tender, not mushy, and the presentation was good. All great work from the chef. Alas, they placed and served this well cooked meal on cold plates, causing it to relatively quickly loose temperature. Why diminish such a great main course by cutting corners?
Software corollary: No matter how good your code is, if you run it on a less than satisfactory platform, the overall performance will not be satisfactory.
When traveling with young ones, a trip to the bath room is inevitably required. No exception this time. As we exit the rest room, a sign encourages us to use a paper towel when turning the door knob. The purpose is noble; prevent spreading of germs or diseases between passengers. But the encouraged behavior is not thought through. We must pass through two doors about 8 feet apart, the first one from the Men's room to a corridor also used by the Women, and the second door from the corridor into the restaurant. If I open the first door and dispose of the paper towel in the bin by the door, I will be using my bare hands on the second door knob, defeating the purpose of the paper towel. If I bring a paper towel with me to the second door and use it there, there is no waste bin in sight as I enter the restaurant. I don't really want to bring rest room paper towels to my table.
Software corollary: Think through user stories and act them out. Be weary of programmatic initiatives that implement the same solution everywhere without adapting to specific circumstances, such as two doors in series.
Back at the table, I can't help notice the flickering candle, which is an electronic candle. A special flickering bulb inside a tube of frosted glass gives the appearance of real candle light. Fire is the greatest danger at sea, so I can certainly understand the elimination of open flames at every table. I don't mind the 'fake' candle, until the flickering goes crazy. Where most of the candles vary the light intensity say +/- 20% to give the flickering appearance, our candle started varying the light intensity +/- 100% making it look more like some kind of alarm clock alerting us to it's message, while of course alerting fellow travelers at nearby tables as well. If you're going to use faux candles, at least test that they appear realistic.
Software corollary: We might think of all the instances of our software as the same, but customer experience is always unique to their own instance and can be affected by the specifics of the local environment.
We ordered dessert. When the waiter brought it, he had more desserts than we ordered (maybe some were for another table?), and he had to ask us who had ordered what. A very small inconvenience, but it doesn't build trust when your waiter can't remember what you ordered.
Software corollary: We need to be smart enough to prevent asking our clients the same question twice. When we work in teams, whether as developers, technical sales specialists, support engineers, or in other client facing roles, our internal information sharing mechanisms must be sufficiently effective that information captured once is available to all the players servicing the client. We do this well in some areas, but there are places we can improve.
Overall, I enjoyed that the portion sizes were such that you could have three courses without feeling like a bloated goose afterward. Nice job there.
Software corollary: Don't push every product we have, design a solution that matches needs.
So, am I just a cranky customer expecting top service for a cheap dollar? Not really. I do think customer expectations were met here. This was not horrible considering the favorable price for the cruise, but it wasn't impressive either. It didn't make me favor this cruise line ahead of others for next time we go. It didn't build loyalty. Notice that none of the things I pointed out here are particularly costly to fix. That's the opportunity we have at every touch point. To impress and delight our customers, to build loyalty. You can translate the restaurant experience into our world of software development easily. The decor, the dress code, the menu, those are all Marketing. The kitchen staff, they are the developers providing the product. The waiters, they are the Services & Support teams. And the Maitre D' is of course the Quality & Customer Satisfaction Engineer, which is my role.
Later in the cruise, we went back for a second visit to the same restaurant. The pianist had more suitable music for dinner in terms of tempo and mood, but he was still on quite a world tour. We sat down to an old Jewish song, Hevenu shalom aleichem (Trsl: We brought peace onto you). He transitioned smoothly into Don't Cry for Me Argentina (Lloyd Webber, British) and then onto Autumn Leaves, which is a French song, although for me personally it evokes memories of New York City. On this World Tour, his next stop was Speak Softly Love, known as the Love Theme from the Godfather. Aha, I thought, finally something remotely related to the Italian theme here. And then it hits me how insensitive my insistence on Italian music in an Italian restaurant really is. This happened to be Holy Thursday in my calendar, but that meant it was the beginning of Passover for our Jewish friends. In that light, Hevenu shalom aleichem was indeed an appropriate tune for an eclectic audience on this day.
So the final lesson this quality engineer draws from the Venetian is this: It's not all about me. It's about all of us. We must learn to see past the tip of our own noses and understand the perspectives of our customers & partners. And that's what this blog is all about. Feel free to share your perspective in comments.
Who knew you could learn so much about software from an Italian restaurant at sea :-)
Why do social tools matter in external customer communications? Because, as Andy McAfee says in this interview, “we can hear with much greater fidelity, the voice of the customer”. Right on.
As someone always interested in the customer view of quality, I have worked with surveys for years, discerning trends in quantitative survey results. We always knew whether we were getting ‘better’ or ‘worse’. Did that 3.8 score from last month grow to 3.9, or did it perhaps decline to 3.6? We knew exactly. But did we know what to do in order to improve the score? Not very clearly from simple, numeric scores.
That’s where social tools enable a much richer interaction offering technical details of customer configurations and usage patterns, preferences, new requirements, and reasons for them. One way to engage with us via social tools is to join the IBM Collaboration Solutions Community on Lotus Greenhouse. We have a range of Facebook pages, Twitter handles, blogs and other social tool engagement, but the Community is the most powerful social connection in my opinion, because it lets you interact directly with our engineers and with fellow administrators or users in other enterprises to discuss common interests, share documents and more. Social tools have a central place in engaging your customer base, because the fidelity of the customer’s voice is so much better in those tools than in old school blind surveys.
Congratulations to our newly elected President & CEO, effective January 1st 2012, Ginni Rometty. She understands better than almost anyone the need to constantly reinvent ourselves and our company. To take risks. To grow. To learn. To matter. To contribute. To lead.
Thank you to our outgoing CEO, Sam Palmisano, for an amazing 10 year run at the helm of Big Blue. Having positioned us well in terms of both company performance and succession, I know you won’t mind me saying this, and in fact, I suspect you’ll agree: The best is yet to come!
Today marks 100 years since the founding of our company. Very few companies have made it that far. IBM continues to have a unique vision and impact in the world. Our commitment to quality and customer satisfaction in all that we do is naturally important to me as a quality engineer, but on a broader scale, it not just about better quality; it's about a better world. Our campaigns call that a "Smarter Planet" these days, but make no mistake: Our commitment to a better world has been ingrained in our company DNA for a long, long time. This is not simply the latest marketing campaign; it's who we are. Nothing beats working with a team determined to do better every day. The IBM Centennial Film "Wild Ducks" below captures that spirit really well, while celebrating our visionary clients, who allow us to serve them as the innovators' innovator.
As we celebrate the centennial and the wild ducks, who made forward leaps toward a better world, some times by leveraging our technology, and always with grit and determination, it is especially interesting for us in IBM Collaboration Solutions to see the transformation of collaborative norms toward social business, crowd sourcing, and a truly networked world. And to think about what that might mean for the future sources of ideas and innovation, helping enable every citizen of the world participate in, and contribute to, networks of enablement, productivity and improvement. Connecting more wild ducks and creating more opportunities. Our technology and our company is committed to help fight disease and to help lift fellow humans out of poverty and exclusion every day, to build a better world. To make our world, not just smarter, but through that also healthier, cleaner, more peaceful, more just, more inclusive, and more prosperous. Now, that is worth celebrating!
My colleague Jon Mell in his 'Social Collaboration' blog discusses five myths of social software. I'd like to share a story supporting his debunking of Myth #1 It's all about Facebook as well as Myth #2 It's all about Generation Y. One of my many hats is being an IBM "Lab Advocate" for several accounts. Lab Advocates help customers and partners leverage our portfolio of offerings. The relationship is often supported by a non-disclosure agreement allowing the Lab Advocate to disclose future plans and helping the customer or partner position themselves to take advantage of new products and releases. One of my accounts is a large Partner covering multiple countries and partnering with multiple hardware and software vendors. We had a competitive situation there last year, in which Lotus Connections and a competing product both vied for their choice of a social software solution. The existing environment already had many components within it from the competing vendor, so proposing Lotus Connections was raising eyebrows. It was viewed as a new species. But an internal group had conducted a Proof of Concept (PoC) exercise with release 2.5. They were preparing their report to the executive leadership team, when I first met them to help position Connections. Like many "lab rats", I know much more about our own product than I do about the competing product. I reached out to the Connections team, but they were in a critical customer meeting introducing the beta release of version 3.0 right at the only time I could schedule the Partner to go over this. So I clearly needed to very quickly come up to speed myself on how to competitively position Connections versus this particular competing product.
That's when "Connections helped Connections".
Within our own internal (w3) deployment of Connections, I was able to quickly find the Product Management community for Lotus, join it, and locate a competitive comparison between Connections and the competing product, which in turn enabled me to make substantial arguments to the account in favor of our product. This would have taken much, much longer without Connections, and I would not have had my arguments in time. Thankfully, our Product Management team had decided to share their insight. by posting an excellent write-up comparing the two products. The same outcome would flat-out not have been possible in the competing product, because they don't have the concept of joining a community at will. They require a system administrator to grant access to each 'site', which is inherently less social, and which takes more time. In addition, search in the competing product is done one site at a time, so if you don't know where to look in their solution, you're up the creek without a paddle. Connections is so much smarter, and so much more social, because it is built as a social software solution from the ground up, rather than being an existing solution re-purposed to become more 'social'.
Because of my intervention and the plans I shared around the upcoming release 3.0, which we eventually launched in November 2010, the Partner decided to continue their release 2.5 PoC into a release 3.0 beta deployment. Did I mention the vast majority of the Partner's software driven revenue is from non-IBM products today? That certainly causes a challenge for us, but it also makes it extra exciting to grow our relationship based on excellent products like Connections, and based on the efforts of passionate colleagues willing to share their insight.
My colleague Ron Denham has put together a quick 4 minute video showing you how simple it is to open a FREE Lotus Greenhouse account. I want to share his video for two reasons: (1) Because a Lotus Greenhouse account lets you try out all our products without purchasing a license, and (2) because a Lotus Greenhouse account is needed to comment on entries in Greenhouse blogs. (However, since this was originally posted, my blog has moved to developerWorks)
If you don't already have a Lotus Greenhouse account, please listen to the video, create your account now, and start familiarizing yourself with our products. And commenting on this blog :-) Cheers
The IBM Champions program, mentioned previously in the blog, has completed the first set (2011) of reviews for the Collaboration Solutions area and today announced the 2011 Champions, see the Social Business Insights Blog entry. These 50 individuals are customers and partners (non IBM employees), who help evangelize our solutions in the field and build the community of users. They deserve our admiration and thank you for their dedication to excellent collaboration solutions, for their insight and expertise, and for their willingness to share their experience with fellow collaboration solutions implementers, users and administrators across the industry, not just within their own organization. As Joyce Davis writes in the in the blog, the champions "will receive an IBM Champion merchandise package (apparel and some cool gadgets!), increased visibility on IBM sites, invitations and discounts to IBM events, recognition at select conferences, and access to key IBM business executives and technical leaders". Let me add to that, the fact that quality feedback from an IBM Champion carries special weight. We are naturally interested in all feedback on our offerings, but especially interested in feedback from the Champions because of their deep insight to our portfolio. Giving technical feedback is a great place to start growing your relationship with IBM Collaboration Solutions, and one day you too may be named an IBM Champion! .
Please see the announcement of the IBM Champion Program now covering IBM Collaboration Solutions as well. Through this program, IBM recognizes individuals who help evangelize our solutions in the market. IBM employees are not eligible for this recognition; The Champion award is to recognize our customers and partners. (IBM employees are allowed to nominate champions, however). If you know someone, who does an outstanding job evangelizing our solutions, please consider nominating them for this award.
The nomination submission deadline is May 15th 2011. To nominate someone, go to the IBM Champion Program web page and submit your nomination.
You may be familiar with the fact IBM Software Group has upped the minimum duration of software support from "3+2" to "5+3", or expressed in words: minimum 3 [now 5] years of support, with at least 2 [now 3] more years of support available at a charge. This better enables our clients to schedule upgrades at convenient times and not be forced to upgrade every few years.
An important extension of the IBM Software Support Lifecycle Policy is the statement that we will normally support N-2, meaning two feature releases prior to the latest release. This is an important part of enabling our clients to stay at the same release for a longer period. The "N" parameter is not necessarily the first digit in the release string, though. Each product may count differently, but I expect for Collaboration Solutions you'll see us counting each V.R (Version.Release or the first 2 digits of the release string) as feature releases. That should mean in the Notes/Domino and Sametime world, for example, that when we launch a release 9.0, we'll still be supporting releases 8.0 and 8.5 in parallel. However, if you step that backward once, you might think we should be supporting Sametime releases 7.5 and 8.0 for as long as 8.5 is the latest release. That would be true except that End of Support has already been announced for release 7.5 before publishing the N-2 policy, and that decision is grandfathered.
It's important to notice that the linked document is a policy, not a firm commitment. There will be variations, but in most cases we'll follow the standard policy of "5+3" and "N-2". Always check additional information specific to the product you're interested in. Note that this policy does not apply to software produced by the IBM Systems and Technology Group, such as operating systems. See also answers to Frequently Asked Questions linked directly within the Policy document above.
From time to time, I am asked what virtualization environments specific products support. With the increasingly diverse choice of virtualization technologies available, it is natural for IBM to select some that will be supported across our portfolio, ensuring that multi-product solutions can leverage the same virtualization technology. So last year, IBM announced a virtualization support policy aligning virtualization support across our product portfolio. In addition, individual products or product families, can decide to support additional environments of their choice, so long as they don't drop support for any of the environments in the common set. To discover any additional virtualization environments supported, check the individual products' documentation. But know that a core set is supported across our portfolio. .
Following up on my recent cloud difference series, I wanted to share a pointer to a good blog post by Dustin Amrhein: It’s a bottom up world. Your cloud service needs to be callable with easy to use, well documented APIs. You need to cater well to developers, who are key influencers and often decision makers, for prospective subscribers’ cloud adoption. Right on.
Don't you just love instructions that are clear, unambiguous, well thought through, and never self contradictory? I for one am grateful for the effort of our Information Development colleagues, our technical writers. As designs are modified and optimized during a release project, they have a complex task of keeping the documentation accurate while delivering in time for release. This applies to both Help embedded in our products as well as web based information like wikis and more. Never underestimate the value of accuracy :-) ~Flemming
Love this example of innovative use of social media. And very fitting to find KLM doing things like this. Years ago, when my oldest kids were 2 and 4 years old, we flew from Copenhagen via Amsterdam to Los Angeles with KLM. Mom was already in the US, so it was just the three of us flying that day. In other words, Dad had his hands full. I can still recall many years later, how the KLM cabin crew served meals on that transatlantic flight. Wheeling the carts down the aisles of the 747, they served all the little kids first, then returned to the first row and served the adults on the second pass. Little hungry kids were satisfied, parents had time to help them before they were served a meal themselves, and by the time the adults ate, the kids were calm. This made so much sense; it made everybody happier, and it didn’t cost KLM one extra dime. This is the kind of thoughtfulness that builds customer loyalty. Yet so few airlines actually do this.
Kudos to KLM for providing flying passengers a quality experience. You are the best! The approach applies universally, whether in air transportation, software, or other fields. Understand your customer’s needs, and you may be surprised at some of the improvements that can be made at very little incremental cost.
A great documentary from my colleague Luis Suarez, fearlessly dumping his e-mail inbox and converting to living social. He makes the point that we will see e-mail gradually transition from a content repository to again being a messaging and notification system. I’d venture that it will evolve even further. E-mail’s grip on my work life stems from the fact that I am held accountable for reading and responding to communications there, whereas in the social tools, engagement has so far been driven primarily by where I expect to find value, rather than by who might be requesting an action or a response from me. In my crystal ball, I see a convergence of the request driven and value driven patterns, or push and pull if you wish, in the social tools. We’re building Activity Streams and the like, which will blend both forms, even if we also build filters to offer different views of the Stream.
The challenge is to not re-invent e-mail in a way that carries the same burdens we deal with today; overload and parsing through unnecessary (to me) content. Instead, we need to deliver a social mail and collaboration experience that lets us focus on creating the most value. And – to position myself in that vision – we need to do so with a compelling level of quality, reliability and ease of use.
LotusLive Meetings recently implemented an improvement designed to bring you a better Meeting experience. Did you perhaps notice it already? An option to select how to optimize meeting performance, whether:
for the best image display quality, which you will want when sharing photos, for example, or
for faster performance which will often be sufficient if all you are sharing is slides with text and simple graphs, or if your audience is unusually large.
I use LotusLive Meetings daily in my work. The number of Meeting participants vary greatly. Most of my LotusLive Meetings have just a handful of participants, but occasionally I have 'community' meetings with several hundred participants. And when our General Manager has an internal All Hands call, we have over a thousand people in the same LotusLive meeting.
It is usually the larger meetings that benefit from the optimization for performance. That's not just because of the sheer number of participants, but because the larger the audience, the more likelihood that some of the participants are connected via low bandwidth and relatively high network latency. Meeting performance is the joint experience of all participants, including those on a low bandwidth connection.
How much of a difference has this made for you?
As always, I'm interested in receiving your feedback. Web conference technology needs to enable your productivity, not get in the way. Is LotusLive Meetings boosting your productivity? What improvements would you like us to consider and prioritize? Please submit feedback as comments to this blog post. (You'll need a free Lotus Greenhouse ID).
This is naturally far from the only improvement of LotusLive implemented recently. Just last weekend we added a large number of new features and improvements to the Advanced Collaboration offerings (Engage, Connections, Meetings). More about the improvements in future blog posts.
The LotusLive Mobile Meetings beta application download for the Android platform has been added to our LotusLive Mobile program, such that the beta is now available on both Apple, Blackberry and Android. For an overview of all platforms, go to the LotusLive Mobile Overview page. If you are a LotusLive Engage/Meetings subscriber, and use either an Apple iPhone, a RIM Blackberry, or an Android mobile phone, please consider trying out the Mobile Meetings application.
Ok, I’m not trying to become a KLM evangelist, but you gotta’ admit these guys & gals are serious about driving value from social media. Love it! “Choose your seat based on fellow passengers’ Facebook profiles”. Every flight is a networking opportunity :-) If you could choose your seat neighbor for the next flight, who would you rub shoulders with? Provided they allow public access to their profile, of course….
News tip: Our education team has just released a free, self-paced course on SmartCloud Notes in a hybrid environment. (The course link was originally posted in the blog Apr 26th 2011. I have updated it on May 21st 2013 after a reader notified me it was broken. The change is a result of the rebranding from LotusLive to SmartCloud).
A hybrid environment allows integration between your on-premises Domino systems and the cloud. Replicating your Domino Directory to the cloud provides for a seamless integration between environments. So rather than replacing existing Domino infrastructure with cloud based offerings, you can leverage the cloud based offerings as an extension of your existing on-premises environment. Your Domino administrators continue to administer on-premises Domino servers and applications, while IBM administers and maintains the SmartCloud Notes mail servers in the cloud.
I serendipitously ran into unexpected behavior of my embedded Sametime client on a machine running Notes 8.5.1. I locked out the screen by pressing F5 as I was getting up to go to lunch. After pressing F5, my Notes client will not let me open or send mails, nor will it let me access or write anything in any of the already open Sametime chats. That's expected behavior. But before leaving my desk, I noticed that my manager opened a chat window with a question. Not wanting to let her wait until after lunch, I instinctively typed my answer in the chat window and sent it back successfully. But the client was still 'screen locked', so this was not expected behavior to me.
Now, the Sametime preferences include a section on Auto-Status changes, which can determine what happens in response to changes in your calendar, Notes client, or operating system. I don't have the setting for 'Locking Lotus Notes' selected. Why? Because it forces the status to Do Not Disturb (DND). What I want, when I'm Away from the workstation, is really an Away status, so buddies can leave me messages in new chat windows, which I'll see when I return. With DND, no new chats can be initiated. So I don't check 'Locking Lotus Notes'; Instead, I set my Sametime status to Away and lock my Notes screen with F5. That's just my personal preference. You can see my Auto-Status preference settings below.
The observation above took place yesterday. This morning, I started working from home, then screen locked Notes and put the laptop in standby mode, drove to the office, and woke up the laptop to continue. My Notes client is still screen locked with the empty password dialog box showing. I replaced my notes \\notes\data\workspace recently to eliminate an issue caused by a non-released plug-in I was working with. This replacement naturally causes me to loose Eclipse based settings I had saved, like the geographic locations Sametime associates with each wireless access point I use. So when my laptop automatically attaches to the wireless network in the office, Sametime pops up a dialog for me to enter my 'new' geographic location. I filled it out and applied it successfully. All while the Notes client was still screen locked. Again, that doesn't strike me as 'expected' behavior, but our Sametime security architect confirmed for me that this is 'Working as Designed'. Sametime itself, meaning the standalone Connect client, doesn't have the lock-out concept. You're either contactable (logged on as Available or Away) or not contactable (DND or not online). The intersection with the Notes client is such that the above behavior is what results.
My observations lead me to two suggestions I'd like to hear your take on:
1. Should there be a choice between Away and DND, when you select the 'Locking Lotus Notes' option? 2. In the absence of the Away option, should the behavior when locked out, and 'Locking Lotus Notes' is not checked, prevent responses in not just existing chats, but new chats as well, and in dialogs like the one for geographic location, until the password is re-entered?
An important aspect of guiding quality improvement is to ensure we focus on the things that matter to our clients. This topic arose out of my own observations, and out of my own subjective expectations for how I would like to see the product work. So I need your help to find out what our clients think. Is this just a pet peeve of mine, or is this a change we should pursue? How important is this issue to you? Please share your thoughts in comments.
Today, on June 1st 2011, Oracle announced (link here) that they have contributed the OpenOffice software to the Apache Software Foundation incubator. For IBM, Oracle's move fits well with our focus on open standards. The near future will reveal what actions will trigger each other like domino pieces, but one thing is for sure: You can expect to see a variety of organizations joining the OpenOffice project under the auspices of the Apache Software Foundation and contributing to it. You can certainly expect to see such contributions from IBM. In the IBM Collaboration Solutions area, the most immediate connection is with our free IBM Lotus Symphony software, which is based on OpenOffice technology. IBM has already announced publicly our intent to contribute to the project, because we see it facilitating continued development and long term viability of OpenOffice. You can find the IBM press release here. As the release points out, IBM has a history of contributing to Apache projects, including the start of the Eclipse Foundation. This is an exciting development for Lotus Symphony and for our clients. And for our Symphony Live code now under development. The added focus on the Open Document Format, and the ODF Toolkit assets under the ODF Toolkit Union, will only benefit users of our software. From a pure quality perspective, the cooperation of a wider set of contributing organizations is generally speaking a good thing. Not that open source software systematically outperforms commercial software on quality, but the breadth of involvement and usage generally promotes quality, albeit in a slightly different way. Expect further innovation in the area of personal productivity software.
This blog is now listed on Planet Lotus (http://planetlotus.org). All of you experienced Lotus bloggers know what Planet Lotus is, but I'll bet there are some, who are new to the blogosphere, who are not familiar with it. Planet Lotus is a blog aggregator. It's a web site run by a partner, so it's independent of IBM. When you register with Planet Lotus - see their site for specific instructions & criteria - they will consume your feed and build tables of recent entries from all the registered blogs. That way, you can keep up with almost 300 bloggers in one place. There are several criteria for registration, but the central most important one is that your blog content focuses on topics around Lotus software, incl LotusLive, or in divisional terms, on IBM Collaboration Solutions. As you can see in my blog, I decided to consume Planet Lotus' feed in return and display the ten most recent headlines in my right sidebar. Isn't technology cool :-) Thank you to Yancy Lent for providing the free Planet Lotus blog aggregator. ~Flemming
Welcome to the Quality Collaboration blog, offering a conversation about continuous quality improvement of offerings from IBM Collaboration Solutions. The purposes of this blog are to:
help our customers (via hints & tips) drive more value from licenses and subscriptions
gather feedback from readers on quality topics
share information about our continuous improvement effort
share customer success stories
develop a conversation with our customers about achieving and maintaining excellence as your preferred provider of software, support and services.
As the Technical Quality Champion for the IBM Collaboration Solutions division, I lead our continuous improvement effort worldwide. I own the quality planning process by which each release project must establish strategic quality improvement goals, certify we achieved them before release, and measure actual performance against in-market performance goals, while the release is supported in the market. My expectation is that a conversation with you via this blog will have value for us both. I plan to share information about our continuous improvement effort and the quality of our solutions. And whether you are a user, administrator, buyer, analyst, partner, student, or other stake holder relative to our software, I hope to learn more about your view of the quality of our solutions. And I'll throw in varied musings to personalize the blog.
If you think these purposes match your interests, I invite you to revisit often and perhaps subscribe to the feed. Please engage in the conversation by submitting constructive comments to the blog entries. It's all about Quality and Collaboration. If you would like to propose a new main topic, you can do so by sending me a note at email@example.com with the exact term "QC blog" included in the Subject line.
Looking forward to our conversation :-) Best regards, Flemming T Christensen
It's a small step in the greater picture, but an important one. I have had several customers ask for this. The Quickr Connectors now support 64-bit Windows and Office 2010. See Mac Guidera's blog post on the topic, and visit Fix Central to download the connectors as Quickr 22.214.171.124_HF7. Feel free to post feedback regarding the quality of the connectors as comments to this blog entry. We're always looking for ways to serve you better. .
The readme file accompanying FP1 could have been clearer. Although the IBM installation manager installs the files onto your server, you still need to commit the changes to the environment. Andy describes how to do this, via the Sametime Cluster Guided activity, or via the WebSphere Integrated Solutions Console.
The Lotus Notes calendar, for example, has built-in information about time zones and the associated Daylight Saving Time (DST) definitions. When those change, we supply an update (a 'fix') to ensure customers can continue to rely on accurate knowledge of time zone differences and what dates DST starts and ends in each time zone. I have noticed how very seamlessly Notes handles those differences for me, when I schedule meetings with teams around the world, and I'm not 100% sure exactly what day each country switches to or from DST. I've never had an issue with a misunderstood meeting time. But that seamless performance is only as good as the underlying knowledge of Time Zone and DST definitions (whether based on the operating system or the application), so when governments decide on changes - and some times that happens with literally just a few weeks of notice - we do our best to respond quickly to keep our customers' scheduling accurate.
If the change is known far in advance, say over a year, it's pretty easy to apply fixes to change the definitions in the operating system and/or the application. The problem comes when the change of definition happens with such a short notice, there is already a number of appointments in the calendar during the time period affected. For example, when the US accelerated the onset of DST by three weeks in 2007, users who already had calendar appointments in that three week span needed to make a decision on whether to adjust them or not. There is no easy logic to make that decision. If you had an appointment with your dentist 10 am standard time, that would naturally shift to 10 am DST after the government decides to apply DST on that date, because you are both in the same, affected time zone. But if you had a conference call with a major customer overseas, whose time zone definition did not change (different government), then the appointment would probably stay at 10 am standard time, which will now be 11 am DST for you. The calendar cannot distinguish which appointments to adjust, and which not to adjust. Human intelligence is still needed. Fortunately :-)
Would like to share a customer testimonial regarding LotusLive. We have received some nice press coverage in the past year for a very large LotusLive deal, the largest ever. But as this testimonial shows, LotusLive can add value no matter the size of the subscriber's organization.
What I find so interesting in this testimonial is the transformational power of the solution. This is literally a game changer for the subscriber, catapulting their services into the competitive leading edge. This is what we do best. Help customers apply technology to solve business problems. And win. .
Delivering a highly available service is way different from producing a customer installable product. The rightful expectation of the Software-as-a-Service (SaaS), or Cloud, subscriber is that the service is available whenever they need it. A good analog is the dial tone in a land line phone. It's just there when you pick up the phone. And if it's not, your first instinct is to check the cords and make sure the phone is plugged in. In developed countries at least, the absence of a dial tone rarely causes a first assumption that the service is down, but rather that you yourself is at fault somehow, e.g. for not plugging in. That same reliability is expected of Cloud systems. But no Cloud vendors are yet as mature as the PBX systems switching our phone lines. All Cloud vendors have occasional outages still. They're short-lived, but still annoying when it happens. And we're all working to eliminate them through root cause analysis, corrective action, and other means. Most of us come from a background of writing on-premises software, since SaaS is still a young and emerging segment. In some cases, that means there are habits we need to unlearn because they don't work well in the SaaS space. And overall, it is useful to discuss, not just how to develop and deliver (SaaS) Cloud services, but specifically how it differs from our on-premises experience. I plan to share a series of brief observations illustrating the differences between developing & delivering on-premises software, and developing & delivering corresponding cloud services. I will tag each one with 'cloud_difference' for easy collection with the URL:
Since this blog is focused on quality, it seems reasonable to start with a description of what the term means to me, in relation to software. I am NOT asking readers to adopt my definition of quality, or to limit your comments based on my definition. In fact, I would like to hear how you define the term.
What does software quality mean to you?
To me, delivering quality software means simultaneously providing:
Code with the right features (capability) & with the features done right (consumability)
Code which is performance optimized, quick to deploy & integrate, easy to learn, easy to use, easy to maintain, easy to scale, and competitive in terms of total cost of ownership
Code assisted by comprehensive knowledge content delivery, and enablement of staff and customers, throughout the software life cycle
Code with rapid support resolution available globally if and when problems occur
Code supported proactively; by providing what’s needed before customers even recognize the need
And finally, for hosted solutions, code and web delivery operations that meet Service Level Agreement (SLA) commitments for specified performance metrics, typically availability.
This encompasses both fitness for purpose, conformance to requirements, total cost of ownership (TCO), and maintainability & support aspects of quality. Would you modify anything in this definition?
Enjoy Peter Presnell's insight gathered over a number of migration projects. Replacing an advanced messaging, application development, and web server platform like Domino is far from trivial. And the author has seen how it often goes down. If you care about quality of your environment, take note of this experience. Thanks to Peter for sharing.
Anyone considering a migration project would be wise to very carefully estimate both the full scope and budget of the effort, as well as the ability to continue support for existing usage patterns. Map how your users are actually leveraging the platform they have today. Don't assume you know.
Years ago, one of my first projects for IBM was to run beta tests for departmental cutsheet network printers. I'll never forget a particular customer: Great and very cooperative IT staff, but when they pointed me to their 'main' print queue, I found limited traffic. To make a long story short, they didn't realize many employees had switched their print jobs to alternate queues on alternate server clusters. Keeping up with all areas of usage is no small task. Know your users, especially when planning change projects, whether migrations, expansions, or other. .
This is a fairly new blog, started just one month ago, so the visit count on each successive entry may rise from a combination of interest in the topic and more people becoming aware of the blog's existence. The prior entry "Why Sametime 8.5.2 is better" saw a very positive rise in visits. I cannot be sure how much of that is due to the topic, and how much is due to increased awareness of the blog, but I take it as a vote in favor of sharing that kind of information. Each new release project has to set strategic quality goals, and achieve them before releasing. My thought is that we should be sharing a "What's better" overview with each new release, just as our Marketing colleagues share "What's new" overviews. I plan to experiment a bit with the level of detail, looking for a balance that is not so detailed it becomes long-winded, nor so sketchy it becomes abstract. Some times, that means discussing select improvements rather than all improvements in a release. Or splitting the material in multiple entries. I just attended the May 24th IBM Collaboration Solutions community call this morning, where the audience confirmed their interest in hearing about continuous improvement. So I'll plan more posts in that vein. Happy to share what I can :-)
In the field of software quality, we rely extensively on a series of quantitative metrics to inform us of trends and performance. We need to be keenly aware that the very instant we set a target value for a metric, we are driving behaviors of the people we charge with achieving the target value. Through it all, leaders need to ensure everybody stays focused on what's best for our business and for our customers; and that excessive pressure to achieve those metrics targets don't interfere with that focus.
Here's a real story from the world outside IBM, where the pressure to achieve metrics targets allegedly caused a Police Force in Brooklyn (NY) to bend the rules and violate rights of the people they were supposed to protect. Listen to "Act Two" within this segment of the radio show "This American Life". The story fills roughly the last 41 minutes of the 59 minute segment, so fast forward to time stamp ~18 mins. Police Officer Adrian Schoolcraft documented extensive cases of focusing on metrics over mission. He is now involved in a law suit against the Police Force. This is a chilling real life story.
After listening to this 'extreme' case, I encourage you to ponder your own software related metrics. Are they as accurate as you think they are? And what behaviors are they driving? What are you doing to prevent similar transgressions in your shop? This is worth giving some thought to on a regular basis for any metrics driven organization. It certainly has direct impact on quality. For that reason, I like to validate metrics results with qualitative observations, and where possible also customer feedback. .
Back in April, IBM announced the availability of IBM Connections 3.0.1. It's a "maintenance release", but it contains important new function as well. Does that improve 'quality' of the product? Well, yes it does. Although we often separate new function from quality improvement, or separate 'what' the software does from 'how well' it does it, reality is that available function directly impacts fitness for purpose, which is a key component of quality. (See my post "What does Quality mean?" from April 25th 2011 on that). So from a quality perspective, we quality folks too are excited about the new functionality, especially:
the Media Gallery for sharing of photos and videos
the moderation capabilities for Community owners to screen content,
the Ideation blog for sharing and voting on ideas within a Community, and
the integration with Enterprise Content Management (ECM) repositories.
In terms of currency and non-functional attributes, we added (i) server side support for Windows 2008 R2 64-bit, (ii) support for Microsoft Active Directory 2008 as an LDAP, (iii) mobile client support for Blackberry OS 6.0, (iv) database support for Oracle 11g Enterprise Edition Release 2 and Microsoft SQL Server 2008, and on the security side (v) support for CA Siteminder 6.0 and Java SPNEGO for single sign-on.
As usual, our quality program also looks for improvements beyond function, currency and non-functional attributes. One of the most significant non-functional improvements in release 3.0.1 comes from further work on optimizing code performance in the Communities component, where we beat our own goal for the release by a nice margin. And in terms of performance, the Activities and Files components achieved significant improvements in the 3.0.0 release, so users of an environment migrating from release 2.5 to 3.0.1 should see performance improvement in all three: Activities, Communities and Files. The Connectors Help has been updated in release 3.0.1 with a Files connector and an Outlook Social connector. In addition, a variety of internal metrics demonstrate a healthy focus on quality. The defect deferral rate is very low for IBM Connections. Another key quality metric is the release-to-release reduction of the number of support calls per customer, and the number of customer reported defects. Based on early data for the first three months since release 3.0, both metrics are demonstrating continued improvement from release 2.5 to release 3.0. The fix list for Connections 3.0.1 details what fixes for customer reported defects are included in this release. I have no doubt release 3.0.1 will provide even further value to our Connections customers.
And now you can leverage IBM Connections 3.0.1 Portlets for WebSphere Portal to extend the social collaboration experience of IBM Connections via a WebSphere Portal instance to your collaboration partners. This kind of integration across the IBM Collaboration Solutions portfolio is a key way we deliver additional value of your investment in our software.
If you have suggestions for improvement to IBM Connections, or the connectors, or the portlets, I'd appreciate if you would share it in comments to this blog post. Thanks.
Continuing in the vein of prior posts with the ‘better’ tag, I want to describe quality improvements made in recent releases. Notes/Domino 8.5.3 has just been released, and you can read about new features in the announcement. There’s plenty to like. The embedded Symphony version has been updated to 3.0, and the embedded Sametime to 8.5.1, both key advances. There are enhancements to XPages and the Domino Designer, and much more. But quality is not just about new features. It’s about all features working well.
The overall quality objectives for the 8.5.3 maintenance release were to significantly reduce the outstanding defect backlog, to improve integration with companion products like Connections, Sametime and Symphony, and to expand test coverage and test automation. The team has delivered on all of those objectives. All major components (Domino, Notes, Designer, Traveler) reduced their deferred defect backlogs by considerable amounts, some by more than half. The vast majority of those defects had not been reported by customers. They were found in house. Removing them eliminates the risk our customers will run into them. Reducing internal defect backlogs is always an objective for a modification release (a.k.a. maintenance release), but release 8.5.3 has achieved reductions that are greater than is typical for most maintenance releases.
Security is a high priority for any release from IBM. In Notes/Domino 8.5.3 we moved systematically forward with further detailing of our threat model and the adoption of Rational AppScan Enterprise Edition for testing of the full attack surface across the Notes client. Similar efforts were done for Traveler and for iNotes. (Domino did this work previously). All the components had security testing in the past; what’s changing is that we’re adding Rational AppScan testing across all of our portfolio. And of course resolving all security defects before releasing.
The Domino team also focused on memory related improvements in release 8.5.3, delivering new NSD macros, and an administrator capability to track and drop 'bad' IMAP sessions, which can cause server crashes. A very key improvement is a substantial reduction in use of shared 16-bit handles, which will reduce the type of conflicts that can cause potential hangs or crashes. The aggregate result is an even more stable solution. For the Domino Configuration Tuner (DCT), we continue to deliver additional rules to help you ensure your environment is optimized. If you use DCT, be sure to download new rules regularly. We add new rules at least quarterly, and some times monthly. For iNotes, we continue to focus on achieving full parity between the Notes and iNotes client experiences, delivering important improvements to sorting by subject, to auto-processing of calendar entries, and the option to not expand personal groups when sending.
Less visible to our customers is the continued progress on test automation. The more of our standard test scenarios are automated, the more time our engineers can devote to specialized, exploratory testing around new features. Some critical areas have doubled the number of automated tests this year, freeing up engineers to expand coverage, all part of our continuous improvement effort.
Release 8.5.3 is the next global deployment candidate for IBM’s own internal environment of nearly 400,000 users around the globe. Prior to release, the IBM CIO’s Office deployed it to over 4,000 IBM employees, and our Services Division deployed a pre-release build for over 14,000 users. That means over 18,000 people were using it daily before we declared it ready to ship. The CIO servers are primarily AIX and zLinux servers. Although the majority run the client on Windows, there are a few hundred running on Linux and Mac platforms as well.
In summary, there’s a lot to like about Notes/Domino 8.5.3. I’ve described a few highlights of the quality effort here, but of course the proof is in the pudding, or more accurately, in the released software. Enjoy the new release. As always, feedback is welcome.
IBM announced Sametime 8.5.2 Interim Feature Release 1 this week. Much buzz has circulated about the new features already, and you can read about them in the announcements. But in the vein of my "Why better" postings, including the prior post about Why Sametime 8.5.2 is better last May, I want to briefly share some of the quality improvements we have worked on for this Interim Feature Release, or IFR 1, as well. Every software release from IBM containing new function must also identify and achieve specific quality improvements. As an interim feature release, the aggregate underlying development effort is smaller than a full feature release, which means the quality improvements are also fewer than for a full feature release, but we were able to take some good steps anyway. The quality focus in this release was on the serviceability attributes; which are the abilities to diagnose and correct any problems. We focused on providing more helpful and more meaningful log and error messages in three specific areas: (i) the install experience, (ii) the NAT ICE SDK, and the (iii) Meetings.
Within the install, we improved the log and error messages related to validating server connections for other servers in the deployment, such as DB2 and LDAP servers. In addition to improving the validation itself, we also surfaced to the user what is being validated, and what the status of the validation is. Moreover, we reviewed the not yet externalized error messages and externalized them where it made sense to give the user more information. This is ongoing effort that will continue to improve end user messages in future releases.
For the network address translation (NAT) interactive connectivity establishment (ICE) software development kit (SDK), used for integrating awareness into other applications for example, we enabled full detail prints of IceSession and MediaSession failure traces, as well as TURN - or Traversal Using Relay NAT - server details when the IceSession is created. We also made a number of improvements to the Logger output from the C++ ICE SDK.
For Meetings, we focused on improving the log messages associated with the AppShare protocol and updated many key messages.
A simultaneous announcement of Sametime Unified Telephony (SUT) 8.5.2 IFR 1 was also made this week. With the new SUT release, we now support virtualization of the SUT server. And we completed a Telephony Control Server (TCS) configuration tool, which can dramatically lower the time needed to configure your solution. We are already receiving very good feedback on this tool. We also coded an automatic restart mechanism for the event of a TURN server crash. Internally, we had also set targets for further expansion and coverage by our automated test suites, especially in the Audio Visual (AV) and Sametime Unified Telephone (SUT) functionality areas, and those targets were also met.
For a discussion of these releases, I recommend listening to Episode 79 of the This Week in Lotus podcast, entitled Why Sametime 8.5.2 IFR 1 definitely ain't no Turkey! Enjoy the new releases taking Video chat and Unified Communications to new heights. To filter the blog and show just the 'why better' entries, click the "better" tag in the line just below the blog entry title.
First of all, a third digit release is what we refer to as a Modification, or Maintenance Release, which means a substantial portion of the effort going into the release is aimed at updating currency and compatibility, and generally making the Sametime 8.5 code stream more consumable and easier to maintain with a lower Total Cost of Ownership (TCO). So as a matter of standard business for a maintenance release, we go through Technical Support call records to identify the relatively more frequent topics called about, which have impact to customer productivity or satisfaction, and then we target actions to eliminate the need for those Support calls. We do this not just to save ourselves cost in delivering Technical Support, but primarily to lower TCO for our clients. The fundamental view is that, all other things equal, a product for which fewer Support calls are needed, is a more consumable product likely to have a lower TCO as well.
So what did we do for quality in Sametime 8.5.2 ?
First, there are naturally fixes for a number of customer-reported defects. These fixes will further stabilize your system and prevent you from needlessly encountering defects already discovered by other users. One of the continuous improvement goals, which each release is required to set, is to reduce the backlog of open defects. Most software vendors prefer not to talk about a backlog of defects, but almost all software products have such a backlog. The traditional argument for not eliminating the backlog entirely is that the activity of fixing the least severe defects has a lower Return on Investment (RoI) for both vendor and customer, than the RoI from spending the same amount of effort instead searching for new and more severe defects, or maintaining currency and compatibility. So while backlogs will likely never disappear entirely, we do set goals to reduce them from release to release, and Sametime 8.5.2 takes a substantial chunk out of our backlog. We have also gone over our pre-release test case suites asking why the customer reported defects against the prior release escaped from our release process. We then make process improvements (update tools & instructions, add test cases, etc) to make sure the same escape doesn't happen again in the future. We refer to that as a closed-loop process, and again, that's pretty much standard business. Call it process maintenance.
NAT firewall traversal is a key addition allowing much easier use of Sametime in enterprises connecting sites that are each protected with a NAT firewall. A/V performance has been improved. And we now support mobile Sametime on the Android platform; no small point. Another much requested feature is the ability to bump an attendee from a meeting. For more feature information, read John Del Pizzo's summary.
Our review of causal analysis data with the Support Team showed that authentication errors were relatively more time consuming to troubleshoot than other errors. We found a number of ambiguous error messages, which were not as helpful to a user as they could be. I'm not surprised here. I see error messages as lacking in almost every software product in the industry, from any vendor. But we honed in on 27 specific error messages involved in the Support calls we reviewed, and we have made a number of logging improvements, which will make these issues much easier to troubleshoot in the future because we collect more detailed information in the log.
On the integrated server console, we have added the ability to monitor the health of all the servers in your Sametime system: Community servers, Meeting servers, Media servers, Mux'es, Gateways, etc. This is the first time we're collecting all that information in one place for a Sametime system, and it greatly simplifies life for the system administrator. Not to mention that it allows issues (like disks or memory running full) to be discovered and corrected before they impact users. The administrator can see what processes are up and running, which might be hung or slow, which resources are heavily utilized, etc. We also added integration points with IBM Tivoli Monitoring to further allow our clients to build detailed alert and notification systems.
We had noticed that especially the Community server didn't offer data as good as other components for troubleshooting incidents in the environment. To improve the First Failure Data Capture (FFDC) posture of Sametime, we have updated the Community server NSD to ensure we now get a full crash stack for the Sametime process allowing us to compare different events, so we can identify repeat issues by matching crash stack signatures to defects. The updated NSD can also include a Windows core file in the event of a server core.
Our beta test partners gave us very good feedback on the Sametime 8.5.2 release, not just in terms of seeing the benefits of the improvements we've made, but also in terms of pointing out targets for further improvement. To be clear, the above is not a full account of all the improvements made in release 8.5.2. I just wanted to share some of the good reasons to plan an upgrade to 8.5.2 soon. .
Nice succinct statement from colleague Anna Dreyzin on why social business is important to her, all packed inside 30 seconds. She talks about listening, engaging and helping, adding value and building reputation, managing feedback, extending reach, increasing engagement and growing advocacy. I agree with them all, and I would also highlight Searching. The ability to share and search knowledge across a large team is highly valuable. For an example, see my prior post on How Connections helped Connections.