Common issues and resolutions for Continuous Delivery on Cloud
email@example.com 270000CF12 Visits (9528)
Common issues and resolutions for Continuous Delivery on Cloud
Rong Zhao (zha
With the rapid development of information and internet technology, the traditional development and deployment cannot meet customer’s requirements. Customers are asking for Cloud, with quick development and seamless deployment.
This article will introduces the common issues and resolutions for Cloud Continuous Delivery, and share with the general DevOps best practices, including methodologies and technologies from actual Smarter Cities Cloud Offering release and customer projects experience.
Current client is looking forward to system can go live to provide service as soon as possible. So we adjust Development and Operations cycle by combining them together. Many Operation preparation work has been moved to development cycle to complete. To achieve this, we initialize template readiness process to ensure all checkpoints are fully implemented before achieve sGA. During development phase, test team will take SoftLayer (SL) environment as a test platform, and finalize SL template once achieve test exit, which will be used as template with GA build, for quick deployment for new customer on-board.
Production release cycle has been effectively reduced, in quarterly, bi-monthly, monthly, or even shorter frequency. We will continuously deploy the latest release delivery to customer environment to refresh with the latest feature and enhancement.
Usually product will have dependency on other 3rd party products. Considering the dependencies between IBM products, usually we will arrange release based on the N-1 release of the depended product. For example, product A and B have monthly releases in the early days of every month, A depends on B. We will define A will pick up B’s last month’s released build to integrate into A.
Traditional team organization is composed by Development team and Operations team, both teams work separately via different time frame, after development complete the release, operations will start to work on operations related stuff in the past . Development team has architectures, designers, developers and testers.
Since CD, our organization has formed one entire DevOps team.
Many operations tasks have been discussed and implemented in early design phase. Architect and design team will incorporate the architecture topology and operations requirements from SaaS team in early stage. The design includes how to ensure solution the capability to support High Availability (HA), security requirements and continuous delivery. Dev team will provide mechanism how to do application monitor, and ensure the related testing completed before GA and deliver the documents to operations team.
Operations team will help to enable Dev & Test team to understand concept and pick up skills, such as how to manage and use SoftLayer, how to ensure security compliance, what’s the requirement to Dev to ensure continuous delivery.
IBM Design Thinking provides a scalable way to create great user experiences and a portfolio that leads to great client experience. Current client usually wants a simple, holistic relationship with business partner. IBM Design’s mission is to unite IBM so that all work in concert to deliver solutions that delight users individually, and deliver increasing value for every IBM capability they consume. IBM Design Thinking helps to unit Product Line Management (PLM), Design, and DevOps team together, and collectively understand clients’ needs.
IBM Design Thinking provides a powerful way to define and solve problems, with three core practices: Hills, Sponsor Users and Playbacks.
We designed hills at the beginning of each releases/projects, agreed by sponsor users); scheduled Hill playbacks, playback zero, and delivery playback in each iteration; and do continuous Client playback in every continuous delivery cycle.
To enable product with the capability to support Continuous Delivery:
We have multiple functions team:
We scheduled daily, weekly, bi-weekly meeting via call or shared screen, and benefit from Sametime group chat meeting, Connections, RTC to trace progress and share information.
With US colleagues, we preferred quick discussion via call meeting; with India colleagues, we preferred group same time chat discussion, which is even much effective than call meeting.
We implement full Automation framework, detail has includes:
The automation tools we’re using with are : Selenium, Jenkins, Chef, Perl, Shell, or etc. And now we’re working to evaluate UDeploy.
Optimize system maintenance windows through A/B switch method. What do A & B mean here? System A is production environment, and B is copy of for System A, using as staging.
Before A/B switch, Ops team had to apply all maintenance steps via a testing environment firstly; after test via the testing environment had been passed, Ops team needed to re-do all the step on production directly to apply the update, which caused long down time.
After adopting A/B switch method, we separate all the maintenance steps which have no impact for system downtime, and apply them via B env firstly. If all functions pass the validation testing, save B env as template, which will be used as B’s baseline for next month maintenance windows. Then maintenance window starts. Apply fixes which need system downtime; switch B as production, then maintenance window ends.
To satisfy a good SLO or SLA, it’s very important that operator is able to find any fault and issue in time after production system goes live, so a good monitor and duty trigger system is the key.
SoftLayer offers both Standard Monitoring Services and Nimsoft Monitoring to ensure users are always aware of any issues with their devices. Standard Monitoring Services include features like Ping, IPMI Statistics and NOC Monitoring, while Nimsoft Monitoring consists of three levels of monitoring, including Basic, Advanced and Premium monitoring.
Those monitor tools be able to monitor most of potential risk or issue in system, for example, the CPU, RAM, Swap resource, whether the server can be accessible, or etc.
The notification mail will be forward to pager duty. PagerDuty provides alerting, on-call scheduling, escalation policies and incident tracking to increase uptime of your apps, servers, websites and databases.
Pager duty is able to define on duty schedule plan for each operator, all the alert can be pager to right operator via SMS, phone call periodically, the alert can be escalated to other people once primary operator isn’t available according to escalation policies.
The operator can acknowledge the alert first, then trouble shooting for the issue on production. He or she can mark the alert resolved and record the steps once the issue is fixed.
The biggest and hardest part to ensure CD and DevOps is to change thinking in organization. We experienced the procedure to introduce the mechanism to DET lead, PMOM focal, management and Dev & Test team. Together with mindset change process, we keep summarizing lesson learn, and keep moving forward with continuous improvement and target adjustments.