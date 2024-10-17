The IBM® Software Site Reliability Engineering (SRE) organization is responsible for the reliability and security of the IBM software as a service (SaaS) and managed services platform, which spans multiple cloud platforms, including IBM Cloud®, AWS, Microsoft® Azure and Google Cloud Platform. The Software SRE team delivers a diverse array of SaaS solutions to hundreds of enterprises, across industries, worldwide.
Due to the breadth and complexity of the platform, many common vulnerabilities and exposures (CVE) are potentially relevant and might need to be mitigated. In addition, thousands of application certificates must be properly maintained to protect uptime.
It’s the Software SRE team’s job to determine exactly which CVEs need to be mitigated, and which certificates need to be renewed or replaced to support the performance and security of IBM SaaS solutions. Until recently, this meant a lot of manual work. “We’d have a pile of 1,000 to 2,000 CVEs a night,” says Marc Velasco, Site Reliability Engineer for the IBM SaaS platform. “It’s like a haystack of information. And our challenge is, how do we find the needles—the CVEs that we really need to patch?”
Previously, the Software SRE team approached the CVE challenge like many other SRE teams in the industry. They used Twistlock software, part of Palo Alto Prisma Cloud, to report potentially relevant CVEs. And separate teams, each responsible for a specific aspect of the platform, manually analyzed the CVEs to determine priorities and actions for their area. The teams also had to manually search for, and mitigate, any certificates not covered by the organization’s automated certificate management system.
This work consumed a substantial amount of time. With finite resources, the team always sought ways to be more efficient. “There are only so many SREs we can throw at this,” says Velasco. “So, how do we turn all that information into something that’s actionable and prioritized?”
Enter IBM Concert®.
Using the Concert tool, the Software SRE team automates CVE analysis and certificate inventory.
For CVEs, the team feeds scan data from Twistlock into Concert, which generates written summaries of each CVE, including concrete, actionable suggestions for addressing vulnerabilities. It also produces an interactive map that shows how each CVE relates to all areas of the IBM SaaS platform.
“Concert does the cross-reference and gives us the contextual information: Here’s the CVE, here are the risks associated with it, here’s the mitigation, and here’s the applicability of it. That’s really helped,” says Velasco. “We had all these different squads doing that same operation in silos, whereas Concert’s bringing us together, allowing us to aggregate that information.”
Velasco adds that the team uses the Concert chat feature, powered by the IBM watsonx™ platform, to expand their understanding of the actual risks posed by CVEs. This deeper knowledge allows them to accelerate prioritization and address the most critical items more quickly. “Our SRE teams can ask questions that weren’t possible to answer before: What is our risk posture across the organization, across IBM Software, across the vast array of disparate teams, technologies, and applications? Concert gives me the ability to see, for a given application, specifically what components or packages are really introducing risk—and how much. We can see potential impact throughout the software development lifecycle and production environments, including runtime.”
For certificates, the team now uses Concert to cross-check existing certificates against the list of managed certificates. The solution automatically verifies non-managed items and alerts the team about expired or non-managed certificates.
Finally, the Software SRE team also uses the workflow management feature of Concert, which integrates with tools like JIRA, ServiceNow and Git. The feature helps streamline the assignment and management of tickets, which prompts faster responses where mitigations are needed.
Before they used Concert, in a typical week, the Software SRE team estimates it could spend nearly 90 person hours triaging, analyzing, and remediating CVEs. Over their first six weeks using Concert, the team eliminated 80 hours of manual work per week on average and completed CVE mitigation processes more than 90% faster than before*.
Certificate inventory management can demand an estimated 4.5 hours per month. In the first month of using Concert, the team completed those processes in about five minutes—98% quicker*.
And with so much time saved, the team can do more to support IBM SaaS solutions. “The biggest thing is the scalability it brings,” says Velasco. “It allows us to scale our resources, and address more risk more quickly, in a way that we just couldn’t do otherwise. And that means our SREs can focus more on automation and coding to improve the reliability of our hosted services.”
*Data gathered from teams deploying on public clouds with existing managed or SaaS services and existing CVE scanning tools and processes and certificate management tools. Teams reported data from various cloud providers and scanning tools and processes. Data is based on estimates and average certificate analysis volume and average weekly CVE volume and analysis workload.
The IBM Software SRE organization is a global team focused on delivering highly available and scalable production SaaS for IBM software products. The Software SRE team provisions, deploys, monitors, maintains and manages incidents by standardizing tooling, processes, automation, runbooks and practices. The Software SRE team works closely with IBM Software development teams to design and implement changes, providing a highly resilient service throughout the software lifecycle.
