September 21, 2022 By Asena Hertz 6 min read

This is the second article of the blog series, “Mastering Cloud Cost Optimization.”

In this article, we will share the frameworks employed by organizations that were successful with their digital transformation and cloud optimization journey.

The cloud spend optimization initiative (and challenge)

In 2017, RightScale’s (now Flexera) State of the Cloud report listed “Optimizing existing cloud spend” as the top initiative for cloud users (53%) for the first time, replacing the 2016 top initiative of “Moving more workloads to cloud”:

The cost optimization initiative has remained number one in every report since, and in 2020, 73% listed it as their top initiative:

So why do organizations still find cloud cost savings as their top cloud initiative and top challenge? As you probably already know from your own experience, it isn’t as easy as it sounds.

In our previous article, “Mastering Cloud Cost Optimization: The Principles” we covered the main cloud cost optimization challenges and the core principles needed to accomplish a well-architected and continuously optimized cloud environment. Before proceeding, make sure to read it for more details and context.

It’s time for action

For years, organizations attempted to achieve cost optimization by focusing on reporting. This included chargeback reports, long excel spreadsheets and dashboards with pretty charts and graphs.

Sadly, this approach rarely works. Staring at data will not reduce a cloud bill, nor will sending reports back and forth.

To be clear, this is not to minimize the importance of cost visibility and reporting — cost visibility is a critical foundation of any organization’s cloud cost optimization strategy and required to establish the needed accountability, but it is not enough. To optimize a cloud environment, you must act and execute actions — but again, this is easier said than done.

To execute actions, one needs to understand what actions to take, when to take them and what the implications of that action will be beyond just cost savings.

We have seen and talked to organizations that created an internal process to identify, analyze and execute cloud optimization actions. Many admitted the process is time-consuming, cumbersome, manual and not scalable, especially in large cloud environments where their efforts had limited impact.

The solution? Automation — the ability to execute optimization actions without any human intervention. From a technical perspective, automation is not hard, especially in public clouds, which offer well documented and robust APIs. The main two challenges with achieving automated cloud optimization are complexity and trust.

Multidimensional complexities

The complexity refers to the process of generating an accurate and actionable cloud optimization action, such as rightsizing a virtual machine (VM)Platform-as-a-Service (PaaS) or even a container.

For example, to properly resize a single workload, you first need to observe the utilization of its resources across multiple metrics (e.g., CPU, memory, IOPS, network, etc.), including monitoring application performance metrics like response times and transactions. The next step is to analyze all the data and determine the best target instance type/SKU out of a massive, ever-growing catalog of configuration options (and prices) offered by the cloud vendor. Then, once the target configuration has been identified, additional constraints must be considered, such as organization policies, OS driver requirements and storage type support. This image illustrates the multiple dimensions that should be considered when scaling a workload:

The journey to automation requires trust

To get organizations to agree to automate actions, you must earn their trust that the actions are accurate, safe and will not hurt the performance of the applications, especially in production.

Trust requires time and a structured approach; it is a journey with multiple stages, and it is closely aligned with the public cloud maturity model:

  • Start with visibility: The first step is to gain visibility into the entire cloud environment and get a sense of the optimization opportunity at hand. This step includes identifying and aggregating all accounts and subscriptions, understanding the overall spend and the commitments made to the cloud providers and tagging and labeling the different workloads based on their purpose, owner and environment (e.g., prod, test, dev, etc.) This must be done across all subscriptions/accounts.
  • Tackle the low-hanging fruit first: The first area that we recommend starting with, mainly since it is the path of least resistance, is terminating unused resources like idle, unnecessary VMs, load balancers, public IPs, unattached volumes and old snapshots. A significant amount of savings can be gained in this stage.
  • Purchase one-year reservations for production: We also recommend that while focusing on non-prod, you should consider purchasing one-year reservations for production. The reason is that optimization takes time — there is no way around it. By purchasing one-year reserved instances or savings plans, you will be able to save 30-40% as you hone your more advanced optimization skills on the non-prod estate. The reason for one-year vs. three-years is that the goal is to build a more sophisticated optimization plan for production during the first year, which will include scaling the production workloads to their optimal size (e.g., rightsizing) and then buying new reservations based on the optimized instance type/SKU.
  • Implement scheduled suspension: Suspending non-prod workloads after hours can yield instant and rather substantial savings. For example, suspending workloads between 6 PM – 6 AM can reduce compute costs by 50%, and the savings will be even higher if suspending during weekends and holidays.
  • Execute IaaS scaling in non-prod environments: At this stage, the savings are noticeable, and many teams are eager to find more savings. We recommend leveraging the BU’s motivation and tackling the non-prod environments with scale actions. We created a maturity curve that focuses exclusively on that stage since it is critical for the success of the optimization efforts:
    • Start with manual action execution: Review every scale action to validate its accuracy and scrutinize it with a handful of stakeholders from various Business Units (e.g., stakeholders from IT/Cloud Ops, Application Team and finance). Execute the action and validate the impact. Take one step at a time and increase the number of actions executed as the confidence grows.
    • Approval workflows: The next step is to implement an approval workflow with your ITSM solution (such as ServiceNow). The optimization scale actions should be routed to the appropriate owner to approve, reject or suggest an adjustment to address elements that were not considered or available when the action was generated. For example, “the suggested instance type is not ideal for this workload since we are planning to double the transactions it will process starting next week.”
    • Maintenance/change windows: As for when to execute the scale actions, start by defining a weekly change window where all approved scale actions will be executed. Over time, expand the scope and frequency of the change windows. Many of our customers are using daily change windows to execute scale actions against non-prod. workloads, the mature ones have moved to full real-time automation, which is the goal.
    • Purchase reservations for non-prod: After the majority of the long-term non-prod workloads have been optimized to their ideal compute configuration, you can now purchase reservations to obtain additional savings.
    • Focus on production: It is time to tackle the production workloads. Leverage all the lessons learned from the non-prod and apply them to the production, following the above steps.
  • Enable real-time Automation: As mentioned, some of our mature customers have enabled real-time automation. Some were able to do so faster than others since they modernized their applications (more on that in the next section).

Application modernization and cost efficiency

Since scale actions on the cloud are disruptive, not all workloads can be resized often; some require graceful shutdown of specific services as part of the scaling process. When an application is modernized to leverage cloud-native architectures and PaaS services, it unlocks the ability to take optimization actions in real-time and leverage automation, without any impact on the application.

Therefore, it is critical that organizations — in parallel to their continuous optimization initiatives — invest in application modernization and architect their applications for cost efficiency by leveraging PaaS Services and cloud-native technologies such as containers and functions (i.e., serverless).

Stay tuned for the next article in this blog series, “Mastering Cloud Cost Optimization: Cloud Cost Models & Discounts Overview.” Leveraging the correct cloud cost model for workloads is one of the most effective methods to reduce cloud costs. The upcoming blog post will provide an overview of the available cost and discount models on the cloud and when to use them.

Learn how to contain cost while preserving performance through automatic continuous cloud optimization with IBM Turbonomic.

Start your journey to assuring app performance at the lowest possible cost. Request your IBM Turbonomic demo today.

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters