Let’s face it - IT infrastructure is expensive. This is especially true of the type of infrastructure used for clustered High Performance Computing (HPC) and analytic applications.
For people not familiar with distributed workload management, claims around cost-savings can sound fanciful, so in short this post I wanted to take the time to clearly explain some of the money savers.
Here are the ten effective ways that IBM Platform Computing software can help you do more with less:
Running applications more efficiently: While we often focus on topics like clock speed and bandwidth as measures of performance, in distributed applications latency is what matters. This is especially true as computers get faster. If I have a series of tasks where each executes for 2 seconds, but I spend a full-second getting each task scheduled to a cluster node, I’ve just lost 33% of my productivity and added 50% to my cost. Platform Computing schedulers avoid this pitfall by providing as low as sub-millisecond scheduling latency removing the “white space” between jobs and keeping infrastructure close to 100% utilized.
Sharing infrastructure more liberally: Everyone appreciates the value of sharing, but in complex IT environments it is often easier said than done. Various technical barriers and concerns about ownership, security and SLAs can cause line-of-business owners to purposely spend more on replicated infrastructure just to avoid these risks. As an example, an organization may deploy one or more Hadoop clusters along with separate clusters for other distributed workloads because they are unable to consolidate the workloads onto a single cluster. It turns out that Platform Computing software is very good at sharing resources among diverse applications. It can protect resource ownership,
automate service deployment, and facilitate lending and borrowing among tenants while guaranteeing resource access based on policy.
Harvesting idle compute cycles: With smarter workload scheduling, clusters can be made more efficient, but what about all of the other resources in the data center? During overnight hours or other periods of low activity, desktops, application servers, hypervisors and other data center assets may sit idle. The resource harvesting capabilities in Platform Computing schedulers allow applications to appropriately tap these idle cycles, stepping around business critical workloads and helping organizations get more work done while avoiding the need to expand the size of dedicated cluster.
Accelerating application development and deployment: Developers know that building distributed applications can be hard. For this reason, they often deploy separate sandbox or test and development clusters for the express purpose of developing and debugging new applications. Platform Computing simplifies this process with a freely downloadable Developer Edition that simulates application behavior of a large-scale production cluster on a standalone computer or VM. This helps reduce the amount of infrastructure required for development while simultaneously reducing development time for a multi-dimensional win – better developer productivity, lower infrastructure requirements and faster time to market for new application services.
Avoid storage costs due to avoidable data replication: For many organizations, globally distributed data centers are the norm. Teams across these centers frequently need access to remote data and make copies that can be referenced locally. This results in multiple copies of the same dataset introducing challenges with version control while requiring additional storage. Similarly, many organizations deploying Hadoop environments find themselves constantly copying data in and out of the Hadoop file system (HDFS) resulting in additional data replication. By deploying IBM GPFS, both of these problems are avoidable. Organizations can maintain a single global name space that exploits wide-area data caching to reduce total storage footprint without compromising performance. Also, GPFS allows both Hadoop and non-Hadoop workloads to share common datasets avoiding the need for redundant copies of data while accelerating workflows.
Improving availability and guaranteeing SLAs: Downtime can be expensive. In application areas like Electronic Design Automation (EDA) or financial risk, cluster downtime can result in large numbers of professionals sitting idle and real costs in terms of opportunity or lost productivity. In this case of regulated environments, the failure to meet batch windows or reporting deadlines can result in SLA breaches and additional levied costs or penalties. Platform Computing schedulers incorporate a whole range of features to ensure that clusters almost never go down and that SLAs can be guaranteed thus avoiding downtime related risks.
Slowing the rate of infrastructure growth: Many of the cost savers discussed above help firms get more done with less infrastructure. As environments evolve year over year, total costs including equipment, software licenses, power, cooling, facilities and personnel can be very large. Slowing the annual growth rate even by even a small amount has a cumulative impact that can result in dramatic savings over many years. By slowing infrastructure growth, organizations may find themselves able to defer or avoid major capital investments like data center expansions or new construction.
Reducing management and personnel costs: In any data center, personnel costs are significant components of overall IT spending. Because the Platform Computing software impacts operations as well as application development and maintenance, advantages related to life-cycle management allow organizations to manage larger environments with fewer personnel. Platform Computing products are frequently deployed in large cluster environments, and extensive work has been done in areas like manageability, diagnosability and sustaining capabilities to maximize management efficiency. This leaves organizations free to redeploy some personnel from operations into other roles more strategic to the business.
Decreasing training and support costs: Related to the point above, deploying intuitive interfaces and automating processes is essential to driving efficiency. By providing easy-to-use interfaces for in-house and commercial applications that constrain usage based on role, application managers can significantly reduce training and support costs. To the extent that otherwise manual multi-step workflows can be automated and hardened, training and support costs can be further reduced. By automating workflows and ensuring their resiliency, problems associated with human factors can be minimized avoiding situations where expensive compute capacity is wasted due to improperly structured jobs or cases where deadlines are missed owing to errors in multi-step processes.
Avoiding hidden or unexpected costs: Distributed environments are complex and it is hard to foresee every capability that might be needed and anticipate every risk. The more capable and feature rich the offering, the less likely it is that sites will encounter down-stream issues that are disruptive to the business and require significant investment to address. Some examples of requirements that may emerge downstream could be the need to deploy new functionality like user-portals, data-aware scheduling solutions, session-oriented scheduling or the need to provisioning infrastructure on demand or cost-efficiently address peak periods by tapping a cloud-based service for overflow capacity. The breadth of the capabilities in the Platform Computing portfolio along with the wide range of applications supported mean that downstream surprises resulting in unforeseen costs are less likely to occur.
To learn more about these and other opportunities to save money, check out the exclusive e-book published by IBM Platform Computing. You can also connect with us on Twitter @IBMSDE for frequent updates!
Gord J. Sissons
Product Marketing Manager - IBM Platform Symphony