Storage

Amp up your Spark: three keys to successful deployment

Share this post:

Today, the advantages of an integrated nationwide electricity infrastructure are obvious.  But in the early days of electrical power in the late 1800s, electricity was available only through small, isolated networks.  AC and DC grids competed with one another. This mishmash of isolated systems was costly, inefficient and inflexible.

Now IT directors are often faced with a similar situation with Apache Spark. It’s creating a huge amount of excitement as a framework for powering big data analytics. And we are seeing many examples of individual, isolated Spark clusters as different lines of business or functional groups set up their own infrastructure to learn to take advantage of the power of Spark.

With its known advantages and increasing maturity, Spark is set to move into the mainstream. But businesses are not always sure how best to take the next step on the path to move Spark from its early adopter stage to production-level deployment. There are three important factors that can help organizations adopt Spark successfully.

Share

Shared infrastructure is akin to an electrical grid that can move power from one region of a country to another in response to demand.  By sharing computing infrastructure across applications and business groups, resources that would otherwise be idle due to lack of local demand can be made available to meet other groups’ current workload demands. This approach improves service levels and reduces costs, since the resulting greater utilization allows the same amount of work to be accomplished using fewer resources.  In one real-world case, a large corporation is able to run its applications on a shared infrastructure of approximately 16,000 cores.  If these applications were in isolated silos, each sized to manage local peak demand, they would need about 28,000 cores.  That’s more than a 40 percent reduction in required servers.

Integrate

When you deploy Spark, you need other components.  Spark requires a resource scheduler to allocate work to available servers.  It also requires data management to handle the data that you are going to analyze, as well as the results of that analysis. And you need to monitor and report on the state of the system.  This is rather like the different components of the power supply system, such as power generation plants, high-voltage long-distance transmission of power, and local distribution, all of which work together to bring electrical power to the home.  One important consideration is how willing–or, indeed, able–your organization is to build a solution from individual components, instead of buying a complete solution from a single vendor: putting together open-source solutions requires significant time and expertise. With an integrated solution from a vendor, you know you have a single point of contact to help get the system up and running – and to keep it running if you have problems.

Adapt

Finally, you need to be sure not to get locked into an inflexible solution.  Think about how power generation has changed over the years, including sources such as coal, hydroelectricity, nuclear, solar, wind and wave energy, all of which have been successfully incorporated into the power grid. Given the rapid pace of Spark development, a system to handle multiple versions of Spark efficiently and flexibly is vital.  And powerful as it is, Spark is not the solution to everything and won’t be around forever. You need an infrastructure that has the flexibility to work with Spark and other solutions, and to handle whatever comes after Spark.

Armed with a good understanding of these issues and how to address them, you will be ready to “amp up your Spark.” For more about deploying Spark in an enterprise environment, please check out this webcast.

More Storage stories

Data protection modernization: What’s it worth?

Integrated infrastructure, Modern data platforms, Storage

What is data protection modernization worth? For many companies, it’s priceless! To understand why, let’s look at major trends having a significant impact on data protection and how companies are evolving their data protection solutions to address new requirements. Enormous growth in digital content, broad adoption of AI and data analytics, additional regulatory reporting requirements ...read more


IBM Storage builds on leadership for containers

Cloud object storage, Modern data platforms, Real-time analytics...

Next directions to accelerate cloud native Containers are an increasingly important element in the design techniques for enterprises to develop and deliver applications and services with greater speed and agility as part of their cloud-native transformation. IBM Storage has taken a position in enabling the data management and protection of containers with open-source Kubernetes integrations. ...read more


New smart multicloud storage solutions for businesses

Flash storage, Integrated infrastructure, Storage...

Today IBM is announcing a broad spectrum of innovations, enhancements and new features across our entire storage portfolio, aimed at providing leading-edge solutions for 21st century business and technology challenges. I hear from business leaders every day about their aspirations as well as their challenges and pain points. They demand the highest levels of performance, ...read more