How-tos

Get Smarter About Apache Spark

Share this post:

We often forget how new Spark is. While it was invented much earlier, Apache Spark only became a top-level Apache project in February 2014 (generally indicating it’s ready for anyone to use), which is just 18 months ago. I might have a toothbrush that is older than Apache Spark!

Since then, Spark has generated tremendous interest because the new data processing platforms scales so well, is high performance (up to 100 times faster than alternatives), and is more flexible than other alternatives, both open source and commercial. (If you’re interested, see the trends on both Google searches and Indeed job postings.)

Spark gives the Data Scientist, Business Analyst, and Developer a new platform to manage data and build services as it provides the ability to compute in real-time via in-memory processing. The project is extremely active with ongoing development, and has serious investment from IBM and key players in Silicon Valley.

Tips for getting started with Apache Spark

Given the great potential to revolutionize advanced analytics for big data and modern applications, the IBM Analytics for Apache Spark team is frequently asked for our tips on great resources to help get up-to-speed on Spark.

Below is our team’s list of recommended resources that we share with you in anticipation of the IBM Analytics for Apache Spark open beta:

You have no idea what Spark is and want to at least be informed

You want to use Spark and want to understand the basics

You are familiar with Spark and want to continue learning

You are already experienced with Spark and want to reach expert level

Add Comment
6 Comments

Leave a Reply

Your email address will not be published.Required fields are marked *


Tamar Eilam

This is a great list

Reply

WhitepeakSoftware

Nice compilation of resources!

Few days back, I started the Spark Fundamentals I on bigdataunversity.com. Downloaded the 5+ GB QSE image but was surprised to found that the Spark service is missing (when I started all services) and on digging deeper (when failed to start the spark-shell) found out that the spark binaries are not present in the required folder [the soft link spark-client -> /usr/iop/4.0.0.0/spark is there but the actual binaries are missing].

Had to waste a lot of time troubleshooting. When I could not fix the problem with the Spark image (actually I tried to install and build spark out of desperation), I am now trying to see if the alternate docker images works.

Posted on the help section in bigdataunversity.com but no one replied. I am surprised there are no forums on bigdataunversity.com – searched for it a lot but could not find any related link. Can you help me out please – I need to be quickly up with Spark both for professional and academic reasons

Reply

WhitepeakSoftware

Fortunately now I find that I can get the spark-shell up and running with docker-image but I would love to get the same on the QSE (Biginsights) image – the Apache Ambari simply does not show the Spark service up even though I do “start all” from console or run “restartAll.sh” from terminal. Looking for your input and help!

Reply

WhitepeakSoftware

I am surprised by the author’s unresponsiveness to the problem I faced in bigdatauniversity spark course. It is the author who suggested bigdatauniversity course and when we faced problem and mentioned about it, he was silent. This is big sense of irresponsibility. If you do not know the answer, at least admit it – do not be silent.

Fortunately I could find my answer to the question in forum. The problem was I could not locate the forum link.

Reply

huangdk

Where can I download the spark docker image?

Reply

Luis Arellano

Hi huangdk,

IBM’s Spark-as-a-Service is not available as a docker image, but rather is a fully multitenant cloud service. You simply sign up for a free 30 day trial at the following link:

http://www.ibm.com/analytics/us/en/technology/cloud-data-services/spark-as-a-service/

Cheers,
Luis

Reply
More How-tos Stories

Setting Access Control Policies for IBM Cloud Object Storage

As your organization explores more digital initiatives, including cloud and mobile, the importance of identity and access management (IAM) is paramount. Nearly all IT decision makers we talk with agree that IAM is essential to the success of their company’s cloud adoption and it is seen as a key enabler for mobility, analytics and IoT initiatives.

Continue reading

Home automation powered by Cloud Functions, Raspberry Pi, Twilio and Watson

Over the past few years, we’ve seen a significant rise in popularity for intelligent personal assistants, such as Apple’s Siri, Amazon Alexa, and Google Assistant. Though they initially appeared to be little more than a novelty, they’ve evolved to become rather useful as a convenient interface to interact with service APIs and IoT connected devices.

Continue reading

New tutorials to get you started with the IBM Cloud

Looking to build your next project on the IBM Cloud and not sure where to start?

Continue reading