How-tos

Get Smarter About Apache Spark

Share this post:

We often forget how new Spark is. While it was invented much earlier, Apache Spark only became a top-level Apache project in February 2014 (generally indicating it’s ready for anyone to use), which is just 18 months ago. I might have a toothbrush that is older than Apache Spark!

Since then, Spark has generated tremendous interest because the new data processing platforms scales so well, is high performance (up to 100 times faster than alternatives), and is more flexible than other alternatives, both open source and commercial. (If you’re interested, see the trends on both Google searches and Indeed job postings.)

Spark gives the Data Scientist, Business Analyst, and Developer a new platform to manage data and build services as it provides the ability to compute in real-time via in-memory processing. The project is extremely active with ongoing development, and has serious investment from IBM and key players in Silicon Valley.

Tips for getting started with Apache Spark

Given the great potential to revolutionize advanced analytics for big data and modern applications, the IBM Analytics for Apache Spark team is frequently asked for our tips on great resources to help get up-to-speed on Spark.

Below is our team’s list of recommended resources that we share with you in anticipation of the IBM Analytics for Apache Spark open beta:

You have no idea what Spark is and want to at least be informed

You want to use Spark and want to understand the basics

You are familiar with Spark and want to continue learning

You are already experienced with Spark and want to reach expert level

Add Comment
6 Comments

Leave a Reply

Your email address will not be published.Required fields are marked *


Tamar Eilam

This is a great list

Reply

WhitepeakSoftware

Nice compilation of resources!

Few days back, I started the Spark Fundamentals I on bigdataunversity.com. Downloaded the 5+ GB QSE image but was surprised to found that the Spark service is missing (when I started all services) and on digging deeper (when failed to start the spark-shell) found out that the spark binaries are not present in the required folder [the soft link spark-client -> /usr/iop/4.0.0.0/spark is there but the actual binaries are missing].

Had to waste a lot of time troubleshooting. When I could not fix the problem with the Spark image (actually I tried to install and build spark out of desperation), I am now trying to see if the alternate docker images works.

Posted on the help section in bigdataunversity.com but no one replied. I am surprised there are no forums on bigdataunversity.com – searched for it a lot but could not find any related link. Can you help me out please – I need to be quickly up with Spark both for professional and academic reasons

Reply

WhitepeakSoftware

Fortunately now I find that I can get the spark-shell up and running with docker-image but I would love to get the same on the QSE (Biginsights) image – the Apache Ambari simply does not show the Spark service up even though I do “start all” from console or run “restartAll.sh” from terminal. Looking for your input and help!

Reply

WhitepeakSoftware

I am surprised by the author’s unresponsiveness to the problem I faced in bigdatauniversity spark course. It is the author who suggested bigdatauniversity course and when we faced problem and mentioned about it, he was silent. This is big sense of irresponsibility. If you do not know the answer, at least admit it – do not be silent.

Fortunately I could find my answer to the question in forum. The problem was I could not locate the forum link.

Reply

huangdk

Where can I download the spark docker image?

Reply

Luis Arellano

Hi huangdk,

IBM’s Spark-as-a-Service is not available as a docker image, but rather is a fully multitenant cloud service. You simply sign up for a free 30 day trial at the following link:

http://www.ibm.com/analytics/us/en/technology/cloud-data-services/spark-as-a-service/

Cheers,
Luis

Reply
More How-tos Stories

Serverless computing and Watson service chaining via OpenWhisk : Part 2 of 3 chaining the services

In Part 1 of this series, you learned the basics of Serverless computing and the building blocks behind OpenWhisk. In this post, you will create Watson Services and add them to an OpenWhisk Sequence on IBM Bluemix. As our post is all about chaining Watson Services using OpenWhisk, in this section you will create three Watson Services

Continue reading

Conversation models as business assets ( part 2 )

This is the second of two posts that consider the value of a chat model beyond just a chatbot. Natural language is taking root as an everyday user interface. You really don’t have to look very far to see how much of an impact that language interfaces have these days. While chatbots are a powerful […]

Continue reading

Mobile Apps Offline and Online – Part 3

In the dynamic and ever-changing realm of mobile, context is critical to the success of your applications. Users may be at home sitting on the couch, or they could be on top of a mountain with very limited connectivity. There’s no way to predict where someone will be when they’re using your app, and as many of us painfully know already, there is never a case when you are always online on your mobile devices. Well, this doesn’t always have to be a problem. Regardless of whether your app is online or offline, it is important that your app does what it needs to do – solve a problem and provide value. This three-part tutorial will walk through the creation of a sample application called GeoPix, which leverages IBM MobileFirst on IBM Bluemix to capture data and image attachments locally (even offline) and replicate those changes to an online data store so that the user experience is never compromised.

Continue reading