How-tos

Get Smarter About Apache Spark

We often forget how new Spark is. While it was invented much earlier, Apache Spark only became a top-level Apache project in February 2014 (generally indicating it’s ready for anyone to use), which is just 18 months ago. I might have a toothbrush that is older than Apache Spark!

Since then, Spark has generated tremendous interest because the new data processing platforms scales so well, is high performance (up to 100 times faster than alternatives), and is more flexible than other alternatives, both open source and commercial. (If you’re interested, see the trends on both Google searches and Indeed job postings.)

Spark gives the Data Scientist, Business Analyst, and Developer a new platform to manage data and build services as it provides the ability to compute in real-time via in-memory processing. The project is extremely active with ongoing development, and has serious investment from IBM and key players in Silicon Valley.

Tips for getting started with Apache Spark

Given the great potential to revolutionize advanced analytics for big data and modern applications, the IBM Analytics for Apache Spark team is frequently asked for our tips on great resources to help get up-to-speed on Spark.

Below is our team’s list of recommended resources that we share with you in anticipation of the IBM Analytics for Apache Spark open beta:

You have no idea what Spark is and want to at least be informed

You want to use Spark and want to understand the basics

You are familiar with Spark and want to continue learning

You are already experienced with Spark and want to reach expert level

Share this post:

Share on LinkedIn

Add Comment
6 Comments

Leave a Reply

Your email address will not be published.Required fields are marked *


Tamar Eilam

This is a great list

Reply

WhitepeakSoftware

Nice compilation of resources!

Few days back, I started the Spark Fundamentals I on bigdataunversity.com. Downloaded the 5+ GB QSE image but was surprised to found that the Spark service is missing (when I started all services) and on digging deeper (when failed to start the spark-shell) found out that the spark binaries are not present in the required folder [the soft link spark-client -> /usr/iop/4.0.0.0/spark is there but the actual binaries are missing].

Had to waste a lot of time troubleshooting. When I could not fix the problem with the Spark image (actually I tried to install and build spark out of desperation), I am now trying to see if the alternate docker images works.

Posted on the help section in bigdataunversity.com but no one replied. I am surprised there are no forums on bigdataunversity.com – searched for it a lot but could not find any related link. Can you help me out please – I need to be quickly up with Spark both for professional and academic reasons

Reply

WhitepeakSoftware

Fortunately now I find that I can get the spark-shell up and running with docker-image but I would love to get the same on the QSE (Biginsights) image – the Apache Ambari simply does not show the Spark service up even though I do “start all” from console or run “restartAll.sh” from terminal. Looking for your input and help!

Reply

WhitepeakSoftware

I am surprised by the author’s unresponsiveness to the problem I faced in bigdatauniversity spark course. It is the author who suggested bigdatauniversity course and when we faced problem and mentioned about it, he was silent. This is big sense of irresponsibility. If you do not know the answer, at least admit it – do not be silent.

Fortunately I could find my answer to the question in forum. The problem was I could not locate the forum link.

Reply

huangdk

Where can I download the spark docker image?

Reply

Luis Arellano

Hi huangdk,

IBM’s Spark-as-a-Service is not available as a docker image, but rather is a fully multitenant cloud service. You simply sign up for a free 30 day trial at the following link:

http://www.ibm.com/analytics/us/en/technology/cloud-data-services/spark-as-a-service/

Cheers,
Luis

Reply
More How-tos Stories

Deploying a CRUD PHP application to Bluemix

PHP developers still spend a lot of time configuring systems and servers to get a PHP application up and running. PHP developers usually go through this lifecycle every time starting a project. Luckily, Bluemix services and runtimes take all this grunt work out of creating web applications.

Installing Docker for Windows: Fixes for common problems

Bluemix includes support for running Docker containers. However, if you're developing on a Windows platform, you may encounter some problems getting the Docker runtime installed and running successfully. This post describes problems I discovered when installing Docker on Windows, how I found advice for fixing those problems, and what I did that successfully resolved them.

Steering a Driving Android Phone over the Web via Speech Recognition in IBM Bluemix

My colleagues Bryan Boyd and Mark VanderWiele have created a nice demo where you can drive smartphones using Sphero balls. I've modified the sample slightly so that it also works for Android phones. Watch the video to see how to steer a driving smartphone via IBM Bluemix, the Internet of Things and cognitive services from IBM Watson.