Community

Can Apache Spark MLlib help you find a cab in NYC?

Share this post:

As humans, grouping and classifying similar objects is a common way we make sense out of an otherwise cluttered world. Your kitchen cabinets are a great example. More than likely, you stack plates with other plates, spoons with other spoons, etc.

Applying this same principle to massive amounts of messy data is a more complex endeavor. Clustering algorithms are one way data scientists and developers group and classify data. K-means clustering is a widely used clustering algorithm and has loads of practical applications. For example, you might want to identify neighborhoods in a certain city that share similar characteristics. Or examine GPS data to better understand driver behavior.

The K-means clustering algorithm is supported by Apache Spark’s machine learning library, MLlib. Follow along in this video while IBM’s Dan Kikuchi demos this technology using IBM Analytics for Apache Spark, a managed Spark-as-a-service offering available on Bluemix.

With Jupyter notebooks built right in to the Spark service, as well as an integrated Object Storage offering, Dan uses Spark to load NYC taxi data, and leverages Spark MLlib and the K-means clustering algorithm to determine the top 3 drop-off spots for New York City cabs.

But don’t stop with there! Head over to Bluemix to try IBM Analytics for Apache Spark for free and start experimenting for yourself.

More stories
May 7, 2019

We’ve Moved! The IBM Cloud Blog Has a New URL

In an effort better integrate the IBM Cloud Blog with the IBM Cloud web experience, we have migrated the blog to a new URL: www.ibm.com/cloud/blog.

Continue reading

April 19, 2019

Reach Out to the IBM Cloud Development Teams on Slack

Get the help you need fast—directly from the IBM Cloud Development Teams and other users on Slack.

Continue reading

April 11, 2019

Permanent Redirect to cloud.ibm.com from console.bluemix.net

Starting on April 27, 2019, we will be turning on permanent redirects from bluemix.net to cloud.ibm.com. All of the same functionality that existed on bluemix.net is still available in cloud.ibm.com.

Continue reading