Spark

End-to-End IoT Data Pipelines in IBM Bluemix: Introducing the Message Hub Object Storage Bridge

With the advent of sensors in all walks of life, the Internet of Things is on a path to generate the Biggest Big Data our planet has known. How can we successfully harness this ocean of data? We need end-to-end IoT data pipelines that collect, store, and analyze the data. In this blog, we introduce the Message Hub Object Storage bridge and show how it can enable just such an end-to-end IoT data pipeline on IBM Bluemix.

Continue reading

Query many data sources as one: IBM Queryplex for data analytics

Queryplex runs advanced analytics (SQL, Python, R, PySpark, etc) across many devices and data sources as though they are a single consolidated data repository. The technology can be used to erase data silos of multiple databases (e.g. Oracle, DB2, PostgreSQL, Netezza), or compute analytics across tens of thousands of distributed Internet of Things devices where data may be stored in smaller repositories (text files, Excel spreadsheets, Informix, MySQL). Queryplex let's you query many data sources at once with a single statement, whether they are large repositories, small devices, or any combination of them.

Continue reading

Migrate Analytics for Apache Spark Notebooks to the IBM Data Science Experience

Hello IBM Analytics for Apache Spark Users, We will be discontinuing support for the Jupyter notebooks on Bluemix as of April 6, 2017. As an IBM Analytics for Apache Spark user, we’d like you to leverage the latest Jupyter Notebooks on the Data Science Experience by migrating your notebooks (Spark 1.4.1 and later versions). Please […]

Continue reading

IBM Analytics for Apache Spark – Personal Plan price and name change

We’re excited to announce new pricing and a name change for the IBM Analytics for Apache Spark Personal Plan. The pricing change is effective November 1, 2016.

Continue reading

Hadoop cluster up and running in minutes!

IBM BigInsights for Apache Hadoop now has a new service plan - Basic (now in Beta). With this you can now have a new Hadoop cluster up and running in under 10 minutes.

Continue reading

Fast-tracking Business Apps with Apache Spark Submit

Up to now, we had one entry point to IBM’s Spark cloud service, which was Jupyter Notebooks. By adding Spark-submit as a new way to access the service, IBM is enabling remote programmatic access to Spark clusters on the IBM cloud, so that external applications can seamlessly execute data processing on Spark. This ability to build direct connections from any user-facing application will bring new power and intelligence to day-to-day operations. More...

Continue reading

Accelerating data science with Jupyter Notebooks and Apache Spark

The newly integrated notebooks within the IBM Analytics for Apache Spark service aim to provide analysts and data scientists with an iterative, flexible environment that supports end-to-end analysis. Melissa Rodriguez Zynda, IBM Offering Manager for Analytics Platform Services, has previously worked as a design researcher for IBM Watson Analytics and Social Media Analytics. A designer by training with a background in anthropology, Melissa explains how notebooks transform the way users interact with technologies such as Apache Spark to rapidly analyze data.

Continue reading

Spark-Cloudant Connector for In-Memory Analytics of JSON Data

The Spark-Cloudant connector allows you to apply the power of Apache Spark to all the data you have stored in your NoSQL Cloudant database.

Continue reading

Apache Spark: Upgrade and speed-up your analytics

One of the best things about Apache Spark is that it makes real-time analytics of vast unstructured datasets – like social media sites – feasible and affordable for companies of all sizes. But what are the practicalities of performing this kind of analysis? And how would you get started? Chetna Warade, Developer Advocate at IBM, is a software engineer who works in research and product development. We spoke to Chetna about a recent project to demonstrate the potential of Spark for social media analytics, which focused on the popular “Ask Me Anything” (AMA) section of the social news and entertainment site, Reddit.

Continue reading