Data Analytics

Spark-Cloudant Connector for In-Memory Analytics of JSON Data

Share this post:

What if you could apply the “Operating System of Big Data Analytics” to all the data you have stored in your NoSQL database? It would be the best of both worlds – IBM Analytics for Apache Spark excels at processing large volumes of data at high speed, and IBM Cloudant enables massive scalability of applications.

And now, this integration is possible thanks to the Spark-Cloudant connector, available in the cloudant-labs/spark-cloudant project on GitHub, the Spark Packages site and through Bluemix when you sign up for a Spark instance. With the Spark-Cloudant connector, you’ll be able to:

  • Load entire databases into a Spark cluster for analysis
  • Read from a Cloudant secondary index to pull a filtered subset or cleansed version of your Cloudant JSON
  • Transform or filter your data and write it back into Cloudant or another data source
  • Conduct federated analytics over disparate data sources such as Cloudant, dashDB and Object Storage

We’ve already started using the Spark-Cloudant connector – in “Sentiment Analysis of Reddit AMAs,” we show how we conducted a sentiment analysis of an IBM-hosted Ask Me Anything (AMA) on Reddit, using the Spark-Cloudant connector, Simple Data Pipe and Watson Tone Analyzer.

To learn more about getting started with the Spark-Cloudant connector and to see an example of the connector in action, see Introducing Spark-Cloudant, an open source Spark connector for Cloudant data.

More stories
April 30, 2019

Introducing IBM Analytics Engine v1.2 and Announcing the Deprecation of IBM Analytics Engine v1.0

We are excited to inform you about the new version of IBM Analytics Engine v1.2 that will be available starting May 15, 2019. Along with this release, Analytics Engine v1.0 will be retired.

Continue reading

April 23, 2019

Announcing the Deprecation of the Watson Machine Learning JSON Token Authentication Service

We’d like to inform you about the deprecation of the Watson Machine Learning JSON Token Authentication service. This method of authentication will be retired on May 30, 2019.

Continue reading

April 19, 2019

Introducing IBM Cloud Object Storage Firewall: Further Secure Your Data

IBM Cloud Object Storage (COS) is giving you more control over who can access your data. We have introduced a new capability allowing you to configure your buckets with trusted IP address(es) that will dictate access to the data in COS.

Continue reading