March 17, 2016 | Written by: James Young
Categorized: Data Analytics | What's New
Share this post:
What if you could apply the “Operating System of Big Data Analytics” to all the data you have stored in your NoSQL database? It would be the best of both worlds – IBM Analytics for Apache Spark excels at processing large volumes of data at high speed, and IBM Cloudant enables massive scalability of applications.
And now, this integration is possible thanks to the Spark-Cloudant connector, available in the cloudant-labs/spark-cloudant project on GitHub, the Spark Packages site and through Bluemix when you sign up for a Spark instance. With the Spark-Cloudant connector, you’ll be able to:
- Load entire databases into a Spark cluster for analysis
- Read from a Cloudant secondary index to pull a filtered subset or cleansed version of your Cloudant JSON
- Transform or filter your data and write it back into Cloudant or another data source
- Conduct federated analytics over disparate data sources such as Cloudant, dashDB and Object Storage
We’ve already started using the Spark-Cloudant connector – in “Sentiment Analysis of Reddit AMAs,” we show how we conducted a sentiment analysis of an IBM-hosted Ask Me Anything (AMA) on Reddit, using the Spark-Cloudant connector, Simple Data Pipe and Watson Tone Analyzer.
To learn more about getting started with the Spark-Cloudant connector and to see an example of the connector in action, see Introducing Spark-Cloudant, an open source Spark connector for Cloudant data.