By James Kobielus
Open data science is proving to be a seedbed for innovation in the cloud-centric world economy. Innovators in every industry are at the forefront of using the tools and techniques of open data science to build new designs for working and living.
Open data science projects are revolutionizing the fabric of business, development, and IT in industries everywhere. Creativity comes when people from many backgrounds, roles, and skillsets use open-source data-science tools—such as Spark, R, and Hadoop–to develop and deploy new designs for working and living.
Open teaming collaborations are essential for unlocking this data-science creativity. As IBM VP Rob Thomas recently discussed in this blog, these tools are essential for making the most of powerful new cloud data services environments such as the Internet of Things (IoT). And for a deep dive on how a new open-source tool called Quarks can help teams leverage Spark to drive algorithmic intelligence to the edge of the IoT, check out this blog I published this past February.
Data science initiatives can foster innovative designs and disruptive applications when teams combine the following roles and skillsets in pursuit of common objectives:
- Data scientists use data science tools for teasing out the insights they’re looking for and for making them actionable immediately through applications, visualizations, and other consumables
- Business analysts use statistical exploration tools to answer domain-specific questions quickly, easily, and without needed for IT assistance
- Application developers use algorithmic capabilities to endow their apps with cognitive smarts that learn from fresh data and take actions that are continually optimized in keeping with contextual, predictive, and environmental variables
- Data engineers build data-processing pipelines that leverage machine learning, stream computing, and other capabilities to ingest data from disparate sources, aggregate and cleanse it, and deliver it downstream to smart applications of all sorts.
Open analytics tools provide a critical enabler for decentralized teams to develop innovative applications in a complex world. The pivotal importance of Spark and R in these efforts stems from the fact that they:
- Facilitate the democratization of self-service data analytics development across enterprises and communities, especially when these programming tools are accessible from within teams’ primary development workbench
- Enable distributed teams to address bigger data-centric problems and reap commensurately larger business results more rapidly, especially when accessed in a shared public-cloud service
- Accelerate development of high-performance analytic apps rapidly, flexibly, and easily, especially when used with browser-based notebooks that support code, text, interactive visualization, math, and media
- Provide a unified execution model for big data processing and analytics capabilities all in one environment, especially when deployed in conjunction with Hadoop, NoSQL databases, and other cloud-based data platforms
- Reduce the amount of code and number of tools needed to combine a deep stack of cognitive capabilities in a single app, especially when used in conjunction with rich libraries of machine learning, streaming analytics, graph computing, natural language processing, and other algorithms
- Allow teams to refine analytic applications interactively and iteratively, especially when used in conjunction with data and model governance features that are integrated into the data lakes around which the data-science development lifecycle revolves.
A common theme in open data science initiatives is the use of Spark to develop applications that deliver predictive, real-time, and/or machine learning capabilities to the point of action. Leveraging these and other capabilities, IBM customers are using Spark for such applications as:
- Anomaly detection in cybersecurity and anti-fraud,
- Real-time recommendation engines in e-commerce,
- Predictive maintenance in the Internet of Things,
- Targeted offers in outbound marketing,
- Customer experience optimization in mobile apps,
- Predictive merchandising through in-store beacons,
- Real-time performance insights in competitive athletics,
- Automated pattern detection in the physical sciences, and
- Churn reduction in customer relationship management
Furthermore, R&D communities around the world are experimenting with a dizzying range of new apps that address opportunities to bring Spark, R, and open data science into every sphere of our lives. As I discussed in this post several months ago, the Spark Technology Center has many projects in development, such as these:
- RedRock: This is a Spark app that lets the user act on real-time data driven insights discovered from Twitter. It transforms a huge volume of Twitter data into an easy-to-digest set of visualizations accessible to a general audience.
- Bluemix Genomics: This Spark app enables scientists to understand how genetics contribute to complex disease. It enables processing and analysis of massive amounts of genome data.
- AMBER Alert Aid: This Spark app enables broadcasting of the most serious missing children cases through AMBER Alert. It uses the analytic capabilities of Spark to find vehicles described in AMBER Alert reports in car traffic video feeds.
- SETI + Spark Explore Space: This is a Spark app for analyzing 100 million radio events that have been collected over several years n order to identify faint signals indicative intelligent extraterrestrial life. It uses sophisticated mathematical models and machine-learning algorithms to separate terrestrial interference from signals truly of interest.
- Tone Analyzer with Watson + Spark + Twitter: This is a Spark app for sifting in real time through Twitter data to gauge customer emotions on a multiple tone dimensions, ranging from anger to cheerfulness to openness.
- Search by Selfie: This is a Spark app for real-time facial detection, recognition, and intelligence in customer engagement scenarios. It enables instant and continual facial recognition gathering is within reach for business users outside of large-scale enterprise—retailers, event-planners, or security, with potential applications for missing persons as well. It enables capture of a photo, extraction of key features, transformation of those features to normalize the data of the faces, and training of facial-recognition models in Spark.
On June 6, IBM will share important announcements for helping customers to use Spark, R, and open data science to drive business innovations in the cloud. At the Apache Spark Maker Community Event, IBM will host a stimulating evening featuring of keen interest to data scientists, data application developers, and data engineers. The event will feature special announcements, a keynote, and maker awards. Leading industry figures who have already committed to participate include John Akred, CTO Silicon Valley Data Science; Ritika Gunnar, Vice President of Offering Management, IBM Analytics; Todd Holloway, Director of Content Science and Algorithms, Netflix; and Matthew Conley, Data Scientist, Tesla Motors.
Please register for the in-person event:
Or, if you can’t make it to the in-person event, please register to watch a livestream of the event:
Share this post: