IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel

IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel. BigInsights 4.1 represents our investment in the open platform through features in security , performance and Governance for our enterprise customers

New features for Version 4.1

New updates to the open source projects that are available in the previous version.

Hadoop 2.7.1
HBase 1.1.1
Hive 1.2.1
Knox 0.6.0
Oozie 4.2.0
Pig 0.15.0
Slider 0.80.0
Solr 5.1.0
Spark 1.4.1
Sqoop 1.4.6
ZooKeeper 3.4.6
Apache Kafka 0.8.2.1 is included, which is a high throughput distributed messaging system.
The Teradata Connector for Hadoop 1.4 (Command Line Edition) for Apache Hadoop 2.7 is supported.

Spark 1.4 is the first release to package SparkR, an R binding for Spark based on Spark’s new DataFrame API. SparkR gives R users access to Spark’s scale-out parallel runtime along with all of Spark’s input and output formats. It also supports calling directly into Spark SQL. The R programming guide has more information on how to get up and running with SparkR.
Security

In addition to LDAP, Knox includes PAM support.
Kerberos setup includes automatic and manual support.

BigInsights Big R with machine learning updates

With the new release, you will see new algorithms, including Decision Trees, Random Forests, and Stepwise Compression. These algorithms enable R users to use existing R functions on a Hadoop cluster.
Big R has expanded its library of machine algorithms to provide a richer set of classification, regression, factorization, feature extraction, and survival analysis capabilities as follows:

bigr.pca: Principal Component Analysis (feature extraction and dimensionality reduction)
bigr.als: Alternating Least Squares (matrix factorization for making recommendations given a dataset of users vs. products).
bigr.kaplan.meier: Kaplan-Meier survival models
bigr.coxph: Cox Proportional Hazard Regression
bigr.step.lm: Step-wise Linear Regression
bigr.step.glm: Step-wise Generalized Linear Models
bigr.dtree: Decision Tree classifier
bigr.randomForest: Random Forest classifier.

Method bigr.transform(), which allows you to perform a variety of transformations required for machine learning (such as missing value imputation, recoding, dummy-coding, scaling, and binning), has been re-implemented inside the SystemML engine, allowing for significant performance improvement and reduction of HDFS space requirements.

New PMML models are now supported for bigr.svm, bigr.glm, and bigr.mlogit models

BigInsights BigSheets updates

Service install enhancements;
Improved and expanded prereq checks
BigSheets service user is now a passwordless user.
Enhancements to the JSON Object Reader.
Support for reading CMX compressed files.
Support for uploading and installing custom BigSheets readers and functions.
API support to allow Text Analtyics Web Tooling to publish extractors to BigSheets.
Various enhancements to publishing BigSheets workbooks to Big SQL and the Hive Metastore.

BigInsights Big SQL updates

The following enhancements have been made for Big SQL:

Support for new Analytic stored procedures

Metadata management
By using the data mining algorithms, you can generate analytics models, which are also known as data mining models. To manage these models, the metadata management component provides the needed environment and stored procedures.

K-means clustering
The K-means algorithm is the most widely used clustering algorithm that uses an explicit distance measure to partition the data set into clusters.

Naive Bayes
The Naive Bayes classification algorithm is a probabilistic classifier. It is based on probability models that incorporate strong independence assumptions.

Association rules
By using association rules mining, you can discover interesting and useful relations between items in a large-scale transaction table. You can identify strong rules between related items by using different measures of interestingness.

Tech Preview of Big SQL Head Node High Availability.
Tech Preview of Big SQL integrated with YARN by using Slider.

BigInsights Text Analytics updates

Enhancements to the Text Analytics include CSV download and user-initiated save points for the Information Extraction Web Tooling (IEWT) tool.
These features have been added to the text analytics web tool:

Export results to CSV
Create snapshots of projects
Per-project customization for pre-built extractors
Support for multiple languages and English parts of speech
Complete support for scalar functions when creating columns
Ability to publish extractors into BigSheets functions
Support for documents with no file extension

BigInsights Enterprise Module updates

Enhancements include integration with Apache Ambari, and HDFS transparency for applications that use IBM Spectrum Scale capabilities.

To get started, download the 4.1 release of the IBM Open Platform with Apache Hadoop

You can read about the download and installation process by visiting the Knowledge Center link.

As always, we are happy to hear your feedback. Please send your comments and suggestions to the user group or through our community forums.

Tips

IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel - Hadoop Dev

Technical Blog Post

Abstract

Body