IBM Support

IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel - Hadoop Dev

Technical Blog Post


Abstract

IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel - Hadoop Dev

Body

IBM presents an all new Open Platform for Apache Hadoop on Intel and Power platforms and BigInsights 4.1 on Intel. BigInsights 4.1 represents our investment in the open platform through features in security , performance and Governance for our enterprise customers

New features for Version 4.1 

New updates to the open source projects that are available in the previous version.

  • Hadoop 2.7.1
  • HBase 1.1.1
  • Hive 1.2.1
  • Knox 0.6.0
  • Oozie 4.2.0
  • Pig 0.15.0
  • Slider 0.80.0
  • Solr 5.1.0
  • Spark 1.4.1
  • Sqoop 1.4.6
  • ZooKeeper 3.4.6
  • Apache Kafka 0.8.2.1 is included, which is a high throughput distributed messaging system.
  • The Teradata Connector for Hadoop 1.4 (Command Line Edition) for Apache Hadoop 2.7 is supported.

Spark 1.4 is the first release to package SparkR, an R binding for Spark based on Spark’s new DataFrame API. SparkR gives R users access to Spark’s scale-out parallel runtime along with all of Spark’s input and output formats. It also supports calling directly into Spark SQL. The R programming guide has more information on how to get up and running with SparkR.
Security

  • In addition to LDAP, Knox includes PAM support.
  • Kerberos setup includes automatic and manual support.

BigInsights Big R with machine learning updates 

With the new release, you will see new algorithms, including Decision Trees, Random Forests, and Stepwise Compression. These algorithms enable R users to use existing R functions on a Hadoop cluster. 
Big R has expanded its library of machine algorithms to provide a richer set of classification, regression, factorization, feature extraction, and survival analysis capabilities as follows:

  • bigr.pca: Principal Component Analysis (feature extraction and dimensionality reduction)
  • bigr.als: Alternating Least Squares (matrix factorization for making recommendations given a dataset of users vs. products).
  • bigr.kaplan.meier: Kaplan-Meier survival models
  • bigr.coxph: Cox Proportional Hazard Regression
  • bigr.step.lm: Step-wise Linear Regression
  • bigr.step.glm: Step-wise Generalized Linear Models
  • bigr.dtree: Decision Tree classifier
  • bigr.randomForest: Random Forest classifier.

Method bigr.transform(), which allows you to perform a variety of transformations required for machine learning (such as missing value imputation, recoding, dummy-coding, scaling, and binning), has been re-implemented inside the SystemML engine, allowing for significant performance improvement and reduction of HDFS space requirements. 

New PMML models are now supported for bigr.svm, bigr.glm, and bigr.mlogit models

BigInsights BigSheets updates 

  • Service install enhancements;
  • Improved and expanded prereq checks
  • BigSheets service user is now a passwordless user.
  • Enhancements to the JSON Object Reader.
  • Support for reading CMX compressed files.
  • Support for uploading and installing custom BigSheets readers and functions.
  • API support to allow Text Analtyics Web Tooling to publish extractors to BigSheets.
  • Various enhancements to publishing BigSheets workbooks to Big SQL and the Hive Metastore.

BigInsights Big SQL updates 

The following enhancements have been made for Big SQL:

  • Support for new Analytic stored procedures

Metadata management 
By using the data mining algorithms, you can generate analytics models, which are also known as data mining models. To manage these models, the metadata management component provides the needed environment and stored procedures. 

K-means clustering 
The K-means algorithm is the most widely used clustering algorithm that uses an explicit distance measure to partition the data set into clusters.

Naive Bayes 
The Naive Bayes classification algorithm is a probabilistic classifier. It is based on probability models that incorporate strong independence assumptions.

Association rules 
By using association rules mining, you can discover interesting and useful relations between items in a large-scale transaction table. You can identify strong rules between related items by using different measures of interestingness. 

  • Tech Preview of  Big SQL Head Node High Availability.
  • Tech Preview of  Big SQL  integrated with YARN by using Slider.

 

BigInsights Text Analytics updates

 

Enhancements to the Text Analytics include CSV download and user-initiated save points for the Information Extraction Web Tooling (IEWT) tool. 
These features have been added to the text analytics web tool:

  • Export results to CSV
  • Create snapshots of projects
  • Per-project customization for pre-built extractors
  • Support for multiple languages and English parts of speech
  • Complete support for scalar functions when creating columns
  • Ability to publish extractors into BigSheets functions
  • Support for documents with no file extension

BigInsights Enterprise Module updates

 

Enhancements include integration with Apache Ambari, and HDFS transparency for applications that use IBM Spectrum Scale capabilities.

To get started, download the 4.1 release of the IBM Open Platform with Apache Hadoop

You can read about the download and installation process by visiting the Knowledge Center link.

As always, we are happy to hear your feedback. Please send your comments and suggestions to the user group or through our community forums.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259981