Product Readmes
Abstract
Learn what new functionality has been added with rollup patches or fix packs since the release of InfoSphere Information Server, Version 11.7.1.
This document covers the following releases, starting with the most recent:
- Fix Pack 1
- Service Pack 2
- Service Pack 1
- DataStage Flow Designer patch July 2019
- 11.7.1
Content
Latest information:
General
New in 11.7.1 Fix Pack 1
- The Information Server Web (Administration) Console was replaced with a new implementation that contains the following functionality:
- User and group management
- Domain management
- Session management
- Limited reporting functionality through the user interface
- Schedule monitoring functionality through the API
- Microservices tier:
- Authentication and secure connections are now enabled by default for Kafka and Solr services that are running on the microservices tier. Also, write access to the Zookeeper service requires authentication and proper ACLs are enforced on Kafka and Solr znodes.
- SELinux in enforcing mode is now supported.
- The microservices tier installer no longer creates Kubernetes resource definition files on the disk. Any specific modifications to the Kubernetes cluster state should be done using the
kubectl
CLI commands, such askubectl edit
.
- IBM InfoSphere Information Server, Version 11.7.1, Fix Pack 1 does not support upgrade if Watson Knowledge Catalog is installed. For more information, see Remove Watson Knowledge Catalog from an Information Server 11.7.1.0 installation.
Watson Knowledge Catalog
New in 11.7.1
- Watson Knowledge Catalog is an extension to Information Governance Catalog that provides self-service access to data assets for knowledge workers who need to use those data assets to gain insights.
Information Governance Catalog New
- When you run a quick scan, the discovery jobs and the results are now available on the new tab 'Quick scan results'. When you review the results, you can edit term assignments for the discovered assets. Quick scan also supports external Hadoop for JDBC connector, and new data source for JDBC connector: Postgres.
- Virtual column management is supported. Virtual columns combine data from multiple columns and are used in analysis.
- Managing rule set definitions and rule sets is supported. They are used to capture how a record within a data source conforms to multiple data rules. Each time you run a rule set, the run record is saved and you can review the run history for each rule set.
- You can import and export data rules, rule sets, data rule definitions, and rule set definitions with a file in the XML format.
- You can set the validity benchmark for data rules and rule sets. It is used to identify a minimum level of acceptability for the data or a tolerance for some level of exceptions.
- You can detect biases in your data when you run a data quality analysis. Bias occurs when one column influences another column in a way it shouldn't. Note: This is a technology preview and is not yet supported for use in production environments.
- Managing data rules is supported. You can create, update, and delete the rules. You can also run them, and work with the results, and the run history.
- When you bind data to quality and data rules, you can use global variables and literals.
- You can configure workspace settings globally for all workspaces and individually for each workspace.
- You can edit data set details and analysis results.
- When you run column analysis, you can select individual columns to analyze.
- After you review analysis results for a data set, you can mark it as reviewed to let other users know.
- You can run new types of analysis: relationship and overlap analysis. The results are displayed in the entity relationship diagram (ERD).
- When you search for data sets to add to a workspace, you can display available catalog data sets in the hierarchical format.
- Quick scan is officially supported. You can run it with new data set types: Hive, Oracle, and SQL Server, with JDBC connector. Additionally, when you assign terms, you can choose to enable a machine learning model to get more accurate results.
- While you browse the list of available data connections, you can check what assets are included in the file-based connections. You can also preview the content of the files or add them to workspaces.
- When you search for assets in the catalog, you can download the results to a CSV file.
- Creating queries is available in Catalog > Queries. You are redirected to Information Governance Catalog classic.
- New Quality tab is added with data quality-related features:
- Managing data rule definitions is possible in Information Governance New UI. Data rule definitions are used to develop rule logic to analyze data. They consist of a condition and action, and can be bound to physical data in quality and data rules. You can create, edit, delete, copy, and publish data rule definitions in Information Governance New.
- You can organize data rule definitions in folders.
- Creating quality rules is supported.
- Workspace and data sets views were added. You can add data sets to workspaces and create new workspaces. In each data set, you can review the detailed status of the quality of your data, and download this information in a CSV file.
- You can run column, data quality, and primary key analysis in a data set view.
- You can create SQL virtual tables to join data from various tables with the use of an SQL statement.
- For each quality dimension, you can download the CSV file with assets that violate the dimension.
New in 11.7.1
- When you search for assets and enter a keyword, the results are now sorted by the best match. The searched keyword is highlighted in the drop down list of results.
- Integration of Information Governance Catalog with Watson Knowledge Catalog is supported. For details, see the offering plans.
- New Quality tab is added with data quality-related features (Technical Preview only - Do not use without SP1)
- Automation rules can be automatically created by the system. Based on common patterns in existing automation rules, new rules are suggested. After you review them, you can accept them and use them like the automation rules that you created yourself.
- When you create a customization profile, you can now select tabs like Queries, Monitoring, and so on, and also asset actions to be available to selected users.
- You can import glossary assets when workflow is enabled. You can choose to either ignore workflow and make the changes public immediately after the import, or apply workflow process to each change to a catalog.
- When workflow is enabled, you can view asset changes history in the Activities panel. The panel also supports adding comments.
- After you ran a discovery, you can discover selected assets again and choose to analyze all metadata, or only metadata that changed since the last analysis.
- When you configure a discovery, apart from schemas, you can also choose database tables as the discovery root.
- Usability was enhanced, currently you can search for assets in one place in the Catalog tab. It's no longer possible to search for assets from the home page. It is easier to access relationship graph, it's available in the asset details page and in the search results list.
- In the asset details page, you can display data in a tabular format.
- You can run a quick scan to quickly analyze data in large data sets. Quick scan is much faster than the traditional discovery, because metadata is not imported. The first 1000 rows of each data set are analyzed. Note: This is a technology preview and is not yet supported for use in production environments.
New in 11.7.1 Service Pack 1
- When assets are synchronized automatically and manually by using istool graphbatchload, you can configure assets flow relationships to be synchronized. For more information, see the technote. Note: This is a technology preview and is not yet supported for use in production environments.
New in 11.7.1
- In the Advanced Search Options panel, you can define the default list of asset properties to search for.
- Database alias is included in the lineage. The details page also contains information about the source database.
- REST API enhancements:
- When you export lineage to a CSV file by using REST API, you can include information about truncated nodes.
- You can export an XML file with flows between bundle assets and migrate them to another instance of Information Governance Catalog.
- Error handling was enhanced for various operations to better explain what caused issues.
- New asset type and properties are supported: data_file_folder_nobucket, defined_on_database_column, and referenced_by_columns.
- New request parameters are supported: referencePageSize and propertyPathDelimiter.
- When assets are synchronized by using OMRS, their details page contains information about the cohort from which they were synchronized.
- When assets are synchronized automatically and manually by using istool graphbatchload, you can configure assets flow relationships to be synchronized. For more information, see the technote. Note: This is a technology preview and is not yet supported for use in production environments.
Information Server Enterprise Search
New in 11.7.1
- The search performance is improved.
- When you filter asset types in relationship graph, only the types which can be displayed in the graph are available on the list.
InfoSphere Information Analyzer
New in 11.7.1 Fix Pack 1
- New, more intuitive UI navigation and many other general UI usability improvements.
- Enhanced business term assignments allow a view of both Published and Un-Published term assignments, candidates, and rejected terms.
- Visualization of trends in data quality for each data asset for a given time interval.
- Faster relationship analysis and overlap analysis with new settings for including or excluding columns by name, column type, or first N columns.
- Improved relationship analysis screen with a more intuitive way to Run analysis and Customize display of results.
- Data Rule binding enhanced to support drag and drop, which significantly speeds the process.
- Google BigQuery is certified using Native connector
- Azure Data Lake Storage (ADLS Gen2) is certified using ADLS Gen2 Connector. Certified ORC, AVRO, PARQ files and files with extensions csv and txt
- Teradata is certified using JDBC connector
New in 11.7.1 Service Pack 2
- The IAAdmin -runTasks command which is used to run analysis tasks from command line supports new options -runSynchronously, -pollInterval, and -limitLog. They are used to track and manage the job run status.
- When IBM Information Server Enterprise Search is installed, Information Analyzer thin client is deprecated. All features are now available in Information Governance Catalog New in the Quality tab.
- Cassandra is supported with JDBC connector.
- DynamoDB is supported with both JDBC and ODBC connectors.
New in 11.7.1 Service Pack 1
- New IAAdmin options -setKeyColumnFilter, -getKeyColumnFilter, and -unsetKeyColumnFilter are supported. Use these options to filter out columns from being analyzed during the primary key analysis (including compound keys), relationship analysis and overlap analysis. You can apply the filter globally for all projects or for selected projects only.
New in 11.7.1
- When in data rule definition logic you use 'matches data class' check, you can refer to both predefined and custom data classes.
- When configuring automatic term assignment, you can use a new parameter algorithm for the linguistic name matching service. This parameter defines a list of settings which are used by the service. You can choose to ignore vowels in terms during matching and control the usage of the existing parameters replacements and WordsToIgnore. Additionally, the machine learning service supports more languages, for example Chinese, however, the pre-processing is limited.
- The performance of analyzing data was improved.
InfoSphere ISALite
New in 11.7.1 Fix Pack 1
Several fixes and new diagnostic tests have been added to the General Health Checker (HC), Collector, and Prerequisite Checker:
- HC: Linux test now reports ulimit values found in /etc/systemd/system.conf
- HC: Fixed issue in the AIX "Verify processes and ports used" test reporting "Unable to find the PID"
- HC: New ISD Tests report the runtime properties of the deployed ISD Services
- HC: New test to report ISD Connections now provides a WARNING if no connections are found
- HC: Fixed issue with invocation of istool.sh reporting OutOfMem error but still PASSES the test
- HC: Moved the Agent Handlers check to its own test for clarity
- HC: Upgraded health checker to use JDBC 4.0 when found installed
- HC: Test for kernel limits now shows description of kernel parameters
- HC: New test to report on all ports used by processes with their description
- HC: Fixed issue with Tests to validate Zookeeper, Kafka, and Solr as they were skipped if zookeeper server was remote
- HC: Added link to new configuration documentation in the Kafka tests
- PX Runtime: added HC test to verify limits and environment of the PXRuntime process
- PX Runtime: added HC test to report on jobs stored/run/compiled in new PX stack
- PX Runtime: added HC test to verify and report on projects in the new PXEngine stack
- PX Runtime: Enhanced Collector to collect new PXEngine stack logs
- Collector: Removed collection of not needed .xml files from InformationServer\logs
- PreReq: Upgraded Pre-requisite checker to 11.7.1.1 specs
New in 11.7.1 Service Pack 1
- Added DataStage test cases in the ISALite General Health Checker that test and report the environment and system limits for the currently running dsrpcd (DS Engine) process. The tests highlight invalid configuration parameters such as the Max Open Files and Max Processes that are allowed for the dsrpcd process and reports warnings if configuration parameters are set too low.
- Added a test case in the ISALite General Health Checker that verifies the file system Inodes usage for each mounted drive. This test provides warnings when file system resources are getting low.
- The XMeta Diagnostic test has been enhanced to include a FASTPASS option as an alternative to the existing FULL test option, which often runs for a long time. The FASTPASS option quickly analyzes subsets of the database repository and reports broken references and database corruptions. It provides options to automatically fix the repository discrepancies or to create an SQL script that you can run to fix the repository at a later time.
- The ISALite runUGDiagnostic test has been enhanced to report additional diagnostic information in the UG diagnostic report.
- The ISALite Information Server collector now includes the file listing of additional folders and subfolders of the IBM Information Server installation.
IBM DataStage Flow Designer
New in 11.7.1 Fix Pack 1
- You can use a new job generation feature that automates and simplifies data movement, taking data from a source X and moving it to a target Y. Rather than having to craft a job to move this data by using the DataStage Edition canvas and palette, and then dragging connectors and stages to build a job, you can use the new job template to follow a simple series of steps and generate a parallel job. You can then use this parallel job to move the data. Also, if you are an administrator you can define target rule sets that define best practices and have users apply those to their job templates and eventual generated jobs.
- You can define and edit column metadata. For example, for a column you can specify what delimiter character should separate text strings or you can set null field values. With the ability to edit column metadata you get more fine-tuned processing of your jobs and more useful data transformation.
- You can create a container that has its own job flow and share and use it in other jobs.
- New stages:
- Complex flat file
- Slowly changing dimension
- New connector:
- You can use the z/OS file connector.
- You can use the SAP OData connector.
- You can use the following steps in the Hierarchical stage:
- JSON composer step
- JSON parser step
- REST step
- Test assembly
- Details inspector
- Create/view contract libraries
- Administration
- Transformer stage support for tabs such as build, surrogate key, and triggers.
New in 11.7.1 Service Pack 2
- New connectors and operators are supported:
- Classic Federation
- Distributed Transactions
- Informix Load
- ISD Input
- ISD Output
- Pivot
- Sybase 12 Load
- Sybase IQ
- Multi-cloud data integration
- Local containers
New in patch_July2019_DFD_all_11710
- New connectors and stages are supported:
- Azure Storage
- Surrogate Key Generator
- Java Integration Stage
- Modify
- Checksum
- Cassandra
- Cloud Object Storage
- Google Cloud Storage
- WriteRangeMap
- WaveGenerator
- Bloom Filter
- Lookup File Set
- Informix Enterprise
- Sybase Enterprise
- Combine Records
- Make Subrecord
- Promote Subrecord
- Split Subrecord
- Make Vector
- Split Vector
- External Filter
- Generic Stage
- DRS Connector
- BigQuery Connector
- Added Git support for Spark jobs. The supported Git repositories are GitHub, BitBucket, Microsoft Team Foundation Server, and GitLab.
- Added Git support for GitLab and Microsoft Team Foundation Server.
- Added REST API support for importing and publishing assets.
- You can import assets in the UI.
- Temporary files that are generated by stages such as sort and lookup on HDFS will no longer use the default replication factor of the Hadoop cluster. The files will have their replication factor set to 1, because they are temporary and have no need to be replicated. This replication factor value reduces network and disk IO in the cluster for these types of files and provides up to a 5% performance improvement in jobs that heavily use these types of files.
- New documentation about setting up a non-root administrator is available.
New in 11.7.1
- New stages are supported: Column Import, Column Export, Column Generator, and External Target.
- New connectors are supported: HBASE, Salesforce, and Snowflake.
- Reject is supported in the lookup and relational connectors: Db2, JDBC, ODBC, Oracle, Teradata, and SQL Server.
- You can create, update, and delete table definitions.
- Scheduling jobs is supported. You can also display the schedule information in the jobs dashboard.
- You can compare jobs from Xmeta and GitHub.
- Exporting jobs from IBM DataStage Flow Designer to a file system is supported.
- In Job canvas, run status is refreshed automatically during the job execution.
- When you view the details card for a connector, you can choose job parameters from a list.
- New operational APIs are supported for compiling and running jobs, and retrieving a status of job.
- REST API supports loading jobs from GitHub and importing them into InfoSphere Information Server.
IBM InfoSphere DataStage on Spark
New in 11.7.1
- Cloud Object Storage connector is supported.
- String Parameters in Spark jobs are supported.
IBM InfoSphere DataStage and BigIntegrate Dockers
New in 11.7.1
- IBM DataStage Dockers can be deployed on Google Cloud and OpenShift.
- Upgrading InfoSphere DataStage Dockers is supported on an Amazon AWS cluster and a Microsoft Azure cluster.
- You can configure the number of the compute pods of IBM DataStage Dockers to scale up or scale down by using IBM DataStage Flow Designer.
- IBM DataStage Docker base version was upgraded to the following versions:
- Ansible 2.7.8
- Kubernetes 1.13.2
- Docker 18.06.1
- Calico 3.3.1
- Docker Registry 2.7.1
Connectivity
New in 11.7.1 Fix Pack 1
- Amazon S3 Connector
- Support for additional configuration for Parquet, ORC and Avro file formats
- Azure Connector
- Support for additional configuration for Parquet, ORC and Avro file formats
- Cassandra Connector
- Support for Azure Cosmos DB
- Db2 Connector
- Db2 v11.5 support
- Support for external tables in reading and writing modes
- Support for direct insert when using External Tables based write modes
- Support for environment variable to control the timeout to start INSERT SQL statement when using external tables
- Support for new environment variable to alter the timeout (value in seconds)
- Support for local transaction mode (non-XA) for Distributed Transaction stageI
- Improved temporary work table creation with indexes and distribution keys
- File Connector:
- Support for Numeric and Decimal data types in the Parquet file format
- Support for Connection time High Availability
- Support for writing into an existing partitioned table
- Support for reading and writing data, metadata import of files with Parquet 1.10
- FTP Enterprise Stage
- Support for OpenSFTP on Windows Platform
- Support for customizable "tmp" directory
- Hive Connector
- Write mode property can now be set to Insert, Update or Delete
- Write mode property can now be set to Insert, Update or Delete
- JDBC Connector certifications
- Support for BigQuery (certified with Progress Datadirect driver)
- Support for Impala with Kudu with Cloudera JDBC/ODBC driver
- Kafka connector:
- Default values for Kafka Client Classpath are now changed
- Support for new environment variable (CC_KAFKA_LOG4J_LOG_LEVEL) to control logging level from inside Kafka
- Netezza Connector
- Support for new environment variable to control intermediate statistics execution
- IBM Performance Server v11 support
- Oracle Connector
- Support for Oracle Autonomous Data Warehouse Cloud (ADWC)
- Support for Oracle 19c
- Salesforce Connector
- Support for API 47.0
- Snowflake Connector
- Support for update, delete, and upsert write modes when using Merge SQL
- Supports for additional configuration for external locations (Azure, Google Cloud Storage)
- Teradata Connector
- New property to enable Unicode Passthrough
- XML Connector (Hierarchical data)
- Added support to handle choice discriminators
- Added support to handle references to abstract types in the recurring XSD Structures
- Added support to handle Large XSDs
- New connectors are supported: RedShift and ADLS (Azure DataLake Storage)
- The following new Hadoop distributions are supported:
- CDH 6.1, 6.2, 6.3
- CDH 5.16.2
- HDP 3.1.4, 3.1.5
- CDP (Unity) 7.0.3
New in 11.7.1
- Google Cloud Storage connector is supported.
- SSL is supported for Cassandra connector.
- Hive connector enhancements:
- Supports SparkSQL.
- Supports Zookeeper.
- Certified connectivity with Impala using external JDBC Simba driver.
- PrestoDB connectivity is certified via JDBC connector using external Simba JDBC driver.
- Data masking stage supports the latest ODPP 11.3 FP7 libraries.
- Netezza connector supports Operational Metadata.
- Db2 connector supports Db2 version 7.3 on iSeries.
- Kafka connector supports Kafka 2.0.
- The following Hadoop distributions are supported:
- CDH 5.16
- CDH 6.0.1
- HDP 3.0.1
- HDP 3.1
- MapR 6.1
- Google BigQuery connector is supported. It supports metadata import.
- Cloud Object Storage connector supports metadata import and lineage.
- Azure Storage connector supports SAS tokens.
- JDBC and ODBC connectors are certified on AWS Aurora Postgres and Oracle 18c.
- Salesforce connector support API 44 and Salesforce Bulk API (no PK Chunking).
- Greenplum 5.14 is supported.
- File connector supports the following features:
- Write operation into an existing Hive table in File connector.
- Added partitions to existing partitioned table for Parquet format.
- Inference of the delimited format files.
- The following connectors support the where and other clauses: ODBC, Greenplum, Oracle, Teradata, Db2 and Netezza.
- Incremental import is supported for Db2 for z/OS.
Managing metadata
- Import from tools using OPENIGC and MITI bridge.
- Support correct name for Db2 and Oracle in the JDBC connector.
New in 11.7.1 Service Pack 1
Added support for Meta Integration Technologies (MITI) bridges to version 10.0.1.
InfoSphere Information Server on Hadoop
New in 11.7.1
- You can use HDFS as a scratch disk. It is especially useful when you have limited local space.
- Cloudera Manager generates and manages the Kerberos for BigIntegrate service in the BigIntegrate Docker installation path.
- You can set a new configuration variable APT_YARN_MAX_AM_POOL_SIZE to define the maximum number of Applications Masters in the pool.
- You can set a new configuration variable PT_YARN_PX_BINARY_VISIBILITY to define the YARN resource localization mode that is used for the parallel engine (PX) binaries.
- Hive connector path is added to the CLASSPATH by default when you install IBM BigIntegrate.
Deprecated features
Features deprecated in 11.7.1 Fix Pack 1
In an effort to improve our offerings for the future, we are deprecating the following features:
- DataStage Server Canvas
- Balanced Optimization
- FastTrack
- Information Server Manager
We will continue to support these features, but will not accept enhancements requests. In addition, some of these features may be removed from a future release.
Capabilities provided by these components remain important and advanced capabilities for job runtime optimization, design acceleration and advanced CI/CD are all strategic aspects of IBM Cloud Pak for Data DataStage.
Features deprecated in 11.7.1 Service Pack 2
- When IBM Information Server Enterprise Search is installed, Information Analyzer thin client is deprecated. All features are now available in Information Governance Catalog New in the Quality tab.
Features deprecated in 11.7.1
- Data masking rules are deprecated in Information Governance New. Instead, use data protection rules in Watson Knowledge Catalog.
Was this topic helpful?
Document Information
Modified date:
17 June 2020
UID
ibm10878853