Security in IBM® Watson Explorer Content Analytics

You can configure multiple layers of security in Watson Explorer Content Analytics to protect sources from unauthorized searching and restrict administrative functions to specific users.

Users can query a wide range of data sources. To ensure that only users who are authorized to query content do so, and to ensure that only authorized users are able to access the Watson Explorer Content Analytics administration console, the system coordinates and enforces security at several levels.
Web application server
The first level of security is the web application server, either through the embedded web application server or through WebSphere® Application Server global security settings. You can configure the system to use an LDAP registry and allow only registered users to log in to applications or the administration console. You can also configure the system to use an LTPA key file to provide single sign-on (SSO) authentication support to application users.

When you set up security controls, different procedures are required if your applications are supported through the embedded web application server or WebSphere Application Server. If you use the embedded application server, you can use the administration console to configure support for SSO authentication and configure support for communication through secure transport protocols.

System-level security
At the system level, you can assign users to administrative roles and authenticate users who administer the system. When a user logs in to the administration console, only the functions and collections that the user is authorized to administer are available to that user. You can also assign privileges to users and groups to control application functions. For example, you can limit the ability to export documents from an application to specific users.

You can also configure credentials that enable crawlers to access the data sources that you include in collections. Other system components also need these credentials. For example, to verify that users are authorized to see documents in the search results, the search servers can use the credentials to connect to a data source and check the current access control lists.

Collection-level security
When you create a collection, you can enable security at the collection level. You cannot change this setting after the collection is created. If you do not enable collection-level security, you cannot later specify document-level security controls.
When collection-level security is enabled:
  • The global analysis processes apply different rules for indexing duplicate documents.
  • You can configure options to enforce document-level security.
  • You can enforce security by mapping applications (not individual users) to the collections that they can access. You then use standard access control mechanisms to permit or deny users access to applications.
  • You can configure the system to use the identity management component, which enables application users to be authenticated without configuring an application profile.

There is a trade-off between enabling collection security and search quality. Enabling collection security reduces the information that is indexed for each document. A side effect is that fewer results are found for some queries.

Document-level security
When you configure crawlers for a collection, you can enable document-level security. For example, you can specify options to associate security tokens with data as the data is collected by crawlers. Your applications can use these tokens, which are stored with documents in the index, to pre-filter the results and ensure that only users with the correct credentials are able to query the data and view documents.

For certain types of data sources, you can configure options to validate a user's login credentials with current access controls during query processing. This extra layer of post-filtering security ensures that a user's privileges are validated in real time with the data source. This capability can protect against instances in which a user's credentials change after a document and its security tokens are indexed.

The anchor text processing phase of global analysis normally associates text that appears in one document (the source document) with another document (the target document) in which that text does not necessarily appear. When you configure a Web crawler, you can specify whether you want to exclude the anchor text from the index if the link connects to a document that the Web crawler is not allowed to crawl.

Encryption
To protect sensitive data, encryption is used to encode the authentication data portion of all messages that are transmitted through the system. The password for the default Watson Explorer Content Analytics administrator is stored in an encrypted format. Passwords that users specify in user profiles and passwords that are stored by the system (in configuration files, the internal databases, and so on) are also encrypted. Encryption incurs little overhead because only the authentication IDs and passwords are encrypted.

Security for your collections extends beyond the authentication and access control mechanisms that the system can use to protect indexed content. Safeguards also exist to prevent a malicious and unauthorized user from gaining access to data while it is in transit. For example, the search servers use protocols such as Transport Layer Security (TLS), Secure Sockets Layer (SSL), Secure Shell (SSH), and Secure Hypertext Transfer Protocol (HTTPS) to communicate with the master server and your applications.

For increased security, you need to ensure that the server hardware is appropriately isolated and secure from unauthorized intrusion. By installing a firewall, you can protect the servers from intrusion through another part of your network. Also, ensure that there are no open ports on the servers. Configure the system so that it listens for requests only on ports that are explicitly assigned to Watson Explorer Content Analytics activities and applications.