Interaction with Standard Solr

This page lists differences between standalone Solr and the Solr-as-a-Service functionality provided by the Retrieve and Rank service. Not all standard Solr operations and options are supported by the Retrieve and Rank service.

Sizing your Retrieve and Rank cluster

Important:

  • The units and measurements discussed in this section apply only to the Retrieve (Solr) component of the Retrieve and Rank service. They do not apply to rankers. Rankers have different requirements, which are listed in Preparing training data. Similarly, Retrieve and Rank units refer only to instances of the IBM Watson Retrieve and Rank service; at the current time there are no other Watson Developer Cloud services that use the unit/cluster sizing pattern.
  • You might need to experiment with different cluster sizes before determining the optimal size for your solution. As a general rule, your indexed documents can use approximately half the amount of disk space available to the cluster. For example, if you have a one-unit cluster with 32 GB of storage, your search index can be approximately 16 GB in size. Larger indexes require larger clusters.

A Retrieve and Rank cluster consists of one to seven units on Bluemix. In Retrieve and Rank terms, a unit is an allocation of RAM and storage as follows:

  • 4 GB of RAM
  • 32 GB of disk storage

The term cluster size in Retrieve and Rank refers to the number of units in a specific Retrieve and Rank cluster. The cluster size, in units, determines the amount of resources available to the Retrieve (Solr) component.

Creating a Retrieve and Rank cluster larger than the free cluster requires you to sign up for a paid service plan. For information on pricing, see the Retrieve and Rank Pick a plan page on Bluemix.

Note:

  • The maximum number of units in a cluster is currently 7.
  • When adding documents to your collection, you can use approximately half of the cluster's storage capacity. For example, a cluster with the maximum number of 7 units has a storage capacity of approximately 224 GB and can handle an index size of approximately 110 GB.

Warning: The free cluster you can create to test the Retrieve and Rank demo application is a single reduced-size unit consisting of a maximum of 50 MB of disk storage. It does not guarantee any specific amount of RAM. The free cluster is meant only to run the demonstration application or small proof-of-concept applications. It cannot be used as a unit in a paid Retrieve and Rank cluster. It is not intended for production use.

Getting usage statistics for your Solr cluster

You can obtain disk-usage and memory-usage statistics for your Solr cluster by using the following cURL command:

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/sc1ca23733_faa8_49ce_b3b6_dc3e193264c6/stats"

Common uses for cluster statistics include:

  • Verifying the capacity of a new or resized cluster (see Resizing your Solr cluster for information on resizing a cluster).
  • Checking the remaining capacity on a cluster to which you've added a new set of documents.
  • Checking the remaining capacity on a cluster that seems to be getting less responsive. It might be that the remaining capacity is low and you need to resize (expand) your cluster.
  • Standard monitoring of infrastructure components.
  • Troubleshooting a problematic cluster.

Resizing your Solr cluster

You can resize your Solr cluster by adding more units to it by using the following cURL command:

curl -X PUT -H "Content-Type: application/json" -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/{solr_cluster_id/cluster_size -d" '{ "cluster_size": "{new_cluster_size}" }'

For example, to resize a one-unit cluster to two units, use the following command:

curl -X PUT -H "Content-Type: application/json" -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters//cluster_size -d" '{ "cluster_size": "2" }'

The resize operation currently has the following limitations:

  • You cannot resize a free cluster to a higher-capacity paid cluster. This support is expected in an upcoming release.
  • You cannot resize a paid cluster to a free cluster. No support is planned for this operation.
  • CAUTION: You can resize a cluster downward; however, be extremely careful when planning and executing such an operation. Obtain usage statistics at every step of the operation to ensure that your downsized cluster can handle the load of your current cluster.

If you start a resize operation and then want to cancel it (for example, if you realize you issued the resize operation against the wrong cluster), you can attempt to do so by issuing another resize method with the cluster size set to the cluster's initial value. Because resizing is an asynchronous operation, an attempt to cancel might or might not be successful. After a resize operation begins, it cannot be cancelled.

For example, to attempt to cancel the resize operation listed above, issue the following cURL command:

curl -X PUT -H "Content-Type: application/json" -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/sc1ca23733_faa8_49ce_b3b6_dc3e193264c6/cluster_size" -d '{ "cluster_size": "1" }'

To determine the success or failure of an attempt to cancel a resize operation, use the method described in Checking the status of a cluster-resize operation.

Checking the status of a cluster-resize operation

Resizing a cluster is an asynchronous operation. To check the status of a resize operation, including when it has completed, use the following cURL command:

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/{solr_cluster_id}/cluster_size"

The output of the command includes the cluster ID, the current cluster size, the target cluster size, and a status message.

High availability

Many organizations require high availability for mission-critical solutions. The Retrieve and Rank service automatically provides high availability for Solr and the overall service, with no configuration or customization needed on your end. Each Retrieve and Rank cluster runs in Solr's high-availability mode, meaning that each cluster is part of an active-active pair. During normal processing, both clusters in the pair actively process requests. If one of the clusters in the pair stops working for any reason, the remaining active cluster continues processing to provide uninterrupted service.

Error messages

All of the methods provided by the Retrieve and Rank service interact with the Solr service to a greater or lesser degree. If Solr throws an error when you use these methods, the Retrieve and Rank service returns the error to you or your application directly, without any parsing. This is to enable standard Solr logs and applications to continue to process them.

For information about creating Solr applications with standard error handling, see the Apache Solr Reference Guide.

Incompatible Solr operations

Most of the Solr APIs are supported by the Retrieve and Rank service. However, some operations are incompatible. When an operation is not allowed, the service returns an error response.

Blocked API calls

The following REST API calls are blocked when you use Java to issue a call to a Solr API through the Retrieve and Rank service:

  • Content stream operations that use the stream, stream.url, or stream.file parameters.
  • All calls to the Solr Collections API except CREATE, DELETE, LIST, and RELOAD.

Elements that are not allowed in uploaded Solr configurations

Uploading a Solr configuration file that includes any of the following elements causes the upload to fail:

  • Property enableRemoteStreaming set to true:

    <requestParsers enableRemoteStreaming="true" />.

  • Any elements of the form:

    <updateHandler><listener ... /></updateHandler>

  • Configs that do not include the <updateLog /> element.

  • Overwritten updateLog directory; for example:

    <updateLog><str name="dir">${OVERWRITTEN}</str></updateLog>.

    The value must be ${solr.ulog.dir:}.

  • Overwritten Solr data directory; for example:

    <dataDir>${OVERWRITTEN}</dataDir>.

    The value must be ${solr.data.dir:}.

  • Any <jmx ... /> elements.

  • The use of AdminHandler; for example:

    <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />.

  • The use of ReplicationHandler; for example:

    <requestHandler name="/replication/" class="solr.ReplicationHandler" />.

Elements that are not allowed when you update a Solr configuration

If you include any of the following elements when you update a configuration, the operation fails:

  • set-user-property, unset-user-property, update-listener, add-listener, delete-listener.
  • set-property or unset-property for the following options: requestDispatcher.requestParsers.enableRemoteStreaming, jmx.agentId, jmx.serviceUrl, jmx.rootName.
  • add-requesthandler or update-requesthandler or delete-requesthandler with the following classes: solr.admin.AdminHandlers, solr.ReplicationHandler.

Solr features that are disabled by the service

The Retrieve and Rank service uses a precustomized and preconfigured version of Solr. As a result, the following common Solr features are intentionally disabled by the service.

  • Sharding
  • Certain configuration options in the Solr schema.xml and solrconfig.xml files, as listed previously in this document.