Handling Enqueueing Errors

Enqueueing a crawl-urls node to the crawler returns a crawler-service-enqueue-response element that provides summary information about the status of the enqueue request, and also includes information about the status of each crawl-url, crawl-delete, or index-atomic element that was associated with that enqueue request.

The crawler-service-enqueue-response element provides a number of attributes that provide specific information about different aspects of the enqueue operation:

Note: See the documentation for the crawl-url or crawl-delete elements for detailed information about the synchronization properties for an enqueue operation. Complete information for all Watson Explorer Engine elements is available in the Watson Explorer Engine Schema Reference manual.

Enqueue requests that raise exceptions, such as when the error attribute is present on a crawler-service-enqueue-response node that was returned by an enqueue operation, are obvious indications that the enqueue operation did not succeed. The exception handler for the case should examine the value of the error attribute to determine how to proceed. Unless the value of the error attribute is invalid, you will usually want to re-send the data (after making sure that the crawler is able to receive enqueue requests). An error value of invalid means that there is a problem in the XML in the enqueue request, in which case resending it will not help.

Next, check the value of the n-failed attribute. If the value of this attribute is greater than 0, you will want to iterate over the nodes that you were attempting to enqueue.

To get accurate and detailed information about the source of an enqueue problem, you must check more than the error and n-failed attributes of a crawler-service-enqueue-response node in order to determine whether an enqueue request was successful. For example, the n-failed attribute will not be incremented for problems such as the following:

Applications should therefore still iterate through all of the crawl-delete, crawl-url, and index-atomic nodes in a crawl-data-enqueue-response node and check the state attribute of each to identify any top-level operations that are not set to pending or success. The application should then examine the error attribute of that node to determine the actual result of each individual enqueue operation.

Tip: As mentioned previously, when enqueueing data with a synchronization mode of to-be-crawled or stronger, there is a guarantee that if the data can be indexed, it will be. This does not mean that errors cannot occur later in the process without being reported in the enqueue response. For example, errors could occur later in the process due to a conversion error when in the enqueued mode (the recommended mode, which guarantees a low latency response). However, in this type of error case, it would not be wise to re-enqueue the data because it could probably not be processed the second time around, either.

Errors for which you do not receive a synchronous response do not require any immediate action. You can subsequently examine and report them by querying the system logs.

The enqueueing functions provide an exception-on-failure argument that can be set to true to cause an exception to be thrown if any of the URLs that are enqueued cannot be processed. In general, it is preferable to leave this option set to false in order to receive a response that can be traversed and analyzed.

Each crawl-url or crawl-delete that is enqueued will be returned without the data but with some status information. See the Watson Explorer Engine Schema Reference for information about these and other elements in the Watson Explorer Engine schema.

XML message:

      <crawler-service-enqueue-response n-success="0" n-failed="2">
        <crawl-url url="http://vivisimo.com" siphoned="duplicate"
          hops="0"
          vertex="5"
          priority="0"
        />
      </crawler-service-enqueue-response>

In C#:

    try
    {
        SearchCollectionEnqueueResponse scer = port.SearchCollectionEnqueue(sce);
        if (scer != null && scer.crawlerserviceenqueueresponse.nfailedSpecified
              && scer.crawlerserviceenqueueresponse.nfailed > 0)
        {
            foreach (Object o in scer.crawlerserviceenqueueresponse.Items)
            {
                if (o is crawlurl)
                {
                    crawlurl cu = (crawlurl)o;
                    if (cu.siphonedSpecified)
                        System.Console.WriteLine(cu.url + " failed: " + cu.siphoned);
                }
                if (o is crawldelete)
                {
                    crawldelete cd = (crawldelete)o;
                    if (cd.siphonedSpecified)
                        System.Console.WriteLine(cd.url + " failed: " + cd.siphoned);
                }
            }
        }
    }
    catch (System.Exception ex)
    {
        handleException(ex);
    }

In Java:

    SearchCollectionEnqueueResponse scer = 
      port.searchCollectionEnqueue(sce);
    CrawlerServiceEnqueueSuccess cses = 
      scer.getCrawlerServiceEnqueueSuccess();
    if (cses.getNFailed() > 0) {
            for (Object o : cses.getCrawlUrlOrCrawlDelete()) {
                    if (o.getClass() == CrawlUrl.class) {
                            CrawlUrl cu = (CrawlUrl) o;
                            if (cu.getSiphoned() != null)
                                    System.out.println(cu.getUrl() + " failed: "
                                                    + cu.getSiphoned());
                    } else if (o.getClass() == CrawlDelete.class) {
                            CrawlDelete cd = (CrawlDelete) o;
                            if (cd.getSiphoned() != null)
                                    ;
                            System.out.println(cd.getUrl() + " failed: "
                                            + cd.getSiphoned());
                    }
            }
    }