crawler-status

A node used to communicate the crawler's current state and crawl progress.

Attributes

from-cache (May only be: from-cache) - Flag indicating that the status was served from an earlier cached copy. This occurs if the status is fetched while the crawler is saving its state to the crawler database.
version (Text) - Version of the crawler binary.
id (Text) - A unique identifier for this particular status node.
stopping-time (xs:long) - Time at which the stop command was received as seconds since the epoch.
expected-stop-time (xs:long) - Time at which the crawler's idle time is expected to end as seconds since the epoch.
time (xs:long required.) - Time at which this status node was produced by the crawler as seconds since the epoch.
host (Text) - Usage: This functionality is deprecated
n-input (xs:unsignedLong default: 0) - Total number of unique URLs input to the crawler for the purpose of indexing.
n-output (xs:unsignedLong default: 0) - Total number of unique URLs successfully crawled and indexed.
n-errors (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in a fetch error or a conversion error.
n-error-rows (xs:unsignedLong default: 0) - TBD: number of rows in 'errors' table in log.sqlt.
n-http-errors (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in an HTTP fetch error.
n-http-location (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in an HTTP redirection.
n-filtered (xs:unsignedLong default: 0) - Total number of unique URLs that were filtered by the crawler conditional settings.
n-robots (xs:unsignedLong default: 0) - Total number of unique URLs that were not crawled due to the server's robots.txt file.
n-pending (Integer default: 0) - Total number of unique URLs input for indexing that are still being processed by the crawler.
n-pending-internal (Integer default: 0) - Total number of unique URLs, crawl-delete nodes, and native file export requests that are still being processed by the crawler.
n-awaiting-gate (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to be processed because the crawler is currently processing another node with the same url or vertex attribute.
n-awaiting-input (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to for initial processing by the crawler.
n-offline-queue (xs:unsignedLong default: 0) - Total number of crawl-urls or crawl-deletes that are waiting in the offline queue for processing.
n-awaiting-index-input (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to be sent to the indexer.
n-awaiting-index-reply (Integer default: 0) - Total number of crawl-urls or crawl-deletes that have been sent to the indexer but have not yet been confirmed as indexed.
conversion-time (Integer default: 0) - Total time spent converting data as milliseconds.
n-sub (Integer default: 0) - Total number of crawl-datas processed by the crawler.
n-bytes (Decimal number default: 0) - Total size of all resources crawled as bytes. Value includes URLs that were crawled from cache.
n-dl-bytes (Decimal number default: 0) - Total size of all resources crawled as bytes. Value excludes URLs that were crawled from cache.
n-redirect (Integer default: 0) - Total number of redirected URLs processed by the crawler.
n-duplicates (Integer default: 0) - Total number of exact duplicate URLs processed by the crawler.
n-deleted (Integer default: 0) - Total number of URLs deleted by the crawler.
n-cache-complete (Integer default: 0) - Total number of URLs crawled from the cache.
converted-size (Decimal number default: 0) - Total size of all converted data in bytes. Value excludes URLs that were crawled from cache.
elapsed (Integer default: 0) - Total elapsed time for this crawl in seconds. On resume, all previous crawl times are included.
this-elapsed (Integer default: 0) - Total elapsed time for this crawl in seconds. On resume, all previous crawl times are excluded.
upgrade-schema (May only be: upgrade-schema) - When set, this flag indicates that the crawler is in the process of updating its log schema as part of a backward compatibility procedure.
sanitize-rebase (May only be: sanitze-rebase) - When set, this flag indicates that the crawler is in the process of sanitizing records obtained from another crawler as a result of a successful rebase request.
request-rebase (May only be: request-rebase) - When set, this flag indicates that the crawler is in the process of requesting a rebase from a remote collection.
copy-rebase (May only be: copy-rebase) - When set, this flag indicates that the crawler is in the process of copying files in order to service a rebase request from a remote collection.
receive-rebase (May only be: receive-rebase) - When set, this flag indicates that the crawler is in the process of receiving files from a remote collection as part of the rebase operation.
resume (May only be: resume) - When set, this flag indicates that the crawler is in the process of a resume operation.
complete (Any of: complete, aborted, unexpected, docs-limit, urls-limit, input-urls-limit, time-limit) - When set, this flag indicates that the crawler has finished crawling the seed URLs. The reason is stored as the value for the attribute: complete: the seeds have been completely crawled. This is independent of the work done after crawling the seed URLs, such as processing externally enqueued URLs. aborted: the crawl was aborted. unexpected: the seed was completely crawled, but the crawler received additional work in the form of an enqueue. docs-limit: the crawler exceeded the maximum number of documents to be crawled. Deprecated. urls-limit: the crawler exceeded the maximum number of completed URLs. input-urls-limit: the crawler exceeded the maximum number of input URLs. time-limit: the crawler exceeeded the maximum crawling time.
idle (May only be: idle) - When set, this flag indicates that the crawler is waiting for additional work in an idle state.
final (May only be: final) - When set, this flag indicates that this is the last value of the crawler-status node that will be recorded on this run. Additional status requests will result in this node being returned until the crawler is restarted.
performing-vacuum (May only be: performing-vacuum) - When set, this flag indicates that the crawler is performing a requested vacuum operation to compact its database. The vacuum operation may take a long time to perform, so enqueue operations should be suspended until it is finished.
error - Usage: Internal
config-md5
service-status (Any of: stopped, running) - Provides a simple way to determine if the service is running or stopped.

Children

Use these in the listed order. The sequence may not repeat.
- converter-timings: (At most 1) - Container for all the timing status.
- crawl-thread: (Zero or more) - A node that indicates the state of a crawler thread.
- crawl-remote-status: (Zero or more) - A node indicating the status of a distributed seach collection that this collection is requesting or serving.
- crawl-client-status: (Zero or more) - A node indicating the status of the distributed search clients.
- crawler-status: (Zero or more) - A node used to communicate the crawler's current state and crawl progress.
- crawl-hops-output: (At most 1) - Contains hop statistics for all URLs completely processed by the crawler.
- crawl-hops-input: (At most 1) - Contains hop statistics for all URLs currently being processed by the crawler.
- queues: (At most 1) - Container for detailed request queue status information.
- crawl-remote-all-status: (At most 1) - A container for nodes describing the state of a distributed search collection's clients and servers.