crawler-status
A node used to communicate the crawler's current state and crawl progress.
Attributes
- from-cache (May only be: from-cache) - Flag indicating that the status was served from an earlier cached copy. This occurs if the status is fetched while the crawler is saving its state to the crawler database.
- version (Text) - Version of the crawler binary.
- id (Text) - A unique identifier for this particular status node.
- stopping-time (xs:long) - Time at which the stop command was received as seconds since the epoch.
- expected-stop-time (xs:long) - Time at which the crawler's idle time is expected to end as seconds since the epoch.
- time (xs:long required.) - Time at which this status node was produced by the crawler as seconds since the epoch.
- host (Text) - Usage: This functionality is deprecated
- n-input (xs:unsignedLong default: 0) - Total number of unique URLs input to the crawler for the purpose of indexing.
- n-output (xs:unsignedLong default: 0) - Total number of unique URLs successfully crawled and indexed.
- n-errors (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in a fetch error or a conversion error.
- n-error-rows (xs:unsignedLong default: 0) - TBD: number of rows in 'errors' table in log.sqlt.
- n-http-errors (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in an HTTP fetch error.
- n-http-location (xs:unsignedLong default: 0) - Total number of unique URLs that resulted in an HTTP redirection.
- n-filtered (xs:unsignedLong default: 0) - Total number of unique URLs that were filtered by the crawler conditional settings.
- n-robots (xs:unsignedLong default: 0) - Total number of unique URLs that were not crawled due to the server's robots.txt file.
- n-pending (Integer default: 0) - Total number of unique URLs input for indexing that are still being processed by the crawler.
- n-pending-internal (Integer default: 0) - Total number of unique URLs, crawl-delete nodes, and native file export requests that are still being processed by the crawler.
- n-awaiting-gate (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to be processed because the crawler is currently processing another node with the same url or vertex attribute.
- n-awaiting-input (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to for initial processing by the crawler.
- n-offline-queue (xs:unsignedLong default: 0) - Total number of crawl-urls or crawl-deletes that are waiting in the offline queue for processing.
- n-awaiting-index-input (Integer default: 0) - Total number of crawl-urls or crawl-deletes that are waiting to be sent to the indexer.
- n-awaiting-index-reply (Integer default: 0) - Total number of crawl-urls or crawl-deletes that have been sent to the indexer but have not yet been confirmed as indexed.
- conversion-time (Integer default: 0) - Total time spent converting data as milliseconds.
- n-sub (Integer default: 0) - Total number of crawl-datas processed by the crawler.
- n-bytes (Decimal number default: 0) - Total size of all resources crawled as bytes. Value includes URLs that were crawled from cache.
- n-dl-bytes (Decimal number default: 0) - Total size of all resources crawled as bytes. Value excludes URLs that were crawled from cache.
- n-redirect (Integer default: 0) - Total number of redirected URLs processed by the crawler.
- n-duplicates (Integer default: 0) - Total number of exact duplicate URLs processed by the crawler.
- n-deleted (Integer default: 0) - Total number of URLs deleted by the crawler.
- n-cache-complete (Integer default: 0) - Total number of URLs crawled from the cache.
- converted-size (Decimal number default: 0) - Total size of all converted data in bytes. Value excludes URLs that were crawled from cache.
- elapsed (Integer default: 0) - Total elapsed time for this crawl in seconds. On resume, all previous crawl times are included.
- this-elapsed (Integer default: 0) - Total elapsed time for this crawl in seconds. On resume, all previous crawl times are excluded.
- upgrade-schema (May only be: upgrade-schema) - When set, this flag indicates that the crawler is in the process of updating its log schema as part of a backward compatibility procedure.
- sanitize-rebase (May only be: sanitze-rebase) - When set, this flag indicates that the crawler is in the process of sanitizing records obtained from another crawler as a result of a successful rebase request.
- request-rebase (May only be: request-rebase) - When set, this flag indicates that the crawler is in the process of requesting a rebase from a remote collection.
- copy-rebase (May only be: copy-rebase) - When set, this flag indicates that the crawler is in the process of copying files in order to service a rebase request from a remote collection.
- receive-rebase (May only be: receive-rebase) - When set, this flag indicates that the crawler is in the process of receiving files from a remote collection as part of the rebase operation.
- resume (May only be: resume) - When set, this flag indicates that the crawler is in the process of a resume operation.
- complete (Any of: complete, aborted, unexpected, docs-limit, urls-limit, input-urls-limit, time-limit) - When set, this flag indicates that the crawler has finished crawling the seed URLs. The reason is stored as the value for the attribute: complete: the seeds have been completely crawled. This is independent of the work done after crawling the seed URLs, such as processing externally enqueued URLs. aborted: the crawl was aborted. unexpected: the seed was completely crawled, but the crawler received additional work in the form of an enqueue. docs-limit: the crawler exceeded the maximum number of documents to be crawled. Deprecated. urls-limit: the crawler exceeded the maximum number of completed URLs. input-urls-limit: the crawler exceeded the maximum number of input URLs. time-limit: the crawler exceeeded the maximum crawling time.
- idle (May only be: idle) - When set, this flag indicates that the crawler is waiting for additional work in an idle state.
- final (May only be: final) - When set, this flag indicates that this is the last value of the crawler-status node that will be recorded on this run. Additional status requests will result in this node being returned until the crawler is restarted.
- performing-vacuum (May only be: performing-vacuum) - When set, this flag indicates that the crawler is performing a requested vacuum operation to compact its database. The vacuum operation may take a long time to perform, so enqueue operations should be suspended until it is finished.
- error - Usage: Internal
- config-md5
- service-status (Any of: stopped, running) - Provides a simple way to determine if the service is running or stopped.
Children
- Use these in the listed order. The sequence may not repeat.
- converter-timings: (At most 1) - Container for all the timing status.
- crawl-thread: (Zero or more) - A node that indicates the state of a crawler thread.
- crawl-remote-status: (Zero or more) - A node indicating the status of a distributed seach collection that this collection is requesting or serving.
- crawl-client-status: (Zero or more) - A node indicating the status of the distributed search clients.
- crawler-status: (Zero or more) - A node used to communicate the crawler's current state and crawl progress.
- crawl-hops-output: (At most 1) - Contains hop statistics for all URLs completely processed by the crawler.
- crawl-hops-input: (At most 1) - Contains hop statistics for all URLs currently being processed by the crawler.
- queues: (At most 1) - Container for detailed request queue status information.
- crawl-remote-all-status: (At most 1) - A container for nodes describing the state of a distributed search collection's clients and servers.