crawl-delete

A node used to remove a URL or set of URLs from the index.

Attributes

  • url (Text) - The URL to delete from the search collection. This attribute is incompatible with the vse-key and vertex attributes.
  • light-crawler (May only be: light-crawler) - A flag that forces the crawler to send the delete to the indexer, even if the crawler does not find that the delete corresponds to a URL that it has knowledge of. The presence of this flag triggers special delete handling in the indexer.
  • normalized (May only be: normalized) - A flag indicating that the url attribute has already been normalized. If this is not present, the crawler will attempt to normalize that value.
  • vse-key (Text) - A vse-key to delete from the search collection. The crawler will attempt to delete all documents corresponding to this key. This attribute is incompatible with the url and vertex attributes.
  • only-input - Temporary attribute used to indicate the crawl-delete was never logged to the authority table. Usage: Internal
  • index-atomically - Attribute used to indicate the crawl-delete is part of an atomic operation. Usage: Internal
  • was-atomic (May only be: was-atomic) - A flag indicating that the vse-key delete was processed as an index-atomic node. Usage: Internal
  • vertex (xs:unsignedInt) - The vertex corresponding to the URL to delete from the search collection. This attribute is incompatible with the url and vse-key attributes.
  • vse-key-normalized (May only be: vse-key-normalized)
  • indexer-generated (May only be: indexer-generated) - Flag used to differentiate crawl-deletes sent from the indexer (in response to a vse-key) from ordinary crawl-deletes. Usage: Internal
  • enqueue-id (Text) - Unique string that identifies a particular enqueue.
  • enqueue-id-for-audit-log (Text) - String that will be used to identify this enqueue in the audit-log instead of the value of the enqueue-id attribute. Usage: Internal
  • recursive (May only be: recursive) - Flag used to indicate if this delete should also delete all references to its children. If a child happens to have another parent that is not a child of the root, then that child will not be deleted.
  • originator (Text) - Unique string that identifies the update originator.
  • state (Any of: pending, success, aborted, error) - The state of this crawl-delete. This attribute can have the following possible values: pending: the resource is currently in the crawler pipeline. success: the resource exited the crawler pipeline and processing did not result in an error or warning. aborted: this vse-key crawl-delete could not be processed due to an error. error: the resource exited the crawler pipeline but no data was successfully deleted. If the enqueue did not reach the indexer, the siphoned attribute will indicate why. Otherwise the log child will indicate the errors.
  • siphoned (Any of: terminated, nonexistent, rebasing, replaced, input-full, needed-gatekeeper, aborted, duplicate, remote-conflict, unknown) - Indicates that the crawler encountered an obstacle that prevented the crawl-delete from meeting its requested synchronization: terminated: The crawl-delete could not be processed because the crawler was stopped after the enqueue entered the pipeline. rebasing: The crawl-delete could not be processed because the crawler is attempting a rebase operation. nonexistent: The crawl-delete does not correspond to any crawl-urls. replaced: The enqueue was replaced with a newer one. input-full: The enqueue could not be processed because the input queue is full. needed-gatekeeper: The enqueue was the child of an index-atomic node but would have needed to be placed in the gatekeeper to proceed. aborted: The enqueue was aborted as part of a transaction. duplicate: The crawl-delete cannot be processed because the crawler is performing work on the target url. remote-conflict: The crawl-delete could not be processed because the collection has a more recent update for this URL, either from the collection itself or another distributed indexing node. unknown: The requested synchronization could not be met for an unknown reason.
  • notify-id (Integer) - Used by the crawler to associate enqueued crawl-deletes with a notification ID. This ID is used to return the outcome of the enqueue to the caller. Usage: Internal
  • reply-id (Integer) - Usage: Internal
  • synchronization (Any of: none, enqueued, to-be-indexed, indexed, indexed-no-sync default: enqueued) - Indicates at which point the crawler should return success for a crawl-delete. none: immediately after receiving the enqueue. enqueued: after verifying the delete target exists. to-be-indexed: after the crawl-delete has been successfully sent to the indexer. indexed: after the crawl-delete has successfully removed the item from the index.
  • gatekeeper-action (Any of: reject, replace, add-to-queue) - Indicates the action that the gatekeeper will take if it encounters this crawl-delete while another crawl-delete sharing the url attribute is in the crawler's pipeline.
    • reject: the gatekeeper will reject this crawl-delete and prevent it from entering the pipeline. This is the default behavior for crawl-deletes enqueued as children of an index-atomic in the non-distributed case.
    • replace: the gatekeeper will reject all crawl-deletes currently in its queue that share the value of this crawl-delete's url attribute, replacing them with this single crawl-delete. This is the default behavior.
    • add-to-queue: the gatekeeper will add this crawl-delete to the tail of its queue. This is the default behavior for crawl-deletes sent to a distributed indexing client as children of an index-atomic node.
    Usage: Internal
  • force-indexed-sync - Flag used to force indexer to acknowledge document changes in the audit log only after indexing is complete and the changes will be reflected in search results. Usage: Internal
  • enqueued-offline (May only be: enqueued-offline) - Flag that indicates that the crawl-delete was enqueued offline.
  • orphaned-atomic (May only be: orphaned-atomic) - Flag that indicates this crawl-delete could not be indexed atomically due to a system error. As a result, this delete had no effect on the index. Usage: Internal
  • priority (Integer default: 0) - An integer indicating the priority of this crawl-delete relative to other crawl-urls and crawl-deletes in the crawler's queues. A larger value indicates a higher priority.
  • input-priority (Integer) - Stores the actual priority on URLs enqueued by the vse-key operation. Set internally by the crawler for temporary use. Usage: Internal
  • indexer-id (Text) - A non-negative integer used to track deletes sent to the indexer. Usage: Internal
  • remote-collection (Text) - The name of the collection that this remote update originated from. Usage: Internal
  • remote-packet-id (Integer) - Temporary attribute used to keep track of an update that will eventually be added to the journal. Usage: Internal
  • remote-counter (xs:unsignedInt) - Remote update's counter value. Used to ensure updates are applied sequentially. Usage: Internal
  • remote-time (xs:unsignedInt) - Time at which the resource was fetched on the remote server as seconds since the epoch. Usage: Internal
  • remote-delete-time (xs:unsignedInt) - Time at which the vse-key delete used to create this delete was recorded on the remote server as seconds since the epoch. Usage: Internal

Children

  • Use these in the listed order. The sequence may not repeat.
    • crawl-delete: (Zero or more) - A node used to remove a URL or set of URLs from the index.
    • log: (At most 1) - Tag in which the log nodes are collected