How the Notes crawler determines the number of documents crawled

When you monitor the Notes crawler, the number that is shown in the Documents Crawled field on the Details for Notes Views and Folders page might seem greater than the number of documents in the Lotus Notes® database.

An attachment to a Lotus Notes document is considered a separate document from its parent document. If the attachment is an archive file (such as .zip, .tar, .gz, and so on), the files in the archive are also counted separately.

On the Details for Notes Views and Folders page, the value that is shown for the number of documents crawled is equal to the output of the following command:

esadmin Notes_crawler_session_ID getCrawlSpaceStatusDetail -ts target_Server

The documents crawled value is essentially the sum of following numbers, which does not necessarily match the number of documents that are stored in the source database:
  • The number of documents that are accessed
  • The number of attachments
  • The number of files in archive files

In addition, some documents might be found by the crawler and then dropped (for example, corrupted documents are dropped). In this case, you can confirm which documents were crawled by looking at the crawler audit log files in the ES_NODE_ROOT/logs/audit/ directory.

If a document is not accessible by the crawler user, it will neither be crawled nor appear in the crawler audit log.

When determining whether a document or attachment was updated, the Notes crawler uses the following logic:
  • When an attachment to a Notes® document is updated, the modified date of the Notes document is always updated too. The Notes crawler assumes that both the body text of the document and the attachment are updated, and they are counted as updated documents individually.
  • If only the body text of a Notes document is updated, both the body text and the attachment are considered to be updated documents.
  • If an attachment is an archive file, updated files in the archive file are counted individually.
  • If only one of the files in an attached archive file is updated, the body text of the Notes document and the changed file in the archive file are counted as updated documents individually. Other files in the archive file are handled as unchanged documents.