Advanced

These options are in the Advanced subsection.

  • Start-up timeout - Specifies the number of seconds that the collection-service will wait for the indexer to successfully start. If the indexer does not start within the specified amount of time, the collection-service will kill the indexer process. This timeout exists to prevent a indexer that is unresponsive during service start-up from effectively deadlocking the collection-service, causing subsequent instructions to the collection-service to wait indefinitely. This behavior has only been observed on systems with environmental problems or suffering from extreme resource scarcity.
  • Shutdown when idle (indexing) - This option is satisfied if this many seconds have elapsed since the last update, merge or reconstruction operation completed processing. The service will shutdown if the searching idle-exit option is satisfied and this option is satisfied.
  • # builder threads - The number of threads specified will each build an independent index, which will then be merged together to create the final index.
  • # reconstructor threads - The number of threads specified will each collect the document data necessary for the reconstruction process. The default is the number of cores, up to eight.
  • Database Synchronous Mode - Specifies the synchronous mode for the sqlite database containing some document data. This has no effect on the index synchronization, just the database containing additional information.
    • FULL : The storage engine will pause at critical moments to make sure that data has actually been written to the disk surface before continuing. This ensures that if the operating system crashes or if there is a power failure, the log will be uncorrupted after rebooting. This option is very safe, but it is also slow.
    • NORMAL : The storage engine will still pause at the most critical moments, but less often than in FULL mode. There is a very small (though non-zero) chance that a power failure at just the wrong time could corrupt the log in NORMAL mode. But in practice, you are more likely to suffer a catastrophic disk failure or some other unrecoverable hardware fault.
    • OFF : The storage engine continues without pausing as soon as it has handed data off to the operating system. If the crawler crashes, the data will be safe, but the log might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface. This option is very fast at the cost of data integrity.
  • Temp. block size - Enables you to specify the block size of any intermediate indices that are created. If you specify a value that is not an even multiple of 512 bytes, the specified size will always be rounded up to the next multiple of 512 bytes. Making this block size large can make the I/O more efficient both when creating intermediate indices and when performing the merge operation. The system will need at least this much memory for each intermediate index that is being merged.
  • Vocabulary words / Instance words - These options control the amount of memory that a thread will use while indexing. You should not need to modify these options.
  • Maximum merge - The maximum number of intermediate indices to merge in a single operation. A value of 500 should be correct for any relatively modern system. However, if you are running on a system that does not allow very many open files, you may need to reduce this number. If you have a system that allows a very large number of open files and your merging stage requires multiple smaller merges, you may choose to increase this value.
  • Arenas enabled - Arenas are a feature that allow multiple collection's worth of data to coexist in a single collection. This option may only be enabled/disabled before any data is stored in the collection.
  • Fast document loading - Enables the indexer to populate its internal document data structures faster.
  • Fast reconstructor startup - Enables the indexer to start up the reconstructor and load its indices faster if the configuration has not changed and wildcard dictionaries do not need to be generated.
  • Preload database - If on, the indexer will preload its database to prime the disk cache. This could significantly improve service startup time.
  • URL equivalents - At indexing time, the URLs may be modified. This is particularly useful if you crawl a file-system that is actually available as a web site. All of the Replace... entries are checked for the URL and the longest matching prefix is found (in case of a tie, the first and longest matching). This prefix is replaced by the corresponding With... entry.
  • Lexical analysis logging configuration - The XML configuration for lexical analysis log4j logging. Logging is turned on by having priority values not equal to OFF. Turning on logging will degrade performance. By default, logging is disabled.
  • Lexical analysis JVM arguments - Space-separated list of JVM command-line arguments.