Level: Intermediate Sam Alexander, Tools Developer, Lotus
01 Apr 2003 This article looks at performance improvements in Discovery Server 2.0.1. It discusses tests on the Notes spider, Full Text Indexer, and Metrics, and how the server performs under a peak load of users logging in and conducting searches.
If you're familiar with Lotus Discovery
Server, you know about the many features it offers to help solve
your knowledge management needs. These features include (among
others) affinity generation between subject matter and people
experts, Sametime awareness, a fast and powerful search engine, and
a browseable Knowledge Map of your corporate content.
Impressive-but of limited use if these features are slow and
tedious. So the next obvious question is: How does Discovery Server
perform?
This article helps answer that question
by providing an overview of several significant performance
improvements in Discovery Server 2.0.1. We discuss the results of
performance tests we conducted on the Notes spider, Full Text
Indexer, and Metrics. We also take a look at how the server
performs under a peak load of users logging in and conducting
searches. Our goal is to provide yet another reason to feel
confident about deploying Discovery Server in your
organization.
The article assumes that you are
familiar with Discovery Server basics.
Notes spider
performance
Notes spiders have been optimized in
Discovery Server 2.0.1. They now achieve higher document throughput
to the work queues. As you will soon see, this optimization has
also enabled higher throughput for other tasks that read from these
queues, including Metrics and the Full Text Indexer. The spiders
must process your data before the other tasks can do their work, so
their performance is crucial. When the spiders run faster, other
tasks perform faster, so data is available to Discovery Server
users more quickly.
This section examines Notes spider
performance improvements in Discovery Server 2.0.1. In our tests,
we compared Discovery Server 2.0.1 to 2.0, spidering the same
content on the same computer.
Test setup
Our hardware for this test was an IBM
Netfinity with the following specifications:
| Processor type | PIII | | Number of processors | Four | | Processor speed | 550
Mhz | | Memory | 2.3
GB | | Disk
(onboard) | 4/18.2
GB |
We configured Discovery Server to run
eight Notes spider threads and two File System spider threads. The
following table provides information on the data spidered for this
test:
| Notes
data spidered | 1.1
GB | | File
system data | 65
MB | | Average Notes document size | 20
K | | Number of Notes spiders | Eight | | Number of File System spiders | Two |
Test results
The following graph shows our test
results:
Figure 1. Discovery Server 2.0.1 vs 2.0 Notes Spider test results
As you can see, our results indicate a
significant improvement in spider performance in Discovery Server
2.0.1. In our test, the Notes spiders processed data at a rate of
approximately 5 GB per day. This is equivalent to approximately
400,000 documents daily-a 240 percent increase in Notes
spider performance between 2.0 and 2.0.1!
As mentioned earlier, faster spidering
means faster performance for other Discovery Server tasks. This
includes the Full Text Indexer, Metrics, and Search, whose
performance tests we discuss in the next sections.
Full Text Indexer
performance
The Notes spider populates the work
queue with data for other tasks. The Full Text Indexer monitors
this work queue for data to be added to the index. With the
increased Notes spider throughput in Discovery Server 2.0.1, the
Full Text Indexer has more data to process. As a result, the Full
Text Indexer performance has improved.
Using the same hardware configuration
and test data described in the previous section, our performance
testing shows the following results:
Figure 2. Discovery Server 2.0.1 vs 2.0 Full Text Indexer test results
The results are similar to the Notes
spider tests above. The Full Text Indexer can also process
approximately 5 GB of data per day. The Indexer in this test
indexed approximately 400,000 documents daily-a 240 percent
increase in Indexer performance between 2.0 and 2.0.1.
Metrics performance
The Metrics subsystem consists of
Metrics collection and Metrics calculation. Metrics collection is
responsible for gathering usage statistics from your spidered data
sources. Metrics calculation then analyzes these
statistics.
Metrics has also directly benefited
from the improved spider performance:
Figure 3. Discovery Server 2.0.1 vs 2.0 Metrics test results
In our tests (running the same
configurations described in the previous sections), Metrics
processed 1.8 GB of data per day or roughly 138,000 documents.
Compared to 2.0, this represents a 75 percent increase in
performance.
Search performance
Several enhancements and bug fixes have
been made to search performance in Discovery Server 2.0.1,
including servlet optimizations and connection pooling
optimizations. These have improved the server's ability to
handle heavy user load.
To determine the extent of these
improvements, we performed the following test.
Test setup
We configured an IBM Netfinity 7100
with four 700 Mhz CPUs and 2 GB of RAM with Windows NT 4.0 and
Discovery Server 2.0.1. We used Mercury Loadrunner to build, run,
and analyze transaction times. Perfmon was used to analyze CPU
utilization, memory, and other statistics.
For this test, we spidered and indexed
300,000 Notes documents with and without attachments. Although
there are several types of searches a user can perform, this test
focused on the Documents About search. We used the standard Kmap
user interface. No Windows NT optimizations were made. Other
Discovery Server tasks were idle during the search testing. The
focus of this test was KmapServlet performance, which coordinates
the search on the user's behalf.
We created the following user
scenario:
- Authenticate with Discovery
Server.
- Perform five Documents About
searches using a random search term from a list of 450
terms.
- Pause a random period of time
(between 30 and 90 seconds) between each search. This simulates
think time.
- Log out or close the browser
window.
- Repeat the preceding
steps.
We conducted one hour peak work load
tests in which 250 simultaneous users performed the preceding
sequence of tasks.
Results
The following table summarizes the
results of a one hour 250-user peak workload:
|
Concurrent users
|
Documents in
index
|
Total searches
per hour
|
Average response
time (sec)
|
CPU usage
(percent)
|
http CPU usage
(percent)
|
ncmserve CPU
usage (percent)
| |
250
|
300,000
|
5,000
|
2.6
|
47
|
29
|
15
|
The table shows we achieved 250
concurrent users in this test. This is a significant improvement
from 2.0, which sometimes experienced stability issues above 100
concurrent users in this testing scenario. At this workload, we
found a 2.6 second average response time for a Documents About
search against 300,000 documents in the index. CPU utilization
remained healthy at 47 percent. In essence, this test shows 2.0.1
can support more concurrent users in this test
scenario.
Two significant Discovery Server
processes involved in search are http and ncmserve. The http
process is the Domino Web server; this consumed 29 percent of CPU.
Used by most Discovery Server tasks, ncmserve manages reading and
writing to the DB2 database server. It is also responsible for ACL
filtering. If the user doesn't have the ACL privileges to see
the document, it does not become part of the results. Ncmserve
utilized 15 percent of CPU in this test.
It is important to note that
performance can be further improved with alternative
configurations. Machine specifications, the number of documents in
the index, user behavior, and other factors can affect
performance.It is also possible
to achieve higher throughput by off-loading tasks to other
machines.
Discovery Server 2.0.1: It's all about
performance!
A lot of work has been done in
Discovery Server 2.0.1 to increase performance. We've shown
you significant performance increases with tasks including the
Notes spider, Full Text Indexer, and Metrics. We've also
discussed displayed samples of data throughput. And we've
also given you an idea of a peak load search
performance.
Performance is a consideration for all
applications in your organization. With this information, you
should feel confident about the performance of Discovery Server
2.0.1.
Resources
About the author  | |  | Sam Alexander works for the IBM Lotus
Product Introduction Engineering team. As a tools developer, he
helps develop software tools and recommends methodology used in
performance testing, data collection, and data analysis. Working
with the Discovery Server Performance Team, he recently developed
performance tests and analyzed results for the Lotus Discovery
Server 2.0 and 2.0.1 search functionality. Outside of work, Sam is
earning a Masters Degree in Computer Science from Boston
University's Metropolitan College. Originally from North Carolina,
Sam enjoys exploring New England and running local 5K and 10K road
races. |
Rate this page
|