Best practices for large Lotus Notes mail files
Many companies have found that their employeeâs Lotus Notes mail files continue to grow, and are looking for advice in controlling the performance costs associated with these larger mail files. As a result, IBM conducted studies of the characteristics of large and active mail files. These mail files contained large numbers of documents, and documents with attachments. We also looked at the impact related to frequency of searching, full-text indexing, and whether mail documents are filed or kept in the Inbox.
To understand how you can make the most efficient use of your Lotus Domino investment, we studied the characteristics of Domino running with large mail files under a wide variety of conditions. We tried different hardware models, different operating systems, and multiple versions of Domino (including Domino 5, Domino 6, and beta versions of Domino 7). This article discusses our results. We show how large mail files can affect server performance, and offer tips for reducing database and Inbox size, the cost of full-text indexes, and other ideas for users, administrators, and planners.
Several recommendations in this article resulted in CPU improvements of over 30 percent when implemented in our test environment. Bear in mind, however, that these tests were based on simulated user workloads. You may see different results in your own environment. Although this article focuses on mail databases, most of our recommendations apply to other databases as well. Our benchmark testing was performed on various iSeries servers, but most of the findings can be applied to other platforms as well.
This article assumes you are familiar with Notes and Domino.
We used multiple iSeries server models for the tests described in this article. These servers included 830-2349 (8-way 540 MHz), 840-2461 (8 processors enabled), 840-2461 (24 processors enabled), and i5 POWER5 520 (two-way) and 550 (four-way) models. We also tested various Domino versions, including Domino 5, 6, and 7.0 (beta) code. The recommendations included at the end of this article apply to all of these Domino releases.
Load was generated by up to 16 PC clients running Server.Load. This free tool is included with the Domino Administrator client. We also used the NotesBench tool for some of the testing. The workload we selected was the built-in R6 Mail Routing script. This script is based on usage patterns of mail users from many Domino customers. Each PC simulated 500 to 1000 users. Server.Load reported average response times experienced by the simulated users. In all cases, load on the server was such that response times never exceeded one second, and were generally much less (the heaviest user test with 2 GB mail files and everything in the Inbox had response times of 750 msec; the lightest users had 11 msec response times).
For most tests, one new user started every second in groups of 1000. It took 17 minutes for all 1000 users to log on. We then took a 13 minute "break" before starting the next group of 1000. Actual usage patterns (for instance, Monday morning spikes) may be even more aggressive. After the ramp-up period, each workload was maintained at "steady state" for at least two hours.
Before each run, all new copies of the mail files were created in order to minimize run variability. For most of the tests, mail files were populated with documents that were all 25 KB in size. This was based on studying IBM internal users, as well as a variety of customers. In comparison, databases of the same size with fewer large documents required much less CPU (more on this later in this article).
We created mail files of different sizes using Server.Load and the built-in NRPC Mail Initialization workload. As mentioned, most of the testing was done with mail files built of documents of 25 KB, varying the number of documents from 100 to 50,000, as described later in this article. Some tests used mail files of all the same size, such as 700 MB, but were built with varying number of documents in the Inbox. Other tests compared various sizes of mail files, such as 400 MB versus 1 GB, while keeping the number of documents included in the mail file constant.
System performance data was monitored using iSeries Performance Tools (5722PT1), which monitors utilization for system resources including CPU, disk, memory, and network.
This section discusses the results of our testing. When reviewing this data, be sure to observe the mail files characteristics being presented, including document sizes, number of documents, size of the mail file, and what percentage of the documents were in the Inbox.
Mail file size
First, let's look at the performance effect of mail file size. We built successively larger mail files by increasing the number of documents. As the number of documents increased, mail file size increased from 20 MB to 2000 MB. (see figure 1).
Figure 1. Mail file size
As figure 1 shows, CPU usage increase vs. mail file size was nearly linear (both steady-state CPU percent and peak CPU percent). Mail files were all built of 25 KB documents.
Based on this test, we concluded that large mail files require more memory, especially when starting user sessions and during failover. Reducing mail file size improves performance of view indexing, compact, backup and restore, and transaction logging. To measure this effect, we tested various mail file sizes ranging from 15 MB up to 2 GB. We did not observe a sharp increase in steady-state CPU as file sizes got larger â at least up to 2 GB mail files. The peak CPU seen during ramp-up increased at a faster rate than steady-state CPU, but again we saw no sharp increase or âkneeâ in the curve. Ramp-up simulates when many users sign on during a short time period, for example on a Monday morning or during a failover. In figure 1, for example, increasing average mail file size from 500 MB to 1000 MB resulted in the CPU increasing by 12 percent at steady state, but was 55 percent higher during ramp-up. These tests were run with all documents in the Inbox.
Managing documents in the Inbox
In our next test, we started up 2000 users over a period of 80 minutes, followed by a steady-state period of 90 minutes. In one test, documents were left in the Inbox, while in the second test, we limited the number of documents in the Inbox to 25 percent of the total document count in the mail file. We then compared the CPU usage of both groups over time (see figure 2).
Figure 2. Timeline of leaving documents in Inbox
We found that when Inboxes were limited to only 25 percent of total document count, peak CPU usage was 50 percent lower, and steady-state CPU was 12 percent lower, compared to allowing all new documents to remain in the Inbox.
We then performed a similar test in which we compared CPU usage between one group of users with 1000 documents in each user's Inbox, and another group with 5000 documents in each user's Inbox. Each group consisted of 5000 users; for each user the index and All Documents view were built. Figure 3 shows the results of this test.
Figure 3. Number of documents in Inbox
In figure 3, one pair of groups used 400 MB mail files, and the other pair used 1000 MB mail files. In each case, the CPU savings for managing the Inbox exceeded 5 percent. Now let's compare the peak CPU usage for these same two pairs of groups (see figure 4).
Figure 4. Peak CPU usage
As figure 4 shows, keeping the number of documents in the Inbox to a maximum of 1000 reduced peak CPU usage significantly for both pairs of groups (50 percent compared to 72 percent for 400 MB mail files, and 62 percent compared to 75 percent for 1000 MB mail files). This performance benefit from reduced Inbox size was even greater for peak CPU than steady-state CPU.
Finally, let's look at the effect of managing the number of documents in the Inbox on response time (see figure 5).
Figure 5. Response time
Keeping the Inbox at 1000 or fewer documents resulted in dramatically faster response times to end users -- 23 msec compared to 37 msec for 400 MB mail files, and 24 msec compared to 40 msec for 1000 MB mail files.
From these tests, we found that reducing the number of documents in the Inbox improves response time to users, and reduces resources needed on the server (CPU, memory, disk I/O, and time to start or failover a server). The effect is most dramatic at "first touch." Reducing the Inbox by 75 percent virtually eliminated the ramp-up spike. In figure 2, for example, peak CPU dropped 50 percent, while steady-state CPU improved 12 to 30 percent. And as figures 3, 4, and 5 show, average response times for a 400 MB mail file were more than two times better when the Inbox was kept at 1000 documents or less.
Fixed and mixed mail file sizes
We also compared results with a test where mail file sizes were mixed to simulate typical customer data, but found that the results were not significantly different to justify the extra difficulty in setting up each test. While ramp-up peaks were sharper with mixed mail file sizes, steady-state CPU was almost identical (see figure 6).
Figure 6. Fixed and mixed mail file sizes
Our mix included mail files ranging from 26 MB to 2000 MB, with a weighted average size of 700 MB. The ramp-up period for mixed mail file sizes had slightly sharper peaks, but steady state was essentially the same.
Index and search
Next, we measured the effects of full-text index and search on CPU usage. Search workload consisted of one search of the Inbox every 15 minutes for each user. Figure 7 shows the results of our test.
Figure 7. Full-text index and search
As you can see, there is little CPU penalty to maintaining the index (1 percent in our test case), but a heavy search cost of 20 percent CPU usage.
Based on this test, we concluded that if databases are searched regularly, it is generally better to create a full-text index. Maintaining a full-text index costs little in terms of performance, and there are many options available to minimize impact. If a database is not indexed, the server will create one âon the flyâ when needed, which can be very expensive. This can be disabled by setting FT_FLY_INDEX_OFF=1 in the Notes.ini file.
Document size and count
We tested the impact of mail document size and count on CPU usage. In this test, each group of 2000 simulated users had mail files 700 MB in size. The difference was that users in one group had mail files containing 28,000 documents of 25 KB each, while the other group had mail files containing 7000 documents of 100 KB each. Our results are displayed in figure 8.
Figure 8. Document size and count
We found that mail files of the same size can use significantly different resources, depending on the average size and number of documents. In our test, the group with fewer documents in their mail files consumed considerably less CPU resources than the group with 28,000 mail documents.
This test indicates that the number of documents in a mail file, and especially the number of documents in the Inbox, has greater impact on performance than the file size itself. Of course, these two variables are typically related in practice. Nonetheless, for a given mail file size, we observed that users with many small documents use more resources than those with fewer large documents.
Finally, we compared CPU usage required for various numbers of Domino partitions with a fixed number of users. The following table summarizes our results:
|Total number of mail users||Number of Domino partitions||CPU usage percent (steady state)||Memory (faults per second / pages per second)||Average response time (milliseconds)||Average disk usage percent|
The performance information shown in the preceding table was collected using a beta version of Domino 7, and is subject to change. Performance enhancements in Domino 7 enable a single Domino partition to support higher numbers of users than was possible with Domino 6. The data represents average utilization and response times during 5-hour steady state intervals. Note that with a single Domino partition, the CPU usage is noticeably lower than with two or four partitions. The memory and disk statistics also reflect the fact that less memory was required with fewer Domino partitions. The slight difference in response time of 1 msec would not be perceivable, and is likely due to measurement variability. The fact that the response time for the single partition was not lower than the others tests, even though the memory and disk statistics were improved, may suggest that some additional contention has been introduced with that excessive number of users in a single partition.
Each new server requires additional system resources. Mail delivery is likely to take longer because it is more likely that messages are now on a different server. Domino 7 has further improved scalability on all platforms, allowing you to support more users on each Domino server. For our tests, consolidating users from four servers onto a single server resulted in a 10 percent CPU improvement overall, and a reduced memory footprint.
Based on our test results, we can offer several recommendations for improved Domino mail performance. One of the easiest ways of improving Domino performance is to manage the Inbox. Our tests show that reducing your Inbox size (filing documents into a folder within the same mail file) improves response times to the end users, steady-state CPU on the server, and especially peak CPU. Peak CPU in this case is associated with users opening their mail for the first time (following session timeout). So you should advise your users to file documents from their Inbox to other folders, to keep it as small as possible (preferably under 1000 documents). This will result in better response times, as well as improved resource utilization. You can also write an agent to automatically "trim" Inboxes. For example, this sidefile contains the code for a sample agent that removes the oldest documents from your Inbox (although they will still be available from the All Documents view). The agent can be set up to run periodically outside of business hours, such as on the weekend.
In addition to Inbox maintenance, we suggest the following tips for Notes users, Domino administrators, and system planners.
Notes mail users
These recommendations apply to each Notes mail user:
- Archiving. Notes and Domino include client-based and server-based archiving capabilities. Archiving can be set up to run automatically or manually; Notes and Domino handle the rest. Older documents will be moved to the archive database, so important information will not be lost. Check with your administrators to see if other archiving solutions are available.
- Client currency. Upgrading to the latest available version of Notes supported by your Domino server will generally give improved performance. Often, performance improvements are made in the newer Domino releases, which can only be fully leveraged by accompanying changes in the client code.
- Use the right tools. Consider whether or not other document management solutions (such as QuickPlace or Domino Document Manager) might help minimize multiple copies of the same information, and facilitate access to the information to a wider audience.
There are many options available to help administrators manage email bloat:
- Mail file quotas. You can control mail file size by setting quotas on databases. A quota is the largest size you will allow a database to grow. You can establish different quotas for different categories of users. For example, researchers might require larger quotas than sales staff. It is very easy to set database quotas from the Domino Administration client (choose Tools â Database â Quotas). You will need to establish a regular compact schedule to achieve the benefit from quotas and other database management techniques.
- Memory tuning. Large mail files use more memory. Memory bottlenecks can cause increased CPU, disk I/O, and slower response times. For our testing on iSeries, we were able to dramatically improve Domino response times by making sure the machine pool was large enough to keep page faulting under five faults per second. This is especially significant, since the default threshold for OS/400 auto performance adjuster is 10 faults per second in the machine pool. Reducing the memory faulting in the Base pool also has a significant benefit. Use other memory tuning mechanisms as applicable to your operating systems.
- Domino server currency. Keep current with Domino releases, as performance improvements are generally included with each release. You should also consider the extent to which new features will be used and allow for any additional overhead associated with them. Check the Notes/Domino product page for information explaining the performance and other benefits of each Domino release.
- Client currency. As mentioned previously, keeping users current with Notes client versions can result in significant performance benefits. IBM Lotus Notes Smart Upgrade makes this easy to administer.
- Full-text indexes. Create full-text indexes on databases that are searched regularly. Maintaining an index is almost always much less expensive than creating an index âon the fly" (see figure 7 for an example).
- Users per server. On iSeries servers, it is common practice to run multiple Domino partitions (DPARS) within a single instance of the operating system or Logical Partition (LPAR). Generally, the number of users on a given server is determined by the backup strategy and other administrative factors, such as physical location of the users, storage requirements, time to complete maintenance tasks, and failure tolerance. In terms of performance and utilization of resources, it is usually better to consolidate onto as few DPARS as possible.
- Archiving. Administrators can archive databases by running the compact task with the archiving option (-a or âA). There are a number of products available that provide advanced archiving features for your Notes/Domino data.
- End-to-end monitoring. Effective monitoring of your Domino environment should include not only statistics of individual systems and Domino servers, but also performance of networking components and storage subsystems (for example, SAN components if applicable).
Sizing large-scale Domino environments can be a complex process. The performance impact of various workload characteristics and their interrelationships are amplified in such environments. Based on our study, we recommend that customers seek assistance from IBM/Lotus to help plan and size Domino iSeries environments that have one or more of the following characteristics:
- 5,000 or more mail users
- Mail file sizes of 500 MB or more
- Implementations involving 10 or more Domino partitions or servers
When sizing, make sure you take into consideration the extra cost of large databases and frequency of full-text searches. Use multiple tools when sizing; for example, Tivoli Analyzer for Domino, NotesBench data, existing server performance, IBM eServer Workload Estimator, and Domino Mail & Calendar User (MCU) rating metrics for all iSeries server models found in the iSeries Performance Capabilities Reference.
We hope you found the test data and recommendations offered in this article useful. Mail is, of course, one of the most important and heavily used features of Notes and Domino. Ensuring that your mail users enjoy good performance goes a long way to keeping everybody in your Notes community happy -- and makes your life easier!
- See the article series on Lotus Domino 7 performance, "Lotus Domino 7 server performance, Part 1: Lotus Notes client workloads," and "Lotus Domino 7 server performance, Part 2: Lotus Domino 7 performance in production at IBM on pSeries servers," to see how Domino 7 performance stacks up against Domino 6.5.
- For information on tuning Domino on the iSeries platform, see the IBM Redbooks, "Domino 6 for iSeries Best Practices Guide" and Domino for iSeries Sizing and Performance Tuning."
- For more information about all the features in Lotus Notes 7, see the developerWorks: Lotus article, "New features in Lotus Notes and Domino Designer 7.0."
- And for more information about new features in Lotus Domino 7, see the article, "New features in Lotus Domino 7.0."
- See the Lotus Notes/Domino product page for more information about the Lotus Notes and Domino products.