There is already a good deal of documentation that deals with general performance, but itâs predicated largely on the underlying assumption that Lotus Domino performance is based solely on the number of transactions. Experience has indicated that the challenges associated with performance have shifted to a different type of usage that does not necessarily follow the traditional assumptions.
One of the main problem areas is that the size and amount of mail being transferred and stored are both much greater than what was once considered the norm. In many environments, Lotus Domino has become the focus for distribution, document management, and storage in addition to the role of traditional messaging. In essence, the complexity of the Lotus Domino environment has increased to keep up with the needs of our customers. As a result, mail files are bigger now than ever, and we can safely assume that they will continue to grow.
Some customers are struggling with overall growth of mail files as they try to determine what to do with the data and how the growth is affecting performance. There might be a single user (or group of users) with a 10 GB mail file, or there can be a large number of large mail files, with each scenario driven by the needs of the business. There is limited information about the effects this growth can have on system performance, so administrators can take remedial actions without a good understanding of how performance will be affected.
Some of the common questions that arise are these:
- What do larger mail file sizes mean in terms of current and optimal tuning configuration?
- Is it true that a couple of large mail files can dramatically affect performance for the other users?
- How do larger mail file sizes affect system resources?
This article explores these and other questions by testing various configurations and examining how they affect overall performance.
Methodology and background
We test many of the standard methods of dealing with performance issues to demonstrate what the expected benefit of such configurations can yield. Although the testing addresses many scenarios, it's important to note that the parameters in the environment are rigid and for the most part uniform. You must use caution when considering a direct translation to any particular environment or actual production load.
NOTE: You should not expect changes in tuning to yield dramatic increases in performance. The essence of performance tuning is to obtain greater efficiency of resources. This tuning is an iterative process in which large gains are generally obvious and most easily found; the more difficult and complex tuning tends to yield less value. First and foremost, any gains are constrained by the ability of the underlying resources to provide the capacity for more work. The goal here is not to provide a definitive guideline but rather to introduce a new perspective on scenarios that are becoming increasingly common.
Hardware and software
Testing is done on a pSeriesÂ® 630 with AIXÂ® 5.3 and four PowerPCÂ®_POWER4â¢ processors running at 1453 MHz. The system has 8 GB of RAM, the data and Domino directories were tested on 20 SSA160 physical disk drives with an IBM SSAÂ® 160 SerialRAID adapter (4109100), and each disk is 18 GB (see table 1).
Table 1. Hardware specifications
|CPU||4 x PowerPC_POWER4 @ 1453 MHz|
|Ethernet Adapter 20 SSA160 physical disk drives||10/100 Mbps Ethernet PCI Adapter II|
|Storage||20 x SSA160 physical disk drives 10000 rpm RAID 0 / maximum interdisk policy with parallel scheduling|
Table 2 outlines the software details of our testing.
Table 2. Software specifications
|Lotus Domino 32-bit||8.0|
|Lotus Domino 64-bit||8.0.1|
Goals and approach
The goal of the testing is to measure the effects of various configuration changes as we stress the server with a high concentration of concurrent activity.
Some basic test loads are run to get an idea of the effect of a more traditional load on a server as opposed to a load with larger databases. After that load, various tuning changes are implemented individually to see whether some of those changes could make a difference in performance.
Some of the areas of interest, besides changes in input/output (I/O) characteristics, are the effects of the following:
- The number of documents
- Transaction logging
- Multiple Lotus Domino servers processing the same load
- Document size
- Changes to physical memory configuration and utilization
Server.Load is implemented to generate the loads to the server. A customized script based on the built-in NotesÂ® Remote Procedure Call (NRPC) routines is used to mimic mail user activity, the categories of which are:
- Opening the database
- Opening and updating a view or folder
- Opening, creating, deleting, and sending email and calendar entries
- Scheduling a calendar activity
NOTE: The test loads that come with Server.Load are designed to approximate ânormalâ usage patterns. Our testing was done using scripts specifically designed to overwhelm the system resources. The values in this test do not represent normal usage; instead, they reflect usage as if each of the users were trying to affect the server as much as they could.
Table 3 shows a sequential performance measure of the I/O subsystem as tested using the dd command. The dd command is a UNIXÂ® command that lets you manipulate the I/O stream using file copy type of operations. This command does not mirror actual usage on a Lotus Domino server because the I/O pattern for a Lotus Domino database is historically expected to be multi-threaded and random. It does, however, provide an idea of base performance of the underlying hardware. The same test was run using a RAM disk file system to show the difference in speed.
Table 3. I/O subsystem test results
|RAM disk||242 MB/sec|
|JFS2||20 MB/sec write and 39 MB/sec read|
Our two areas of concern are the response time and resource usage; hence, our final goals are to maximize the resources available and decrease the response time to users.
The two operations that consistently consumed almost all the transaction time in our tests were the UPDATE_NOTE and Notes Indexing Facility (NIF) operations (OPEN_COLLECTION, UPDATE_COLLECTION, and so on). The transaction times for these operations made up the bulk of total transaction time during all our tests.
Not surprisingly, we see that transaction times go up with the number of users, while transactions processed per minute go down. In essence, more users means more activity, which translates into slower performance (see figure 1).
Figure 1. Transaction times and numbers of users
Current documentation and experience indicate that mail performance tends to depend on the factors that affect inbox operation, which makes sense as this is the most commonly used NIF (aka views/folders) structure. Resource utilization and operation time are especially sensitive when the NIF structures are larger.
Figure 2 shows the results of the performance impact of 250 users with varying numbers of documents. These results fall in line with historical findings. Although it might not be surprising, the key is to recognize that the underlying size of the database, or even the view, can and will increase the amount of resources necessary to process a transaction.
Figure 2. Performance and number of documents
You can compare this result to the scenario in which the actual number of transactions or users increases. The interesting result is, when we look at just the number of transactions per minute, then increase the number of documents 1000 percent, it had a greater negative effect than increasing the number of users 1000 percent.
This comment is a broad comparison, but it does provide a sense that the number of documents affect performance in the same manner as the number of users. Both have a negative effect on the time and resources.
Here are some factors that were found to not have a significant performance effect:
- The number of documents not in the inbox (for example, in other folders or the All Documents view)
- The size of the documents in storage
- To a lesser degree, the size of the documents being used
NOTE: The size of the document being used for testing did not generate significant differences in average response times, but we did see greater variance in results. The lack of a significant effect could be explained by a lack of concurrency in the test. The results raise the possibility that the large-document effect on performance is measured more accurately by testing concurrency rather than database size.
Tests were rerun with more than 500,000 documents (up to 250 KB in size) in the database. These results were added to the databases that we used in the earlier tests in the All Documents view but not in the Inbox view. Results of the tests did not show a significant difference in performance. This result appears to indicate that during ânormalâ (that is, the operations in the test script) user activity, the size or number of documents in the database did not affect performance meaningfully, as long as the NIF structures remained the same size.
The configurations below were chosen to see how they would affect large inboxes:
- Different-sized inboxes. We used this configuration as a baseline to show how the size of an inbox could affect performance as we got closer to the architectural limit.
Results indicated an exponential growth pattern; that is, as the number of documents grew to 80,000, the increase in response time grew exponentially.
- 64-bit Domino. 64-bit architecture is often touted as having significant performance advantages over 32-bit. Because the AIX operating system was already running a 64-bit kernel, though, this coonfiguration was a test to see whether just the change in Lotus Domino code to 64-bit would make a difference.
Our findings showed that, although the amount of overall response time increased and the OPEN_COLLECTION performance declined, there was a slight improvement in UPDATE_NOTE times.
- Multiple Lotus Domino servers. Lotus Domino scales well but also does a good job of using system resources, notably, memory. The goal of having two Lotus Domino servers was to test whether performance using the same hardware resources changes when you decrease the load on each Lotus Domino server. Theoretically, this configuration would test whether having additional Lotus Domino servers allows system resources to be used more efficiently.
This efficiency did, in fact, appear to be the case as OPEN_COLLECTION times were lower, while UPDATE_NOTE times did not appear to suffer.
- j2_nBufferPerPagerDevice. This configuration is an AIX file-system-level setting that adjusts the amount of memory used for file-system buffer. It can sometimes be an area that needs to be tuned if there is enough I/O to overwhelm the buffer. Although testing did not indicate much need to adjust this setting, it was increased from the default of 512 K to determine if there would be a negative effect from setting it to a relatively high value.
This configuration yielded a marginal performance improvement; however, there did not appear to be any negative consequences to setting the value at four times the default.
- j2_minPageReadAhead. This configuration is another AIX file-system setting. It controls how many pages the system reads ahead for any read operation. The goal is to consolidate the number of read operations that the system must do. If we see significant improvement in performance due to this setting, it tells us that I/O was more efficient when more data was read at the same time. This result generally points to a more sequential read I/O pattern.
It was found that OPEN_COLLECTION performance improved significantly and continued to obtain gains from higher read-ahead values, though there appeared to be a cost to the UPDATE_NOTE times. UPDATE_NOTE operations did suffer some additional latency but, interestingly enough, this pattern did not increase with higher read-ahead values.
- Transaction logging. Transaction logging writes all transactions out to a separate group of files. This configuration is especially useful for database integrity and server restarts after an outage. Because of this behavior, the server can be less dependent on the performance of the disks that house the databases because transactions can be flushed independently of when the activity occurs. This test should measure how the feature handles the load caused by large databases.
The overall effect appeared to be only slightly positive, though it should be noted that the test conditions were not conducive to a high number of concurrent transactions. Because of the nature of what transaction logging does, you might expect that low concurrency of the test scenario would yield only a limited benefit.
- Two people using the All Documents view. Every production environment can be expected to have someone using the system in a way that could be exaggerated or otherwise harmful. In this test, two users had their databases populated with 500,000 documents and they worked out of that view. The goal here was to document the additive impact of this type of usage.
Interestingly, over three separate runs, the results pointed to only a slight increase in NIF operation times (less than 5 percent). The net effect was that Lotus Domino was able to handle the abnormal minor amounts of usage without a major effect on the rest of the users.
- Buffer pool manipulation. Buffer pool usually makes up the largest percentage of shared memory and is often pointed to as an area to be tuned. These tests documented the effect of increasing the values from the default of 512 MB.
The results showed limited benefit here, possibly due to the relatively low number of concurrent transactions in the test. During the test run, the actual amount of buffer pool used did not exceed 1 GB, even though the amount that the server could use was significantly above that. This finding tends to reinforce the notion that buffer pool was not much of a factor here but could be a factor in an environment with a much higher number of concurrent transactions.
Configuration test conclusions
Figure 3 illustrates the results of our configuration tests.
Figure 3. Configuration test results
The data shows that significant performance benefits were found in the following areas:
- Spreading the load over more servers. Because physical resources were the same, having two servers might have led to greater efficiency of the resources. This finding might be explained through better use of physical memory, whereby the Lotus Domino cache of database structures significantly outperformed the file systemâs cache.
- Increasing the sequential nature of I/O usage by increasing the minimum read-ahead value. This benefit was documented by the j2_minPageAhead values. The tests indicated significantly better performance for OPEN_COLLECTION but worse performance for UPDATE_NOTE transactions. The overall effect, though, was positive.
This finding is an interesting result, especially when we compare it with increasing the expected randomness of I/O by tuning the j2_nPagesPerWriteBehindCluster. What we found there was slightly better performance for the UPDATE_NOTE transaction, but worse performance for the UPDATE_COLLECTION operations.
The apparent conclusion is that systems with larger views or folders have different I/O characteristics than what is traditionally considered the norm and thus might benefit greatly from tuning for more sequential I/O. On the other hand, systems with smaller documents and view or folders might gain by tuning for more random I/O. (Note that page size on this system was 4 K, so an eight-page read-ahead was 32 K.)
- Tests using 64-bit Lotus Domino fared slightly better with respect to UPDATE_NOTE transactions but significantly worse overall. Although testing the larger-database scenario showed a small decrease in performance, general testing indicated that loads that stressed concurrent activity fared comparably, if not better.
- Buffer pool tests in our test case were hampered somewhat by the larger inboxes. Because of the large inboxes for each user, available disk space limited the amount of concurrency that we were able to test. Thus the number of concurrent transactions was relatively low. Because of that, the net effect of having more buffer pool was limited.
Having said that, however, we were still able to see some benefits by increasing the buffer pool. It should be noted that there was not a significant performance difference between the 1024 MB buffer pool and the 1536 MB buffer pool because the smaller number of concurrent transactions did not generate a need for the additional buffer pool.
From past documentation and testing, we know that performance can indeed suffer after 1024 MB. (See the Appendix for more information on buffer pool tests with greater concurrency.)
- Transaction logging appeared to have a small but significant effect on overall performance and for the two major transaction types. The results indicate that there is a beneficial performance effect from having transaction logging enabled. The question that arises is this one: If logging can have a positive performance effect, can we see greater improvement through tuning?
It's been previously documented that transaction logging has some overhead associated with it that can negatively affect performance when it is used. But in a stress-test situation in which the database size was more of a consideration than the concurrent number of transactions, we see that it can have a positive effect on performance.
These conclusions appear to confirm that the effect of large documents on performance is vastly different from that of a larger amount of transactions (that is, users). As such, the tuning and the configuration for each situation need to be appropriate.
A little used and relatively new feature is the RAM disk, which allows for the use of physical RAM as the storage for a file system. Of course, it is not persistent, but it does offer fast I/O for any activities that might be considered transient.
Some of the areas with which we experimented were storage of the Lotus Domino Directory (highly experimental), storage of transaction logs, and usage of the view_rebuild_directory Notes.ini parameter.
It was found that there was not a significant change in performance in any of the categories when used in testing. A few reasons were these:
- The amount of concurrency was never high enough.
- Lotus Domino did a good job of using physical memory for operations that needed fast I/O.
- The tests were rigid and limited; they never came upon certain events that could affect performance.
For example, during the course of a stress test, there was limited, if any, need to have a view be rebuilt. Although this need can occur on a regular basis in production, it doesn't happen often in the course of a one-hour test.
Testing the rebuild times did, however, show dramatic improvement when the view_rebuild_dir parameter pointed to a RAM disk. Figure 4 compares the rebuild times of a 4 GB database.
Figure 4. Comparison of rebuild times
At this point, the only conclusion we can draw is that RAM disk can dramatically affect view rebuilds, when they occur. View rebuild conditions had to be artificially created to be observed so, while rebuilds can occur in regular production, it's not something that would occur during normal test loads.
Extraordinarily large number of documents
One of the limiting factors, of course, is that the size of the inbox is to some degree limited by the folder structure itself. This factor was the reason for keeping the maximum size of the inbox tested at 80,000 documents.
In a regular production environment, though, there can be some users who work out of views or applications residing on the server. We know that Lotus Domino handles this type of load well, but does performance differ under different configurations? Figure 5 shows the changes in response time for various functions when the server was populated with databases that had 400,000 documents in a view.
Figure 5. Response times for various configuration changes
Surprisingly, one of the best performance results came from having the transaction logs held in RAM (see the yellow line in figure 5). This result was unexpected because in the tests with 80,000 documents, the use of transaction logging made only a small difference; moreover, for larger databases and views tested, the effect of having transaction logging was noticeably negative.
Yet, by putting logs on a fast independent device (in this case, a RAM-based file system) we saw a noticeable improvement in performance. You might suspect that the transaction logging device, even with the limited number of transactions, added a bottleneck to performance when large views were used.
Past documentation has indicated the need for the fastest logging disks possible, so we reran the tests using a RAM disk file system, to see if doing so made a difference. There was not a significant performance effect, which was most likely due to the lack of a bottleneck on the logging disk because the level of concurrency was fairly low.
Also of note, having the view_rebuild_dir Notes.ini parameter use a RAM disk showed an overall gain.
A few inferences we might draw from this set of results are these:
- The use of larger views can cause noticeable resource issues in the logging device and the view rebuild directory. The effect of changes in configuration or capacity had a substantial effect on performance.
- The speed of a transaction logging device becomes more of a factor under heavy view or folder activities. In other words, having databases with large folders and views magnifies the transaction logging diskâs effect on performance.
- Databases that use larger views might benefit from using the fastest I/O possible for their view_rebuild_dir file system.
Regarding the network compression statistics, we ran the tests using FTP as a control, to determine how the network was affected by a transfer of data. We found that during our sample we were able to achieve approximately 10 MB per second. Because of the nonlinear and noncontinuous activity of Lotus Domino, we normally would expect to get worse total throughput.
The key element we are concerned with, though, is the average read time, and in our base testing we obtained 6.3 MB per second, with about 4.3 MB of that being read operations. Those numbers might indicate that the tests were bottlenecked by the network.
The tests above appear to indicate the effect of compression did not significantly influence the response times, although it did change the behavior somewhat. Hence, we tend to believe that the network bandwidth was not a significant factor in the tests.
Also, these tests were run with even less concurrency than the tests above, which had 80,000 documents. To measure the effect of large numbers of documents, we had to further sacrifice the amount of concurrency for database size. This limitation should be kept in mind with respect to any tunables or configurations that did not have the expected effect on performance.
Many other configurations could be considered for future study in this area. For example, a further investigation into write I/O patterns, network constraints, and greater concurrency with larger databases are just a few ideas that would be of interest, given the time and resources. While the results in this article can't be considered definitive, a key point is that large database issues can be complex. They behave quite differently with respect to traditional assumptions about database performance and, as such, the symptoms and management of large database servers must be treated differently as well.
To recap our findings:
- The size of a database can easily have a greater effect on performance than the number of users; however, that effect does not necessarily have to be the case. The greatest impact on performance is the size of the view or folder in use, specifically, the number of documents in a view or folder being used, assuming that transaction types and amounts remain the same.
- The size or the total number of documents in the database did not prove to be an accurate predictor of the performance impact, under normal test loads. The data shows exponential growth rates for UPDATE_NOTE and NIF operation times as the number of documents in the inbox grew.
Note, however, that a normal test load did not include database activities that could use significantly more resources, regardless of the size of a specific view. (For example, view rebuilds and maintenance operations can use a significant amount of additional resources to complete the same task.) Although total database size is not the greatest predictor of performance, it can still play a role.
- The nature of the performance impact of concurrent users proved to be quite different from that of large databases. We saw that the nature of I/O appeared to indicate a sequential pattern, the inference being that differences in tuning might make it advantageous to group similar users together to take advantage of configuration differences.
- The way in which physical memory is used can greatly affect performance. Configuration changes to the buffer pool, the use of multiple servers, and even the use of RAM disk showed that changes in how memory was configured could help overall performance. It could also hurt performance, although in general we found that the use of file-system cache in our tests tended to have less value, if there was an alternate method for using physical memory to relieve I/O operations.
- The relationship between concurrency and the size of a database continues to be almost impossible to define accurately. To some degree, concurrency and size complement each other in their use of resources and overall effect on performance. For example, an administrator might decide that having a large database mixed in with smaller ones can hide the performance impact. On the other hand, because large databases might use the resources of a server more intensively and differently, it can be beneficial to isolate them from other databases.
- Because a large database behaves differently and as such can react differently to tuning and configuration, it raises the question of whether large databases should exist separately so as to take advantage of those tunings or configurations. There does seem to be compelling evidence to indicate that might be value there. We hope that by more clearly defining the nature of large databases, we can improve our grasp on how to manage performance.
The figures that follow show the effect of buffer pool size when the amount of concurrency was increased with smaller databases. Note that 1000 users does not mean more concurrency, but rather more queued requests. The tests that follow were able to achieve a greater number of concurrent transactions during the test runs with 500 users.
Figure A1. update_note times
Figure A2. open_collections ties and buffer pool sizes
We can see from the figures that there appears to be a significant benefit to managing the buffer pool size when there's a high degree of concurrency. Compare this finding to the testing that we saw with larger databases in which the benefit was marginal. In fact, during the larger-database tests, the actual allocation of the buffer pool never exceeded 1 GB. This finding demonstrates the effect of concurrent transactions on the buffer pool and how changes in the buffer pool affect server performance.
Although the buffer pool did not exceed 1 GB in the large-database tests, the smaller database and greater concurrency tests used every megabyte possible. This usage is why the results were so much more pronounced in these tests. Note, however, that as the buffer pool exceeded 1024 MB, the effect on performance actually became negative.
Past testing and general experience appear to indicate that there's an optimum buffer pool size; if that amount is exceeded, then the additional cost associated with the larger buffer pool is more than the value obtained. There isn't a hard and fast number that has been documented because it can vary with the type of workload, but as a rule of thumb, having more than 1 GB does not appear to be worthwhile.
Last, the amount of improvement seen in OPEN_COLLECTION times was proportionally much greater for the smaller number of users. This result leads us to believe that the buffer pool was more effective when the disks were less taxed. One reason might be that the buffer pool manages throughput to the disks; however, if the disks were to become the bottleneck, then any additional value you would expect to see from configuring the buffer pool would be diminished.
- Participate in the discussion forum.
- Read the IBM Support Technote, "Compact performance affected by file caching system."
- Read the IBM Support Technote, "Knowledge Collection/Troubleshooting Guide: Known causes of slow database."
- Read the IBM Support Technote, "How to optimize database performance using Database properties."
- Read the IBM Support Technote, "Domino Server Performance Troubleshooting Cookbook."
- Read the IBM RedbooksÂ® Paper, "Sizing Large Scale Domino Workloads on iSeries."
- Read the IBM Redbooks publication, "Understanding IBM eServer pSeries Performance and Sizing."
- Read the IBM Redpaper, "Domino for IBM eServer xSeries and BladeCenter Sizing and Performance Tuning."
- Read the developerWorksÂ® Lotus article, "Lotus Notes/Domino 7 application performance."
- Read the developerWorks Lotus article, "Best practices for large Lotus Notes mail files."