One of the saddest sights we know is a beautiful application that is so slow it's unusable -- all the long hours and hard work wasted because users are frustrated by slow response times. Over the past 12 years, we've spent a lot of time researching and testing Domino applications and functionality to understand how and where features can best be used to optimize performance. We started supporting and developing Domino applications in the early 1990s, and quickly became fascinated with application performance. It seemed to us then (as it still does today) that much of what we perceive as server performance problems are actually application performance problems. And the solutions, therefore, are often found within the application rather than on the server.
In this two-part article series, we will share some of what we've learned with you. This series covers three areas of application performance: database properties, document collections, and views. In part 1, we will discuss database properties and document collections. In each case, we will point out areas that are most significant and provide concise, real-world examples to help you understand what to do in your own applications. We'll use examples from many applications; you'll probably find that at least one of them closely matches something that you do or that you use. Our goal is to help you build applications that are as fast as they are beautiful.
This article assumes that you're an experienced Notes/Domino application developer.
There are a handful of database properties that are relevant to the performance of your application.
If you check this box, unread marks will not be tracked in your application regardless of the settings you have for each view. We used client_clock to track the time spent opening a database, and what we saw surprised us. For a large application (say 20 GB with 200,000 documents), our Notes client could open the database in about five seconds without unread marks, including network traffic. With unread marks turned on, we had to wait an additional six seconds or more. This additional time was spent in GET_UNREAD_NOTE_TABLE and RCV_UNREAD. With unread marks turned off, these calls aren't made.
In a smaller database (less than 1 GB), we saw savings of maybe 0.5 seconds with unread marks turned off. Of course, it was faster to open that database with or without unread marks compared to the larger database. So you should consider whether or not your application needs the unread marks feature before you roll it out into production.
This feature has not changed for the past few releases of Lotus Notes/Domino. This feature is designed to speed up view indexing for applications with structures that resemble the Domino Directory. (In other words, they contain many documents that use one form and a small number of documents using a different form. Think of Person documents versus Server documents in the Domino Directory.)
The idea is that, instead of checking every document note to see whether or not it should be included in a view index, we make two passes. The first pass merely checks to see if the correct form name is associated with that document. The second pass, if needed, checks for the various other conditions that must be met to include this document note in the view index.
Note: Currently, this feature does not appear to improve indexing times, not even for Domino Directories.
This feature has not changed for the past few releases of Lotus Notes/Domino. If you uncheck this box, then whenever documents are deleted, Lotus Notes will actually overwrite the bits of data instead of merely deleting the pointer to that data. The goal is to make the data unrecoverable. You would only use this feature if you feared for the physical safety of your hard disk. For virtually every application, this extra physical security is unwarranted and is merely an extra step when data is deleted.
This feature has not changed for the past few releases of Lotus Notes/Domino. If you check this box, Notes will track the last time a Notes client opened each document in the database. Lotus Notes always tracks the last save, of course, in the $UpdatedBy field, but this feature tracks the last read as well. (It does not track Web browser reads, however.)
We have not seen this feature used by developers other than in knowledge base applications, where data is archived if it has not been read within a certain number of months or years.
We've looked at customer code for many years in agents, in views, in form field formulas, and so on. In our experience, frontend performance problems tend to be more troublesome than backend performance problems for a number of reasons:
- backend processes are typically monitored more rigorously.
- backend processes frequently do not have to worry about network traffic.
- frontend problems can be confusing to decipher. Users often are not sure which actions are relevant, causing them to report unimportant and even unrelated actions to your Support desk.
But regardless of where the code comes from, if we find that something is slow and we open up the code to examine it, we will likely find the following as a common denominator:
- The code establishes certain criteria from context, such as user's name, status of document that user is in, today's date, and so on.
- The code gets a collection of documents from this or another database.
- The code reads from, and/or writes to, these documents.
From performing tests over many years, we have found that typically, the first step is very fast and not worth optimizing, at least not until bigger battles have been fought and won. The third step is often slow, but unfortunately it is not very elastic. That is, you are unlikely to find that your code is inefficiently reading information from or saving information to a set of documents. For example, if you are trying to save today's date to a field called DateToday, you would likely use one of the following methods:
Set Doc = dc.getfirstdocument
Do while not (Doc is Nothing)
Doc.DateToday = Today
Set Doc = dc.getnextdocument ( Doc )
Set Doc = dc.getfirstdocument
Do while not (Doc is Nothing)
Call Doc.ReplaceItemValue ("DateToday", Today)
Set Doc = dc.getnextdocument (Doc)
Call dc.StampAll ("TodayDate", Today)
In our testing, we've never found a difference in performance between the first two of the three preceding examples. Using the extended class syntax, doc.DateToday = Today, appears to be just as fast as using doc.ReplaceItemValue ("DateToday", Today). In theory, we should see some performance difference because in one case, we are not explicitly telling Lotus Notes that we will update a field item, so Lotus Notes should spend a bit longer figuring out that DateToday is, in fact, a field. However, practical tests show no difference.
The dc.StampAll method is faster if you are updating many documents with a single value as in the preceding example. There were some point releases in which a bug made this method much slower, so if you're not using the latest and greatest, please confirm this is working optimally (either with testing or by checking the fix list). But as of Lotus Notes/Domino 6.5 and 7, this is once again fast. However, there are often so many checks to perform against the data or variable data to write to the documents that dc.StampAll is not always a viable option. We would put it into the category of a valuable piece of information that you may or may not be able to use in a particular application.
As for deciding which of the three methods we should focus on, our experience says that the ReplaceItemValue example (getting a collection of documents) is the one. It turns out that this is often, by far, the largest chunk of time used by code and fortunately, the most compressible. This was the focus of oiur testing and will be discussed in the remainder of this section.
Our testing methodology was to create a large database with documents of fairly consistent size (roughly 2K) and with the same number of fields (approximately 200). We made sure that the documents had some carefully programmed differences, so that we could perform lookups against any number of documents. Specifically, we made sure that we could do a lookup against 1, 2, 3, â¦ 9, 10 documents; and also 20, 30, 40, â¦ 90, 100; and also 200, 300, 400, â¦ 900, 1000; and so on. This gave us a tremendous number of data points and allowed us to verify that we were not seeing good performance across only a narrow band. For example, db.search is an excellent performer against a large subset of documents in a database, but a poor performer against a small subset. Without carefully testing against the entire spectrum, we might have been misled as to its performance characteristics.
We ran tests for many hours at a time, writing out the results to text files which we would then import into spreadsheets and presentation programs for the purpose of charting XY plots. After many such iterations and after trying databases that were small (10K documents) and large (4 million documents), we came up with a set of guidelines that we think are helpful to the application developer.
The fastest way to get a collection of documents for reading or writing is to use either db.ftsearch or view.GetAllDocumentsByKey. It turns out that other methods (see the following list) may be close for some sets of documents (discussed later in this article), but nothing else can match these methods for both small and large collections. We list the methods with a brief explanation here and go into more detail later.
- view.GetAllDocumentsByKey gets a collection of documents based upon a key in a view, then iterates through that collection using set doc = dc.GetNextDocument (doc).
- db.ftsearch gets a collection of documents based upon full-text search criteria in a database, then iterates through that collection using set doc = dc.GetNextDocument (doc).
- view.ftsearch gets a collection of documents based upon full-text search criteria, but constrains the results to documents that already appear in a view. It then iterates through the collection using set doc = dc.GetNextDocument (doc).
- db.search gets a collection of documents based upon a non-full-text search of documents in a database, then iterates through the collection using set doc = dc.GetNextDocument (doc).
- view.GetAllEntriesByKey gets a collection of view entries in a view, then either reads directly from column values or gets a handle to the backend document through the view entry. It then iterates through the collection using set entry = nvc.GetNextEntry (entry).
If you have a small collection of documents (for example, 10 or so) and a small database (for instance, 10,000 documents), many different methods will yield approximately the same performance, and they'll all be very fast. This is what you would call the trivial case, and unless this code is looping many times (or is used frequently in your application), you might leave this code intact and move on to bigger problems.
However, you still may find small differences, and if you need to get many collections of documents, then even saving a fraction of a second each time your code runs will become meaningful. Additionally, if your application is large (or growing), you'll find the time differences can become substantial.
Here are two customer examples: first, scheduled agents that are set to run very frequently (every few minutes or whenever documents have been saved or modified) that iterate through every new document to get search criteria, and then perform searches based on that criteria. If 10 new documents were being processed, then 10 searches were performed -- and if 100 new documents were processed, then 100 searches were performed. For this customer, if we could shave 0.5 second off the time to get a collection of documents, that savings was really multiplied by 10 or 100, and then multiplied again by the frequency of the execution of the agent. It could easily save many minutes per hour during busy traffic times of the day, which is meaningful. Another case is a principal form has a PostOpen or QuerySave event that runs this code. If you have hundreds of edits per hour (or more), this 0.5 second saved will be multiplied to become a noticeable savings.
When we explained to colleagues or customers why some of these methods are faster or easier to use than other methods, we often engaged in a spirited debate, complete with "on the one hand" and "on the other hand" arguments. To our great satisfaction, the deeper we've taken these arguments, the clearer the issues have become. We will attempt to invoke that same spirit in this article with two mythical debating opponents, Prometheus ("Pro" to his friends) and his skeptical colleague Connie (a.k.a "Con").
Prometheus: view.GetAllDocumentsByKey looks very fast. I think I'm sold on using it wherever I can.
Connie: All well and good, my friend, but what if you're looking up data in the Domino Directory? You can't get permission to create new views there easily.
Pro: Great point. OK, in applications where I control the lookup database, that's where I'll use this method.
Con: Oh? And if you end up creating an additional 10 views in that database, is it still a good method? Think of all the additional view indexing required.
Pro: That might appear to be a nuisance, but if I build streamlined views, they will likely index in less than 100 milliseconds every 15 minutes when the UPDATE task runs -- more if required by lookups. Surely we can spare a few hundred milliseconds every few minutes?
Con: How do you streamline these views? Is that hard? Will it require much upkeep?
Pro: Not at all. To streamline a lookup view, you first make the selection criteria as refined as possible. This reduces the size of the view index and therefore, the time to update the index and perform your lookup. Then, think about how you'll do your lookups against this view. If you're going to get all documents, consider simply using a single sorted column with a formula of "1." Then it's trivial to get all the documents in the view. If you need many different fields of information, consider making a second column that concatenates those data points into a single lookup. One lookup is much faster than several, even if the total data returned is the same.
Con: OK, I might be sold on that method. But you have also touted db.ftsearch as being very fast, and I'm not sure I'm ready to use that method. It seems like it requires a lot of infrastructure.
Pro: It is true that to use db.ftsearch reliably in your code, you'll need to both maintain a full-text index and also make sure that your Domino server's configuration includes FT_MAX_SEARCH_RESULTS=n, where n is a number larger than the largest collection size your code will need to return. Without it, you are limited to 5,000 documents.
Con: And what happens if the full-text index isn't updated fast enough?
Pro: In that case, your code can include db.UpdateFTIndex to update the index.
Con: My testing indicates that this can be quite time consuming, far outweighing any performance benefits you get from using db.ftsearch in the first place. And what happens if the full-text index hasn't even been created?
Pro: If the database has fewer than 5,000 documents, a temporary full-text index will be created on-the-fly for you.
Con: I have two problems with that. First, a temporary full-text index is very inefficient because it gets dumped after my code runs. Second, 5,000 documents isn't a very high threshold. Sounds like that would only be some mail files in my organization. What if there are more than 5,000 documents in the database?
Pro: In that case, using db.UpdateFTIndex (True) will create a permanent full-text index.
Con: OK, but creating a full-text index for a larger database can be very time consuming. I also know that the full-text index will only be created if the database is local to the code -- that is, on the same server as the code that is executing.
Pro: True enough. Fortunately, Lotus Notes/Domino 7 has some improved console logging as well as the ability to use Domino Domain Monitoring (DDM) to more closely track issues such as using ftsearch methods against databases with no full-text index. Here are a couple of messages you might see on your console log. As you can see, they are pretty clear:
Agent Manager: Full text operations on database 'xyz.nsf' which is not full text indexed. This is extremely inefficient.
mm/dd/yyyy 04:04:34 PM Full Text message: index of 10000 documents exceeds limit (5000), aborting: Maximum allowable documents exceeded for a temporary full text index
Con: While I'm at it, I see that you haven't said much positive about view.ftsearch, view.GetAllEntriesByKey, or db.search. And I think I know why. The first two are fast under some conditions, but if the view happens to be structured so that your lookup data is indexed towards the bottom, they can be very slow. And db.search tends to be very inefficient for small document collections.
Pro: All those points are true. However, db.search is very effective at time/date sensitive searches, where you would not want to build a view with time/date formulas and where you might not want to have to maintain a full-text index to use the db.ftsearch method. Also, if you are performing lookups against databases not under your control and if those databases are not already full-text indexed, it is possible that db.search is your only real option for getting a collection of documents.
Here are some charts to help quantify the preceding points made by Pro. These charts show how long it takes to simply get a collection of documents. Nothing is read from these documents and nothing is written back to them. This is a test application in our test environment, so the absolute numbers should be taken with a grain of salt. However, the relationships between the various methods should be consistent with what you would find in your own environment.
In Figure 1, db.ftsearch and view.GetAllDocumentsByKey are virtually indistinguishable from each other, both being the best performers. Call that a tie for first place. A close third would be view.GetAllEntriesByKey, while view.ftsearch starts out performing very well, but then rapidly worsens as the number of documents hits 40 or so.
Figure 1. Document collections, optimized views (up to 100 documents)
In Figure 2, the only difference worth noting from Figure 1 is that db.search looks better and better as the number of documents increases. It turns out that at approximately 5 to 10 percent of the documents in a database, db.search will be just as fast as the front runners. As we saw in Figure 1, view.ftsearch is getting worse and worse as the document collection size increases.
Figure 2. Document collections, optimized views (100 to 1,000 documents)
In Figure 3, the views are no longer optimized to put the results towards the top. That is, if we are getting a collection of only a few documents, then in our test environment, we can try to skew the results by making sure those few documents are towards the top or bottom of the lookup view. In Figures 1 and 2, those documents tended to be towards the top of the view, but in Figure 3, those documents are at the bottom. For three of the methods, this is immaterial (db.search, db.ftsearch, and view.GetAllDocumentsByKey). However, for view.ftsearch and view.GetAllEntriesByKey, this switch is catastrophic in terms of performance. The scale on Figures 2 and 3 had to be changed -- instead of the Y-axis going up to one second, it has to go up to six seconds!
Figure 3. Document collections, non-optimized views
Whenever feasible, use view.GetAllDocumentsByKey to get a collection of documents. In conjunction with this method, streamline your lookup views so that they are as small and efficient as possible. Part 2 of this article series has some tips for doing this.
If your lookups need to go against rich text fields, or if your database is already full-text indexed, db.ftsearch is an excellent performer and well worth considering. Be sure that your results will always be less than 5,000 documents or use the Notes.ini parameter FT_MAX_SEARCH_RESULTS=n (where n is the maximum number of documents that can be returned) to guarantee that you do not lose data integrity due to this limit.
This concludes the first part of our examination of Notes/Domino 7 application performance. In part 2, we discuss how you can build high-performing views. See you there!
Read part of two of this article series, "Lotus Notes/Domino 7 application performance: Part 2: Optimizing database views."
For information on troubleshooting application performance in Lotus Notes/Domino, see the developerWorks Lotus two-part article series, "Troubleshooting application performance: Part 1: Troubleshooting techniques and code tips" and "Troubleshooting application performance: Part 2: New tools in Lotus Notes/Domino 7."
In addition, the two-part article series, "Application Performance Tuning, Part 1" and "Application Performance Tuning, Part 2," offers valuable information about tuning application performance in Lotus Notes/Domino 5 and 6.
The article, "New features in Lotus Notes and Domino Designer 7.0," describes new features introduced in Lotus Domino Designer 7.0.
- Participate in the discussion forum.
Participate in developerWorks
blogs and get involved in the developerWorks community.