Daniel González 270007DFU9 Comments (2) Visits (5746)
JSONStore is a lightweight, document-oriented storage system that is included as a feature of IBM® Worklight Foundation, and enables persistent storage of JSON documents. Documents in an application are available in JSONStore even when the device that is running the application is offline. This persistent, always-available storage can be useful for customers, employees, or partners, to give them access to documents when, for example, there is no network connection to the device.
For more general information regarding JSONStore, see the user documentation.
Since JSONStore is managing files on the device, performance is one of the main concerns for developers, so I will be discussing how the JSONStore API performs. I will also talk about the factors that affect performance the most, common performance problems encountered while using JSONStore, and some of the best practices to get the best performance possible.
JSONStore doesn't have many parameters that can be tuned for performance, so the few that can be changed should be taken into account to avoid running out of memory in your application, and to make it as fast as it can be. I will discuss what affects each method and how to modify the parameters and use certain options to optimize them.
For these benchmarks, I have a simple hybrid Worklight V6.2 application that uses the JSONStore API, and keeps track of the time that each operation took. If you want to try the application yourself, go to the Appendix at the end of this article to get it, and to get the instructions on how to use the application yourself. You will also find benchmarks that were run with several Android, iOS, and Windows devices.
For this article, I will be discussing only the benchmarks that I got on a Nexus 5 to keep the discussion simple. Similar performance can be seen in other devices; the trends for each API method will be similar on all devices, but the actual times are obviously highly dependent on your device's hardware.
Disclaimer: I am providing the application used to generate the benchmarks discussed here as well as the results as is, without warranty or support of any kind, express or implied. Also, keep in mind that the numbers that I got might be different for you. This application also does the bare minimum needed to use JSONStore, so there is minimal UI and user interaction, and nothing else is going on in the background, so if you compare these results to your application, keep in mind that yours will likely be doing many more things than this application, and will probably have different performance. These benchmarks are just to give you an idea of the general performance of the different methods in JSONStore, not a guarantee that you will get exactly these numbers on your application. Finally, the data used in these tests was created randomly combining different names, numbers and addresses for the purposes of creating data similar to what you might be using. This is not real data.
JSONStore API performance
Each method in the JSONStore API has different parameters, and behaves differently depending on these parameters. Here is a discussion of the performance of methods in the JSONStore API, talking about which parameters affect each method the most and how to avoid performance problems.
In this discussion it is assumed that you are familiar with JSONStore and understand what the different terms (such as store, collection, document, search field, etc.) mean. For more information, see the user documentation.
Here we see the performance for the JSONStore init method when initializing one collection with several search fields. Notice that in the bottom, the number in the axis refers to what size the collection eventually will be, so for each test point, the time seen is the time it takes to initialize one collection. We see that it is pretty much constant. For the case without encryption, it hovers around 100 ms. For the case with encryption, it hovers around 700 ms, which clearly shows that using encryption affects the init method significantly, with a 700% performance hit. This happens because it has to use the given password to generate the security artifacts that will be used later to encrypt and decrypt the documents in the stores (for more details about what these artifacts are, see the user documentation). This means that if you want to initialize multiple stores at the same time, all of them with encryption, each one will take more time because they are each generating the security artifacts. The times can add up quickly, so be sure to only initialize the stores that you need at any given time to avoid the additional delay.
The other parameter that affects the init method's performance is how many collections are being initialized: the more collections you are initializing, the longer it will take. As mentioned previously, each collection takes a somewhat constant time to initialize, so the total execution time for initializing the collections will grow linearly with the number of collections being initialized, since each collection takes a constant amount of time to be initialized.
Here we can see the execution times when adding documents to a collection. We can see that encryption does not affect the performance considerably, just adding a small percentage to the execution time. On the other hand, the amount of documents being added affects it significantly, going from about 5 s for 2000 documents to about 16 s for 8000 documents, an increase of 220%.
However, the execution time isn't the only thing to be concerned about: you can easily run out of memory in your application if you try to add too many documents at a time. Each application has only a certain amount of memory that it can access, which is determined by the OS and by the device's hardware. How many documents you can add at a time is also limited to how big your data is; the bigger your data, the fewer documents you can keep in memory. For example, when we ran the tests on an HTC Inspire running Android 2.3.3, it started reaching the application's memory limit when adding the 2000 and 4000 documents (which are 822kB and 1.6MB, respectively), whereas the Nexus 5 had no problem adding the 8000 documents, which are 4.1 MB. This means that you have to decide which devices you want to support, and plan accordingly; older and cheaper devices tend to have less memory than newer, higher-end devices. Consult the OS's documentation to learn how to determine how much memory is available to the application, and what the best practices for memory management are.
If your application is running out of memory, the best practice would be to separate the documents being added into several chunks, and only add a certain amount at a time (say, 500 documents at a time). The correct amount depends on the devices you are programming for, the average size of the documents you are adding, and how many things you have in memory at a given time in your application, so you should try using several different amounts of documents to see which one is ideal for your case. If you use too few, it will make it unnecessarily slow, as each add call has some overhead, but using too many makes it much slower and you risk making the application run out of memory. For example, if you want to add 5000 documents, passing all of them at the same time might be inefficient, but doing them 10 at a time might be ridiculously slow too. A possible sweet spot might be adding 500 or 1000 at a time. This would also be a good idea if you are retrieving the data from a server or reading them from disk, since separating it into too many chunks would mean more network calls or more file I/O operations, which you would want to minimize too.
Also, having a high memory usage at any given time is a not a good idea. For example, Android manages memory by using a garbage collector to remove objects that are no longer in use. If you are using most of your memory, the OS will have to run the garbage collector constantly to try to make space in what little is left, and running the garbage collector too many times makes your application even slower, since the OS has to stop all applications to free the memory. This is another reason why you should avoid having too many documents in memory at the same time.
Another thing not pictured in this graph is that the amount of search fields you are indexing, and how big each of them are, affects the performance when adding documents. Every time you add a new document, JSONStore has to read each of the search fields being indexed and cache them, so the more it has to cache, the more it will take per document. Also, if the search fields being cached are too big, it requires copying them to the cache, which will affect performance and memory, as explained before. So make sure that you only use search fields that you really need.
Find with queries
(Note: For this method and some of the following ones, there are two charts. These two charts represent executing the method in a collection size of up to 8000 documents in the first graph, and up to 100,000 documents in the second one. The first, smaller one is provided to get a clearer picture of the trend seen between the encrypted and unencrypted cases.)
For find with queries, we see in these two graphs how long it took to execute a find with a query in collections of different sizes. We see that the encryption overhead isn't too significant until you get to larger collection sizes (a difference of about 8% [1.53 s vs 1.67 s] in the 6000 documents case vs. a difference of 50% [18.8 s vs 28.1 s] in the 100k documents case). This overhead exists because JSONStore has to decrypt the documents to be able to inspect them, so the more documents it has to handle with the encryption, the longer it will take.
However, the biggest factor that influences find with a query is the collection size, since JSONStore has to search the whole collection to find all documents that fit the query's specification, so having a bigger collection makes find slower. This can be seen when you compare that a query for 6000 documents takes 1.67 s in the encrypted case, vs 28.1 s in the 100k documents encrypted case, which is a 1583% increase, compared to the up to 50% increase for the encryption case.
This data shows that it is usually a bad idea to have large collections and stores on the device, so this should be avoided as much as possible. This Offline Patterns blog post goes into more details as to what the best practices are regarding what data should be kept in the device and what should not.
However, if you still need to have a large database locally, there are some more things that can be done to make find with queries faster. The first one would be to use the limit and offset options provided for this method. This is especially useful if you only want a fixed amount of results (say, the first 5, or you know that there is only one result for this query), since otherwise JSONStore has to look at all the documents in the collection to get all documents that match the query. This means that if you have 100k or 250k documents, it has to look at all of them before returning the results. By specifying a limit, JSONStore stops as soon as it finds the amount of documents specified by the limit, making searching in large databases much faster, on average. The offset is used if you want to go through the results by chunks, say, the first 100 results, the next 100, and so on. Limit and offset is also necessary if your query returns too many results, because, as discussed for the add method, the memory available to the application is limited by the device's hardware and OS, so retrieving too many results could potentially make your application slow or run out of memory.
For Worklight V6.2 and above, the filter option is also useful to manage memory, since you can retrieve only the search fields that you are interested in, instead of returning the whole document. This is useful if you have documents that have large fields that could make you have memory problems. You could include the _id field assigned by JSONStore in the filter, so that if you later need the whole document, you can retrieve it by using find with id, which is much faster, instead of using find with queries.
As an aside from the discussion of the results for find with query, another best practice is that when you want to count how many documents match a given query, you do not want to do a find with query and then count the results, as that could be a big performance hit. Starting with Worklight V6.1, a new option to call count with a query was introduced, which is faster than getting all the results from find, and will also save you all the memory from loading those results, since the only thing returned from count with query is the number of documents that match that given query. This is also helpful when using limit and offset, as it gives you a maximum to use when using these two options for your given query.
Find by id
Find by id is the fastest way of searching for documents in collections; if you have the _id that was assigned by JSONStore, you can find a document or group of documents much quicker by using this method than by using find with query.
For find by id, the charts show that execution time grows linearly with collection size, but that growth is pretty tame. When searching for one of the last documents in the collection, it goes from about 50 ms in a 1000 documents encrypted collection to almost 2.5 s in a 100,000 document encrypted collection, which, in terms of percentage, is a large increase (a 200% increase), but it is still much faster than the equivalent using find with queries, as can be seen in the following graphs:
Encryption has a bigger effect on find by id, especially as the collection grows; it goes from a difference of 280 ms in the 8000 documents case (120 ms unencrypted versus 400 ms encrypted) to a difference of 1.5 s in the 100,000 case. As mentioned before, this is still much faster than the equivalent scenario using find with queries.
There is not much that can be tweaked to have better performance with find by id. The only best practice would be that, if you have to search more than one document at the same time, that you should pass all of them in an _id array to find by id, instead of executing find by id one by one per document.
As with other methods, collection size does not affect count considerably by itself. It grows linearly with collection size, going from about 20 ms in the 1000 documents unencrypted case, to about 400 ms in the 100,000 documents case, which means it is growing at about 3.8 ms per 1000 documents.
With encryption, however, count takes longer with larger collection sizes, due to the performance hit from encryption. It goes from 20 ms in the 1000 documents encrypted case to 1.8 s in the 100,000 encrypted case, meaning that it grows at about 17.8 ms per 1000 documents. This is a 368% increase from the rate for the unencrypted case.
If you would prefer a quicker way to count how many documents are in a given collection, you can keep a variable for each collection that you want to keep track of, and add the count of the documents being added, and subtract the documents being removed. Keep in mind, however, count is still pretty fast for databases that are not extremely large (taking less than 500 ms for collections of less than about 20k documents).
Remove and replace documents
For remove, we can see that the performance is somewhat similar to find by id in terms of how collection size and encryption affect it. This is due to the fact that remove receives an id or an array of ids, finds the documents using these, and removes the documents found, so the only difference with find by id is the extra step of removing the documents found. It takes longer because it is removing the file from disk, so the file I/O will make it take longer than just doing a find by id.
Collection size does not affect remove much, except in the case of encryption. As has been explained previously in other methods, with encryption, larger collection sizes have a bigger hit. In the unencrypted case, remove grows from about 50 ms for the 1000 documents case, to about 2.2 s for the 100,000 case, which means that it is taking about 21.5 ms more per 1000 documents, whereas in the encrypted case it goes from about 50 ms in the 1000 documents case to 7.1 s, which means that it is growing at a rate of 70.5 ms per 1000 documents, a 228% increase in the rate from the unencrypted case.
Replace has similar performance to find by id and remove, but it is much closer to find by id than remove since it does not completely remove documents from the collection, but simply updates the document. It is mostly affected by encryption, having a difference of 1.6 s (2.1 s encrypted vs 0.5 s unencrypted) in the 100k documents case, whereas for the 8000 documents the difference is 250 ms (100 ms unencrypted versus 360 ms encrypted). This further reinforces the fact that encryption affects large collections the hardest, as has been mentioned before.
Close all stores and destroy all stores
The closeAll() method executes at a constant speed (there is some variability in the graph, but this variability is less than 15 ms). Encryption and collection size do not affect the performance of this method, as it can be seen that even with encryption, the longest it takes is 20 ms to close the accessor to the store.
Similarly, destroy executes at a constant rate, taking slightly longer for the encrypted case on average, but for all collection sizes, regardless of encryption, it took less than 40 ms to execute destroy.
Summary of factors that affect performance
Based on the discussion for the performance of the JSONStore API methods discussed previously, here is a summary of the main factors that affect performance when using JSONStore:
Best practices for performance
Based on the previous discussion, here are some best practices to follow when using JSONStore in your hybrid or native application:
In this article I explained, method by method, the different factors that affect JSONStore's performance, and explained the best practices that should be used to get the most out of JSONStore. I also provided the results and the application used, so that you can play around with it and see how it works for you on your devices.
Here is the application that was used. It is a Worklight hybrid application, and it contains the sample data used for the tests, as well as the code with the different tests. Some setup is required for each of the different operating systems, as described below.
Setting up the application for Android
Setting up the application for iOS
Setting up the application for Windows
Here are the different benchmark results from running the application on a Nexus 5, iPhone 5 and HTC W8, respectively, all inside a single zip file.