If you read my recent post, Backing Your Guests With Huge Pages, you might remember me mentioning that the ability to back your KVM guests with hugepages is on its way to OpenStack. This functionality is planned for the Juno release. If you are just learning about hugepages, as I am, you may be wondering how this functionality could benefit you.
What This Means for OpenStack
OpenStack Compute, also known as Nova, is a feature rich service that streamlines the creation, use, and management of virtual machines. That being said, it still is not able to expose all of the functionality that your hypervisor provides. The hugepages blueprint, among others, brings OpenStack closer to providing you all of the options you could use to tweak your guest machines.
It should be noted that the hugepages blueprint is targeted towards hypervisors that use libvirt.
General Benefits of Hugepages
The advantages of hugepages are not widely applicable. Such benefits are seen in very specific workloads. Nevertheless, your workload may be one that does benefit from hugepages so it is worth knowing about.
Hugepages have several interesting properties. Two of these properties can be quite useful, and though I mentioned them in my previous post, allow me to expand upon each.
Property: Each hugepage is, as the name implies, large. Its size is a multiple of the standard page size.
Benefits: It goes without saying that you can fit more in a hugepage than a standard page. What you might not be thinking is how this improves the performance of TLB lookups. With hugepages, finding an address is faster as fewer entries are needed in the TLB to provide memory coverage.
Drawbacks and Other Observations: You should only use hugepages if the workload you are performing will use the space in the hugepages. Otherwise you will end up with severely fragmented memory. It should also be noted that the speedup seen in the TLB look up will not be the same on each machine. Let's take a look at an example comparing two hypothetical machines, A and B. For simplicity let's assume that page size is the same on both machines, while the hugepage size is larger on Machine A. On Machine A, 10 hugepages takes up 40% of memory. On Machine B 10 hugepages take up only 20% of memory. This means that to achieve memory coverage, more TLB entries will be needed on Machine B than for Machine A. Therefore, the benefits of hugepages would be seen more on Machine A than they would on Machine B
Property: Hugepages are pinned to memory. This means they do not get swapped in and out from RAM.
Benefits: The overhead for paging in and out is effectively eliminated. This provides a significant boost in performance.
Drawbacks and Other Observations: Main memory can fill up pretty quickly, and if you aren't swapping pages in and out, your perceived memory size will be much much smaller. Place that on top of under utilized hugepages and you have a real disaster on your hands! That being said, you do not have to only use hugepages on your host machine, nor would I necessarily recommend that you do. Your host can have a portion of memory backed by hugepages, leaving the rest to be backed by normal pages.
It is difficult to find/think of use cases which are general enough to be applicable to a wide audience, but specific enough that they could be utilized. So while the following section may be helpful, my advice is that you keep the properties and benefits mentioned above in the back of your head, considering their applicability on new workloads. Keep in mind that these are all strictly theoretical and should be taken with a grain of salt.
In broad terms, what you are looking for are workloads that require large chunks of data often. For example, perhaps a chunk of code needs to be accessed very frequently but it's too large to fit in a single standard page. While you could of course disperse this chunk of code across several pages, the opportunity to benefit from hugepages could be taken, potentially improving performance.
In a more specific case, imagine you have a database of books. Each record has title field, author field, etc. However, this database is unique in that each record has a field that contains the entire text of the book. For simplicity, assume that each book's text will use most of a hugepage but is unable to fit in a standard page. You write an application to utilize this database and want to bring the text of several books into memory so you can search them for a certain phrase.
Let's consider solving this problem with standard pages. Searching can take a bit of time, and in that time a chunk of one of the books gets marked for paging by the operating system. Once you reach that chunk of text to search, you now need to go out to the disk, find the page, etc. Additionally, the text of a book is quite large and will need to be dispersed across several standard pages.
By instead using hugepages, you reduce the overhead of checking the TLB as each book only requires one entry. As well, you are assured that the book's full text is in memory which means you won't incur the overhead of paging.
Finally, I'd like to direct you to a success story involving hugepages.
Once again, expect to see hugepage support for KVM guests in OpenStack's Juno release. Also look out for a post on using this feature on Power8 machines around release time. There, I'll be posting step by step instructions for creating guests with hugepages with OpenStack . If you are interested in keeping up with the development of this feature, or simply want more information, I highly suggest that you look through the blueprint (Click here to check it out).
Hopefully this post helped you further understand this interesting feature. Who knows, maybe the next time you need to optimize your machine for a certain problem, Hugepages will come to your rescue?