Memory overlays and addressing errors are difficult problems to diagnose and service. Growing software size and complexity further complicate the situation. Many software components in the AIX® operating system share the kernel address space.
POWER6™ processors and AIX Version 6.1 now have storage protection keys which the kernel extensions and device drivers can use to improve the reliability and serviceability of the system. In this article, learn about the new storage protection mechanism, and how to take advantage of storage protection keys to improve the Reliability, Availability, and Serviceability (RAS) characteristics of an existing device driver or kernel extension.
Storage protection keys and their support by the AIX kernel are new capabilities introduced with POWER6 processors and AIX Version 6.1. In this article, storage protection keys are also called storage keys or keys. Keys provide a context-sensitive storage protection mechanism for virtual memory pages. Software might use keys to protect multiple classes of data individually and to control access to data on a per-context basis. This differs from the older page protection mechanism, which is global in nature.
Storage keys are available in both kernel-mode and user-mode application binary interfaces (ABIs). In kernel-mode ABIs, storage key support is known as kernel keys. In user space, storage keys are called user keys. A kernel extension might have to concern itself with both types of keys.
A memory protection domain generally uses multiple storage protection keys to achieve additional protection. AIX Version 6.1 divides the system into four memory protection domains:
- Kernel public
- Kernel data that is available without restriction to the kernel and its extensions, such as stack, bss, data, and areas allocated from the kernel or pinned storage heaps.
- Kernel private
- Data that is largely private within the AIX kernel proper, such as the structures representing a process.
- Kernel extension
- Data that is used primarily by kernel extensions, such as file system buf structures.
- User
- Data in an application address space that might be using key protection to control access to its own data.
One purpose of the various domains, except for kernel public, is to protect data in a domain from coding accidents in another domain. To a limited extent, you can also protect data within a domain from other subcomponents of that domain. When coding storage protection into a kernel extension, you can achieve some or all of the following RAS benefits:
- Protect data in user space from accidental overlay by your extension.
- Respect private user key protection used by an application to protect its own private data.
- Protect kernel private data from accidental overlay by your extension.
- Protect your private kernel extension data from accidental overlay by the kernel, by other kernel extensions, and even by subcomponents of your own kernel extension.
Storage protection keys are not meant to be used as a security mechanism. Keys are used following a set of voluntary protocols by which cooperating subsystem designers can better detect, and subsequently repair, programming errors.
Degrees of storage key protection
A kernel extension might support storage protection keys to varying degrees, depending on its unique requirements. These degrees are:
- Key-unsafe kernel extension
- A key-unsafe kernel extension does not contain any explicit support for
storage protection keys. Extensions in this class are legacy code (older
code written without regard to storage key protection), which needs unrestricted
access to all memory.
It is the kernel's responsibility to ensure that legacy code continues to function as it did on prior AIX releases and on hardware without storage key support, even though such code might access kernel private data. - Key-safe kernel extension
- A key-safe kernel extension manages its access to memory, respecting the boundaries of the kernel and user domains. It does not directly reference either kernel private data structures or user space addresses. To become key-safe, the extension must explicitly select the existing memory domains which it intends to access. This protects the rest of the system from errors in the key-safe module.
- Key-protected kernel extension
- A key-protected kernel extension goes beyond key safe; it identifies and protects its own private data, as well as data in other domains from accidental access. This can be done by using a private kernel key, or by taking advantage of a shared key that you are already using.
A kernel extension that is either key-safe or key-protected is called key aware. To make a kernel extension key-aware, you must understand the kernel’s use of keys. To make a kernel extension key-protected, you must also define its private or semi-private data and how it uses keys to protect that data. A semi-private key might be used to share data among several related kernel extensions, while a private key would be for the exclusive use of a single extension.
Hardware storage protection with keys is supported beginning with POWER6 processors. The hardware architecture added two new items:
- Each virtual memory page has a storage protection key associated with it. This key is an integer between zero and 31 (the architected maximum; the maximum is smaller on current machines).
- Each processor has a 64-bit Authority Mask Register (AMR). The AMR comprises 32 pairs of bits, one pair for each storage protection key. Each pair contains a read and a write access bit that determines if the processor can read or write into virtual memory pages protected with the associated storage protection key.
If a program does not have sufficient access with the AMR to the key associated with the data being referenced, a data storage interrupt (DSI) exception occurs.
There are relatively few hardware keys available. The kernel provides an abstraction called kernel keys, which map many-to-one onto hardware keys. The limited number of hardware keys means that some degree of false sharing of data is unavoidable. Because many kernel keys might map to a single hardware key, users of such kernel keys are not protected from each other.
Key-unsafe kernel extension support
To continue running in a key-protected environment, legacy kernel extensions receive special support from the kernel. Any extension converted to use keys is still in an environment with a mixture of key-aware and key-unsafe functions. A key-aware kernel extension might call a service that is in a key-unsafe kernel extension.
When a key-unsafe function is called, the kernel must, in effect, transparently insert special glue code into the call stack between the calling function and the key-unsafe called function. This is done automatically, but it's worth understanding the mechanism, since the inserted glue code is visible in stack callback traces.
When legacy code is called, either directly by calling an exported function or indirectly by using a function pointer, the kernel must:
- Save the caller's current key access rights (held in the AMR).
- Save the caller's link register (LR).
- Replace the current AMR value with one granting broad data access rights.
- Proceed to call the key-unsafe function, with the link register set, so that the called function returns to the next step below.
- Restore the original caller's AMR and LR values.
- Return to the original caller.
Beyond that of the standard C call stack, an additional stack resource is
required, which must show no evidence of this tampering. The new resource is
called the AMR or context stack. The current context stack pointer is
maintained in the active kmstsave structure, which holds the machine state for the
thread or interrupt context. Use the mst kernel
debugger command to display this information. The context stack is automatically
pinned for key-unsafe kernel processes. The setjmpx and longjmpx kernel
services maintain the AMR and the context stack pointer.
When a context stack frame needs to be logically inserted between standard stack
frames, the affected function (actually, the function's traceback table) is
flagged with an indicator. The debugger recognizes this, and it is able to provide
you with a complete display for the stack trace. The inserted routine is named
hkey_legacy_gate. A similar mechanism is applied at
many of the exported entry points into the kernel, where you might observe the use
of kernel_add_gate and
kernel_replace_gate.
This processing adds overhead when an exported key-unsafe function is called, but only when the called function is external to the calling module. Exported functions are represented by function descriptors that are modified by the loader to enable the AMR changing service to front-end exported services. Intra-module calls do not rely on function descriptors for direct calls and, therefore, are not affected.
All indirect function pointer calls in a key-aware kernel extension go
through special kernel-resident glue code that performs the automatic AMR
manipulations described above. If you call out this way to key-unsafe functions,
the glue code recognizes the situation and takes care of it for you. Hence, a
key-aware kernel extension must be compiled with the
-q noinlglue option for glue code.
The kernel's data is classified into kernel keys according to intended use. A kernel key is a software key that allows the kernel to create data protection classes, regardless of the number of hardware keys available. A kernel keyset is a representation of a collection of kernel keys and the desired read or write access to them. Remember, several kernel keys might share a given hardware key. Most kernel keys are for use only within the kernel (the full list is in sys/skeys.h). Table 1 shows the kernel keys that are likely to be useful to kernel extension developers.
Table 1. Useful kernel keys
| KKEY_PUBLIC | This kernel key is always necessary for access to a program's stack, bss,
and data regions. Data allocated from the
pinned_heap and the
kernel_heap is also public. |
|---|---|
| KKEY_BLOCK_DEV | This kernel key is required for block device drivers. Their buf structs must be either public or in this key. |
| KKEY_COMMO | This kernel key is required for communication drivers. CDLI structures must be either public or in this key. |
| KKEY_NETM | This kernel key is required for network and other drivers to reference
memory allocated by net_malloc. |
| KKEY_USB | This kernel key is required for USB device drivers. |
| KKEY_GRAPHICS | This kernel key is required for graphics device drivers. |
| KKEY_DMA | This kernel key is required for DMA information (DMA handles and EEH handles). |
| KKEY_TRB | This kernel key is required for timer services (struct trb). |
| KKEY_IOMAP | This kernel key is required for access to i/o mapped segments. |
| KKEY_FILE_SYSTEM | This kernel key is required to access vnodes and gnodes (vnop callers). |
Because the full list of keys might evolve over time, the only safe way to pick up the set of keys necessary for a typical kernel extension is to use one of the predefined kernel keysets, as shown in Table 2 below.
Table 2. Predefined kernel keysets
| KKEYSET_KERNEXT | The minimal set needed by a kernel extension. |
|---|---|
| KKEYSET_COMMO | Keys needed for a communications or network driver. |
| KKEYSET_BLOCK | Keys needed for a block device driver. |
| KKEYSET_GRAPHICS | Keys needed for a graphics device driver. |
| KKEYSET_USB | Keys needed for a USB device driver. |
See sys/skeys.h for a complete list of the predefined kernel keysets. These keysets provide read and write access to the data protected by their keys. If you want just read access to keys, those sets are named by simply appending _READ (as in KKEYSET_KERNEXT_READ).
It is acceptable, though not required, to remove unwanted keys from copies of the sets above. For example, KKEY_TRB might be unnecessary because your code does not use timer services. Building a keyset from scratch by explicitly adding just the kernel keys you need is not recommended. The kernel's use of keys is likely to evolve over time, which could make your keyset insufficient in the future. Any new kernel keys defined that kernel extensions might need will be added to the basic predefined keysets above, ensuring that you'll hold these new keys automatically.
The mechanism described earlier that grants broad data access rights to a key-unsafe kernel extension is a type of protection gate called an implicit protection gate. If you make a kernel extension key-aware, you must add explicit protection gates, typically at all entry and exit points of your module. The gates are to ensure that your module has access to the data it requires, and does not have access to data it does not require.
Without a gate at (or near) an entry point, code would run with whichever keys the caller happened to hold. This is something that should not be left to chance. Part of making a kernel extension key-aware is determining the storage protection key requirements of its various entry points and controlling them with explicit protection gates. There are two kinds of gates to choose from at an entry point:
- Add gate
- Allows you to augment the callers keyset with your own.
Choose an add gate for a service where your caller passes pointers to data that could be in an arbitrary key. Since this data might be protected with a key that is not known to you, it is important that you retain the keys to ensure you can reference the data, while perhaps adding additional keys so that you can also reference any private data that you need. - Replace gate
- You switch to your own defined set of keys.
Choose a replace gate at an entry point that stands on its own—typically a callback that you have registered with the device switch table, for example. Your parameters are implicit, and you need to pick up the known keys that are necessary to reference this data.
The replace gate is also important in relinquishing the caller's keys; typically the kernel is your caller in these situations, and retaining access to kernel internal data would be inappropriate. The predefined kernel keysets described above should form the basis of the typical replace gate.
In both cases, the protection gate service returns the original AMR value, so you can restore it at your exit points.
If a directly called service is not passed any parameters pointing to data that might be in an arbitrary key, a replace gate should be used in preference to an add gate, as this is a stronger form of protection. Generally, calls within your module do not need gates, unless you want to change protection domains within your module as part of a multiple keys component design.
Protection gates might be placed anywhere in your program flow, but it's often simplest to identify and place gates at all the externally visible entry points into your module. However, there is one common exception. You can defer the gate briefly while taking advantage of your caller's keys to copy potentially private data being passed into public storage, and then switch to your own keyset with a replace gate. This technique yields stronger storage protection than the simpler add gate at the entry point. When using this approach, you must restore the caller's keys before public data can be copied back through a parameter pointer. If you need both the caller's and your own keys simultaneously, you must use an add gate.
To identify the entry points of your kernel extension, be sure to consider the following typical entry points:
- Device switch table callbacks, such as:
openclosereadwriteioctlstrategyselectprintdumpmpxrevoke
The config entry point for a device driver typically does not have a protection gate, but it includes the initialization necessary for protection gates, heaps, and so on for subsequent use by other entry points. The entry point configuring device instances would typically have protection gates.
- Struct ndd callbacks used in network drivers, such as:
ndd_openndd_closendd_outputndd_ctlnd_receivend_statusndd_trace
- Trb (timer event) handler
- Watchdog handler
- Enhanced I/O error handling (EER) handlers
- Interrupt handler (INTCLASSx, iodone, and offlevel)
- Environmental and power warning (EPOW) handler
- Exported system calls
- Dynamic reconfiguration (DR) and high-availability (HA) handlers
- Shutdown notification handler
- RAS callback
- Dump callback (for example, as set up using
dmp_addordmp_ctl(DMPCTL_ADD,…)) - Streams callback functions
- Process state change notification (proch) handlers
- Function pointers passed outside of your module
You generally need protection gates only to set up access rights to non-parameter private data that your function references. It is the responsibility of called programs to ensure the ability to reference any of their own private data. When your caller's access rights are known to be sufficient, protection gates are not needed.
Once you've determined the kernel key requirements for a protection gate, you
need to construct a hardware keyset (hkeyset_t) to pass
to a protection gate service. In this multi-step operation, you:
- Build a kernel keyset (
kkeyset_t), typically based on one of the exported kernel keysets such as KKEYSET_KERNEXT. - Perhaps add or remove individual kernel key access rights.
- Convert this kernel keyset to a hardware keyset.
Doing this work once, in advance, minimizes overhead at your protection gate. The creation of kernel and hardware keysets must be done in fully enabled process mode.
The services provided for constructing a kernel keyset and the equivalent hardware keyset are:
- kkeyset_create
- Creates an empty kernel keyset into which you subsequently place the required
kernel keys whose access rights you need. For example:
#include <sys/skeys.h> kerrno_t kerrno; kkeyset_t kset = KKEYSET_INVALID; kerrno = kkeyset_create(&kset);
- kkeyset_add_set
- The recommended next step, typically for future use in a replace gate, is to
add all the key access rights from one of the exported kernel keysets to your
new kernel keyset. If you do not need to add or remove any individual kernel
keys to or from your keyset, this completes the construction of your kernel
keyset. For example:
kerrno = kkeyset_add_set(kset, KKEYSET_KERNEXT);
- kkeyset_add_key
- If you need access to specific private data, add the required kernel key
to your keyset. You have the choice of adding the key with read, write, or read
and write access (KA_READ, KA_WRITE, or KA_RW flag passed with the kernel key
desired). For example:
kerrno = kkeyset_add_key(kset, KKEY_BLOCK_DEV, KA_RW);
- kkey_assign_private
- With a key-protected driver, you might want to choose at least one unique
private kernel key, and protect your data with this key (or keys). Kernel
extensions that are packaged with the base operating system can register keys
with unique names, visible in sys/skeys.h. The kernel also recognizes 32
predefined private keys. It is best to let the system pick one for you by
calling the
kkey_assign_privateservice.kkey_t my_key; kerrno = kkey_assign_private("myID", 0, /* instance # */ 0, /* always 0 */ &my_key);This service hashes the string passed as its first parameter to determine a private kernel key number, and then folds in the instance number. If you want two private keys, you should call this service twice with the same ID string—first with an instance number of zero, and then next with an instance number of one. To the extent possible,
kkey_assign_privatetries to return consecutive key instances that are most likely to map to distinct hardware keys. - kkeyset_remove_key
- Use this service to remove a well-known kernel key that you do not need from
your keyset. For example, if your extension does not use timer services:
kerrno = kkeyset_remove_key(kset, KKEY_TRB, KA_RW);
- kkeyset_remove_set
- This service removes all the keys in one keyset from another.
kerrno = kkeyset_remove_set(kset1, kset2);
- kkeyset_to_hkeyset
- This service converts the access rights defined in a kernel keyset into a
hardware keyset in the same format as the AMR. You'll use this hardware keyset
later in your protection gates, so it should be in globally visible public
memory that is typically contained in your program's bss or data region.
hkeyset_t my_hset; kerrno = kkeyset_to_hkeyset(kset, &my_hset);
- kkeyset_delete
- Once you've created a hardware keyset, there is no need to retain the kernel
keyset used in its construction. This service releases resources allocated by
kkeyset_create.
kerror = kkeyset_delete(kset);
The data inside a kernel keyset is kernel private data, protected by KKEY_KKEYSET. The services above are examples of a key-protected design, with protection gates granting access to KKEY_KKEYSET. The contents of a kernel keyset are deliberately opaque to you, and you are not expected to access this data other than through the services provided.
Use the following services to code your actual protection gates. You must use the XLC compiler to correctly compile functions calling these services, which are implemented using inline assembler language specified with #pragma.
- hkeyset_add
- This service implements an add protection gate, where the access rights in the
hardware keyset you provide augment those already present in the AMR.
hkeyset_t old_hset; old_hset = hkeyset_add(my_hset);
- hkeyset_replace
- This service provides a replace protection gate, where the access rights in
the hardware keyset you provide replace those in the AMR.
old_hset = hkeyset_replace(my_hset);
- hkeyset_restore
- This service complements the two above, and it is used at the exit points of
your function to restore the AMR to the value it had on entry, using the value
that the above services returned.
hkeyset_restore(old_hset);
As with the kernel keyset, the services creating and using hardware keysets deliberately hide the internal format of the hardware keyset from you (although the hkeyset_t is actually a long (64-bit) integer that you own and could examine). The best way to display hardware and kernel keysets is with the kernel debugger.
There is no remove gate service. Remember that multiple kernel keys can share a single hardware key, and removing one hardware key from the AMR has the (likely unfortunate) side effect of simultaneously removing many kernel key access rights. The only correct way to omit a kernel key from a hardware keyset is to omit it from the kernel keyset used to construct that hardware keyset.
The kernel debugger has some new and changed commands to help you with storage protection keys, as shown in Table 3.
Table 3. New and changed commands
| kkeymap | Displays the available hardware keys and the kernel keys that map to each. |
| kkeymap <decimal kernel key number> | Displays the mapping of the specified kernel key to a hardware key (-1 indicates that the kernel key is not mapped). |
|---|---|
| hkeymap <decimal hardware key number> | Displays all the kernel keys that map to the specified hardware key. |
| kkeyset <address of kkeyset_t> | Displays the kernel key access rights represented by a kernel keyset. The operand of this command is the address of the pointer to the opaque kernel keyset, not the kernel keyset structure itself. |
| hkeyset <64 bit hex value> | Displays the hardware key accesses represented by this value if used in the AMR, and a sampling of the kernel keys involved. |
| dr amr | Displays the current AMR and the access rights it represents. |
| dr sp | Includes the AMR value. |
| mr amr | Allows modification of the AMR. |
| dk 1 <eaddr> | Displays the hardware key of the resident virtual page containing
eaddr. |
| mst | Displays the AMR and context stack values. A storage key protection
exception is indicated in excp_type as
DSISR_SKEY. |
| iplcb | Displays the IBM processor-storage-keys property of a CPU, which indicates the number of hardware keys supported. This is in the /cpus/PowerPC section of the device tree. |
| vmlog | Shows an exception value of EXCEPT_SKEY for a
storage key violation. |
| pft | Displays the hardware key (labeled hkey) value
for the page. |
| pte | Displays the hardware key (labeled sk) value for
the page. |
| scb | Displays the hardware key default set for the segment. |
| heap | Displays the kernel and hardware keys associated with an xmalloc heap. |
If you want to make a key-safe kernel extension, you should:
- Decide which exported kernel keyset, if any, should be the basis for your module's keyset.
- Optionally, remove any unnecessary keys from your copy of this kernel keyset.
- Convert the kernel keyset to a hardware keyset.
- Place add or replace gates at or near all entry points (except driver
initialization) as needed to gain the specific data access rights required by
each entry point.
You'll no longer have direct access to user space by default, and you might need to address that issue at a later time.
- Place your restore gates at or near exit points.
- Link your extension with the new
–bRAS flag to identify it to the system as RAS-aware. - Do not specify inline pointer glue,
-q inlglue, as mentioned earlier.If you are compiling with the v9 or later xlC compiler, you must specify
–q noinlgluebecause the default has changed.
Your initialization or configuration entry point cannot, of course, start off with a protection gate whose underlying hardware keyset it must first compute. Only after setting up the necessary hardware keysets can you implement your protection gates. The computation of these keysets should be done only once (for example, when the first instance of an adapter is created). These are global resources used by all instances of the driver. Until you can use your new protection gates, you must be sure to reference only data that is unprotected, such as your stack, bss, and data regions.
If this is particularly difficult for some reason, you can statically initialize your hardware keyset to HKEYSET_GLOBAL. That initial value allows your protection gates to work even before you have constructed your kernel and hardware keysets, although they would grant the code following them global access to memory until after the hardware keysets have been properly initialized. If your extension accepts and queues data for future asynchronous access, you might also need to use HKEYSET_GLOBAL, but only if this data is allowed to be arbitrarily key-protected by your callers. Use of the global keyset should be strictly minimized.
If you want to be certain that a hardware keyset is not used unexpectedly, statically initialize it to HKEYSET_INVALID. A replace gate with this hardware keyset would revoke all access to memory and cause a DSI almost immediately.
Your protection gates protect the kernel's and other modules' data from many of the accidental overlays that might originate in your extension. It should not be necessary to change any of the logic of your module to become key-safe. But your module's own data remains unprotected. The next step is to protect your kernel extension.
Key-protected kernel extension
Making a kernel extension fully key-protected adds more steps to the port. You now must also:
- Analyze your private data, and decide which of your structures can be
key-protected.
You might decide that your internal data objects can be partitioned into multiple classes, according to the internal subsystems that reference them, and use more than one private key to achieve this. - Consider that data allocated for you by special services might require you to hold specific keys.
- Construct hardware keysets as necessary for your protection gates.
- Consider using read-only access rights for extra protection. For example, you might switch to read-only access for private data being made available to an untrusted function.
- Allocate one, or more, private kernel keys to protect your private data.
- Construct a heap (preferably, or use another method for allocating storage)
protected by each kernel key you allocate, and substitute this heap, or heaps,
consistently in your existing xmalloc and xmfree calls.
When substituting, pay particular attention that you replace use ofkernel_heapwith a pageable heap, and use ofpinned_heapwith a pinned one. Also be careful to always free allocated storage back to the heap from which it was allocated. You can use malloc and free as a shorthand for xmalloc and xmfree from thekernel_heap, so be sure to also check for these. - Understand the key requirements of services you call. Some might only work if you pass them public data.
You need to collect individual global variables into a single structure that can be xmalloc'd with key protection. Only the pointer to the structure and the hardware keyset necessary to access the structure need to remain public.
The private key or keys you allocate share hardware keys with other kernel keys, and perhaps even with each other. This affects the granularity of protection that you can achieve, but it does not affect how you design and write your code. You should write your code with only its kernel keys in mind, and do not concern yourself with mapping kernel keys to hardware keys for testing purposes. Using multiple keys requires additional protection gates, which might not be justifiable in performance sensitive areas.
When protecting your own data, it is generally easiest to create a shared keyed heap for use with xmalloc. You can then allocate memory that is protected by a selected key easily without having to deal with issues caused by varying page sizes. As with kernel keysets, your heap or heaps are a global resource that need only be created once during initialization or configuration.
Using a created shared heap is just as easy as using the existing
kernel_heap and pinned_heap,
which return unprotected (KKEY_PUBLIC) memory. You do not need to write special
code to check if hardware keys are supported. If they are not, the heaps you
create just return public storage.
The attributes of the heap you create are specified by filling out a
heapattr_t, which is a structure passed as a parameter
to heap_create. For example, using the private key
determined in the kkey_assign_private example, you
might create a pair of heaps: one to allocat pinned objects and the other to
allocate pageable ones. For example:
#include <sys/malloc.h>
heapattr_t heapattr;
heapaddr_t my_pinned_heap = HPA_INVALID_HEAP;
heapaddr_t my_pageable_heap = HPA_INVALID_HEAP;
bzero(&heapattr, sizeof(heapattr));
heapattr.hpa_eyec = EYEC_HEAPATTR;
heapattr.hpa_version = HPA_VERSION;
heapattr.hpa_flags = HPA_PINNED | HPA_SHARED;
heapattr.hpa_kkey = my_key;
kerrno = heap_create(&heapattr, &my_pinned_heap);
heapattr.hpa_flags = HPA_PAGED | HPA_SHARED;
kerrno = heap_create(&heapattr, &my_pageable_heap);
|
The heaps created above are both shared heaps, meaning that all heaps created this way actually allocate from the same underlying memory pools in the kernel.
An alternative, which is a little more complicated, is to use the private heap.
For a private heap, specify the HPA_PRIVATE flag instead of HPA_SHARED. In this
case, you're reserving a block of contiguous storage expressly for your
allocations. The block of storage is allocated for you, with the size being
whatever you specify in the hpa_heapsize field. The
minimum supported size for a private heap is 8M. Optionally, you can impose a
smaller limit on how much memory can be allocated from your heap using
hpa_limit. This would, for example, let you vary how
much storage is to be available from your private heap as a function of load,
number of processors online, and so on. The limit might be changed, as follows.
kerrno = heap_modify(my_pinned_heap,
XMHM_HEAP_LIMIT,
new_limit);
|
With a private heap, you have additional flexibility. All your data will be close together in storage, easier to find in a system dump, and perhaps less likely to be inadvertently overlaid by another part of the kernel. To set the heap size efficiently, you do have to know how much storage will be required. Using a private heap might also increase the memory footprint of the system.
Use of keyed heaps and affinity-aware heaps are mutually exclusive. When
xmalloc_srad is passed a keyed heap and kernel keys are
enabled, it returns memory that does not have particular affinity to the specified
srad. On systems that do not support keys,
xmalloc_srad continues to support affinity. (Allocation
of public storage from kernel_heap and
pinned_heap always support affinity.)
Private heaps might be garbage collected. Storage allocated from any heap is a real resource, so remember to clean up any created heaps during termination or unconfiguration. All storage allocated from a heap should be freed and the heap destroyed with:
kerror = heap_destroy(my_pinned_heap, 0); |
You must have current read and write access to the key associated with your heap
when calling xmalloc or xmfree. It's not necessary to hold this key to use
heap_create, heap_modify, or
heap_destroy.
Creating a keyed ldata storage pool
An ldata pool, like an xmalloc heap, might have a storage protection key associated with it.
int ldata_create(size_t element_size,
long initcount,
long maxcount,
kkey_t kkey,
ldata_t *ldatap);
|
In previous releases, it was required that the fourth parameter be zero, but it now is used to specify that a kernel key be applied to storage that later will be allocated. If you want to continue to have public storage in your pool, you might specify KKEY_PUBLIC for the kkey parameter, or simply leave it zero.
A keyed ldata pool remains affinity-aware.
You must have current read and write access to the key associated with your ldata
pool when calling ldata_alloc or
ldata_free. It is not necessary to hold this key when
calling the other ldata services:
ldata_create, ldata_destroy, and
ldata_grow.
While the xmalloc heaps described above are probably the more convenient approach, your kernel extension might already be using an allocated memory segment. You can set a default storage key on your segment with:
kerrno_t vm_setseg_kkey(vmid_t sid, kkey_t kkey); |
This service does not set a key on pages already allocated in the segment. It sets a default key for future allocations. Therefore, it is best to use this service immediately after creating a segment, before any pages exist in it. Only working storage and rmmap segments can be used, but they might contain pages of any supported size.
You can also directly set the key on an integral block of page-aligned pages in a working storage segment with:
kerrno_t vm_protect_kkey(void *addr,
size_t nbytes,
kkey_t kkey,
ulong flags);
|
The flags parameter would usually be zero, requiring write access to the memory being protected. To bypass this requirement, specify the VMPK_NO_CHECK_AUTHORITY flag. Memory can be unprotected by specifying the KKEY_PUBLIC public key. Remember that the AIX kernel supports multiple page sizes.
You should not use vm_protect_kkey to change the key
of xmalloc’d or ldata_alloc’d pages. When such pages are xmfree’d or ldata_free’d
and subsequently reallocated, they might cause storage key violations in unrelated
code, which is extremely difficult to debug.
It is best to use the services specifically designed to let the kernel access user space, such as copyin, copyout, uiomove, and so on. They have their own protection gates that enable appropriate access to user memory. If kernel extension code directly attaches to a user segment, it must acquire the appropriate user protection keys to avoid a DSI, since user space access is not included in the AMR by default. It is important to remember that user space applications might be using protection keys. You do not want your kernel extension to access protected user memory in violation of the application design of the user.
To add the current active user keyset to the kernel hardware keyset already in the AMR and then restore your state afterwards, use:
kerror = hkeyset_update_userkeys(&old_hset); kerror = hkeyset_restore_userkeys(old_hset); |
The above ensures that you have the current active user keyset represented in the AMR. Key protection is part of the virtual memory mapping; the user keys are only effective if user space is addressed with the natural mapping.
These services can only be used by process mode code, since the active user keyset only applies to the currently running thread.
The cross memory descriptor has been enhanced to contain an AMR value, allowing a keyset to be bound to a buffer and protecting memory references made by the bottom half of a device driver. The protection is applied by:
- xmemin
- xmemout
- xmemzero
- xmemdma64
- xlate_pin
- Related services
You can add the desired hardware keyset to a cross memory descriptor with:
kerrno_t xmsethkeyset(struct xmem *, hkeyset_t, 0); |
However, if you perform a temporary attach with
vm_att, xm_mapin, or
xm_att, you must separately activate your keyset for
the pages they have attached. You can use xmgethkeyset
to obtain the cross memory descriptor's keyset, and activate it with a normal
replace gate.
You can also place the current AMR in the cross memory descriptor automatically at xmattach time by specifying the new SYS_ADSPACE_ASSIGN_KEYSET segflag parameter. The existing SYS_ADSPACE segflag results in global access to memory, and the USER_ADSPACE segflag automatically uses the active user keyset.
When storage keys are not in use
You do not need to design two paths through your kernel extension to handle cases
where kernel storage keys either exist, or do not exist, on a system. All the
key-related services handle this for you. When a protection gate is called on a
system without kernel keys enabled, the calling code is dynamically patched to a
No OPeration (NOP) instruction, which essentially eliminates the overhead of the
coded gate. The return value from hkeyset_add, for
example, is not the previous AMR value as it would be if the kernel keys were
enabled. There's no harm in passing this value to
hkeyset_restore, since that call is not made.
You can use the __KKEY_ENABLED() macro in sys/systemcfg.h to test whether kernel keys are enabled at run time, if necessary.
Protection gates add overhead to the functions that contain them, whether they're the implicit gates used by key-unsafe extensions, or the explicit gates you use to make your extension key-aware.
If you make a key-safe extension by just adding the minimal entry and exit point protection gates, it might actually run a little bit faster than it otherwise would on a keys-enabled system, since explicit gates do not use the context stack. You must trade off granularity of protection against overhead as you move into the key-protected realm, though. Adding protection gates, for example, within a loop for precise access control to some private object might result in unacceptable overhead. Try to avoid such situations, where possible, in the framework of your specific key-protected design.
In general, you don't need to change the functional logic of a kernel extension to make it key-safe, or even key-protected, other than adding protection gates and reorganizing and moving data to key-protected pages. However, there are some restrictions and observations worth noting:
- You must hold the key necessary to access protected memory when calling xmalloc to allocate it or xmfree to release it.
- You must hold the key necessary to access protected memory when calling ldata_alloc to allocate it or ldata_free to release it.
- If 16M or larger pages are configured for the kernel heap, kernel keys are disabled.
- Most existing interfaces can accept parameters in key-protected storage, since
you hold the access rights to your protected storage at the time of the call.
Because a function call does not affect the AMR, these access rights are simply
inherited by the called function.
Some interfaces, however, place storage that you pass to them on to linked lists, such as when registering for a future callback. It is always safe to pass unprotected storage to such routines, although many have been updated to deal with the problem. - Callers of xmemdma64 that do not set either XMEM_ACC_CHK or XMEM_WRITE_ONLY disable both page and storage key protection.
- If you start a kproc from a key-aware kernel extension, the kproc is assumed
to be user-key aware. The kproc's initial function, set up with
initp, is called with only the kernel public key held, not with the keyset of the creator of the process. This is also true for the initial function set up withkthread_start. - If you pin or unpin your kernel stack, you must also pin or unpin the context
stack. Use the
pin_context_stackandunpin_context_stackkernel services. Key-aware kprocs are started with their context stack unpinned. - If you switch to a pre-pinned kernel stack, you must also pin the context stack. You cannot switch context stacks.
- The vm_release kernel service resets a page's key back to its segment's default.
- Any ioctls storage driver, such as scsidisk_ioctl, called from kernel space must pass an arg structure that is either public or in KKEY_BLOCK_DEV.
- Key-aware code must conform to the requirements imposed on kernel data. In
many cases, there is a well-known kernel key associated with such data. You must
hold this key when accessing protected objects passed to you by kernel callbacks.
You must not protect objects that you allocate with any key other than the one expected by the services you use. You must not return public items for subsequent allocation by a subsystem's protected allocator. The following data objects must either be in the key stated or public:Kernel key Structure name Function interfaces KKEY_TRB trb talloc,... KKEY_DMA d_handle,
eeh_handled_map_init, ...
eeh_init, …KKEY_NETM mbuf m_get, … KKEY_BLOCK_DEV buf bread, bwrite, clrbuf, iodone, iowait, geterror, devstrat, ddstrategy, … dkstat iostadd, iostdel KKEY_COMMO ndd ns_attach, ndd_open, … KKEY_IOMAP (iomap’d segment) KKEY_FILE_SYSTEM vnode vfs_vget, vnop_open, … gnode gn_opencnt, … file fp_open, fp_close, … - Some data objects can only be public:
- The struct cblock passed to or returned by:
- putcf
- putcb
- getcf
- The struct clist passed to:
- getcs
- putcs
- getc
- putc
- getcb
- putcb
- getcx
- putcx
- getcbp
- putcbp
- putcfl
- The struct cfgncb passed to:
- cfgnadd
- cfgndel
- The dr_dma_handler_t passed to:
- dr_register_dma_mapper
- dr_register_dma_mapperx
- dr_unregister_dma_mapper
- The key argument passed to:
- kext_service_register
- kext_service_unregister
- kext_service_request
- The struct busprt passed to:
- reg_display_acc
- unreg_display_acc
- grant_display_owner
- revoke_display_owner
- The struct cblock passed to or returned by:
- Key-safe path control modules (PCMs) must have access to KKEY_BLOCK_DEV.
- The
shutdown_notify_tused to register a shutdown notification routine might be key-protected with any key. - The struct watchdog used to register a watchdog timer might be key-protected with any key.
- The struct intr used to register an interrupt handler might be key-protected with any key.
- Event list headers might be key-protected with any key.
- The mstsave structures are public.
- Be aware of and use special-purpose allocators. For example, use talloc to allocate a struct trb, and do not xmalloc one yourself.
Be sure to test modified code in both key-enabled and key-disabled environments.
Remember that two kernel keys might map to the same hardware key on machines of today, but the kernel keys might map to unique hardware keys on a future machine. If the number of supported hardware keys increases, always re-test your key-protected code.
If you control your test machine, you can temporarily modify the mapping of kernel keys to hardware keys to make one kernel key the exclusive user of its hardware key. Testing with your new keys like this is an excellent way to find out if you missed placing any protection gates, since it eliminates all false sharing of your key. Using an exclusive key mapping is for testing purposes only. IBM does not support customers running in this mode. But here is how to adjust the mapping:
- If you have not already booted the kernel with the
–Iflag to cause the system to enter the kernel debugger immediately upon reboot, do so. - Determine the numerical number of the kernel key of interest. One way to do
this is to use
printfto display it during your code's initialization. Suchprintfs will only be effective once you have booted with the debugger enabled.Another possibility is to place a kdb breakpoint so that you can display the key value, for example, returned a call to
kkey_assign_private.kkey_assign_privatewill always return the same kernel key when called with the same parameters. - Reboot. Then, at the initial kdb prompt (this is the only valid time to do
this), change the value of the
skey_xkeyinteger from -1 to the hexadecimal value of your kernel key.For example, if you're using KKEY_PRIVATE1, which is kernel key number 43 (decimal), at the initial kdb prompt, enter what is in bold below. You must press Enter after typing each of your parts of this dialogue.
- At the kdb prompt,
KDB (0) >, entermw skey_xkey. - After kdb displays
skey_xkey+000000: FFFFFFFF =, enter 2B. - After kdb displays
skey_xkey+000004: 00000000 =, enter a dot (.). - Enter g at the kdb prompt.
KDB(0)> mw skey_xkey skey_xkey+000000: FFFFFFFF = 2B skey_xkey+000004: 00000000 = . KDB(0)> g
After the system boots, display the current mapping of kernel keys to hardware keys with the kdb
kkeymapcommand. You can activate kdb by typing^\(ctrl-backslash) on the machine's virtual console (/dev/vty0) and resume normal operation withgas above. Or, you can use kdb at the command line for this purpose. The selection of an exclusive key this way is only effective for the duration of the current boot. - At the kdb prompt,
In an emergency, if you cannot boot with kernel keys enabled, you'll need a way to reboot so you can fix a problem and try again. Some possibilities are:
- If you have booted the kernel with the
-Iflag as described above, you can modifyskey_kmodeto zero at the initial kdb prompt. This disables kernel keys for the duration of the current boot, and might bypass a key-related error. - If you have prepared alternate bootable media, as with the
mkcdcommand, you can boot from it to recover. - You can boot from original install media (including an available NIM server) in maintenance mode to recover.
Adding storage protection to a kernel extension or device driver can enhance the reliability of both the extension and the AIX kernel by potentially detecting memory overlays that are difficult to debug.
Learn
- Storage Protection Keys on AIX Version 5.3:
This paper talks about protection keys used in application space.
- Popular content:
See what AIX and UNIX content your peers find interesting.
- AIX and
UNIX:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
- New to AIX and UNIX?:
Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
- AIX Wiki:
A collaborative environment for technical information related to AIX.
- Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java™ technology
- Linux
- Open source
- Safari bookstore:
Visit this e-reference library to find specific technical resources.
- developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and
catch up with IBM technical experts.
Get products and technologies
- IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
- AIX —technical forum
- AIX 6 Open Beta
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools—technical
- Virtualization—technical
- More AIX and UNIX forums
Larry Brenner is a senior IBM AIX kernel developer in Austin, Texas, where he has worked for 12 years. He is team leader for the storage protection keys implementations in AIX releases 5.3 and 6. You can contact him at lbrenner@us.ibm.com.
Comments (Undergoing maintenance)





