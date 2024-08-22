As we’ve seen, implementing our own version of the AddVectoredExceptionHandler API isn’t too involved. But more importantly, it didn’t really require us to interact with the kernel, aside from calling NtProtectVirtualMemory to change memory protections on the .mrdata section of NTDLL. Since all the information the process uses when calling Vectored Exception Handlers is stored within the process, it presents a great target as a threadless process injection technique.

What is threadless process injection? Ceri Coburn covered it in their 2023 talk at Bsides Cymru, “Needles Without the Thread.” Funnily enough, this talk came out just before I was about to give a talk at an internal IBM conference demonstrating my new injection technique that didn’t require an execution primitive.

To summarize, traditional process injection techniques require a way to:

Allocate memory in the remote process

memory in the remote process Write your code into the allocated memory

your code into the allocated memory Protect the memory in the remote process so that it’s executable

the memory in the remote process so that it’s executable Execute your code in the remote process

We can mix and match these primitives to get different techniques, and some techniques do not need all the steps. For example, if you allocate memory in the remote process as RWX, then you do not need to change the protection later. Or if you call NtMapViewOfSection then your memory is allocated and written into the remote process in the same step. But the one thing all traditional process injection techniques do require is a primitive for execution. This is typically CreateRemoteThread/QueueUserAPC/SetThreadContext (or their Nt function equivalents). As a result, these execution primitives are heavily scrutinized by security products for malicious use. Calling an execution primitive targeting unbacked memory in a remote process is a great way to get your beacon caught.

So, how about we skip the execution primitive entirely? With Vectored Exception Handlers, it works as follows:

Identify the VEH list in our local process, since the address will be the same in the remote process. Allocate/Write/Protect our shellcode into the remote process with your primitives of choice. Allocate space for a new Vectored Exception Handler struct in the remote process. Call EncodeRemotePointer to get an encoded pointer for the address where you wrote your shellcode. Allocate space for a pointer and an int in the remote process (we need these for the two reserved attributes of the VEH entry). Update the new VEH entry with valid Flink/Blink attributes, update the pointer and update the two reserved attributes to point to the memory you allocated previously. Check the IsUsingVEH bit in the remote process Process Environment Block (PEB) and set it, if necessary. Set a PAGE_GUARD trap on a region of memory that will be executed by the process.

The last step is the critical one that allows us to bypass the need for an execution primitive by triggering an exception in the remote process. There are a few ways to go about this but a PAGE_GUARD trap is, in my opinion, the best way. I’ve implemented injection techniques for both new and existing processes using PAGE_GUARD traps.

If you are spawning a new process, then you can spawn the process in a suspended state and set a trap at the entry point for the process. Typically, spawning a process in a suspended state and manipulating it will get you tagged for process hollowing behavior. However, since we are not writing to any .text sections or using any execution primitives, we shouldn’t get hit with this detection. But as always, test this in your lab.

Injecting into a running process is a bit more involved, but I’ve found the easiest way is:

Choose a thread in the process. Suspend the thread. Get the thread context. Set a PAGE_GUARD trap at the RIP of the thread. Resume the thread.

This technique can be a bit unstable if you are executing straight shellcode since it hijacks the thread, which can crash the process. I’ve found it more reliable to add some bootstrap shellcode that implements a proper Vectored Exception Handler that creates a new thread for your shellcode and then returns code execution back to the thread as normal. This local thread creation will not be subject to the same scrutiny as a remote thread creation.

The last consideration for either technique is whenever an error occurs in the process, your VEH will be called and your shellcode will execute. This can result in a whole bunch of beacons being created in one process and ultimately crashing it. I’ve found the solution to this problem is either the bootstrap shellcode mentioned above, which can check to ensure that the exception is a PAGE_GUARD trap, or to remove your Vectored Exception Handler from your newly spawned beacon. This can be done by executing a BOF to walk the VEH list, identify your handler (an encoded pointer to unbacked memory) and remove it through manual manipulation, or simply calling RemoveVectoredExceptionHandler on it.