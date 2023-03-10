23 min read
While next-generation AI and machine-learning components of security solutions continue to enhance behavioral-based detection capabilities, at their core many still rely on signature-based detections. Cobalt Strike being a popular red team Command and Control (C2) framework used by both threat actors and red teams since its debut, continues to be heavily signatured by security solutions.
To continue Cobalt Strikes operational usage in the past, we on the IBM X-Force Red Adversary Simulation team invested significant research and development efforts to customize Cobalt Strike with internal tooling. Some of our Cobalt Strike specific internal tools have public versions, such as “InlineExecute-Assembly”, “CredBandit”, and “BokuLoader”. In the last two years, given over-signaturing of Cobalt Strike, we restrict its use to simulating less sophisticated threat actors, and instead leverage other 3rd party and in-house C2 when performing more advanced red team exercises.
Through research and development efforts, we have found better operational success in advanced red team exercises with:
However, there are still a large amount of threat actors leveraging pirated copies of Cobalt Strike, and it remains important to be able to simulate these threat actors. For red teams willing to put in the research and development effort, they may still find operational success with Cobalt Strike while simulating these adversaries. Additionally, Cobalt Strike is a great learning tool, which can be leveraged by newcomers to get hands-on experience with a C2 framework through red team training courses.
As we continue to expand our C2 capabilities, we are sharing some insight into how we have built on the Cobalt Strike framework in the past, specifically by developing custom reflective loaders. It is also intended for defenders to understand how Cobalt Strike works to create more robust detections.
This blog post is the first of a series that serves as a primer, covering the basics of developing a Cobalt Strike reflective loader. As we progress through this series, we will build upon this foundation and reference this post.
By the end of this series, we aim to create a reflective loader that integrates with Cobalt Strike’s existing evasion features and even enhances them with advanced techniques not currently present in the tool. Future posts will delve deeper into the development of specific evasion features and how to implement them into our Cobalt Strike reflective loader.
To kick things off, this post will cover:
As we explore Cobalt Strike’s reflective loading through the lens of an offensive security tool developer, we’ll highlight opportunities for detections and evasions. Some development aspects will be omitted or simplified, and we encourage you to fill in the gaps by debugging existing reflective loader projects, rebuilding them from scratch, or seeking out training.
The Cobalt Strike C2 implant, known as Beacon, is a Windows Dynamic-Link Library (DLL), and the modular capability of using our own DLL loader in Cobalt Strike is known as the User-Defined Reflective Loader (UDRL).
Typically, the built-in Windows DLL Loader is responsible for loading DLLs into a process’s virtual memory space. The Windows DLL Loader exists primarily within user space, although it does cross over into kernel space when mapping DLLs from disk.
Using the Windows DLL Loader presents a few drawbacks when used during adversary simulations:
Therefore, using the Windows DLL Loader for loading our beacon DLL is not an ideal solution. To overcome these challenges, we load the beacon DLL from memory with a reflective loader.
The three main detection points reflective loading avoids are:
Reflective loading can be thought of as simply loading a raw DLL directly from memory, as opposed to loading it from the file system.
Reflective loading and the built-in Windows DLL Loader both serve the same purpose of loading a DLL from raw file format into the virtual memory space of a process. However, reflective loading has a key advantage over the Windows DLL Loader in that it doesn’t require the DLL file to exist on the file system. This in-memory loading allows for an unlimited number of chain loading phases, as the C2 implant DLL can be hidden within layers of encryption and encoding within the memory of the process.
A key concept to understand when loading a DLL, is knowing that the DLL will be formatted differently on disk versus in-memory. The main differences between the DLL in raw file format versus virtual address format are:
Raw File Format:
Virtual Address Format:
By examining a HTTP beacon DLL in the PE-Bear tool by Aleksandra Doniec, we see the differences between the raw and virtual addressing for each section of the DLL:
Table listing raw and virtual addresses of each section of the beacon DLL.
This HTTP/S beacon DLL is
PE-Bear provides a visual representation of our beacon DLL as it exists in raw file format versus virtual address space format:
Visual representation of beacon DLL in raw format (left) versus virtual format (right)
While not the wisest move to perform during an adversary simulation, dropping a raw beacon DLL with no obfuscation to disk and loading it with the Windows DLL Loader is a great way to demystify both beacon and DLL loading. Essentially, beacon is just a DLL. The Windows DLL Loader and a reflective loader just load a DLL into a process.
To load the beacon DLL with the Windows DLL Loader, we perform the following steps:
API to load our beacon DLL from disk.
LoadLibrary
First, we disable all of the Malleable PE options which make our beacon DLL unloadable by the Windows DLL Loader. To do this, we modify our Malleable C2 profile and disable Malleable PE evasion options located in the stage block:
Malleable C2 profile stage block modified to disable Cobalt Strike evasion features.
After modifying the profile, we restart the Cobalt Strike Team Server, supplying our
We connect to the Team Server with the Cobalt Strike client. Then we create a
Screenshot of creating a “raw stageless” beacon DLL from the Cobalt Strike Client
Using the below code, we create a C program named
Windows C code to load the beacon DLL from disk using the Windows DLL Loader.
We use the
As part of the loading process, the Windows DLL Loader will initialize our beacon DLL by calling its entry point with
After the Windows DLL Loader has loaded and initialized our beacon DLL to the virtual memory space of our process, we will need to again call the virtual beacon DLL’s entry point with the argument
Our program must know our virtual beacon DLL’s entry point to execute our virtual beacon DLL. This can be done dynamically within the program by parsing the virtual beacon DLL’s headers for the entry point Relative Virtual Address (RVA), or we can quickly look at what it is and hardcode the value.
For our proof-of-concept we will manually discover and hardcode our beacon DLL’s entry point RVA into our program. Using PE-Bear we discover that the RVA to beacon’s entry point is
Screenshot of finding the beacon DLL entry point RVA using PE-Bear
The
With our code ready to go, we compile our C program into a Windows executable:
Command used to compile our program.
By placing our beacon DLL and our executable beacon loader program in the same directory, the Windows DLL Loader will be able to discover our DLL as it performs its loading routine.
We place both
Beacon DLL and loader program placed in the same directory.
From our Windows desktop, we double-click our loadBeaconDLL.exe program and establish an active beacon connection to our Team Server.
Successful connect to C2 Team Server from beacon DLL loaded using the Windows DLL Loader.
Cobalt Strike uses a modified version of the Reflective Loader project by Stephen Fewer. This legendary in-memory DLL loader is over a decade old and has been used in Metasploit and other notable offensive security tools.
Over the years the Cobalt Strike reflective loader has been enhanced to handle all the Malleable PE evasion features Cobalt Strike has to offer. The major disadvantage to using a custom User-Defined Reflective Loader (UDRL) is that Malleable PE evasion features may or may not be supported out-of-the-box.
Some evasion features are fully implemented when using a UDRL, being patched into the beacon DLL by Cobalt Strikes Malleable PE engine on beacon payload creation. However, currently features like
must be handled by the UDRL, while others like
and
can be handled by beacon with proper UDRL integration.
The original Reflective Loader project requires compiling the
Then another project is responsible for:
Diagram of the original reflective loader, loading a DLL to virtual memory.
An alternative method is prepending the reflective loader to the DLL. This allows any unmanaged DLL to be loaded and does not require compiling the DLL from source code. This is a robust reflective loading method that can be capable of loading any PE file (EXE or DLL).
Diagram of a reflective loader prepended to a DLL, loading a DLL to virtual memory.
Cobalt Strike’s implementation of reflective loading uses a hybrid of the above two methods. This reflective loading method may be familiar to those with knowledge of how Metasploit’s Meterpreter does reflective loading.
Like the original reflective loader method, the
When a UDRL is loaded into Cobalt Strike, and an operator generates a beacon payload from the Cobalt Strike client, Cobalt Strike’s Malleable PE engine patches in the reflective loader shellcode at the raw file offset of the
When the Malleable PE engine completes the patching of the raw beacon DLL, the raw beacon DLL is given to the operator in an executable shellcode-like format.
Diagram of the Cobalt Strike reflective loader, loading the beacon DLL to virtual memory.
Looking at the initial bytes in the PE-Bear disassembler we can see that the beacon DLL itself is executable:
The call reflective loader stub shown as executable assembly operation codes.
The initial bytes
After executing optionally prepended
We confirm that the raw file offset for the
Screenshot of using PE-Bear to determine the raw file offset of the ReflectiveLoader export.
As it exists within the export directory, the address for the
To discover the raw file offset of the
The virtual and raw addresses for the
Raw and virtual addresses of the .text section of the beacon DLL.
The difference between the two is
We can confirm this in PE-Bear by right-clicking the
In summary, the Cobalt Strike reflective loading process flow is:
Diagram showing the main phases of how Cobalt Strike performs reflective loading of the beacon DLL.
Since our reflective loader is executed before the beacon DLL is loaded, the reflective loader code needs to be pure shellcode.
The easiest way of making complex shellcode is to write it in C with no external dependencies. Then the C file is compiled to an object file. Everything must be included in the
section of the object file. Finally, we rip out the
.text
Cobalt Strike’s Malleable PE engine will handle the work of getting the shellcode from our reflective loader object file and patching it into the raw beacon DLL at the raw file offset of the
Aggressor script to write reflective loader shellcode into the raw beacon DLL leveraging Cobalt Strike.
Our UDRL Aggressor script has Cobalt Strike write in our reflective loader shellcode by performing these steps:
<a href="https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics_aggressor-scripts/as-resources_functions.htm#extract_reflective_loader">extract_reflective_loader</a>Cobalt Strike Aggressor function will parse our UDRL object file from the
<a href="https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics_aggressor-scripts/as-resources_functions.htm#setup_reflective_loader">setup_reflective_loader</a>Cobalt Strike Aggressor function will use the Malleable PE engine to discover the raw file offset of our
Cobalt Strike has done the work for us regarding extracting the
.text section from our reflective loader object file, patching in our reflective loader shellcode, and calling our reflective loader with the call reflective loader stub located in the beacon DLL header.
These are the phases we must develop to reflectively load beacon:
There are several different methods we can use to discover the address for the raw beacon DLL in memory. Some methods are:
When using a method that hunts backward, we need to first get the current address of our thread’s Instruction Pointer (
Intel x64 assembly code to get the raw beacon DLL base address from the RDI register.
The original reflective loader project hunts backward for the MZ and PE headers. These headers have become detection points. To overcome this Cobalt Strike added the
The Cobalt Strike documentation states that the
When configured the
These bytes must be somewhat unique, or the reflective loader won’t be able to find them. Additionally, the bytes for the MZ header must be no-operation and executable. They cannot be values like
After discovering this potential detection point I developed a different, but similar method to find the raw beacon DLL’s base address. This method uses an egg hunter capable of searching backward from
The address
Since we don’t have easy access to the Java Malleable PE engine, the
Aggressor script to write an egg into the raw beacon DLL and display the changes in the Cobalt Strike script console.
The UDRL code must know the egg value written to the raw beacon DLL by the UDRL script. With the egg known, the egg hunter searches backward for two instances of the egg, as seen in the code below:
Intel x64 assembly code for an egg hunter which searches backward for two instances of a 64-bit egg.
Now that the MZ and PE headers are no longer used, we can nop them out in the UDRL Aggressor script:
Aggressor script to mask MZ, PE, and unused bytes of the DOS banner located in the raw beacon DLL’s headers.
There is also another, Cobalt Strike specific way, to discover the raw beacon DLL’s base address. As we saw above, the initial bytes in the call reflective loader stub store the raw beacon DLL’s base address in the
To examine this further in the debugger, we generate a beacon, prepend a breakpoint (
X64dbg screenshot of stepping through call reflective loader stub to see that the raw beacon DLL base address is saved in the RDI register before calling the reflective loader.
Below is a working example of how to get the raw beacon DLL’s base address from the call reflective loader stub:
Inline-assembly C code to get the raw beacon DLL base address from the RDI register.
With the base address of the raw beacon DLL, we can now get the values we need to load beacon into the virtual address space of the process.
The below table lists values we need from the raw beacon DLL’s headers, the locations we will find them at, and their types:.
Table listing values from the raw beacon DLL header which are useful for loading the beacon DLL.
Not all contents of the headers are required for loading the beacon DLL. Required values can be repacked or obfuscated. Values not required can be removed or randomized.
Once we know the
SizeOfImagef
Different methods can be used for allocating memory for the virtual beacon DLL. Different methods will use different types of memory. The different methods supported by the Cobalt Strike’s default reflective loader are:
Table showing Cobalt Strike memory allocation options for the virtual beacon DLL.
This can be taken a step further with UDRL. The NTAPI version of these functions can be used instead. Even further, the NTAPI functions could be called via direct or indirect system calls which may or may not help with bolstering evasion capabilities.
When the allocator method is set to
Code sample from BokuLoader project showing a direct system call is used to allocate memory for the virtual beacon DLL.
The below image shows a code example of using the HellsGate and HalosGate methods to determine the system call numbers:
Code sample from BokuLoader project showing how system calls are discovered from the process.
Now that we have allocated memory for our virtual beacon DLL, we need to copy beacon’s sections from their raw file offsets, as they exist in the raw beacon DLL, to the allocated memory at their relative virtual offsets.
If we allocated our memory with
READWRITE
section and its size. Before calling the entry point of the virtual beacon DLL we will need to change the memory protections of the
section to executable.
Allocating our memory with
makes the reflective loading process easier but increases chances of detection by security solutions.
Below is a simplified code example, from the BokuLoader project, which demonstrates this:.
Code sample from BokuLoader project showing sections copied from the raw beacon DLL to the virtual beacon DLL .
Some evasion features regarding loading sections are:
In the public BokuLoader project, the headers for the beacon DLL are not copied from the raw beacon DLL to the virtual beacon DLL. Currently the first
Another possible evasion opportunity is having the UDRL Aggressor script encrypt the sections. The sections could be decrypted in memory by the UDRL, using a key shared between the UDRL and the UDRL Aggressor script.
The x64 HTTP/S beacon relies on four DLLs to function properly. If these DLLs are not currently loaded into the process, our reflective loader will need to load them.
The four DLLs are listed in the HTTP/S beacon DLL’s import directory:
Screenshot from PE-Bear listing DLLs from the beacon DLL’s import directory.
The built-in Cobalt Strike reflective loader uses the kernel32.LoadLibraryA API for DLL loading.
DLL loading can be achieved in a variety of different ways, with different operational security considerations. Some methods are:
If the DLL already exists in the process, then the above Windows APIs can still be used to get the DLL base addresses, although this may trigger unwanted detection alerts.
Alternatively, the PEB holds a pointer to the
<a title="https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb_ldr_data" href="https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb_ldr_data">_PEB_LDR_DATA</a>
struct. Within, there is a linked list of all the DLLs loaded in the process and their relative information (
). BokuLoader leverages this to discover the DLL information, avoiding unnecessary API calls.
If the DLL does not exist in the
Nested reflective loading cannot easily be used to load DLL dependencies because reflective loaders generally do not register the DLL to the process. Code external to the DLL cannot properly use a reflectively loaded DLL. The DarkLoadLibrary project may be capable of properly loading a DLL into memory without triggering a kernel image load event.
Code sample from BokuLoader project showing how loaded DLL’s base addresses can be resolved by walking the InMemoryOrderModuleList.
With the required DLLs loaded into the process, the APIs listed in the import directory must be resolved. The API addresses will then need to be written to the virtual beacon DLL’s Import Address Table (IAT). This way beacon knows what address to jump to when it needs to call APIs such as
The import entry will either need to be resolved via the ordinal or name string.
In the image below, we see that the Cobalt Strike beacon DLL uses a combination of ordinals and name strings for import entries:
Screenshot from PE-Bear showing some import entries for beacon DLL must be resolved by ordinal.
The built-in Cobalt Strike reflective loader uses the
Some evasion methods to resolve API addresses are:
GetProcAddress
NTDLL.LdrGetProcedureAddress
BokuLoader uses a custom code implementation of
GetProcAddress
The
is capable of handling both name strings and ordinals as well. If the returned address for the Import Entry is a forwarder to another DLL, BokuLoader defaults to the
to resolve the forwarder.
While writing the IAT, hooking can be implemented by writing the virtual addresses of hook functions we have implemented rather than the intended APIs virtual address. As long as the expected output is returned to beacon when the address in the IAT is called, we can execute additional code before returning to beacon. Future posts and public BokuLoader releases will demonstrate how we can leverage IAT hooking for advanced evasion features.
With a recent release, the public BokuLoader project supports the
Malleable PE feature from the Cobalt Strike C2 profile with a custom implementation. By modifying the masking key in the
BokuLoader.cna
Regarding operational security, it is important to know that pattern matching engines are capable of brute-forcing single-byte XOR masks. Future posts will demonstrate how we can create our own Malleable PE engine using Cobalt Strikes Aggressor scripting functionality to obfuscate beacon to overcome pattern matching.
The beacon DLL has many relocations which must be resolved and written to the virtual beacon DLL’s Base Relocation Table before it is executed.
In PE-Bear we can see that the beacon DLL by default has the image base address of
Screenshot from PE-Bear showing image base address of the beacon DLL.
Before we start writing relocations, we need to calculate the delta between the base address of our virtual beacon DLL and the hardcoded base address.
For example, let’s pretend the base address for our virtual beacon DLL is
Next, to determine the virtual address for each relocation entry in the Base Relocation Table, we add the base address delta to the hardcoded relocation entry address to determine the relocation within our virtual beacon DLL.
In the below image we can see that beacons relocation entries are written backward in little-endian format:
Screenshot from PE-Bear showing some relocation entries exist in little-endian format.
The hardcoded address for this relocation entry is
We add this address to the base address delta, to get the virtual address for the relocation as it exists in the virtual beacon DLL:
For each relocation entry we will need to check that the type is
<a title="https://learn.microsoft.com/en-us/windows/win32/debug/pe-format" href="https://learn.microsoft.com/en-us/windows/win32/debug/pe-format">IMAGE_REL_BASED_DIR64 (0xA)</a>
. If this is false we will skip writing the relocation.
Once we determine the virtual address of the relocation as it exists within the virtual beacon DLL, we write it to the memory space which holds the hardcoded relocation entry address.
If you are interested in learning more about how to do PE relocations, check out the doRelocations function code in the public BokuLoader project. Before releasing this blog post, I changed the relocations code from assembly to hopefully human-readable C code, to assist others wanting to know the technical details of how this is done.
Executing beacon can be broken down into three steps:
If the memory we allocated for our virtual beacon DLL is
If we allocated our virtual beacon memory as non-executable (
In the public BokuLoader project, memory protections changes are performed by direct system calls to
Code sample from BokuLoader project demonstrating changing the .text section of the virtual beacon DLL to executable.
The
For the virtual beacon DLL to run properly, it must first be initialized by calling the virtual beacon DLL’s entry point. The first argument is the base address of the virtual beacon DLL. The second argument is the
Code sample from BokuLoader project initializing the virtual beacon DLL.
After initializing the virtual beacon DLL, we can either return the entry point of virtual beacon to the call reflective loader stub, or we can call virtual beacon DLL’s entry point in our UDRL with the
Unlike a typical DLL where the first argument
<a href="https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain">DLLMAIN</a>
would be the base address of the virtual DLL, beacon expects the base address of the raw beacon DLL. If this is not supplied, some Malleable PE evasion features may fail.
Code sample from BokuLoader project showing two different ways to execute the virtual beacon DLL.
Hopefully this blog post helps both red teams and blue teams better understand Cobalt Strike and the reflective loading process. There are still tons of evasion opportunities that can be implemented through reflective loading. With a deeper understanding of these concepts, organizations can better prepare themselves for a successful defense against cyber threats.
Future posts in this series will focus on integrating UDRL with current Cobalt Strike evasion features, dive into undocumented evasion features already present in the public BokuLoader, as well as advanced features that have not yet been released to the public. Stay tuned for more in-depth information and techniques to learn how to take your Cobalt Strike game to the next level with UDRL development!