A guide to using shims to deal with incompatible runtime environments



Gone are the days when software development was straight-forward. Innovations and development in the computing domain have made this field very complex. Now there is more emphasis on collaboration, with seamless integration between products that support multiple platforms and architectures. Products are increasingly exposing web-based APIs for integration with various other products. Web-based APIs reduce the complexity that arises from incompatibilities between the systems.

But for products that are designed as heavyweight, native applications and are dependent on operating system architecture, it is always quite complex and tricky to integrate applications of different architecture, such as a 32-bit or 64-bit application. Developers like to work with their favorite integrated development environments (IDEs) and do all of the operations from the same IDE interface. So it becomes essential to provide the integration between various products that interact with the IDE.


With the prevalence of 64-bit operating systems, many applications have become 64-bit compliant, but there is a large group of existing native applications that still run in 32-bit compatibility mode on 64-bit operating systems. While integrating 64-bit applications with 32-bit applications, we face numerous compatibility challenges relating to size of the data type, address, pointers, binary format, and runtime dependencies. These can include the size of data types (integers, for example), the size of the address space and pointers (32-bit or 64-bit), interfaces to APIs (for example, C++ STL), semantics of APIs, and so forth. The 64-bit applications cannot load the 32-bit libraries. Therefore, no direct integration is possible with the exposed APIs of the 32-bit application. Also, if shared objects (executable and libraries) in the same process space make different assumptions about any of these compatibility issues, even if the other process is of similar architecture (64-bit only), incompatibility problems will arise.


To solve the problem of runtime incompatibility, we need to maintain two separate runtime environments (thus, two separate processes) with an IPC (interprocess communication) mechanism between them. This allows any potential incompatibilities to co-exist, as long as each process is internally consistent.

Process separation can be achieved by the use of shim library that is as small as possible and with few external dependencies.

Shims are typically a thin layer used to resolve the compatibility issues when integrating two applications. A shim is a small dynamic library that transparently intercepts an API and, if required, redirects the operation elsewhere.

Figure 1. Shim library concept
Different-sized gears fitting together

This shim library creates a subprocess that performs the actual operations requested through the calling application (a 64-bit application) by calling into the third-party integrated application (32-bit application). This subprocess does not need to be compatible with the runtime environment of the calling application. The shim library will use an interprocess communication (IPC) mechanism to communicate with the subprocess. The calling application calls a function in the shim library. The shim library forwards the function call information to the subprocess and waits for the subprocess to return the results, which are then returned to the caller.

Figure 2. Shim library using IPC mechanism to communicate with the subprocess
Communication between 32- and 64-bit process, IPC
Communication between 32- and 64-bit process, IPC

We discuss two approaches that can be used to demonstrate the power of shim libraries. The goal of this article is to provide a guide to help you get started with creating a shim to integrate the 64-bit application with an existing 32-bit application. We also give you an overall idea of how to resolve incompatibilities with an application library that can't be loaded by the current process.

Case 1. Solution for UNIX or Linux operating systems

If the operating system used is UNIX or Linux, there are many forms of IPC available, such as shared memory, message queues, remote procedure call (RPC) But we prefer to choose the easiest form of IPC, namely pipe, which is very useful and can be used easily and effectively. A pipe provides a one-way flow of data and has two file descriptors associated for read and write. The 64-bit client application loads the shim library, which forks and creates a new process: a 32-bit application compiled with the 32-bit libraries of the other application. The shim becomes part of the 64-bit process, and both the processes communicate through the pipe. This is depicted in the diagram in Figure 3.

We define the steps and the code skeleton that you need to get started writing the client and server code. The shim library is part of the 64-bit application, so we are calling it the client here.

Client-side steps

Step 1. Define the pipes for two way communication

Figure 3. Shim library on UNIX and Linux systems, using pipes
IPC communication using pipes
IPC communication using pipes

The 64-bit process is the client and also the parent process here. The 64-bit application loads the 64-bit shim library in its address space. In shim library implementation, we define the two pipes that will be used for the two-way communication. One pipe will be used by the 64-bit client process to write into the pipe and used as input for the server process. The other pipe will be used by the 32-bit server process to write the output (result) and will be read by the 64-bit process at the other end.

Step 2. Spawn the new process

After we define the pipes, we use the fork system call to spawn a new child process. Although it is a child process spawned by a 64-bit process, it is, in essence, a different process. We can use execl to transform it into a new 32-bit process that will have its own address space, and we achieve process separation. As, you will see in later steps and examples that we pass the file descriptors of the pipes created in Step I to the 32-bit child process and use it for two-way communication between the parent and child processes.

Step 3. Initialize and close the unneeded ends of pipe

When the new child process is spawned, it gets a copy of the pipes. Therefore, we need to close the unneeded end of pipes in both the parent and child process.

Step 4. Load the server program in the child address space

The 64-bit process is called the client here, because it is the initiating process and also used by the end user. The 32-bit child process is called the server process, because it runs in the background, receives the function call requests from the 64-bit client process, processes calls, and returns the results.

Pseudo code example of parent-child communication and synchronization using pipes:

In this section, we show an example that provides psuedo code and explains how the end-to-end communication works.

Listing 1. Initiating the server process
/* define the pipes :*/

int pinf[2] = {-1, -1}; /* stdin of server */
int poutf[2] = {-1, -1};        /* stdout of server */
int pid;
char infd_s[4];
char outfd_s[4];

/* Create the Pipes */

if (pipe(pinf) < 0) {
 sprintf(msg_string, "Unable to create pipe pinf to subprocess\n");
 print_msg( msg_string );
 goto fail;

if (pipe(poutf) < 0) {
 sprintf(msg_string, "Unable to create pipe poutf to subprocess\n");
 print_msg( msg_string );
 goto fail;

/* spawn a new process for server */ ***+++

pid = fork();

/* Close the unneeded ends of pipe *//* Parent. */
if (pid > 0) { 
 pinf[PREAD] = -1;

 poutf[PWRITE] = -1;

 perrf[PWRITE] = -1;
 goto done;

 /* Child */

if (pid == 0) { 
 pinf[PWRITE] = -1;

 poutf[PREAD] = -1;

    perrf[PREAD] = -1;
memset(infd_s, '\0', 4);
 sprintf(infd_s, "%d", pinf[PREAD]);
 memset(outfd_s, '\0', 4);
    sprintf(outfd_s, "%d", poutf[PWRITE]);

(void)execl (server_path, server_path, infd_s, outfd_s, NULL);

Here, server_path is the path to the executable server process.

Now that you have seen how to create the pipe and how to launch the server, we will move to the core business: How to call server functions and read the result. The most important thing to remember is that the communication between the client and the server must be synchronized.

The basic flow is that when the 64-bit client application wants to call a function in the 32-bit application, it will call a function in the shim library. The shim library will dispatch the call to the 32-bit process through the pipe. The 32-bit server process receives the function request and interacts with the 32-bit third-party application's core libraries to process it, and it writes the result back to the pipe, which is read by the shim library and passed back to the 64-bit application.

Therefore, for each function call in the server, we will have a corresponding function in the shim library and a unique function identifier in the 32-bit server process.

Detailed steps

As an example, assume that the client needs to call a function with the following signature:

int foo(arg1, arg2, arg3)

For each function exposed by the server, keep a corresponding enumeration value. For example:

enum {
} function_identifier;

To call a function, there are multiple steps:

Step 1. Send a header packet

The send_header_msg function provides the example. The packet contains:

  • The function identifier
  • The number of arguments to be followed

Step 2. Send the arguments

The send_arg_msg function provides the example. Again, this is done in two steps:

  1. Send the size of the argument message.
  2. Send the argument string.

Step 3. Read the result

When the server processes the functions, it writes the result to the pipe. The client process then reads the result from the pipe.

Listing 2. Calling function
void send_header_msg( int function_id, int argc ) {
 int write_status = 0;
 int msg_ints[2];
 msg_ints[0] = function_id;
 msg_ints[1] = argc;

 write_status = write(pinf[PWRITE], &msg_ints[0], 2 * sizeof(int));

 if (write_status == -1) {
 print_error("Can't write to the pipe \n");

void call_function(
int function_id;
 String *args
    int i;
 int argc = 0;
 int arglen = 0;

 while (args[argc] != NULL) {

  /* send the header packet  */

 send_header_msg (function_id, argc);

  /* send parameters    */

 for (i = 0; i < argc; i++) {

void send_arg_msg(String msg) {
 int write_status = 0;
 int msg_size;
 if (msg != NULL) {
 msg_size = strlen(msg);
 /* Write the string argument size. */
 write_status = write(pinf[PWRITE], &msg_size, sizeof(int));

        if (write_status == -1) {
/* Write the string argument. */
 write_status = write(pinf[PWRITE], msg, msg_size);

void client_function() {
    String* args;
    /* initialize the args */
    call_function( FOO,args );

          /* Read the result */

 if (read(poutf[PREAD], &return, sizeof(int)) == -1) {
        print_error("Error reading pipe \n");



Server-side steps

Here, the 32-bit process is called the server. It is compiled and built with the 32-bit libraries of the 32-bit application with which we are integrating. The sever process keeps running all of the time, reads the pipe, and accepts function requests.

Depending on the function ID and the number of arguments, the server process keeps reading the pipe until all of the arguments of the functions are read. Then it processes the function request and writes the result to the other pipe, which is read by the client.

Listing 3. Structure of the server process
main(argc, argv)
    int argc;
    char* argv[];
    int op;
    int read_status = 0;
    int shutdown_received = 0;
    char **args;
    int retVal;
    int header_msg[MSG_HEADER_SIZE];
    int write_status;

    if (argc == 3) {
        sscanf(argv[1], "%d", &infd);
        sscanf(argv[2], "%d", &outfd);
        if (infd < 0 || outfd < 0 ) {
    } else {

/* Wait for writes from library and dispatch on commands received. */

    while (shutdown_received == 0) {
        argc = 0;

/* Read the header packet */

        read_status = read(infd, &header_msg[0], sizeof(int) * MSG_HEADER_SIZE);

/* The first field is the function id */

        if( read_status > 0 ){
            op = header_msg[0];
            stsFlag = 0;
        if (read_status == (MSG_HEADER_SIZE * sizeof(int))) {
            switch (op) {
              case FOO:
                argc = header_msg[1]; /* The second field is the number of arguments */
                args = (char **)stg_malloc((argc + 1) * sizeof(char *));
                /* Read all the arguments */
                get_msg_data_with_args(args, header_msg);
                args[argc] = NULL;
                /* Process the request, dispatch to proper function */
                retVal = FOO(args);
                /* write the result */
                write_status = write(outfd, &retVal, sizeof(int));
if (argc > 0) {
 for (i = 0; i < argc; i++ ) {
 if(args[i] !=NULL ){
void get_msg_data_with_args(
    char **args,
    int msg_hdr_data[MSG_HEADER_SIZE]
    int i;
    int arg_len = 0;
    int argc = 0;
    argc = msg_hdr_data[1];
    if (argc > 0) {
        for (i = 0; i < argc; i++ ) {
         /* Read string length */
            read(infd, &arg_len, sizeof(int));
            args[i] = (char *)malloc(sizeof(char) * (arg_len + 1));
            memset(args[i], '\0', arg_len+1 );
            memset(args[i], '\0', sizeof(char) * (arg_len + 1));
	/* Read the string */
            read(infd, args[i], arg_len);


Case 2. Solution for Windows operating systems

If the integration development is only on a Microsoft Windows platform, then COM (Component Object Model) technology is an automatic choice, because it provides a framework and also handles the IPC transparently.

The shim library that is loaded by the 64-bit Windows Explorer becomes the COM client. We write 32-bit out-of-proc COM server, which is compiled and linked with 32-bit libraries and handles all of the client calls.

The client just needs an interface pointer, and then it invokes methods of the COM object through the pointer.

Overview of the steps

  1. Define interfaces and the coClass that implements the interface: Write an Interface Definition Language (IDL)-containing interface, coClass-containing GUIDs, and interface definition.
  2. Compile the IDL.
  3. Generate proxy and stub DLL.
  4. Write code for the COM executable (.exe file).
  5. Register the .exe, proxy. and stub DLL.

Detailed steps

This section describes the steps mentioned above in more detail.

Step 1. Define interfaces and the coClass which implements the interface

We start by identifying all functionality that will be needed for integration, and then define the interface. The interface of a COM object declares the signature of the methods that will be used by the clients to communicate with the COM object.

All COM interfaces inherit an Iunknown interface, and each interface or coClass has a GUID, which is a 128-bit number. Use guidgen.exe to generate the GUIDs.

Listing 4. IDL file example
import "unknwn.idl";

//Interface IMyCustomInterface
	helpstring("My custom Interface"),
interface IMyCustomInterface: IUnknown
	HRESULT foo([in, string] char* szIn,
					[out] long* szOut
					) ;
	HRESULT bar([out] int* szOut);

// Component 1
coclass CoClass1
	[default] interface IMyCustomInterface;

Step 2. Compile the IDL

It is simple to compile it by using Microsoft Visual Studio and following these steps:

  1. Create an empty DLL project, and add the IDL file to the folder.
  2. Set the preprocessor definitions:
    • WIN32_WINNT=0x0500
  3. Add rpcrt4.lib to the linker's "additional dependency."
  4. Create a definition file for your project, and set it as the linker module definition file.
  5. Compile IDL and files generated out of IDL to create a DLL file.

Step 3. Generating proxy and stub DLL

For the communication between the client and the server, you need to get the parameters passed to a function from the address space of the client to the address space of the component. This is called marshalling.

Apart from this, you also need to unmarshall the data sent from the client. And then you need to marshal the data that the component sends back to the client, which will then be unmarshalled by the client.

In COM, this can be achieved by proxy and stub DLLs.

A component that acts like another component. The proxy must be DLL because it needs access to the address space of the client so that it can marshal data passed to the interface functions.
A proxy DLL that sits in the address space of the server.

In this case, a proxy will be a 64-bit DLL, because the client is 64-bit application; whereas, the stub will be compiled in 32-bit, because it will be part of the 32-bit out-of-process COM server.

Step 4. Write the COM.exe

The coClass must implement all of the methods in the IUnknown interface and the IMyCustomInterface interface.

Implement a class factory for each of the coClasses. If there are multiple COM classes, keeping a class factory template for the class factories will make the implementation small and will also help in maintenance. The class factory should implement all of the ICassFactory methods.

IClassFactory::LockServer is important in case of an out-of-process COM server.

Initially, when the client calls CoCreateInstance, the module count for the COM object is incremented. When the release is called, the module count will reduce to 0, and the server will shut it down. When the client needs the COM object again, the server will be restarted. This can be a performance bottleneck, and the client needs to cache the class factory pointer so that the server is not down. To achieve this, the client can call IClassFactory::LockServer(TRUE), which locks the COM server application open in memory, thus keeping it available. Now the server can shut itself down only when it receives the corresponding unlock call: IClassFactory::LockServer(FALSE).

The Reference counter is not useful, and AddRef and Release return only constants.

Winmain content structure

  1. CoInitialize
  2. Create an instance of the component's class factory
  3. CoRegisterClassObject
  4. CoResumeClassObjects
  5. CoRevokeClassObject
  6. CoUninitialize

Next, we explain the overall flow and then the required registry keys to make the integration work:

  • Our COM server runs as an EXE on the local machine, and the client runs as a different EXE.
  • The client calls CoCreateInstance to create an instance of the component in the local server.
  • The COM runtime looks for the server's CLSID key in the registry.
  • The file path at LocalServer32 denotes where the component is stored.
  • If the local server is not already running, the COM runtime launches it.
  • When the local server is launched, execution begins at WinMain, which enters the main STA with a default call to CoInitialize.
  • The local server creates an instance of the component's class factory and registers it with the API CoRegisterClassObject.
  • The class factory is a COM object that lives in the local server's main STA.
  • The COM runtime then pulls the pointer to the appropriate class factory off the process's registration table.
  • It is encapsulated by the CoGetClassObject and the functions that wrap it, and it invokes IClassFactory::CreateInstance.

Registry keys for Interface for 64-bit proxy




Registry keys for Interface for 32-bit stub




Registry keys for 64-bit proxy



Registry keys for 32-bit stub



Registry key for COM class






Both of the preceding solutions have been successfully implemented, and we've observed the following benefits:

  • The implementation is lightweight and has resolved the runtime incompatibilities on Windows, UNIX, and Linux operating systems.
  • It is a simple, transparent, and generic approach to implement IPC through the pipes mechanism.
  • Developers can work within their same IDEs to do operations supported by the third-party tools.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

ArticleTitle=A guide to using shims to deal with incompatible runtime environments