Controlling symbol visibility for shared libraries
Part 1 - Introduction to symbol visibility
Content series:
This content is part # of # in the series: Controlling symbol visibility for shared libraries
This content is part of the series:Controlling symbol visibility for shared libraries
Stay tuned for additional content in this series.
What is symbol and symbol visibility
Symbol is one of the basic terms when talking about object files, linking, and so on. In fact, in C/C++ language, symbol is the corresponding entity of most user-defined variables, function names, mangled with namespace, class/struct/name, and so on. For example, a C/C++ compiler may generate symbols in an object file when people define non-static global variables or non-static functions, which are useful for the linker to decide if different modules (object files, dynamic shared libraries, executables) would share same data or code.
Though both variables and functions may be shared among modules, variable sharing is more common among object files. For example, a programmer may declare a variable in a.c:
extern int shared_var;
And, define it in b.c:
int shared_var;
Thus, both symbol shared_var
appears in compiled object a.o,
b.o, and symbol in a.o may share the address of b.o finally after linker’s
resolution. However, it is rare that people make variables shared amongd
shared libraries and executables. And for such modules, it is very common
to make only functions visible to the others. Sometimes we call such
functions API, as the module is deemed to provide such interfaces for
others to call into. We also say such symbols are exported since
it is visible to the others. Notice that such visibility only takes effect
at dynamic linking time since shared libraries are commonly loaded as part
of memory image at program runs. Therefore, symbol visibility
comes to be an attribute for all global symbols for dynamic linking.
Why need to control symbol visibility
On different platforms, the XL C/C++ compiler might choose either to export all the symbols in modules or not. For example, when creating Executable and Linking Format (ELF) shared libraries on the IBM PowerLinux™ platform, by default, all the symbols are exported. While creating an XCOFF library on AIX that runs on the POWER platform, current XL C/C++ compiler may choose not to export any without the assistance of a tool. And there are still some other ways to allow a programmer to determine symbol visibility one by one. (That is what we will introduce in the next part of this series.) However, generally it is not recommended to export all the symbols in modules. Programmers can just export symbols as needed. This does not only benefit library security, but also benefits dynamic linking time.
When programmers choose to export all symbols, there exists a high risk to get symbol collision at linking time, especially when modules are developed by different programmers. Because symbol is a low-level concept, it does not get scope involved. As soon as one links against a library with the same symbol names as that of yours, the library might accidentally overwrite your own symbols as linker’s resolution is done (hopefully there is some warning or error information given). And, in most cases, such symbols are never expected to be used from the library designer’s perspective. Therefore, creating only limited, (characterized by careful thought) meaningful names for the symbols can help a lot on such issues.
For C++ programming, nowadays there is a growing requirement for performance. However, due to dependencies against other libraries and using of specific C++ features such as templates, compiler/linker tend to use and generate a huge amount of symbols. Therefore, exporting all symbols slows down the program and costs massive memory. Exporting limited number of symbols can reduce the loading and linking time for dynamic shared libraries. Furthermore, it also enables optimization from the compiler’s perspective, which means more efficient code could be generated.
The above drawbacks of exporting all symbols explain why defining symbol visibility is mandatory. In this article, we provide solutions to make symbols in the dynamic shared object (DSO) be under control. Users can identify different ways to solve the same problem, and we also propose which one should be preferred on a specific platform.
Ways to control symbol visibility
In the discussions below, we will make use of the following C++ code snippet:
Listing 1. a.C
int myintvar = 5; int func0 () { return ++myintvar; } int func1 (int i) { return func0() * i; }
In a.C, we define one variable myintvar
, and two functions
func0
and func1
. By default, when creating a
shared library on the AIX platform, the compiler and linker along with the
CreateExportList tool would make all three symbols visible. We can check
it from the Loader Symbol Table Information with the dump
binary tool:
$ xlC -qpic a.C -qmkshrobj -o libtest.a $ dump -Tv libtest.a ***Loader Symbol Table Information*** [Index] Value Scn IMEX Sclass Type IMPid Name [0] 0x20000280 .data EXP RW SECdef [noIMid] myintvar [1] 0x20000284 .data EXP DS SECdef [noIMid] func0__Fv [2] 0x20000290 .data EXP DS SECdef [noIMid] func1__Fi
Here, “EXP” means the symbol is “exported”. The function names
func0
and func1
are mangled with C++ mangling
rules. (However, it is not hard to guess.) The -T
option of
dump
tool shows the Loader Symbol Table Information, which
would be used by the dynamic linker. In this case, all the symbols in a.C
are exported. But from the perspective of a library writer, we may want to
export only func1
for this case. Global symbol
myintvar
and function func0
are deemed as
keeping/changing internal status only, or say just locally used. Thus
making them invisible is important for the library writer.
We may at least have three ways to achieve this goal. This include: Using the static keyword, defining the GNU visibility attribute, and using an export list. Each of them has unique functionality and (may be) drawbacks as well. We shall look into them now.
1. Using the static keyword
The static keyword in C/C++ may be an overloaded keyword as it can specify both the scope and the storage for the variable. For scope, we may say that it disables the external linkage for the symbol in file. That means that the symbol with the keyword, static would never be linkable as the compiler does not leave any information for the linker about this symbol. It is a language-level control and it is the simplest way to hide the symbol.
Let us add the static keyword to the above case:
Listing 2. b.C
static int myintvar = 5; static int func0 () { return ++myintvar; } int func1 (int i) { return func0() * i; }
When we generate the shared library and look in to the Loader Symbol Table Information again, it works as expected:
$ xlC -qpic a.C -qmkshrobj -o libtest.a $ dump -Tv libtest.a ***Loader Symbol Table Information*** [Index] Value Scn IMEX Sclass Type IMPid Name [0] 0x20000284 .data EXP DS SECdef [noIMid] func1__Fi
Now, only func1
is exported as the information shows. However,
though the static keyword can hide the symbol, it also defines an
extra rule that variables or functions can only be used within the file
scope where it is defined. Thus, if we define:
extern int myintvar;
Later, in file b.C you may want to build libtest.a from both a.o and b.o.
When you do so, the linker would display an error message stating that
myintvar
defined in b.C cannot be linked, because the linker
did not find a definition elsewhere. That breaks the data/code sharing
inside the same module, which the programmer would generally require.
Thus, it is more used as a visibility control of variables/functions
inside the file, rather than for visibility control of low-level symbols.
In fact, most of them would not rely on the static keyword to
control symbol visibility. Therefore, we can consider the second
method:
2. Defining the visibility attribute (GNU only)
The next candidate to control symbol visibility is to use the visibility attribute. The ELF application binary interface (ABI) defines the visibility of symbols. Generally, it defines four classes, but in most cases, only two of them are more commonly used:
STV_DEFAULT
- Symbols defined with it will be exported. In other words, it declares that symbols are visible everywhere.STV_HIDDEN
- Symbols defined with it will not be exported and cannot be used from other objects.
Notice that this is an extension for GNU C/C++ only. Thus currently, PowerLinux customers can use it as GNU attribute for symbols. Here is an example for our case:
int myintvar __attribute__ ((visibility ("hidden"))); int __attribute__ ((visibility ("hidden"))) func0 () { return ++myintvar; } ...
To define a GNU attribute, you need to include __attribute__
and the parenthesized (double parenthesis) content. You can specify the
visibility of symbols as visibility(“hidden”)
. In the above
case, we can mark myintvar
and func0
as
hidden
visibility. This doesn not allow them to get exported
in the library, but can be shared among source files. In fact, the hidden
symbols would not appear in the dynamic symbol table, but is left in the
symbol table for static linking purpose. That is a well-defined behavior
and can definitely achieve our goal. It obviously surpasses the
static keyword solution.
Notice that, for the variable specified with the visibility
attribute, declaring it as static might confuse the compliler. As
a result, the compiler would display a warning message.
The ELF ABI also defines other visibility modes:
- STV_PROTECTED: The symbol is visible outside the current executable or shared object, but it may not be overridden. In other words, if a protected symbol in a shared library is referenced by an other code in the shared library, the other code will always reference the symbol in the shared library, even if the executable defines a symbol with the same name.
- STV_INTERNAL: The symbol is not accessible outside the current executable or shared library.
Notice that currently, this method is not supported by the XL C/C++ compiler, yet even on the PowerLinux platform. But still, we have other way out.
3. Using the export list
The above two solutions can take effect at the source-code level and only require the compiler to make the functionality achieved. However, it is essential for users to have the ability to tell the linker to perform similar work as symbol visibility gets involved mainly in dynamic linking. The solution for the linker is the export list.
The export list would be generated by the compiler (or related tools, such as CreateExportlist) automatically at the time of creating the shared library. It can also be written by the developer manually. An export list is passed into and treated as input for the linker by the linker option. However, as the compiler driver would do all trivial work, the programmer seldom takes much care of very detailed options.
The idea of the export list is to explicitly instruct the linker about the symbols that can be exported from the object files through an external file. GNU people named such an external file as “export map”. We can write an export map for our case:
{ global: func1; local: *; };
The above description tells the linker that only the func1
symbol is going to be exported, and other symbols (matched by *) are
local. The programmer can also explicitly list func0
or
myintvar
as local symbols (local:func0;myintvar;). But
obviously, catch-all (*) is more convenient. And generally speaking, using
the catch-all(*) case to mark all the symbols as locals and only picking
out the ones that need to be exported is highly recommended because it is
safer. It avoids users forgetting to keep some symbols local and also
avoids duplication in both lists, which may cause an unexpected
behavior.
To generate a DSO with this method, the programmer has to pass the export
map file with the --version-script
linker option:
$ gcc -shared -o libtest.so a.C -fPIC -Wl,--version-script=exportmap
Reading the ELF object file with the readelf
binary ultility
together with the -s
option:
readelf -s mylib.so
It would show that only func1
is globally visible for this
module (entries in section .dynsym), and other symbols are hidden as
local.
For the IBM AIX OS linker, a similar export list is provided. To be exact, the export list is called the export file on AIX.
Writing an export file is simple. The programmer just needs to put the symbols that are needed to be exported into the export file. In our case, it is just as simple as shown below:
func1__Fi // symbol name
Thus, when we specify the export file with a linker option, the only symbol we want to export is added into the “loader symbol table” for XCOFF, while the others are kept as un-exported.
And for AIX 6.1 and above version, programmer may even append a visibility attribute to describe the visibility of symbols in the export file. The AIX linker now accepts 4 of such visibility attribute types:
export
: Symbol is exported with the global export attribute.hidden
: Symbol is not exported.protected
: Symbol is exported but cannot be rebound (preempted), even if runtime linking is being used.internal
: Symbol is not exported. The address of the symbol must not be provided to other programs or shared objects, but the linker does not verify this.
The distinctions between export
and hidden
are
obvious. However, the distinctions between exported
and
protected
are subtle. We will continue to talk about symbol
preemption in the next section with better description.
Anyway, the above four keywords are available in the export file. By appending them (with a blank) to the tail of symbol, it will provide different granularity controlling of symbol visibility. In this case, we can also specify symbol visibility (on AIX 6.1 and later versions) as shown below:
func1__Fi export func0__Fv hidden myintvar hidden
This informs the linker that only func1__Fi
(that
is, func1
) will be exported, and others will not be
exported.
You may notice that, unlike the GNU export map, the symbols listed in the export file are all mangled names. Mangled names do not look so friendly because the programmer may not be aware of the rule of mangling. But, it does help the linker to quickly do name resolution. To close this gap, the AIX OS chooses to utilize a tool to help programmer.
To be short, if the programmer specifies the -qmkshrobj
option
while invoking the XL C/C++ compiler, the compiler driver invokes the
CreateExportList
tool to generate the export file that holds
the names of the mangled symbols automatically, after the compiler
successfully generates the object file. The compiler driver then passes
the export file to the linker to process the symbol visibility setting.
Considering this example, if we invoke:
$ xlC -qpic a.C -qmkshrobj -o libtest.a
The libtest.a library is generated with all the symbols exported (this is
default). Though it does not achieve our goal, at least the whole process
looks transparent to the programmer. And, the programmer can also choose
to use the CreateExportList
utility to generate the export
file instead. If you choose this way, you are now able to modify the
export file manually. For example, suppose the export file name you want
is exportfile, then qexpfile=exportfile
is the
option you need to pass to the XL C/C++ compiler driver.
$ xlC -qmkshrobj -o libtest.a a.o -qexpfile=exportfile
In this case, you can find out all the symbols as shown below:
func0__Fv func1__Fi myintvar
Based on our requirement, we can either simply remove lines with the
myintvar
, func0
, or append the
hidden
visibility keyword after them, and then save the
export file and use the linker option -bE:exportfile
to pass
the refined export file back.
$ xlC -qmkshrobj -o libtest.a a.o -bE:exportfile
That would finalize all the steps. Now the generated DSO will not have
func1__Fi
(that is, func1
) exported:
$ dump -Tv libtest.a ***Loader Symbol Table Information*** [Index] Value Scn IMEX Sclass Type IMPid Name [0] 0x20000284 .data EXP DS SECdef [noIMid] func1__Fi
Alternatively, the programmer can also use the
CreateExportList
utility to explicitly generate the export
file as shown below:
$ CreateExportList exportfile a.o
In our case, it works exactly as the one above.
For the new format on AIX 6.1 and later versions, appending the keyword for symbol visibility one by one might require more effort. However, the XL C/C++ compiler is planning to make some changes to make life easier for programmer. (Related information will be provided in the next part in this series.)
In the export list solution, all the information is kept in the export list and programmers do not need to change the source file. It separates the work of code development and library development. However, we might face an issue with such a process. As we keep the source file unmodified, the binary code compiler generated might not be optimal. The compiler misses the chance to optimize symbols that are not exported due to lack of information. It would either increase the binary size generated or slow down the process of symbol resolution. However, this is not a major issue for most of the applications.
The following table compares all the above solutions and makes the view centralized.
Table 1. Comparison of each solution
Solution | Advantage | Disadvantage |
---|---|---|
static keyword |
|
|
export list |
|
|
Specify visibility attribute |
|
|
Symbol preemption
As we mentioned above, there is a subtle distinction between the visibility
keywords export
and protected
. And the subtle
distinction is about symbol preemption. Symbol preemption occurs when the
symbol address resolved at link time is replaced with another symbol
address resolved at runtime (notice that runtime linking is optional on
AIX though). Conceptually, runtime linking would resolve undefined and
non-deferred symbols in shared modules after the program execution has
begun. It is a mechanism for providing runtime definitions (these function
definitions are not available at link time) and symbol rebinding
capabilities. On AIX, when the main program is linked with the
-brtl
flag or when preloaded libraries are specified with the
LDR_CNTRL
environment variable, the program is able to use
the runtime linking facility. Compiling with -brtl
adds a
reference to the dynamic linker to the program, which will be called by
the program's startup code (/lib/crt0.o) when the program begins to run.
Shared object input files are listed as dependents in the program loader
section in the same order as they are specified in the command line. When
the program begins to run, the system loader loads these shared objects so
that their definitions are available to the dynamic linker.
Thus, the functionality of redefining the items in shared objects at runtime is called symbol preemption. Symbol preemption is only possible on AIX when runtime linking is used. Imports bound to a module at link time can be rebound to another module at runtime. Whether a local definition can be preempted by an imported instance depends on the way the module was linked. However, a non-exported symbol can never be preempted at runtime. When the runtime loader loads a component, all the symbols within the component that have the default visibility are subject to preemption by symbols of the same name in components that are already loaded. Note that because the main program image is always loaded first, none of the symbols defined by it will be preempted (redefined).
A protected symbol is exported, but it is not preemptible. In contrast, an exported symbol is exported and can be preempted (if runtime linking is used).
For default symbols, there is a difference between Linux® and AIX. The GNU compilers and ELF file format define a default visibility, which is used for symbols that are exported and preemptible. This is similar to the exported visibility defined on AIX.
The following code takes the AIX platform as an example.
Listing 3. func.C
#include <stdio.h> void func_DEFAULT(){ printf("func_DEFAULT in the shared library, Not preempted\n"); } void func_PROC(){ printf("func_PROC in the shared library, Not preempted\n"); }
Listing 4. invoke.C
extern void func_DEFAULT(); extern void func_PROC(); void invoke(){ func_DEFAULT(); func_PROC(); }
Listing 5. main.C
#include <stdio.h> extern void func_DEFAULT(); extern void func_PROC(); extern void invoke(); int main(){ invoke(); return 0; } void func_DEFAULT(){ printf("func_DEFAULT redefined in main program, Preempted ==> EXP\n"); } void func_PROC(){ printf("func_PROC redefined in main program, Preempted ==> EXP\n"); }
In the above description, we defined func_DEFAULT
and
func_PROC
both in func.C and main.C. They have the same names
but with different behaviors. A function invoke
from invoke.C
will call func_DEFAULT
and func_PROC
in
sequence. We will use the following exportlist code to see if
symbols are exported and how they are exported.
Listing 6. exportlist
func_DEFAULT__Fv export func_PROC__Fv protected invoke__Fv
If you are using the linker version before to AIX 6.1, you may use a blank
space instead of export
, and the symbolic
keyword instead of the protected
keyword. The command for
building the libtest.so library and the main executable
are listed in the following code:
/* generate position-independent code suitable for use in shared libraries. */ $ xlC -c func.C invoke.C -qpic /* generate shared library, exportlist is used to control symbol visibility */ $ xlC -G -o libtest.so func.o invoke.o -bE:exportlist $ xlC -c main.C /* -brtl enable runtime linkage. */ $ xlC main.o -L. -ltest -brtl -bexpall -o main
Basically, we construct libtest.so from func.o and invoke.o. We use
exportlist
to set func_DEFAULT
from func.C and
func_PROC
from func.C as exported symbols, but still
protected. Thus libtest.so has two exported symbols and one protected
symbol. For the main program, we export all the symbols from main.C, but
link it to libtest.so. Notice that we use the -brtl
flag to
enable dynamic linking for libtest.so.
The next step is to invoke the main program.
$ ./main func_DEFAULT redefined in main program, Preempted ==> EXP func_PROC in the shared library, Not preempted
Here we see something interesting: func_DEFAULT
is the version
from main.C, while func_PROC
is the version from libtest.so
(func.C). The func_DEFAULT
symbol is preempted because the
local version (we say it is local because the calling function invoke is
from invoke.C, which is basically in the same module with
func_DEFAULT
from func.C) from libtest.so is replaced by the
one from another module. However, same condition does happen on
func_PROC
, which is specified as protected
visibility in the export file.
Notice that the symbol that can preempt others should always be exported.
Suppose we remove the -bexpall
option while building the
executable main, the output is as shown below:
$ xlC main.o -L. -ltest -brtl -o main; //-brtl enable runtime linkage. $ ./main func_DEFAULT in the shared library, Not preempted func_PROC in the shared library, Not preempted
Here no preemption happens. All the symbols are kept as same version in module.
In fact, to check if a symbol is exported or even protected at runtime, we
can make use of the dump
utility:
$ dump -TRv libtest.so libtest.so: ***Loader Section*** ***Loader Symbol Table Information*** [Index] Value Scn IMEX Sclass Type IMPid Name [0] 0x00000000 undef IMP DS EXTref libc.a(shr.o) printf [1] 0x2000040c .data EXP DS SECdef [noIMid] func_DEFAULT__Fv [2] 0x20000418 .data EXP DS SECdef [noIMid] func_PROC__Fv [3] 0x20000424 .data EXP DS SECdef [noIMid] invoke__Fv ***Relocation Information*** Vaddr Symndx Type Relsect Name 0x2000040c 0x00000000 Pos_Rel 0x0002 .text 0x20000410 0x00000001 Pos_Rel 0x0002 .data 0x20000418 0x00000000 Pos_Rel 0x0002 .text 0x2000041c 0x00000001 Pos_Rel 0x0002 .data 0x20000424 0x00000000 Pos_Rel 0x0002 .text 0x20000428 0x00000001 Pos_Rel 0x0002 .data 0x20000430 0x00000000 Pos_Rel 0x0002 .text 0x20000434 0x00000003 Pos_Rel 0x0002 printf 0x20000438 0x00000004 Pos_Rel 0x0002 func_DEFAULT__Fv 0x2000043c 0x00000006 Pos_Rel 0x0002 invoke__Fv
This is the output from libtest.so. We may find that
func_DEFAULT__Fv
and func_PROC__Fv
are all
exported. However, func_PROC__Fv
does not have any
relocations. It means that the loader may not be able to find a way to
replace the address of func_PROC
from TOC table. And the
address of func_PROC
in TOC table is where the function
invokes transfer control to. Therefore, func_PROC
does not
appear to be preempted. We then realize that it is protected.
Symbol preemption is rarely used in fact. However, it leaves a possibility that people replace the symbol dynamically at run time but also leave some security holes. If you do not want key symbols in your library to be preempted (but still need to export it for use), you need to make it protected for safety.
Acknowledgments
We would like to thank Dr. Jinsong Ji for reviewing and providing valuble suggestions on this article.
Downloadable resources
Related topics
- Get basic visibility concept from the GCC wiki. It also explains why visibility is useful and how to use the visibility attribute.
- Get to know how to work with GNU export maps, which enable GNU linker to properly set symbol visibility at linking time.
- Get to know the basic concept of Library on wiki, and some related concepts involved.
- Formal introduction to Symbols from the IBM AIX documentation.
- Formal introduction to Shared library and shared memory from the IBM AIX 6.1 documentation
- Details about the ld command from the IBM AIX 6.1 documentation.