What is a Compiler?

Authors

Staff Writer

IBM Think

Staff Editor

IBM Think

What is a compiler?

A compiler is a type of computer program that converts code from one programming language (the source language) into another programming language (the target language).

Compilers are used to transform high-level source code into low-level target code (such as assembly language, object code or machine code) while preserving the program functionality.

A critical tool for modern, practical computer programming, compilers enable programmers to work in human-readable high-level code and then convert their source code into executable target code. Compilers also help software developers create efficient executable programs with improved security, stability and portability. This is because compilers assist in identifying and addressing errors, thus creating portable executable applications.

Although all compilers convert high-level code into low-level, executable code, different types of compilers are used for different programming languages and applications. For example, a cross-compiler is used to produce code for a different type of CPU or operating system than the one on which it is running.

When the ideal compiler isn’t available, or hasn’t yet been built, a temporary bootstrap compiler is used for compiling a more permanent compiler that’s better optimized for compiling any specific programming language.

A brief list of other related software includes:

Decompilers work like reverse compilers and convert low-level code into high-level languages.
Source-to-source compilers (or transpilers) convert high-level code into other high-level languages.
Language rewriters convert formal code expressions into different forms without changing the language.
Compiler-compilers are used to make generic and reusable compilers or compiler components that can be incorporated into more project-specific purposes.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

How compilers work

In practice, using a compiler can be as simple as entering a command into a command line in any Linux (or equivalent) system, specifying the compiler executable file and the source files to be compiled. This command instructs the system to process the source code, compiling it into a target machine code and resulting in the requisite object files to produce an executable program.

Open-source compilers like the GNU Compiler Collection (GCC)—a robust C compiler collection commonly used to compile C code into C programs—or the alternative Clang are available on repositories like GitHub. Other compilers can be freely installed or purchased from a wide array of distributors. They can also be built into popular integrated development environments (IDEs), which bundle various utilities for software development, including text editors, API documentation and debugging tools.

Regardless of the specific compiler being employed, the process of compiling code involves passing the source code through various levels of analysis, optimization and ultimately code generation. Source code passes through the different analytical layers sequentially and is evaluated through each step in the process.

If the compiler recognizes any issues with the original source code, it might return an error message, prompting developers to address identified errors before proceeding with compiling the rest of the code. Generally, compilers proceed through the following steps:

Lexical analysis: The first step of compiling passes the source code through the compiler’s lexer, a program that transforms characters into meaningful units of language, such as keywords, identifiers and operators known. These units are known collectively as tokens. This step essentially prepares the source code for the next step by converting meaningful and important elements of the source code into the tokens the compiler can work with.
Syntax analysis: The second step in the compilation process sends the tokens from the lexer to the compiler’s parser. A parser is a program that checks code for syntactical errors and ensures that the source code properly follows the rules of the source language. If the parser detects no errors during parsing, it generates an abstract representation of the overall code structure called an Abstract Syntax Tree (AST).
Semantic analysis: After verifying the code syntax, a compiler carries out a semantic analysis on the parsed code to deduce the intended function of the source code. In this step, the compiler performs checks for logical errors like undeclared variables or incorrect operator usage.
Optimization: While not necessarily required to produce functioning code, optimization is an optional step common to many compilers to improve the overall performance of the compiled code. Optimization can identify and remove unnecessary code and result in faster, more efficient and more stable programs, as well as shorten the final debugging process.
Code generation: In the final step of the process, the compiler converts the AST into machine-readable code. The final output of code generation is an assembly language code that can then be converted into binary code and executed by the computer system.

AI Academy

Achieving AI-readiness with hybrid cloud

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

Go to episode

Three-stage compiler structure

Some compilers might not adhere strictly to the preceding structure. However, while some compilers might contain more or less steps, all phases of compilation can be ascribed to one of three stages: a front end, a middle end and a back end.

This three-stage structure enables compilers to take a modular approach. It allows combining multiple front ends for different languages with back ends for different CPUs, all while sharing the optimization capabilities of various applicable middle ends.

The three stages of a compiler entail the following distribution:

Front end: The front-end of a compiler includes aspects of lexical analysis, syntax analysis and semantic analysis. This stage verifies syntax and semantics according to the rules of the source language and can identify and pinpoint errors in the source code. Assuming no errors are found, the front end of the compiler converts the source code into an intermediate representation (IR)—a temporary lower-level conversion of the source code—for the middle end.
Middle end: The middle-end stage of a compiler performs various code optimizations on the IR independent of whatever CPU architecture is being targeted by the overall compilation process. By enacting optimizations on the source code independently of the target machine code, the compiler can apply generalized optimizations that can improve code performance across multiple versions. These improvements can be done regardless of the specific supported language or hardware architecture.
Back end: The back-end stage uses the output from the middle-end stage and might perform other CPU-specific optimizations and conversions. In this final stage of the compilation process, the compiler outputs target-dependent assembly code, including register allocations and instruction scheduling. The back-end stage typically results in machine code specialized for target operating systems and hardware.

Benefits of using a compiler

While compilers are not explicitly necessary for producing workable code, the wide variety and complexity of both coding languages and machine environments make compilers a practical necessity for creating executable software. These are the four main benefits of using software compilers.

Facilitate high-level language coding

High-level programming languages use syntax and keywords that are closer to spoken languages, making them much easier for developers to use. Compilers convert this human-readable code into the more complex machine code needed to run optimized software applications.

Some examples of high-level languages include the following languages:

Python (used for web development, data science and others)
Java™ (used for Android development, enterprise applications and others)
C++ (used for game development, operating systems and others)
JavaScript (used for dynamic and interactive web development)
PHP (used for server-side scripting in web development)
C# (used for Windows applications, Unity engine game development)

Reduce repetition

Compilers help improve efficiency by converting high-level code into executable machine code. The compiler's output is stored with a .exe file extension, which is then directly executed by a computer. Due to the compiler, writing an executable program becomes a one-time effort task.

Once completed, the compiled code can be executed as many times as necessary. This process helps programs generally run faster and more efficiently, as certain applications or parts of applications can be executed separately from runtime software tasks.

Improve portability

Not all systems can run all types of programming code. Compilers are used to convert the types of code developers prefer to use into the types of code that systems require to operate. In this way, compilers improve program portability by converting software into a wide variety of compatible languages that can be easily stored, transferred and executed in various operating systems and hardware architectures.

Promote overall optimization

During the compiling process, compilers can be used to identify and address software errors and flaws, resulting in more stable and better-optimized programs. Compilers can also help improve software security by preventing memory-related errors, such as buffer overflows, and generate warnings if potential memory issues are detected.

Compilers versus interpreters

While compilers are used to convert source code into executable machine code, interpreters are another type of program that can provide similar functionality, but through a different mechanism.

Instead of converting the source code, interpreters either directly execute source code or use an intermediate code known as bytecode, a low-level, platform-independent representation of the source code. Bytecode serves as an intermediary between human-readable source code and machine code, designed for execution by a virtual machine (VM) instead of directly on a computer’s hardware.

Theoretically, any programming language can be executed with either a compiler or an interpreter. However, individual programming languages tend to be better suited to either compilation or interpretation.

In practice, the distinction between compiler languages and interpreter languages can sometimes blur—just as the distinction between compilers and interpreters themselves—as both types of programs can feature overlapping functionalities. While some languages are more commonly compiled and some more commonly interpreted, it is possible to write a compiler for a language that is commonly interpreted and vice versa.

High-level languages are typically created with a type of conversion—either compilation or interpretation—in mind, but these are more suggestions than hard limitations. For example, BASIC is often referred to as an interpreted language and C a compiled language, but there exist compilers for BASIC just as there are C interpreters.

The primary difference between interpreters and compilers lies in timing and optimization. Both types of programs attempt to convert source code into target code that is first functional and then optimized.

Depending on the operating environment, compiled or interpreted code might be better suited to efficiently run with considerations made for hardware capability, memory and storage capacity. Depending on the constraints of any specific program, application and hardware, either compilation, interpretation or a combination of both might yield the best results.

As such, interpretation cannot stand in for compilation entirely, but it can move compilation duties to the background through a gradual conversion process. Compilers employ an ahead-of-time (AOT) conversion strategy that converts source code into target code entirely before creating an executable file.

Interpreters, alternatively, either run code directly as an application requires it or use bytecode as the intermediary to output virtual machine executable source code. In this way, interpreters might provide some speedups or flexibility, but at some point, a set of directly executed machine instructions must be provided toward the end of the execution stack.

In some instances, when lightweight efficiency is a priority, special interpreters can be preferable over compilers for their ability to perform just-in-time (JIT) conversion. JIT is a strategy that compiles pieces of source code into target code into a memory buffer for immediate execution. JIT interpretation compiles code on demand, combining the one-time compilation efficiency of a traditional compiler with the flexibility to repeatedly execute code—often faster than standard bytecode interpreters.

However, as modern trends toward JIT compilation increase along with situationally dependent bytecode interpretation, many compilers are being designed to offer both compilation and interpretation features. This overlap further blurs the lines between these two categories.

Why Gartner Says These Platforms Are Defining the Future of IT Infrastructure

Explore how leaders in distributed file systems and object storage are reshaping performance, resilience, and scalability for enterprise IT environments.

What is a compiler?

Authors

What is a compiler?

The latest tech news, backed by expert insights

Thank you! You are subscribed.

How compilers work

Achieving AI-readiness with hybrid cloud

Three-stage compiler structure

Benefits of using a compiler

Compilers versus interpreters

IBM LinuxONE 5: Unlock the potential of Linux and AI