HØit Nr. 1-97
Previous article Next article TOC: Nr. 1, 1997 Previous Issue Next Issue About HØit

How Difficult is it to Write a Bare Bones Compiler?


Roland Olsson

Introduction

A full-blown modern compiler package typically weighs in at 50 Mb or more. When being confronted with such a complex product, many computer science students feel that the task of writing a compiler is insurmountable. If you are also awed by the complexity of a compiler, I hope that this article will convince you that the core of a compiler is easy to write and that most of the typical 50 Mb or so is ''bells-and-whistles''. If all fat, for example the GUI and the libraries, is shedded, the remaining bare bones of a compiler is on the order of a few hundred Kb.

As an example of a bare bones compiler, I will use the ADATE-ML compiler that I wrote during the summer of 1996. This compiler converts a small, purely functional part of the language Standard ML directly to machine code for INTEL x86 processors, but is particularly optimized for the Pentium Pro. The ADATE-ML compiler is written in a mix of C and ML using Standard ML of New Jersey development tools, for example ML-LEX and ML-YACC.

Why Write Your Own Compiler?

In general, two important reasons to brew a compiler in your own office is that you have a special, perhaps self-designed, language for which there are no compilers or that you have special target hardware for which no compilers exist.

My reasons are as follows. The ADATE-ML compiler is embedded in the ADATE program synthesis system, which needs to compile, link, load and execute several hundred programs per second. There are at least four reasons that no generally known ML compiler, such as Standard ML of New Jersey or Harlequin ML Works, can be used instead of my own:

  1. The ADATE system needs special feedback from the execution of a program to guide subsequent program transformations. Examples of such feedback are the number of times each alternative in a case-expression is chosen and the number of function calls. It would be easy to insert special statistics gathering code into a synthesized program before it is compiled, but execution would then be quite a lot slower compared with letting the compiler directly emit machine code for this purpose.
  2. The ADATE-ML language is quite small and not a pure subset of Standard ML. For example, it contains three different kinds of special DON'T KNOW constants meaning that the result of a computation is unknown. These constants could be converted to Standard ML exceptions, but there are advantages with giving them special treatment in the compiler.
  3. The compilation, link and load time for the ADATE-ML compiler is at least 10 times shorter than for the compilers above. Since this time is one of the main speed bottlenecks in the ADATE system, compilation time efficiency is essential.
  4. The compiler, the linker and the loader need to be callable as functions from within the ADATE system. It would be quite time consuming to go through the operating system each time one of these functions is called.

The Main Parts of a Compiler

The source code of a program is typically converted to machine code in several stages. For example, the compiler phases that gradually transform the input program may be as follows.
  1. Lexical and syntax analysis. This phase builds an internal representation, usually some form of tree, of the source code. Also, it checks that the source is grammatically correct with respect to a context-free grammar for the programming language.
  2. Typing. The type phase checks the type correctness of the source code, but may also need to infer types. The type phase is quite simple for languages with almost non-existing type systems such as C++ but somewhat more difficult for a language like Standard ML with polymorphic, algebraic data types requiring unification based type inference.
  3. Code generation. In a simple compiler, the typed syntax tree may be directly converted to assembly language.
  4. Assembling, linking and loading. This phase converts the assembly code to machine code using absolute addresses and puts the machine code in its proper place in memory.
For several of these phases, there are automation tools that emit ML or C code for the corresponding parts of the compiler. The most well-known such tools are LEX and YACC which automate the development of code for the first phase, that is, lexical and syntax analysis. Therefore, the first phase becomes rather trivial once the grammar of the input language is known. Using LEX and YACC, this phase can be completed in one or two days for a small language. There are also tools to automate the code generation and to ease retargeting the compiler to another processor. One example of such a tool is ML-TWIG, which is a code-generator generator emitting ML code.

As an example of code for these four phases, the interested reader may look at the source code of the ADATE ML compiler, which is available at http://www-ia.hiof.no/~rolando/adate-ml-compiler .

This directory contains code for the four compiler phases as follows:

  1. Lexical and syntax analysis. Files: ML.lex, ML.grm.
  2. Typing. File: type.sml.
  3. Code generation. Files: compile.sml, super_combs.sml.
  4. Assembling, linking and loading. Files: execute.c, assemble_link_and_load.sml.

Together, these files contain 2633 lines including comments and occupy about 80 Kb of disc space, which isn't a lot for a native code compiler with quite short compilation times and that also emits reasonably efficient code.

Producing Efficient Code

Writing a highly optimizing compiler often requires many times as much effort as writing a bare bones compiler with few optimizations.

Generally, one needs intimate knowledge of both the language to be compiled and the target CPU. CPU specific optimizations are becoming more and more important with each new processor generation. In addition to being super-scalar, the newest top-of-the-line CPUs have branch prediction coupled with speculative execution, which means that they execute instructions following a conditional branch before knowing whether or not the branch will be taken.

In current CPUs, three, four or even five instructions can execute in the same clock cycle. This instruction level parallelism is likely to increase in a few years, for example when INTEL's IA-64 architecture appears in 1998 or 1999. Full utilization of the parallelism in the CPUs of the future will demand smarter compilers.

Here is a list of some optimizations in order of specificity with respect to a given processor:

  • Common subexpression elimination. The aim with this transformation is to compute the same subexpression only once.
  • Tail recursion optimization. This optimization can replace some kinds of function calls with jumps.
  • Beta-expansion. Inlining is another name for this optimization, which replaces a function call with an appropriately instantiated version of the definition of the function.
  • Code flow optimization. One goal with this optimization is to arrange conditional jumps so that the most frequent outcome is "jump not taken". The motivation is not to invalidate speculative prefetch and execution of instructions.
  • Alignment. This low-level optimization puts blocks of machine code, that follow for example jumps and function calls, on 32 byte boundaries, say. It is employed by practically all compilers, even the most stupid ones.

Since the goal of the ADATE-ML compiler primarily was fast compilation, it only employs the code flow and the alignment optimizations. Given this low degree of optimization, it is amazing that the ADATE-ML compiler in many cases emits more efficient code than huge optimizing compilers such as Standard ML of New Jersey and Harlequin ML Works.

I will shortly present two benchmarks comparing ADATE-ML and Standard ML of New Jersey, but before reading them, please remember that the waters of benchmarking are quite murky and misleading.

The first benchmark is a program that simplifies a polynomial represented as a list of coefficient - exponent pairs. The second benchmark computes a list of lists containing all permutations of a given list. These two benchmarks are available in http://www-ia.hiof.no/~rolando/adate-ml-compiler as bm_simplify.sml and bm_perms.sml. Both of these benchmarks are written in the ADATE subset of ML.

The execution times for simplifying a rather long polynomial and computing all the 120 permutations of a list with five elements are shown in the following table. The times are in seconds for running each program 10000 times on a 200 MHz Pentium Pro machine.

simplifypermutations
ADATE-ML 0.3 1.24 1.47
SML-NJ 109.111.58 2.79

Thus, it seems to be possible to obtain competitive execution times even without time-consuming high-level optimizations as long as the low-level code generator is reasonably good. This conclusion is supported by recent comp.lang.ml articles written by Xavier Leroy at INRIA, who also has developed an ML compiler without a full range of high level optimizations.

However, if long compilation times are acceptable, a code generator optimized for the Pentium Pro together with a full range of high-level optimizations may be able to improve the times above by a factor of two or more. One example of an amazingly effective technique, that has become quite popular during 1996, is profiling based optimization. With this technique, the program is first compiled and executed to gather statistics. The collected statistics is then used to guide optimizations during a second compilation.

Previous article Next article TOC: Nr. 1, 1997 Previous Issue Next Issue About HØit
HØit Nr. 1-97


Copyright: 1996, Høgskolen i Østfold. Last Update: 28.06.97, Thomas Malt.