How Difficult is it to Write a Bare Bones Compiler?Roland Olsson
IntroductionA full-blown modern compiler package typically weighs in at 50 Mb or more. When being confronted with such a complex product, many computer science students feel that the task of writing a compiler is insurmountable. If you are also awed by the complexity of a compiler, I hope that this article will convince you that the core of a compiler is easy to write and that most of the typical 50 Mb or so is ''bells-and-whistles''. If all fat, for example the GUI and the libraries, is shedded, the remaining bare bones of a compiler is on the order of a few hundred Kb.As an example of a bare bones compiler, I will use the ADATE-ML compiler that I wrote during the summer of 1996. This compiler converts a small, purely functional part of the language Standard ML directly to machine code for INTEL x86 processors, but is particularly optimized for the Pentium Pro. The ADATE-ML compiler is written in a mix of C and ML using Standard ML of New Jersey development tools, for example ML-LEX and ML-YACC.
Why Write Your Own Compiler?In general, two important reasons to brew a compiler in your own office is that you have a special, perhaps self-designed, language for which there are no compilers or that you have special target hardware for which no compilers exist.My reasons are as follows. The ADATE-ML compiler is embedded in the ADATE program synthesis system, which needs to compile, link, load and execute several hundred programs per second. There are at least four reasons that no generally known ML compiler, such as Standard ML of New Jersey or Harlequin ML Works, can be used instead of my own:
The Main Parts of a CompilerThe source code of a program is typically converted to machine code in several stages. For example, the compiler phases that gradually transform the input program may be as follows.
As an example of code for these four phases, the interested reader may look at the source code of the ADATE ML compiler, which is available at http://www-ia.hiof.no/~rolando/adate-ml-compiler . This directory contains code for the four compiler phases as follows:
Together, these files contain 2633 lines including comments and occupy about 80 Kb of disc space, which isn't a lot for a native code compiler with quite short compilation times and that also emits reasonably efficient code.
Producing Efficient CodeWriting a highly optimizing compiler often requires many times as much effort as writing a bare bones compiler with few optimizations.Generally, one needs intimate knowledge of both the language to be compiled and the target CPU. CPU specific optimizations are becoming more and more important with each new processor generation. In addition to being super-scalar, the newest top-of-the-line CPUs have branch prediction coupled with speculative execution, which means that they execute instructions following a conditional branch before knowing whether or not the branch will be taken. In current CPUs, three, four or even five instructions can execute in the same clock cycle. This instruction level parallelism is likely to increase in a few years, for example when INTEL's IA-64 architecture appears in 1998 or 1999. Full utilization of the parallelism in the CPUs of the future will demand smarter compilers. Here is a list of some optimizations in order of specificity with respect to a given processor:
Since the goal of the ADATE-ML compiler primarily was fast compilation, it only employs the code flow and the alignment optimizations. Given this low degree of optimization, it is amazing that the ADATE-ML compiler in many cases emits more efficient code than huge optimizing compilers such as Standard ML of New Jersey and Harlequin ML Works. I will shortly present two benchmarks comparing ADATE-ML and Standard ML of New Jersey, but before reading them, please remember that the waters of benchmarking are quite murky and misleading. The first benchmark is a program that simplifies a polynomial represented as a list of coefficient - exponent pairs. The second benchmark computes a list of lists containing all permutations of a given list. These two benchmarks are available in http://www-ia.hiof.no/~rolando/adate-ml-compiler as bm_simplify.sml and bm_perms.sml. Both of these benchmarks are written in the ADATE subset of ML. The execution times for simplifying a rather long polynomial and computing all the 120 permutations of a list with five elements are shown in the following table. The times are in seconds for running each program 10000 times on a 200 MHz Pentium Pro machine.
Thus, it seems to be possible to obtain competitive execution times even without time-consuming high-level optimizations as long as the low-level code generator is reasonably good. This conclusion is supported by recent comp.lang.ml articles written by Xavier Leroy at INRIA, who also has developed an ML compiler without a full range of high level optimizations. However, if long compilation times are acceptable, a code generator optimized for the Pentium Pro together with a full range of high-level optimizations may be able to improve the times above by a factor of two or more. One example of an amazingly effective technique, that has become quite popular during 1996, is profiling based optimization. With this technique, the program is first compiled and executed to gather statistics. The collected statistics is then used to guide optimizations during a second compilation.
Copyright: 1996, Høgskolen i Østfold. Last Update: 28.06.97, Thomas Malt. |