C, Lambda Calculus and Compiler Verification

A study in Haskell of purely-functional techniques
for a formal specification of imperative programming languages
and an epistemically-sound verification of their compilers

Formal verification of a compiler is a long-standing problem in computer science and, although recent years have seen substantial achievements in the area, most of the proposed solutions do not scale very well with the complexity of modern software development environments. In this thesis, I present a formal semantic model of the popular C programming language described in the ANSI/ISO/IEC Standard 9899:1990, in the form of a mapping of C programs to computable functions expressed in a suitable variant of lambda calculus. The specification is formulated in a highly readable functional style and is accompanied by a complete Haskell implementation of the compiler, covering all aspects of the translation from a parse tree of a C program down to the actual sequence of executable machine instructions, resolving issues of separate compilation, allowing for optimising program transformations and providing rigorous guarantees of the implementation’s conformance to a formal defiition of the language.

In order to achieve these goals, I revisit the challenge of compiler verification from its very philosophical foundations. Beginning with the basic epistemic notions of knowledge, correctness and proof, I show that a fully rigorous solution of the problem requires a constructive formulation of the correctness criteria in terms of the translation process itself, in contrast with the more popular extensional approaches to compiler verification, in which correctness is generally defined as commutativity of the system with respect to a formal semantics of the source and target languages, effectively formalising various aspects of the compiler independently of each other and separately from its eventual implementation. I argue that a satisfactory judgement of correctness should always constitute a direct formal description of the job performed by the software being judged, instead of an axiomatic definition of some abstract property such as commutativity of the translation system, since the later approach fails to establish a crucial causal connection between a judgement of correctness and a knowledge of it.

The primary contribution of my PhD thesis is the new notion of linear correctness, which strives to provide a constructive formulation of a compiler’s validity criteria by deriving its judgement directly from a formal description of the language translation process itself. The approach relies crucially on a denotational semantic model of the source and target languages, in which the domains of program meanings are unified with the actual intermediate program representation of the underlying compiler implementation. By defining the concepts of a programming language, compiler and compiler correctness in category theoretic terms, I show that every linearly correct compiler is also valid in the more traditional sense of the word. Further, by presenting a complete verified translation of the standard C programming language within this framework, I demonstrate that linear correctness is a highly effective approach to the problem of compiler verification and that it scales particularly well with the complexity of modern software development environments.

You can obtain the final version of my thesis from:

http://www.jantar.org/papers/zadarnowski11phd.pdf

If you would like to cite my work in your publication, please use the following BibTeX entry:

@phdthesis{zadarnowski11phd,
    author = {Zadarnowski, Patryk},
    title = {C, Lambda Calculus and Compiler Verification:
             A study in Haskell of purely-functional techniques
             for a formal specification of imperative programming
             languages and an epistemically-sound verification
             of their compilers},
    school = {University of New South Wales},
    month = {January},
    year = {2011},
 }

Last but not least, all Haskell source code for the actual C compiler described within the thesis can be downloaded from the following address:

http://www.jantar.org/phd/dissertation.tar.gz

The tarball contains Haskell source code for a complete back-end of a standards-conforming C compiler which translates a parse tree of a C translation-unit into an abstract representation of an MMIX object file described in Chapters 5 and 6 of my thesis, respectively. Since, at the moment, the compiler does not include a front-end (i.e., a lexical analyser and parser) you will need to construct the parse tree of your source program manually. For a good example of doing so, please check out the two samples included in the files C/Example.lhs and MMIX/Example.lhs.

In order to use the toolkit, you will need a Haskell 98 interpreter (preferably GHCi) that supports the recursive do notation and relaxed polymorphic recursion extensions, as enabled through the RecursiveDo and RelaxedPolyRec LANGUAGE pragmas. The source code has been confirmed to work with GHCi version 6.10.4.

Alternatively, you can check out my FunCC project, in which I will continue development of the compiler described in my thesis, eventually including a complete preprocessor, parser, support for multiple realistic targets such as X86 and ARM, as well as a bagfull of useful optimisations beyond the simple code transformations described in Lambda/Simplifier.lhs.

Copyright

You are free and encouraged to use the above source for any purpose you seem fit, including development of your own linearly correct compilers, subject only to the following slightly-modified BSD licence:

Copyright © 2003-2011, Patryk Zadarnowski. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.