Formal verification of a compiler is a long-standing problem in computer science and, although recent years have seen substantial achievements in the area, most of the proposed solutions do not scale very well with the complexity of modern software development environments. In this thesis, I present a formal semantic model of the popular C programming language described in the ANSI/ISO/IEC Standard 9899:1990, in the form of a mapping of C programs to computable functions expressed in a suitable variant of lambda calculus. The specification is formulated in a highly readable functional style and is accompanied by a complete Haskell implementation of the compiler, covering all aspects of the translation from a parse tree of a C program down to the actual sequence of executable machine instructions, resolving issues of separate compilation, allowing for optimising program transformations and providing rigorous guarantees of the implementation’s conformance to a formal defiition of the language.
In order to achieve these goals, I revisit the challenge of compiler verification from its very philosophical foundations. Beginning with the basic epistemic notions of knowledge, correctness and proof, I show that a fully rigorous solution of the problem requires a constructive formulation of the correctness criteria in terms of the translation process itself, in contrast with the more popular extensional approaches to compiler verification, in which correctness is generally defined as commutativity of the system with respect to a formal semantics of the source and target languages, effectively formalising various aspects of the compiler independently of each other and separately from its eventual implementation. I argue that a satisfactory judgement of correctness should always constitute a direct formal description of the job performed by the software being judged, instead of an axiomatic definition of some abstract property such as commutativity of the translation system, since the later approach fails to establish a crucial causal connection between a judgement of correctness and a knowledge of it.
The primary contribution of this thesis is the new notion of linear correctness, which strives to provide a constructive formulation of a compiler’s validity criteria by deriving its judgement directly from a formal description of the language translation process itself. The approach relies crucially on a denotational semantic model of the source and target languages, in which the domains of program meanings are unified with the actual intermediate program representation of the underlying compiler implementation. By defining the concepts of a programming language, compiler and compiler correctness in category theoretic terms, I show that every linearly correct compiler is also valid in the more traditional sense of the word. Further, by presenting a complete verified translation of the standard C programming language within this framework, I demonstrate that linear correctness is a highly effective approach to the problem of compiler verification and that it scales particularly well with the complexity of modern software development environments.
@phdthesis{zadarnowski00thesis, author = {Zadarnowski, Patryk}, title = {C, Lambda Calculus and Compiler Verification: A study in Haskell of purely-functional techniques for a formal specification of imperative programming languages and an epistemically-sound verification of their compilers}, school = {University of New South Wales}, month = {January}, year = {2011}, }PDF More
To date, measurement-based WCET analysis and static analysis have largely been seen as being at odds with each other. We argue that they should be considered complementary, and that the combination of both represents a promising approach. In this paper we discuss in some detail how we aim to improve on our probabilistic measurement-based technique by adding static cache analysis. Specifically we are planning to make use of recent advances in the functional languages research community. The objective of this paper is not to present finished or almost finished work. Instead we hope to trigger discussion and solicit feedback from the community in order to avoid pitfalls experienced by others and help focus our research.
@inproceedings{petters07potoroo, author = {Petters, Stefan M. and Zadarnowski, Patryk and Heiser, Gernot}, title = {Measurements or Static Analysis or Both?}, booktitle = {Proceedings of the 7th International Workshop on Worst-Case Execution Time (WCET) Analysis}, editor = {Rochange, Christine}, year = {2007}, month = {July}, pages = {5--11}, location = {Pisa, Italy}, publisher = {IRIT, France}, }PDF
The recent advent of the load/store architectures, combined with the growing disproportion between the register and memory access times, has exerted an increasing pressure on compiler designers to expend substantial effort on the register allocation phase of compilation, promoting it from its original role as an optional optimizing pass to an intrinsic part of the compilation process and an important design factor in determining the global structure of a compiler. There is a strong tendency for modern compilers to be clearly divided into a portable symbolic phase encompassing the bulk of transformations, and an unportable register-allocated phase comprising of architecture specific transformations such as instruction scheduling and cache-related optimizations, and connected to the portable phase through register allocation. In this report, we review the past and present register allocation methodologies employed by compiler designers, with an emphasis on their influence on the design of intermediate program representations.
@misc{zadarnowski03ra, author = {Zadarnowski, Patryk}, title = {Intermediate Program Representations for Register Allocation}, howpublished = {Unpublished manuscript, University of New South Wales}, month = {December}, year = {2003}, }PDF
The static single assignment (SSA) form is central to a range of optimisation algorithms relying on data flow information, and hence, to the correctness of compilers employing those algorithms. It is well known that the SSA form is closely related to lambda terms (i.e., functional programs) and, considering the large amount of energy expended on theories and frameworks for formal reasoning in the lambda calculus, it seems only natural to leverage this connection to improve our capabilities to reason about compiler optimisations. In this paper, we discuss a new formalisation of the mapping from SSA programs to a restricted form of lambda terms, called administrative normal form (ANF). We conjecture that this connection improves our ability to reason about SSA-based optimisation algorithms and provide a first data point by presenting an ANF variant of a well known SSA-based conditional constant propagation algorithm.
@inproceedings{chakravarty03perspective, author = {Chakravarty, Manuel M. T. and Keller, Gabriele and Zadarnowski, Patryk}, title = {A Functional Perspective on SSA Optimisation Algorithms}, booktitle = {COCV '03: Compiler Optimization Meets Compiler Verification}, series = {Electronic Notes in Theoretical Computer Science}, editor = {Knoop, Jens and Zimmermann, Wolf}, year = {2003}, month = {April}, location = {Warsaw, Poland}, volume = {82}, issue = {2}, pages = {347--361}, publisher = {Elsevier Science}, }PDF More
This thesis describes the design of a system-level instruction set simulator, a software tool that emulates enough software-visible aspects of a computer system to allow direct execution of an unmodified operating system in a controlled, transparent environment. Instruction set (ISA) simulation is an established technique used for debugging, profiling and validation of computer systems since the introduction of the first EDSAC ISA in 1949. To date, all such simulators have been designed in an ad hoc fashion, targeted for a particular combination of the target architecture, operating system and simulation technique. This approach leaves little room for code reuse, and is unsatisfactory as instruction set simulators represent a substantial code investment, but are typically employed in a research or a development environment as short-lived, project-specific tools. Therefore, there exists a clear need for a robust framework for sharing and reusing simulator components. Sulima is a result of my attempt to design such a framework, in which system-level simulators can be constructed from opaque, autonomous modules, each representing a distinct component of a complete hardware system. For example, one can design modules simulating a particular processor, peripheral device, memory configuration or a floating-point unit for the desired level of accuracy and performance. In this thesis, I also discuss issues associated with design of a simulator based on dynamic translation of instruction sets and a novel technique that uses the Sulima framework to efficiently distribute a simulation of a multiprocessor system across multiple simulator hosts. Finally, I describe the implementation of a fairly sophisticated simulator for a complete MIPS64-based system.
@misc{zadarnowski00sulima, author = {Zadarnowski, Patryk}, title = {The Design and Implementation of an Extendible Instruction Set Simulator}, howpublished = {Undergraduate thesis, University of New South Wales}, month = {November}, year = {2000}, }PDF More