# **Peter Müller (Ed.)**

# **Programming Languages and Systems**

**29th European Symposium on Programming, ESOP 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings**

# Lecture Notes in Computer Science 12075

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

# Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

Peter Müller (Ed.)

# Programming Languages and Systems

29th European Symposium on Programming, ESOP 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings

Editor Peter Müller ETH Zurich Zurich, Switzerland

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-44913-1 ISBN 978-3-030-44914-8 (eBook) https://doi.org/10.1007/978-3-030-44914-8

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in its beautiful capital Dublin.

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference.

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences).

The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago).

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin!

February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

# Preface

Welcome to the European Symposium on Programming (ESOP 2020)! The 29th edition of this conference series was initially planned to be held April 27–30, 2020, in Dublin, Ireland, but was then moved to fall 2020 due to the COVID-19 outbreak. ESOP is one of the European Joint Conferences on Theory and Practice of Software (ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems.

This volume contains 27 papers, which the Program Committee (PC) selected among 87 submissions. Each submission received between three and six reviews. After an author response period, the papers were discussed electronically among the PC members and external reviewers. The one paper for which the PC chair had a conflict of interest was kindly handled by Sasa Misailovic.

Submissions authored by a PC member were held to slightly higher standards: they received at least four reviews, had an external reviewer, and were accepted only if they were not involved in comparisons of relative merit with other submissions. We accepted two out of four PC submissions.

The final program includes a keynote by Işıl Dillig on "Formal Methods for Evolving Database Applications."

Any conference depends first and foremost on the quality of its submissions. I would like to thank all the authors who submitted their work to ESOP 2020! I am truly impressed by the members of the PC. They produced insightful and constructive reviews, contributed very actively to the online discussions, and were extremely helpful. It was an honor to work with all of you! I am also grateful to the external reviewers, who provided their expert opinions and helped tremendously to reach well-informed decisions. I would like to thank everybody who contributed to the organization of ESOP 2020, especially the ESOP 2020 Steering Committee and its chair Peter Thiemann as well as the ETAPS 2020 Steering Committee and its chair Marieke Huisman, who provided help and guidance on numerous occasions. Finally, I'd like to thank Linard Arquint and Vasileios Koutavas for their help with the proceedings.

February 2020 Peter Müller

# Organization

# Program Committee

Jean-Christophe Filliatre LRI, CNRS, France Rupak Majumdar MPI-SWS, Germany Peter Müller ETH Zurich, Switzerland Zvonimir Rakamaric University of Utah, USA Sukyoung Ryu KAIST, South Korea

Caterina Urban Inria Paris, France Viktor Vafeiadis MPI-SWS, Germany

# Additional Reviewers

Amtoft, Torben Arenas, Puri Balabonski, Thibaut Bernardy, Jean-Philippe Bierman, Gavin Blanchet, Bruno Bonchi, Filippo Bonelli, Eduardo Botbol, Vincent Bourke, Timothy

Elvira Albert Universidad Complutense de Madrid, Spain Sophia Drossopoulou Imperial College London, UK Arie Gurfinkel University of Waterloo, Canada Jan Hoffmann Carnegie Mellon University, USA Ranjit Jhala University of California at San Diego, USA Woosuk Lee Hanyang University, South Korea Rustan Leino Amazon Web Services, USA Roland Meyer Technische Universität Braunschweig, Germany Antoine Miné LIP6, Sorbonne Université, France Sasa Misailovic University of Illinois at Urbana-Champaign, USA Toby Murray University of Melbourne, Australia David Naumann Stevens Institute of Technology, USA Francesco Ranzato University of Padova, Italy Ilya Sergey Yale-NUS College and National University of Singapore, Singapore Alexandra Silva University College London, UK Nikhil Swamy Microsoft Research, USA Sam Tobin-Hochstadt Indiana University Bloomington, USA

> Brady, Edwin Brunet, Paul Caires, Luís Charguéraud, Arthur Chini, Peter Chudnov, Andrey Correas Fernández, Jesús Costea, Andreea Cousot, Patrick Crole, Roy

Cusumano-Towner, Marco Dagand, Pierre-Evariste Dahlqvist, Fredrik Dang, Hai Danielsson, Nils Anders Das, Ankush Enea, Constantin Finkbeiner, Bernd Fromherz, Aymeric Fuhs, Carsten Genaim, Samir Genitrini, Antoine Ghica, Dan Gordillo, Pablo Gordon, Colin S. Haas, Thomas Hage, Jurriaan He, Shaobo Heljanko, Keijo Jourdan, Jacques-Henri Kahn, David Kang, Jeehoon Kuderski, Jakub Lahav, Ori Laurent, Olivier Lee, Dongkwon Lee, Wonyeol Lesani, Mohsen Levy, Paul Blain Lindley, Sam Martin-Martin, Enrique Mohan, Anshuman Mordido, Andreia Morris, J. Garrett

Muller, Stefan Ngo, Minh Oh, Hakjoo Ouadjaout, Abdelraouf Ouederni, Meriem Palamidessi, Catuscia Pearlmutter, Barak Peters, Kirstin Pham, Long Poli, Federico Polikarpova, Nadia Pottier, François Rival, Xavier Román-Díez, Guillermo Sammartino, Matteo Sasse, Ralf Scalas, Alceste Scherer, Gabriel Sieczkowski, Filip Sivaramakrishnan, Kc Staton, Sam Stutsman, Ryan Tan, Yong Kiam van den Brand, Mark Vákár, Matthijs Wang, Di Wang, Meng Wehrheim, Heike Weng, Shu-Chun Wies, Thomas Wijesekera, Duminda Wolff, Sebastian Zufferey, Damien

# Formal Methods for Evolving Database Applications (Abstract of Keynote Talk)

#### Işıl Dillig

University of Texas at Austin, USA isil@cs.utexas.edu

Many database applications undergo significant schema changes during their life cycle due to performance or maintainability reasons. Examples of such schema changes include denormalization, splitting a single table into multiple tables, and consolidating multiple tables into a single table. Even though such schema refactorings are quite common in practice, programmers need to spend significant time and effort to re-implement parts of the code base that are affected by the schema change. Furthermore, it is not uncommon to introduce bugs during this code transformation process.

In this talk, I will present our recent work on using formal methods to simplify the schema refactoring process for evolving database applications. Specifically, I will first propose a definition of equivalence between database applications that operate over different schemas. Building on this definition, I will then present a fully automated technique for proving equivalence between a pair of applications. Our verification technique is capable of automatically synthesizing bisimulation invariants between two database applications and uses the inferred bisimulation invariant to automatically prove equivalence.

In the next part of the talk, I will explain how to leverage this verification technique to completely automate the code migration process. Specifically, given an original database application P over schema S and a new schema S<sup>0</sup> , I will discuss a practical program synthesis technique that can be used to generate a new program P<sup>0</sup> over schema S<sup>0</sup> such that P and P<sup>0</sup> are provably equivalent. In particular, I will first present a method for generating a program sketch of the new version; then, I will describe a novel synthesis algorithm that efficiently explores the space of all programs that are in the search space of the generated sketch.

Finally, I will describe experimental results on a suite of schema refactoring benchmarks, including real-world database applications written in Ruby-on-Rails. I will also outline remaining challenges in this area and motivate future research directions relevant to research in programming languages and formal methods.

# Contents





# Trace-Relating Compiler Correctness and Secure Compilation

Carmine Abate<sup>1</sup> Roberto Blanco<sup>1</sup> S, tefan Ciobâca˘<sup>2</sup> Adrien Durier<sup>1</sup> Deepak Garg<sup>3</sup> Cat˘ alin Hrit ˘ ,cu<sup>1</sup> Marco Patrignani4,<sup>5</sup> Éric Tanter6,<sup>1</sup> Jérémy Thibault<sup>1</sup>

<sup>1</sup>Inria Paris, France <sup>2</sup>UAIC Ia¸si, Romania <sup>3</sup>MPI-SWS, Saarbrücken, Germany <sup>4</sup>Stanford University, Stanford, USA <sup>5</sup>CISPA, Saarbrücken, Germany <sup>6</sup>University of Chile, Santiago, Chile

Abstract. Compiler correctness is, in its simplest form, defined as the inclusion of the set of traces of the compiled program into the set of traces of the original program, which is equivalent to the preservation of all trace properties. Here traces collect, for instance, the externally observable events of each execution. This definition requires, however, the set of traces of the source and target languages to be exactly the same, which is not the case when the languages are far apart or when observations are fine-grained. To overcome this issue, we study a generalized compiler correctness definition, which uses source and target traces drawn from potentially different sets and connected by an arbitrary relation. We set out to understand what guarantees this generalized compiler correctness definition gives us when instantiated with a non-trivial relation on traces. When this trace relation is not equality, it is no longer possible to preserve the trace properties of the source program unchanged. Instead, we provide a generic characterization of the target trace property ensured by correctly compiling a program that satisfies a given source property, and dually, of the source trace property one is required to show in order to obtain a certain target property for the compiled code. We show that this view on compiler correctness can naturally account for undefined behavior, resource exhaustion, different source and target values, sidechannels, and various abstraction mismatches. Finally, we show that the same generalization also applies to many secure compilation definitions, which characterize the protection of a compiled program against linked adversarial code.

# 1 Introduction

Compiler correctness is an old idea [37, 40, 41] that has seen a significant revival in recent times. This new wave was started by the creation of the CompCert verified C compiler [33] and continued by the proposal of many significant extensions and variants of CompCert [8, 9, 12, 23, 29, 30, 42, 52, 56, 57, 61] and the success of many other milestone compiler verification projects, including Vellvm [64], Pilsner [45], CakeML [58], CertiCoq [4], etc. Yet, even for these verified compilers, the precise statement of correctness matters. Since proof assistants are used to conduct the verification, an external observer does not have to understand the proofs in order to trust them, but one still has to deeply understand the statement that was proved. And this is true not just for correct compilation, but also for secure compilation, which is the more recent idea that our compilation chains should do more to also ensure security of our programs [3, 26].

Basic Compiler Correctness. The gold standard for compiler correctness is *semantic preservation*, which intuitively says that the semantics of a compiled program (in the target language) is compatible with the semantics of the original program (in the source language). For practical verified compilers, such as CompCert [33] and CakeML [58], semantic preservation is stated extrinsically, by referring to *traces*. In these two settings, a trace is an ordered sequence of events—such as inputs from and outputs to an external environment—that are produced by the execution of a program.

A basic definition of compiler correctness can be given by the set inclusion of the traces of the compiled program into the traces of the original program. Formally [33]:

#### Definition 1.1 (Basic Compiler Correctness (CC)). *A compiler* ↓ *is* correct *iff* <sup>∀</sup><sup>W</sup> t. <sup>W</sup>↓<sup>t</sup> <sup>⇒</sup> <sup>W</sup>t.

This definition says that for any whole<sup>1</sup> source program W, if we compile it (denoted W↓), execute it with respect to the semantics of the target language, and observe a trace t, then the original W can produce *the same* trace t with respect to the semantics of the source language.2 This definition is simple and easy to understand, since it only references a few familiar concepts: a compiler between a source and a target language, each equipped with a trace-producing semantics (usually nondeterministic).

Beyond Basic Compiler Correctness. This basic compiler correctness definition assumes that any trace produced by a compiled program can be produced by the source program. This is a very strict requirement, and in particular implies that the source and target traces are drawn from the same set and that the same source trace corresponds to a given target trace. These assumptions are often too strong, and hence in practice verified compiler efforts use different formulations of compiler correctness:


<sup>1</sup> For simplicity, for now we ignore separate compilation and linking, returning to it in §5.

<sup>2</sup> Typesetting convention [47]: we use a blue, sans-serif font for source elements, an **orange**,

**bold** font for **target** ones and a *black*, *italic* font for elements common to both languages. <sup>3</sup> Stated at the top of the CompCert file driver/Complements.v and discussed by Regehr [53].

Trace-Relating Compiler Correctness. Generalized formalizations of compiler correctness like the ones above can be naturally expressed as instances of a uniform definition, which we call *trace-relating compiler correctness*. This generalizes basic compiler correctness by (a) considering that source and target traces belong to *possibly distinct* sets Trace<sup>S</sup> and **TraceT**, and (b) being parameterized by an arbitrary *trace relation* <sup>∼</sup>.

Definition 1.2 (Trace-Relating Compiler Correctness (CC∼)). *A compiler* ↓ *is* correct *with respect to a trace relation* ∼ ⊆ Trace<sup>S</sup> <sup>×</sup> **Trace<sup>T</sup>** *iff*

<sup>∀</sup>W.∀**t**. <sup>W</sup>↓**<sup>t</sup>** ⇒∃<sup>s</sup> <sup>∼</sup> **<sup>t</sup>**. <sup>W</sup>s.

This definition requires that, for any target trace **t** produced by the compiled program W↓, there exist a source trace s that can be produced by the original program W and is *related* to **<sup>t</sup>** according to <sup>∼</sup> (i.e., <sup>s</sup> <sup>∼</sup> **<sup>t</sup>**). By choosing the trace relation appropriately, one can recover the different notions of compiler correctness presented above:

Basic CC Take <sup>s</sup> <sup>∼</sup> **<sup>t</sup>** to be <sup>s</sup> <sup>=</sup> **<sup>t</sup>**. Trivially, the basic CC of Definition 1.1 is CC=.

CompCert Undefined behavior is modeled in CompCert as a trace-terminating event Goes\_wrong that can occur in any of its languages (source, target, and all intermediate languages), so for a given phase (or composition thereof), we have Trace<sup>S</sup> = **TraceT**. Nevertheless, the relation between source and target traces with which to instantiate CC<sup>∼</sup> to obtain CompCert's current theorem is:

<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>≡</sup> <sup>s</sup> <sup>=</sup> **<sup>t</sup>** <sup>∨</sup> (∃<sup>m</sup> <sup>≤</sup> **<sup>t</sup>**. <sup>s</sup> <sup>=</sup> <sup>m</sup>·Goes\_wrong).

A compiler satisfying CC<sup>∼</sup> for this trace relation can turn a source trace ending in undefined behavior <sup>m</sup>·Goes\_wrong (where "·" is concatenation) either into the same trace in the target (first disjunct), or into a target trace that starts with the prefix m but then continues *arbitrarily* (second disjunct, "≤" is the prefix relation).

CakeML Here, target traces are sequences of symbols from an alphabet **Σ<sup>T</sup>** that has a specific trace-terminating event, **Resource**\_**limit**\_**hit**, which is not available in the source alphabet <sup>Σ</sup><sup>S</sup> (i.e., **<sup>Σ</sup><sup>T</sup>** <sup>=</sup> <sup>Σ</sup><sup>S</sup> ∪ {**Resource**\_**limit**\_**hit**}. Then, the compiler correctness theorem of CakeML can be obtained by instantiating CC<sup>∼</sup> with the following ∼ relation:

<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>≡</sup> <sup>s</sup> <sup>=</sup> **<sup>t</sup>** <sup>∨</sup> (∃m. m <sup>≤</sup> <sup>s</sup>. **<sup>t</sup>** <sup>=</sup> <sup>m</sup>·**Resource**\_**limit**\_**hit**). The resulting CC<sup>∼</sup> instance relates a target trace ending in **Resource**\_**limit**\_**hit** after executing m to a source trace that first produces m and then continues in a way given by the semantics of the source program.

Beyond undefined behavior and resource exhaustion, there are many other practical uses for CC∼: in this paper we show that it also accounts for differences between source and target values, for a single source output being turned into a series of target outputs, and for side-channels.

On the flip side, the compiler correctness statement and its implications can be more difficult to understand for CC<sup>∼</sup> than for CC<sup>=</sup>. The full implications of choosing a particular ∼ relation can be subtle. In fact, using a bad relation can make the compiler correctness statement trivial or unexpected. For instance, it should be easy to see that if one uses the total relation, which relates all source traces to all target ones, the CC<sup>∼</sup> property holds for every compiler, yet it might take one a bit more effort to understand that the same is true even for the following relation:

> <sup>s</sup> <sup>∼</sup> **<sup>t</sup>** ≡ ∃W.W<sup>s</sup> <sup>∧</sup> <sup>W</sup>↓**t**.

Reasoning About Trace Properties. To understand more about a particular CC<sup>∼</sup> instance, we propose to also look at how it preserves *trace properties*—defined as sets of allowed traces [31]—from the source to the target. For instance, it is well known that CC<sup>=</sup> is equivalent to the preservation of all trace properties (where <sup>W</sup> <sup>|</sup><sup>=</sup> <sup>π</sup> reads "<sup>W</sup> satisfies <sup>π</sup>" and stands for <sup>∀</sup>t. Wt ⇒ t ∈ π):

CC<sup>=</sup> ≡ ∀<sup>π</sup> <sup>∈</sup> <sup>2</sup>Trace <sup>∀</sup>W. <sup>W</sup>|=<sup>π</sup> <sup>⇒</sup> <sup>W</sup>↓*|***=**π.

However, to the best of our knowledge, similar results have not been formulated for trace relations beyond equality, when it is no longer possible to preserve the trace properties of the source program unchanged. For trace-relating compiler correctness, where source and target traces can be drawn from different sets and related by an arbitrary trace relation, there are two crucial questions to ask:


Far from being mere hypothetical questions, they can help the developer of a verified compiler to better understand the compiler correctness theorem they are proving, and we expect that any user of such a compiler will need to ask either one or the other if they are to make use of that theorem. In this work we provide a simple and natural answer to these questions, for any instance of CC∼. Building upon a bijection between relations and Galois connections [5, 20, 43], we observe that any trace relation ∼ corresponds to two *property mappings* τ˜ and σ˜, which are functions mapping source properties to target ones (τ˜ standing for "to target") and target properties to source ones (σ˜ standing for "to source"):

<sup>τ</sup>˜(πS) = {**<sup>t</sup>** | ∃s.<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>∧</sup> <sup>s</sup> <sup>∈</sup> <sup>π</sup>S} ; ˜σ(*π***T**) = {<sup>s</sup> | ∀**t**.<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>⇒</sup> **<sup>t</sup>** <sup>∈</sup> *<sup>π</sup>***T**} .

The *existential image* of ∼, τ˜, answers the first question above by mapping a given source property π<sup>S</sup> to the target property that contains all target traces for which *there exists a related source trace* that satisfies πS. Dually, the *universal image* of ∼, σ˜, answers the second question by mapping a given target property *π***<sup>T</sup>** to the source property that contains all source traces for which *all related target traces* satisfy *π***T**. We introduce two new correct compilation definitions in terms of *trace property preservation* (TP): TP<sup>τ</sup>˜ quantifies over all source trace properties and uses τ˜ to obtain the corresponding target properties. TP<sup>σ</sup>˜ quantifies over all target trace properties and uses σ˜ to obtain the corresponding source properties. We prove that these two definitions are equivalent to CC∼, yielding a novel trinitarian view of compiler correctness (Figure 1).

Fig. 1: The equivalent compiler correctness definitions forming our trinitarian view.

#### Contributions.


The paper closes with discussions of related (§6) and future work (§7). An online appendix contains omitted technical details: https://arxiv.org/abs/1907.05320.

The traces considered in our examples are structured, usually as sequences of events. We notice however that unless explicitly mentioned, all our definitions and results are more general and make no assumption whatsoever about the structure of traces. Most of the theorems formally or informally mentioned in the paper were mechanized in the Coq proof assistant and are marked with . This development has around 10k lines of code, is described in the online appendix, and is available at the following address: https://github.com/secure-compilation/different\_traces.

# 2 Trace-Relating Compiler Correctness

In this section, we start by generalizing the trace property preservation definitions at the end of the introduction to TP<sup>σ</sup> and TP<sup>τ</sup> , which depend on two *arbitrary* mappings σ and τ (§2.1). We prove that, whenever σ and τ form a Galois connection, TP<sup>σ</sup> and TP<sup>τ</sup> are equivalent (Theorem 2.4). We then exploit a bijective correspondence between trace relations and Galois connections to close the trinitarian view (§2.2), with two main benefits: first, it helps us assess the meaningfulness of a given trace relation by looking at the property mappings it induces; second, it allows us to construct new compiler correctness definitions starting from a desired mapping of properties. Finally, we generalize the classic result that compiler correctness (i.e., CC=) is enough to preserve not just trace properties but also all subset-closed hyperproperties [14]. For this, we show that CC<sup>∼</sup> is also equivalent to subset-closed hyperproperty preservation, for which we also define both a version in terms of σ˜ and a version in terms of τ˜ (§2.3).

# 2.1 Property Mappings

As explained in §1, trace-relating compiler correctness CC∼, by itself, lacks a crisp description of which trace properties are preserved by compilation. Since even the syntax of traces can differ between source and target, one can either look at trace properties of the source (but then one needs to interpret them in the target), or at trace properties of the target (but then one needs to interpret them in the source). Formally we need two property mappings, <sup>τ</sup> : 2TraceS <sup>→</sup> <sup>2</sup>**Trace<sup>T</sup>** and <sup>σ</sup> : 2**Trace<sup>T</sup>** <sup>→</sup> <sup>2</sup>TraceS , which lead us to the following generalization of trace property preservation (TP).

Definition 2.1 (TP<sup>σ</sup> and TP<sup>τ</sup> ). *Given two property mappings,* <sup>τ</sup> : 2TraceS <sup>→</sup> <sup>2</sup>**Trace<sup>T</sup>** *and* <sup>σ</sup> : 2**Trace<sup>T</sup>** <sup>→</sup> <sup>2</sup>TraceS *, for a compilation chain* ·↓ *we define:*

TP<sup>τ</sup> ≡ ∀πS. <sup>∀</sup>W. <sup>W</sup> <sup>|</sup><sup>=</sup> <sup>π</sup><sup>S</sup> <sup>⇒</sup> <sup>W</sup><sup>↓</sup> *<sup>|</sup>***<sup>=</sup>** <sup>τ</sup> (πS); TP<sup>σ</sup> ≡ ∀*π***T**. <sup>∀</sup>W. <sup>W</sup> <sup>|</sup><sup>=</sup> <sup>σ</sup>(*π***T**) <sup>⇒</sup> <sup>W</sup><sup>↓</sup> *<sup>|</sup>***<sup>=</sup>** *<sup>π</sup>***T**.

For an arbitrary source program W, τ interprets a source property π<sup>S</sup> as the *target guarantee* for W↓. Dually, σ defines a *source obligation* sufficient for the satisfaction of a target property *π***<sup>T</sup>** after compilation. Ideally:


These requirements are satisfied when the two maps form a *Galois connection* between the posets of source and target properties ordered by inclusion. We briefly recall the definition and the characteristic property of Galois connections [16, 38].

Definition 2.2 (Galois connection). *Let* (X, ) *and* (Y, ) *be two posets. A pair of maps,* <sup>α</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> *,* <sup>γ</sup> : <sup>Y</sup> <sup>→</sup> <sup>X</sup> *is a Galois connection* iff *it satisfies the* adjunction law*:* ∀x ∈ X. ∀y ∈ Y. α(x) y ⇐⇒ x γ(y)*.* α *(resp.* γ*) is the lower (upper) adjoint or abstraction (concretization) function and* Y *(*X*) the abstract (concrete) domain.*

We will often write <sup>α</sup> : (X, ) (Y, ) : <sup>γ</sup> to denote a Galois connection, or simply α : X Y : γ, or even α γ when the involved posets are clear from context.

Lemma 2.3 (Characteristic property of Galois connections). *If* <sup>α</sup>:(X, ) (Y, ):<sup>γ</sup> *is a Galois connection, then* α, γ *are monotone and they satisfy these properties:*

i) ∀x ∈ X. x γ(α(x)); ii) ∀y ∈ Y. α(γ(y)) y. *If* X, Y *are complete lattices, then* α *is continuous, i.e.,* ∀F ⊆ X. α( - F) = α(F)*.*

If two property mappings, τ and σ, form a Galois connection on trace properties ordered by set inclusion, Lemma 2.3 (with α = τ and γ = σ) tells us that they satisfy the ideal conditions we discussed above, i.e., τ (σ(*π***T**)) ⊆ *π***<sup>T</sup>** and σ(τ (πS)) ⊇ πS. 4

The two ideal conditions on τ and σ are sufficient to show the equivalence of the criteria they define, respectively TP<sup>τ</sup> and TP<sup>σ</sup>.

Theorem 2.4 (TP<sup>τ</sup> and TP<sup>σ</sup> coincide ). *Let* τ : 2TraceS 2**Trace<sup>T</sup>** : σ *be a Galois connection, with* <sup>τ</sup> *and* <sup>σ</sup> *the lower and upper adjoints (resp.). Then* TP<sup>τ</sup> ⇐⇒ TP<sup>σ</sup>*.*

#### 2.2 Trace Relations and Property Mappings

We now investigate the relation between CC∼, TP<sup>τ</sup> and TP<sup>σ</sup>. We show that for a trace relation and its corresponding Galois connection (Lemma 2.7), the three criteria are equivalent (Theorem 2.8). This equivalence offers interesting insights for both verification and design of a correct compiler. For a CC<sup>∼</sup> compiler, the equivalence makes explicit both the guarantees one has after compilation (τ˜) and source proof obligations to ensure the satisfaction of a given target property (σ˜). On the other hand, a compiler designer might first determine the target guarantees the compiler itself must provide, i.e., τ , and then prove an equivalent statement, CC∼, for which more convenient proof techniques exist in the literature [7, 58].

Definition 2.5 (Existential and Universal Image [20]). *Given any two sets* X *and* Y *and a relation* ∼ ⊆ <sup>A</sup> <sup>×</sup> <sup>B</sup>*, define its existential or direct image,* <sup>τ</sup>˜ : 2<sup>X</sup> <sup>→</sup> <sup>2</sup><sup>Y</sup> *and its universal image,* <sup>σ</sup>˜ : 2<sup>Y</sup> <sup>→</sup> <sup>2</sup><sup>X</sup> *as follows:* <sup>τ</sup>˜ <sup>=</sup> λ π <sup>∈</sup> <sup>2</sup><sup>X</sup>. {<sup>y</sup> | ∃x. x <sup>∼</sup> <sup>y</sup> <sup>∧</sup> <sup>x</sup> <sup>∈</sup> <sup>π</sup>} ; ˜<sup>σ</sup> <sup>=</sup> λ π <sup>∈</sup> <sup>2</sup><sup>Y</sup> . {<sup>x</sup> | ∀y. x <sup>∼</sup> <sup>y</sup> <sup>⇒</sup> <sup>y</sup> <sup>∈</sup> <sup>π</sup>} .

When trace relations are considered, the existential and universal images can be used to instantiate Definition 2.1 leading to the trinitarian view already mentioned in §1.

Theorem 2.6 (Trinitarian View ). *For any trace relation* ∼ *and its existential and universal images* <sup>τ</sup>˜ *and* <sup>σ</sup>˜*, we have:* TP<sup>τ</sup>˜ ⇐⇒ CC<sup>∼</sup> ⇐⇒ TP<sup>σ</sup>˜ *.*

This result relies both on Theorem 2.4 and on the fact that the existential and universal images of a trace relation form a Galois connection ( ). Below we further generalize this result (Theorem 2.8) relying on a bijective correspondence between trace relations and Galois connections on properties.

Lemma 2.7 (Trace relations ∼= Galois connections on trace properties). *The function* ∼ → <sup>τ</sup>˜ <sup>σ</sup>˜ *that maps a trace relation to its existential and universal images is a bijection between trace relations* 2TraceS×**Trace<sup>T</sup>** *and Galois connections on trace properties* <sup>2</sup>TraceS <sup>2</sup>**Trace<sup>T</sup>** *. Its inverse is* <sup>τ</sup> <sup>σ</sup> → <sup>∼</sup><sup>ˆ</sup> *, where* <sup>s</sup> <sup>∼</sup><sup>ˆ</sup> **<sup>t</sup>** <sup>≡</sup> **<sup>t</sup>** <sup>∈</sup> <sup>τ</sup> ({s})*.*

<sup>4</sup> While target traces are often *"more concrete"* than source ones, trace properties 2Trace (which in Coq we represent as the function type Trace→Prop) are contravariant in Trace and thus target properties correspond to the *abstract domain*.

*Proof.* Gardiner et al. [20] show that the existential image is a functor from the category of sets and relations to the category of predicate transformers, mapping a set <sup>X</sup> → <sup>2</sup><sup>X</sup> and a relation ∼ ⊆ <sup>X</sup> <sup>×</sup> <sup>Y</sup> → <sup>τ</sup>˜ : 2<sup>X</sup> <sup>→</sup> <sup>2</sup><sup>Y</sup> . They also show that such a functor is an isomorphism – hence bijective – when one considers only monotonic predicate transformers that have a – unique – upper adjoint. The universal image of ∼, σ˜, is the unique adjoint of <sup>τ</sup>˜ ( ), hence ∼ → <sup>τ</sup>˜ <sup>σ</sup>˜ is itself bijective.

The bijection just introduced allows us to generalize Theorem 2.6 and switch between the three views of compiler correctness described earlier at will.

Theorem 2.8 (Correspondence of Criteria). *For any trace relation* ∼ *and corresponding Galois connection* <sup>τ</sup> <sup>σ</sup>*, we have:* TP<sup>τ</sup> ⇐⇒ CC<sup>∼</sup> ⇐⇒ TP<sup>σ</sup>*.*

*Proof.* For a trace relation <sup>∼</sup> and the Galois connection <sup>τ</sup>˜ <sup>σ</sup>˜, the result follows from Theorem 2.6. For a Galois connection <sup>τ</sup> <sup>σ</sup> and <sup>∼</sup><sup>ˆ</sup> , use Lemma 2.7 to conclude that the existential and universal images of ∼ˆ coincide with τ and σ, respectively; the goal then follows from Theorem 2.6.

We conclude by explicitly noting that sometimes the lifted properties may be trivial: the target guarantee can be the true property (the set of all traces), or the source obligation the false property (the empty set of traces). This might be the case when source observations abstract away too much information (§3.2 presents an example).

# 2.3 Preservation of Subset-Closed Hyperproperties

A CC<sup>=</sup> compiler ensures the preservation not only of trace properties, but also of all subset-closed hyperproperties, which are known to be preserved by refinement [14]. An example of a subset-closed hyperproperty is *noninterference* [14]; a CC<sup>=</sup> compiler thus guarantees that if W is noninterfering with respect to the inputs and outputs in the trace then so is W↓. To be able to talk about how (hyper)properties such as noninterference are preserved, in this section we propose another trinitarian view involving CC<sup>∼</sup> and preservation of subset-closed hyperproperties (Theorem 2.11), slightly weakened in that source and target property mappings will need to be closed under subsets.

First, recall that a program satisfies a hyperproperty when its complete set of traces, which from now on we will call its *behavior*, is a member of the hyperproperty [14].

Definition 2.9 (Hyperproperty Satisfaction). *A program* W *satisfies a hyperproperty* <sup>H</sup>*, written* <sup>W</sup> <sup>|</sup><sup>=</sup> <sup>H</sup>*, iff* beh(W) <sup>∈</sup> <sup>H</sup>*, where* beh(W) = {<sup>t</sup> <sup>|</sup> <sup>W</sup>t}*.*

Hyperproperty preservation is a strong requirement in general. Fortunately, many interesting hyperproperties are *subset-closed* (SCH for short), which simplifies their preservation since it suffices to show that the behaviors of the compiled program refine the behaviors of the source one, which coincides with the statement of CC<sup>=</sup>.

To talk about hyperproperty preservation in the trace-relating setting, we need an interpretation of source hyperproperties into the target and vice versa. The one we consider builds on top of the two trace property mappings τ and σ, which are naturally lifted to hyperproperty mappings. This way we are able to extract two hyperproperty mappings from a trace relation similarly to §2.2:

Definition 2.10 (Lifting property mappings to hyperproperty mappings). *Let* τ : <sup>2</sup>TraceS <sup>→</sup> <sup>2</sup>**Trace<sup>T</sup>** *and* <sup>σ</sup> : 2**Trace<sup>T</sup>** <sup>→</sup> <sup>2</sup>TraceS *be arbitrary property mappings. The images of* <sup>H</sup><sup>S</sup> <sup>∈</sup> <sup>2</sup>2TraceS , **<sup>H</sup><sup>T</sup>** <sup>∈</sup> <sup>2</sup>2**TraceT** *under* <sup>τ</sup> *and* <sup>σ</sup> *are, respectively:* <sup>τ</sup> (HS) = {<sup>τ</sup> (πS) <sup>|</sup> <sup>π</sup><sup>S</sup> <sup>∈</sup> <sup>H</sup>S} ; <sup>σ</sup>(**HT**) = {σ(*π***T**) <sup>|</sup> *<sup>π</sup>***<sup>T</sup>** <sup>∈</sup> **<sup>H</sup>T**} .

Formally we are defining two new mappings, this time on hyperproperties, but by a small abuse of notation we still denote them by τ and σ.

Interestingly, it is not possible to apply the argument used for CC<sup>=</sup> to show that a CC<sup>∼</sup> compiler guarantees <sup>W</sup><sup>↓</sup> *<sup>|</sup>***<sup>=</sup>** <sup>τ</sup>˜(HS) whenever <sup>W</sup> <sup>|</sup><sup>=</sup> <sup>H</sup>S. This is in fact not true because direct images do not necessarily preserve subset-closure [36, 44]. To fix this we close the image of <sup>τ</sup>˜ and <sup>σ</sup>˜ under subsets (denoted as Cl<sup>⊆</sup>) and obtain:

Theorem 2.11 (Preservation of Subset-Closed Hyperproperties ). *For any trace relation* ∼ *and its existential and universal images lifted to hyperproperties,* τ˜ *and* σ˜*, and for* Cl <sup>⊆</sup>(H) = {<sup>π</sup> | ∃π <sup>∈</sup> H. π <sup>⊆</sup> <sup>π</sup> }*, we have:* SCHP*Cl*⊆◦τ˜ ⇐⇒ CC<sup>∼</sup> ⇐⇒ SCHP*Cl*⊆◦σ˜ , *where*

$$\mathsf{C}\mathsf{C}\mathsf{H}\mathsf{P}^{Cl\_{\subseteq}\circ\tilde{\tau}} \equiv \forall \mathsf{W}\forall \mathsf{H}\_{\mathsf{S}} \in \mathsf{C}\mathsf{C}\mathsf{H}\_{\mathsf{S}}.\mathsf{W} \mid \mathsf{H}\_{\mathsf{S}} \Rightarrow \mathsf{W}\downarrow \vdash \mathsf{C}l\_{\subseteq}(\tilde{\tau}(\mathsf{H}\_{\mathsf{S}}));$$

SCHP*Cl*⊆◦σ˜ ≡ ∀W∀**H<sup>T</sup>** <sup>∈</sup> **SCHT**.<sup>W</sup> <sup>|</sup><sup>=</sup> Cl <sup>⊆</sup>(˜σ(**HT**)) <sup>⇒</sup> <sup>W</sup><sup>↓</sup> *<sup>|</sup>***= HT**.

Theorem 2.11 makes us aware of the potential loss of precision when interested in preserving subset-closed hyperproperties through compilation. In §4 we focus on a security relevant subset-closed hyperproperty, noninterference, and show that such a loss of precision can be intended as a declassification of noninterference.

# 3 Instances of Trace-Relating Compiler Correctness

The trace-relating view of compiler correctness above can serve as a unifying framework for studying a range of interesting compilers. This section provides several representative instantiations of the framework: source languages with undefined behavior that compilation can turn into arbitrary target behavior (§3.1), target languages with resource exhaustion that cannot happen in the source (§3.2), changes in the representation of values (§3.3), and differences in the granularity of data and observable events (§3.4).

# 3.1 Undefined Behavior

We start by expanding upon the discussion of undefined behavior in §1. We first study the model of CompCert, where source and target alphabets are the same, including the event for undefined behavior. The trace relation weakens equality by allowing undefined behavior to be replaced with an arbitrary sequence of events.

*Example 3.1 (CompCert-like Undefined Behavior Relation).* Source and target traces are sequences of events drawn from <sup>Σ</sup>, where Goes\_wrong <sup>∈</sup> <sup>Σ</sup> is a terminal event that represents an undefined behavior. We then use the trace relation from the introduction:

<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>≡</sup> <sup>s</sup> <sup>=</sup> **<sup>t</sup>** ∨ ∃<sup>m</sup> <sup>≤</sup> **<sup>t</sup>**. <sup>s</sup> <sup>=</sup> <sup>m</sup> · Goes\_wrong.

Each trace of a target program produced by a CC<sup>∼</sup> compiler is either also a trace of the original source program or it has a finite prefix that the source program also produces, immediately before encountering undefined behavior. As explained in §1, one of the correctness theorems in CompCert can be rephrased as this variant of CC∼.

We proved that the property mappings induced by the relation can be written as ( ):

<sup>σ</sup>˜(*π***T**) = {<sup>s</sup> <sup>|</sup> <sup>s</sup>∈*π***<sup>T</sup>** <sup>∧</sup> <sup>s</sup> <sup>=</sup> <sup>m</sup>·*Goes*\_*wrong*}∪{m·*Goes*\_*wrong* | ∀**t**. m≤**<sup>t</sup>** <sup>=</sup><sup>⇒</sup> **<sup>t</sup>**∈*π***T**} ; <sup>τ</sup>˜(πS) = {**<sup>t</sup>** <sup>|</sup> **<sup>t</sup>**∈πS}∪{**<sup>t</sup>** | ∃<sup>m</sup> <sup>≤</sup> **<sup>t</sup>**. m·*Goes*\_*wrong* <sup>∈</sup> <sup>π</sup>S} .

These two mappings explain what a CC<sup>∼</sup> compiler ensures for the ∼ relation above. The target-to-source mapping σ˜ states that to prove that a compiled program has a property *π<sup>T</sup>* using source-level reasoning, one has to prove that any trace produced by the source program must either be a target trace satisfying *π<sup>T</sup>* or have undefined behavior, but only provided that *any continuation* of the trace substituted for the undefined behavior satisfies *π<sup>T</sup>* . The source-to-target mapping τ˜ states that by compiling a program satisfying a property π<sup>S</sup> we obtain a program that produces traces that satisfy the same property or that extend a source trace that ends in undefined behavior.

These definitions can help us reason about programs. For instance, σ˜ specifies that, to prove that an event does not happen in the target, it is not enough to prove that it does not happen in the source: it is also necessary to prove that the source program is does not have any undefined behavior (second disjunct). Indeed, if it had an undefined behavior, its continuations could exhibit the unwanted event. -

This relation can be easily generalized to other settings. For instance, consider the setting in which we compile down to a low-level language like machine code. Target traces can now contain new events that cannot occur in the source: indeed, in modern architectures like x86 a compiler typically uses only a fraction of the available instruction set. Some instructions might even perform dangerous operations, such as writing to the hard drive. Formally, the source and target do not have the same events any more. Thus, we consider a source alphabet Σ<sup>S</sup> = Σ ∪ {Goes\_wrong}, and a target alphabet **<sup>Σ</sup><sup>T</sup>** <sup>=</sup> <sup>Σ</sup> <sup>∪</sup> <sup>Σ</sup> . The trace relation is defined in the same way and we obtain the same property mappings as above, except that since target traces now have more events (some of which may be dangerous), and the arbitrary continuations of target traces get more interesting. For instance, consider a new event that represents writing data on the hard drive, and suppose we want to prove that this event cannot happen for a compiled program. Then, proving this property requires exactly proving that the source program exhibits no undefined behavior [11]. More generally, what one can prove about targetonly events can only be either that they cannot appear (because there is no undefined behavior) or that any of them can appear (in the case of undefined behavior).

In §5.2 we study a similar example, showing that even in a safe language linked adversarial contexts can cause dangerous target events that have no source correspondent.

#### 3.2 Resource Exhaustion

Let us return to the discussion about resource exhaustion in §1.

*Example 3.2 (Resource Exhaustion).* We consider traces made of events drawn from <sup>Σ</sup><sup>S</sup> in the source, and **<sup>Σ</sup><sup>T</sup>** <sup>=</sup> <sup>Σ</sup><sup>S</sup> ∪ {**Resource**\_**Limit**\_**Hit**} in the target. Recall the trace relation for resource exhaustion:

<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>≡</sup> <sup>s</sup> <sup>=</sup> **<sup>t</sup>** ∨ ∃<sup>m</sup> <sup>≤</sup> <sup>s</sup>. **<sup>t</sup>** <sup>=</sup> <sup>m</sup> · **Resource**\_**Limit**\_**Hit**.

Formally, this relation is similar to the one for undefined behavior, except this time it is the target trace that is allowed to end early instead of the source trace.

The induced trace property mappings σ˜ and τ˜ are the following ( ):

$$\tilde{\sigma}(\pi\_T) = \{ \mathbf{s} \mid \mathbf{s} \in \pi\_T \} \cap \{ \mathbf{s} \mid \forall m \le \mathbf{s}. m \cdot \text{Resoure\\_Limit\\_Hit} \in \pi\_T \};$$

$$
\tilde{\tau}(\pi\_{\mathfrak{S}}) = \pi\_{\mathfrak{S}} \cup \{ m \cdot \text{Resoure\\_Limit\\_Hit} \mid \exists \mathfrak{s} \in \pi\_{\mathfrak{S}}. m \le \mathfrak{s} \}.
$$

These capture the following intuitions. The target-to-source mapping σ˜ states that to prove a property of the compiled program one has to show that the traces of the source program satisfy two conditions: (1) they must also satisfy the target property; and (2) the termination of every one of their prefixes by a resource exhaustion error must be allowed by the target property. This is rather restrictive: any property that prevents resource exhaustion cannot be proved using source-level reasoning. Indeed, if *π***<sup>T</sup>** does not allow resource exhaustion, then σ˜(*π***T**) = ∅. This is to be expected since resource exhaustion is simply not accounted for at the source level. The other mapping τ˜ states that a compiled program produces traces that either belong to the same properties as the traces of the source program or end early due to resource exhaustion.

In this example, safety properties [31] are mapped (in both directions) to other safety properties ( ). This can be desirable for a relation: since safety properties are usually easier to reason about, one interested only in safety properties at the target can reason about them using source-level reasoning tools for safety properties.

The compiler correctness theorem in CakeML is an instance of CC<sup>∼</sup> for the ∼ relation above. We have also implemented two small compilers that are correct for this relation. The full details can be found in the Coq development in the supplementary materials. The first compiler ( ) goes from a simple expression language (similar to the one in §3.3 but without inputs) to the same language except that execution is bounded by some amount of fuel: each execution step consumes some amount of fuel and execution immediately halts when it runs out of fuel. The compiler is the identity.

The second compiler ( ) is more interesting: we proved this CC<sup>∼</sup> instance for a variant of a compiler from a WHILE language to a simple stack machine by Xavier Leroy [35]. We enriched the two languages with outputs and modified the semantics of the stack machine so that it falls into an error state if the stack reaches a certain size. The proof uses a standard forward simulation modified to account for failure. -

We conclude this subsection by noting that the resource exhaustion relation and the undefined behavior relation from the previous subsection can easily be combined. Indeed, given a relation ∼UB and a relation ∼RE defined as above on the same sets of traces, we can build a new relation ∼ that allows both refinement of undefined behavior and resource exhaustion by taking their union: <sup>s</sup> <sup>∼</sup> **<sup>t</sup>** <sup>≡</sup> <sup>s</sup> <sup>∼</sup>UB **<sup>t</sup>** <sup>∨</sup> <sup>s</sup> <sup>∼</sup>RE **<sup>t</sup>**. A compiler that is CC<sup>∼</sup>UB or CC<sup>∼</sup>RE is trivially CC∼, though the converse is not true.

#### 3.3 Different Source and Target Values

We now illustrate trace-relating compilation for a translation mapping source-level booleans to target-level natural numbers. Given the simplicity of this compiler, most of the details of the formalization are deferred to the online appendix.

The source language is a pure, statically typed expression language whose expressions e include naturals n, booleans b, conditionals, arithmetic and relational operations, boolean inputs in<sup>b</sup> and natural inputs inn. A trace s is a list of inputs is paired with a result r, which can be a natural, a boolean, or an error. Well-typed programs never produce error ( ). Types ty are either N (naturals) or B (booleans); typing is standard. The source language has a standard big-step operational semantics (e is,r) which tells how an expression e generates a trace is,r. The target language is analogous, except that it is untyped, only has naturals **n** and its only inputs are naturals **inn**. The semantics of the target language is also given in big-step style. Since we only have naturals and all expressions operate on them, no error result is possible in the target.

The compiler is homomorphic, translating a source expression to the same target expression; the only differences are natural numbers (and conditionals), as noted below.

true<sup>↓</sup> <sup>=</sup> **<sup>1</sup>** inb<sup>↓</sup> <sup>=</sup> **in<sup>n</sup>** <sup>e</sup><sup>1</sup> <sup>≤</sup> <sup>e</sup>2<sup>↓</sup> <sup>=</sup> **if** <sup>e</sup>1<sup>↓</sup> *<sup>≤</sup>* <sup>e</sup>2<sup>↓</sup> **then 1 else 0** false<sup>↓</sup> <sup>=</sup> **<sup>0</sup>** inn<sup>↓</sup> <sup>=</sup> **in<sup>n</sup>** if e<sup>1</sup> then e<sup>2</sup> else e3<sup>↓</sup> <sup>=</sup> **if** <sup>e</sup>1<sup>↓</sup> *<sup>≤</sup>* **0 then** <sup>e</sup>3<sup>↓</sup> **else** <sup>e</sup>2<sup>↓</sup> When compiling an *if-then-else* the target condition <sup>e</sup>1<sup>↓</sup> *<sup>≤</sup>* **<sup>0</sup>** is used to check that <sup>e</sup><sup>1</sup> is false, and therefore the *then* and *else* branches of the source are swapped in the target.

Relating Traces. We relate basic values (naturals and booleans) in a non-injective fashion as noted below. Then, we extend the relation to lists of inputs pointwise (Rules Empty and Cons) and lift that relation to traces (Rules Nat and Bool).


Property mappings. The property mappings σ˜ and τ˜ induced by the trace relation ∼ defined above capture the intuition behind encoding booleans as naturals:

– the source-to-target mapping allows true to be encoded by any non-zero number;

– the target-to-source mapping requires that **0** be replaceable by *both* 0 and false.

Compiler correctness. With the relation above, the compiler is proven to satisfy CC∼.

Theorem 3.3 (·↓ is correct ). ·↓ *is* CC∼*.*

Simulations with different traces. The difficulty in proving Theorem 3.3 arises from the trace-relating compilation setting: For compilation chains that have the same source and target traces, it is customary to prove compiler correctness using a forward simulation (i.e., a simulation between source and target transition system); then, using determinacy [18, 39] of the target language and input totality [19, 63] (aka receptiveness) of the source, this forward simulation is flipped into a backward simulation (a simulation between target and source transition system), as described by Beringer et al. [7], Leroy [34]. This flipping is useful because forward simulations are often much easier to prove (by induction on the transitions of the source) than backward ones, as it is the case here.

We first give the main idea of the flipping proof, when the inputs are the same in the source and the target [7, 34]. We only consider inputs, as it is the most interesting case, since with determinacy, nondeterminism only occurs on inputs. Given a forward simulation <sup>R</sup>, and a target program **<sup>W</sup><sup>T</sup>** that simulates a source program <sup>W</sup>S, **<sup>W</sup><sup>T</sup>** is able to perform an input iff so is WS: otherwise, say for instance that W<sup>S</sup> performs an output, by forward simulation **W<sup>T</sup>** would also perform an output, which is impossible because of determinacy. By input totality of the source, W<sup>S</sup> must be able to perform the exact same input as **WT**; using forward simulation and determinacy, the resulting programs must be related.

However, our trace relation is not injective (both 0 and false are mapped to **0**), therefore these arguments do not apply: not all possible inputs of target programs are accounted for in the forward simulation. We thus have to strengthen the forward simulation assumption, requiring the following additional property to hold, for any source program <sup>W</sup><sup>S</sup> and target program **<sup>W</sup><sup>T</sup>** related by the forward simulation <sup>R</sup>.

$$\begin{array}{c} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{\bf{s}}}}\_{\operatorname{\bf{\bf{s}}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{s}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{s}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{\bf{s}}}}\_{\operatorname{\bf{\bf{s}}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{s}}}\_{\operatorname{\bf{\bf{s}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{s}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{s}}}\_{\operatorname{\bf{s}}} \operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{s}}}\_{\operatorname{\bf{s}}} \operatorname{\bf{\bf{\bf{s}}}\_{\operatorname{\bf{T}}}}} \mathop{\mathbf{w}}\_{\operatorname{\bf{\bf{s}}}\_{\operatorname{\bf{s}}} \operatorname{\bf{\bf$$

We say that a forward simulation for which this property holds is *flippable*. For our example compiler, a flippable forward simulation works as follows: whenever a boolean input occurs in the source, the target program must perform every strictly positive input **n** (and not just **1**, as suggested by the compiler). Using this property, determinacy of the target, input totality of the source, as well as the fact that any target input has an inverse image through the relation, we can indeed show that the forward simulation can be turned into a backward one: starting from <sup>W</sup><sup>S</sup> <sup>R</sup> **<sup>W</sup><sup>T</sup>** and an input **<sup>i</sup>T2**, we show that there is iS1 and **iT2** as in the diagram above, using the same arguments as when the inputs are the same; because the simulation is flippable, we can close the diagram, and obtain the existence of an adequate iS2. From this we obtain CC∼.

In fact, we have proven a completely general 'flipping theorem', with this flippable hypothesis on the forward simulation ( ). We have also shown that if the relation ∼ defines a bijection between the inputs of the source and the target, then any forward simulation is flippable, hence reobtaining the usual proof technique [7, 34] as a special case. This flipping theorem is further discussed in the online appendix.

#### 3.4 Abstraction Mismatches

We now consider how to relate traces where a single source action is compiled to multiple target ones. To illustrate this, we take a pure, statically-typed source language that can output (nested) pairs of arbitrary size, and a pure, *untyped* target language where sent values have a fixed size. Concretely, the source is analogous to the language of §3.3, except that it does not have inputs or booleans and it has an expression send e, which can emit a (nested) pair e of values in a single action. That is, given that e reduces to a pair, e.g., v1,v2, v3, expression send v1,v2, v3 emits action v1,v2, v3. That expression is compiled into a sequence of individual sends in the target language **send v1** ; **send v2** ; **send v3**, since in the target, **send e** sends the value that **e** reduces to, but the language has no pairs.

Due to space constraints we omit the full formalization of these simple languages and of the homomorphic compiler ((·) ⏐ : <sup>e</sup> <sup>→</sup> **<sup>e</sup>**). The only interesting bit is the compilation of the send · expression, which relies on the gensend (·) function below. That function takes a source expression of a given type and returns a sequence of target **send** · instructions that send each element of the expression.

$$\mathtt{gensend}\left(\vdash\mathtt{e}\mathrel{\mathtt{e}}\mathrel{\mathtt{\cdot}}\tau\right) = \begin{cases} \mathtt{send}\left(\vdash\mathtt{e}\mathrel{\mathtt{\cdot}}\mathrel{\mathtt{\cdot}}\mathtt{N}\right)\downarrow & \text{if }\tau\mathrel{\mathtt{\cdot}}=\mathtt{N} \\ \mathtt{gensend}\left(\vdash\mathtt{e}\mathrel{\mathtt{\cdot}}\mathtt{1}\mathrel{\mathtt{\cdot}}\tau'\right); \mathtt{gensend}\left(\vdash\mathtt{e}\mathrel{\mathtt{\cdot}}\mathtt{2}\mathrel{\mathtt{\cdot}}\tau''\right) & \text{if }\tau=\tau'\times\tau'' \end{cases}$$

Relating Traces. We start with the trivial relation between numbers: <sup>n</sup> <sup>∼</sup><sup>0</sup> **<sup>n</sup>**, i.e., numbers are related when they are the same. We cannot build a relation between single actions since a single source action is related to multiple target ones. Therefore, we define a relation between a source action M and a target trace **t** (a list of numbers), inductively on the structure of M (which is a pair of values, and values are natural numbers or pairs).

(Trace-Rel-N-N) <sup>n</sup> <sup>∼</sup><sup>0</sup> **<sup>n</sup>** <sup>n</sup> <sup>∼</sup><sup>0</sup> **<sup>n</sup>** n, n ∼ **n** · **n** (Trace-Rel-N-M) <sup>n</sup> <sup>∼</sup><sup>0</sup> **<sup>n</sup>** <sup>M</sup> <sup>∼</sup> **<sup>t</sup>** n, M ∼ **n** · **t** (Trace-Rel-M-N) <sup>M</sup> <sup>∼</sup> **<sup>t</sup>** <sup>n</sup> <sup>∼</sup><sup>0</sup> **<sup>n</sup>** M, n ∼ **t** · **n** (Trace-Rel-M-M) M ∼ **t** M ∼ **t** M, M ∼ **t** · **t**

A pair of naturals is related to the two actions that send each element of the pair (Rule Trace-Rel-N-N). If a pair is made of sub-pairs, we require all such sub-pairs to be related (Rules Trace-Rel-N-M to Trace-Rel-M-M). We build on these rules to define the

<sup>s</sup> <sup>∼</sup> **<sup>t</sup>** relation between source and target traces for which the compiler is correct (Theorem 3.4). Trivially, traces are related when they are both empty. Alternatively, given related traces, we can concatenate a source action and a second target trace provided that they are related (Rule Trace-Rel-Single).

$$\frac{\mathbf{s} \sim \mathbf{t} \quad \mathbf{s} \sim \mathbf{t} \quad \mathbf{t} \sim \mathbf{s}}{\mathbf{s} \sim \mathbf{t} \quad \mathbf{M} \sim \mathbf{t'}}$$

#### Theorem 3.4 ((·) ⏐ is correct). (·) ⏐ *is* CC∼*.*

With our trace relation, the trace property mappings capture the following intuitions: – The target-to-source mapping states that a source property can reconstruct target action as it sees fit. For example, trace **<sup>4</sup>** · **<sup>6</sup>** · **<sup>5</sup>** · **<sup>7</sup>** is related to 4, <sup>6</sup>·5, <sup>7</sup> and 4,6,5, 7 (and many more variations). This gives freedom to the source implementation of a target behavior, which follows from the non-injectivity of ∼. 5

– The source-to-target mapping "forgets" about the way pairs are nested, but is faithful w.r.t. the values v<sup>i</sup> contained in a message. Notice that source safety properties are always mapped to target safety properties. For instance, if π<sup>S</sup> ∈ Safety<sup>S</sup> prescribes that some bad number is never sent, then τ˜(πS) prescribes the same number is never sent in the target and <sup>τ</sup>˜(πS) <sup>∈</sup> **SafetyT**. Of course if <sup>π</sup><sup>S</sup> <sup>∈</sup> Safety<sup>S</sup> prescribes that a particular nested pairing like 4,6,5, 7 never happens, then τ˜(πS) is still a target safety property, but the trivial one, since <sup>τ</sup>˜(πS) = <sup>∈</sup> **SafetyT**.

# 4 Trace-Relating Compilation and Noninterference Preservation

When source and target observations are drawn from the same set, a correct compiler (CC<sup>=</sup>) is enough to ensure the preservation of all subset-closed hyperproperties, in particular of *noninterference* (NI) [22], as also mentioned at the beginning of §2.3. In the

<sup>5</sup> Making <sup>∼</sup> injective is a matter of adding open and close parenthesis actions in target traces.

scenario where target observations are strictly more informative than source observations, the best guarantee one may expect from a correct trace-relating compiler (CC∼) is a *weakening* (or *declassification*) of target noninterference that matches the noninterference property satisfied in the source. To formalize this reasoning, this section applies the trinitarian view of trace-relating compilation to the general framework of abstract noninterference (ANI) [21].

We first define NI and explain the issue of preserving source NI via a CC<sup>∼</sup> compiler. We then introduce ANI, which allows characterizations of various forms of noninterference, and formulate a general theory of ANI preservation via CC∼. We also study how to deal with cases such as undefined behavior in the target. Finally, we answer the dual question, i.e., which source NI should be satisfied to guarantee that compiled programs are noninterfering with respect to target observers.

Intuitively, NI requires that publicly observable outputs do not reveal information about private inputs. To define this formally, we need a few additions to our setup. We indicate the (disjoint) *input* and *output* projections of a trace t as t◦ and t • respectively6. Denote with [t]*low* the equivalence class of a trace t, obtained using a standard lowequivalence relation that relates low (public) events only if they are equal, and ingores any difference between private events. Then, NI for source traces can be defined as:

NI<sup>S</sup> = {π<sup>S</sup> | ∀s1s<sup>2</sup> ∈ πS. [s ◦ <sup>1</sup>]*low* = [s ◦ <sup>2</sup>]*low* ⇒ [s • <sup>1</sup>]*low* = [s • <sup>2</sup>]*low* } .

That is, source NI comprises the sets of traces that have equivalent low output projections as long as their low input projections are equivalent.

Trace-Relating Compilation and Noninterference. When additional observations are possible in the target, it is unclear whether a noninterfering source program is compiled to a noninterfering target program or not, and if so, whether the notion of NI in the target is the expected or desired one. We illustrate this issue considering a scenario where target traces extend source ones by exposing the execution time. While source noninterference NI<sup>S</sup> requires that private inputs do not affect public outputs, **NI<sup>T</sup>** additionally requires that the execution time is not affected by private inputs.

To model the scenario described, let Trace<sup>S</sup> denote the set of traces in the source, and **Trace<sup>T</sup>** <sup>=</sup> Trace<sup>S</sup> <sup>×</sup> <sup>N</sup>*<sup>ω</sup>* be the set of target traces, where <sup>N</sup>*<sup>ω</sup>* <sup>N</sup> ∪ {*ω*}. Target traces have two components: a source trace, and a natural number that denotes the time spent to produce the trace (*ω* if infinite). Notice that if two source traces <sup>s</sup>1,s2, are low-equivalent then {s1,s2} ∈ NI<sup>S</sup> and {(s1, **<sup>42</sup>**),(s1, **<sup>42</sup>**)} ∈ **NIT**, but {(s1, **<sup>42</sup>**),(s2, **<sup>43</sup>**)} ∈ **NI<sup>T</sup>** and {(s1, **<sup>42</sup>**),(s2, **<sup>42</sup>**),(s1, **<sup>43</sup>**),(s2, **<sup>43</sup>**)} ∈ **NIT**.

Consider the following straightforward trace relation, which relates a source trace to any target trace whose first component is equal to it, irrespective of execution time:

$$\mathbf{s} \sim \mathbf{t} \quad \equiv \quad \exists \mathbf{n} . \,\mathbf{t} = (\mathbf{s}, \mathbf{n}).$$

A compiler is CC<sup>∼</sup> if any trace that can be exhibited in the target can be simulated in the source in some amount of time. For such a compiler Theorem 2.11 says that if <sup>W</sup> satisfies NIS, then <sup>W</sup><sup>↓</sup> satisfies Cl<sup>⊆</sup> ◦ <sup>τ</sup>˜(NIS), which however is strictly weaker than **NIT**, as it contains, e.g., {(s1, **<sup>42</sup>**),(s2, **<sup>42</sup>**),(s1, **<sup>43</sup>**),(s2, **<sup>43</sup>**)}, and one cannot conclude that W↓ is noninterfering in the target. It is easy to prove that

<sup>6</sup> Here we only require the projections to be disjoint. Depending on the scenario and the attacker model the projections might record information such as the ordering of events.

Cl<sup>⊆</sup> ◦ <sup>τ</sup>˜(NIS) = Cl<sup>⊆</sup> ({ <sup>π</sup><sup>S</sup> <sup>×</sup> <sup>N</sup>*<sup>ω</sup>* <sup>|</sup> <sup>π</sup><sup>S</sup> <sup>∈</sup> NIS}) = { <sup>π</sup><sup>S</sup> ×I | <sup>π</sup><sup>S</sup> <sup>∈</sup> NI<sup>S</sup> ∧I ⊆ <sup>N</sup>*ω*} , the first equality coming from <sup>τ</sup>˜(πS) = <sup>π</sup><sup>S</sup> <sup>×</sup> <sup>N</sup>*ω*, and the second from NI<sup>S</sup> being subset-closed. As we will see, this hyperproperty *can* be characterized as a form of NI, which one might call *timing-insensitive noninterference*, and ensured only against attackers that cannot measure execution time. For this characterization, and to describe different forms of noninterference as well as formally analyze their preservation by a CC<sup>∼</sup> compiler, we rely on the general framework of *abstract noninterference* [21].

Abstract Noninterference. ANI [21] is a generalization of NI whose formulation relies on abstractions (in abstract interpretation sense [16]) in order to encompass arbitrary variants of NI. ANI is parameterized by an *observer abstraction* ρ, which denotes the distinguishing power of the attacker, and a *selection abstraction* φ, which specifies when to check NI, and therefore captures a form of declassification [54].<sup>7</sup> Formally:

ANI <sup>ρ</sup> <sup>φ</sup> = {π | ∀t1t<sup>2</sup> ∈ π. φ(t ◦ <sup>1</sup>) = φ(t ◦ <sup>2</sup>) ⇒ ρ(t • <sup>1</sup>) = ρ(t • <sup>2</sup>)} . By picking φ = ρ = [·]*low* , we recover the standard noninterference defined above, where NI must hold for all low inputs (i.e., no declassification of private inputs), and the observational power of the attacker is limited to distinguishing low outputs.

The observational power of the attacker can be weakened by choosing a more liberal relation for ρ. For instance, one may limit the attacker to observe the *parity* of output integer values. Another way to weaken ANI is to use φ to specify that noninterference is only required to hold for a subset of low inputs.

To be formally precise, φ and ρ are defined over sets of (input and output projections of) traces, so when we write φ(t) above, this should be understood as a convenience notation for φ({t}). Likewise, φ = [·]*low* should be understood as φ = λπ. <sup>t</sup>∈<sup>π</sup>[t]*low* , i.e., the powerset lifting of [·]*low* . Additionally, φ and ρ are required to be upper-closed operators (uco)—i.e., monotonic, idempotent and extensive—on the poset that is the powerset of (input and output projections of) traces ordered by inclusion [21].

Trace-Relating Compilation and ANI for Timing. We can now reformulate our example with observable execution times in the target in terms of ANI. We have NI<sup>S</sup> = ANI <sup>φ</sup><sup>S</sup> <sup>ρ</sup><sup>S</sup> with φ<sup>S</sup> = ρ<sup>S</sup> = [·]*low* . In this case, we can formally describe the hyperproperty that a compiled program W↓ satisfies whenever W satisfies NI<sup>S</sup> as an instance of ANI: Cl<sup>⊆</sup> ◦ <sup>τ</sup>˜(NIS) = ANI *<sup>ρ</sup>***<sup>T</sup>** *<sup>φ</sup>***<sup>T</sup>** ,

for *<sup>φ</sup>***<sup>T</sup>** <sup>=</sup> <sup>φ</sup><sup>S</sup> and *<sup>ρ</sup>***T**(*π***T**) = {(s, **<sup>n</sup>**) | ∃(s1, **<sup>n</sup>1**) <sup>∈</sup> *<sup>π</sup>***T**. [<sup>s</sup> • ]*low* = [s • <sup>1</sup>]*low* } .

The definition of *φ***<sup>T</sup>** tells us that the trace relation does not affect the selection abstraction. The definition of *ρ***<sup>T</sup>** characterizes an observer that cannot distinguish execution times for noninterfering traces (notice that **n<sup>1</sup>** in the definition of *ρ***<sup>T</sup>** is discarded). For instance, *<sup>ρ</sup>***T**({(s, **<sup>n</sup>1**)}) = *<sup>ρ</sup>***T**({(s, **<sup>n</sup>2**)}), for any <sup>s</sup>, **<sup>n</sup>1**, **<sup>n</sup>2**. Therefore, in this setting, we know explicitly through *ρ***<sup>T</sup>** that a CC<sup>∼</sup> compiler degrades source noninterference to target *timing-insensitive* noninterference.

Trace-Relating Compilation and ANI in General. While the particular *φ***<sup>T</sup>** and *ρ***<sup>T</sup>** above can be discovered by intuition, we want to know whether there is a systematic way of obtaining them in general. In other words, for *any* trace relation ∼ and *any*

<sup>7</sup> ANI includes a third parameter η, which describes the maximal input variation that the attacker may control. Here we omit η (i.e., take it to be the identity) in order to simplify the presentation.

notion of source NI, what property is guaranteed on noninterfering source programs by any CC<sup>∼</sup> compiler?

We can now answer this question generally (Theorem 4.1): any source notion of noninterference expressible as an instance of ANI is mapped to a corresponding instance of ANI in the target, whenever source traces are an abstraction of target ones (i.e., when ∼ is a total and surjective map). For this result we consider trace relations that can be split into input and output trace relations (denoted as <sup>∼</sup> ◦ ∼, • ∼) such that <sup>s</sup> <sup>∼</sup> **<sup>t</sup>** ⇐⇒ <sup>s</sup>◦ ◦ <sup>∼</sup> **<sup>t</sup>**◦ <sup>∧</sup> <sup>s</sup>• • <sup>∼</sup> **<sup>t</sup>**• . The trace relation ∼ corresponds to a Galois connection between the sets of trace properties τ˜ σ˜ as described in §2.2. Similarly, the pair ◦ ∼ and • <sup>∼</sup> corresponds to a pair of Galois connections, <sup>τ</sup>˜◦ <sup>σ</sup>˜◦ and <sup>τ</sup>˜• <sup>σ</sup>˜• , between the sets of input and output properties. In the timing example, time is an output so we have <sup>∼</sup> =, • ∼ and • ∼ is defined as s• • <sup>∼</sup> **<sup>t</sup>**• ≡ ∃**n**. **<sup>t</sup>**• <sup>=</sup> (s• , **n**).

Theorem 4.1 (Compiling ANI). *Assume traces of source and target languages are related via* ∼ ⊆ Trace<sup>S</sup> <sup>×</sup> **TraceT***,* <sup>∼</sup> ◦ ∼, • ∼ *such that* ◦ ∼ *and* • ∼ *are both total maps from target to source traces, and* ◦ ∼ *is surjective. Assume* ↓ *is a* CC<sup>∼</sup> *compiler, and* <sup>φ</sup><sup>S</sup> <sup>∈</sup> uco(2Trace◦ <sup>S</sup> ), <sup>ρ</sup><sup>S</sup> <sup>∈</sup> uco(2Trace• <sup>S</sup> )*.*

*If* <sup>W</sup> *satisfies* ANI <sup>ρ</sup><sup>S</sup> φ<sup>S</sup> *, then* <sup>W</sup><sup>↓</sup> *satisfies* ANI *<sup>ρ</sup>***# T** *φ***#** *, where φ***# <sup>T</sup>** *and <sup>ρ</sup>***# <sup>T</sup>** *are defined as:*

**T**

*φ***# <sup>T</sup>** = g◦ ◦ φ<sup>S</sup> ◦ f ◦ ; *ρ***# <sup>T</sup>** = g• ◦ ρ<sup>S</sup> ◦ f • *and*

f ◦ (*π◦* **<sup>T</sup>**) = s ◦ <sup>∃</sup>**t**◦ <sup>∈</sup> *<sup>π</sup>◦* **<sup>T</sup>**. s ◦ ◦ <sup>∼</sup> **<sup>t</sup>**◦ ; g◦ (π◦ <sup>S</sup>) = {**t**◦ | ∀<sup>s</sup> ◦ . s ◦ ◦ <sup>∼</sup> **<sup>t</sup>**◦ <sup>⇒</sup> <sup>s</sup> ◦ ∈ π◦ S} *(and both* f • *and* g• *are defined analogously).*

For the example above we recover the definitions we justified intuitively, i.e., *φ***# <sup>T</sup>** = <sup>g</sup>◦ ◦ <sup>φ</sup><sup>S</sup> ◦ <sup>f</sup> ◦ <sup>=</sup> *<sup>φ</sup>***<sup>T</sup>** and *<sup>ρ</sup>***# <sup>T</sup>** = g• ◦ ρ<sup>S</sup> ◦ f • = *ρ***T**. Moreover, we can prove that if • ∼ also is surjective, ANI *<sup>ρ</sup>***# T** *φ***# T** <sup>⊆</sup> Cl<sup>⊆</sup> ◦ <sup>τ</sup>˜(ANI <sup>ρ</sup><sup>S</sup> φ<sup>S</sup> ). Therefore, the derived guarantee ANI *<sup>ρ</sup>***# T** *φ***# T** is at least as strong as the one that follows by just knowing that the compiler ↓ is CC∼.

Noninterference and Undefined Behavior. As stated above, Theorem 4.1 does not apply to several scenarios from §3 such as undefined behavior (§3.1), as in those cases the relation • ∼ is not a total map. Nevertheless, we can still exploit our framework to reason about the impact of compilation on noninterference.

Let us consider <sup>∼</sup> ◦ ∼, • ∼ where ◦ ∼ is any total and surjective map from target to source inputs (e.g., equality) and • ∼ is defined as s• • <sup>∼</sup> **<sup>t</sup>**• <sup>≡</sup> <sup>s</sup>• <sup>=</sup> **<sup>t</sup>**• ∨ ∃m• <sup>≤</sup> **<sup>t</sup>**• . s• = <sup>m</sup>• · Goes\_wrong. Intuitively, a CC<sup>∼</sup> compiler guarantees that no interference can be observed by a target attacker that cannot exploit undefined behavior to learn private information. This intuition can be made formal by the following theorem.

Theorem 4.2 (Relaxed Compiling ANI). *Relax the assumptions of Theorem 4.1 by allowing* • <sup>∼</sup> *to be* any *output trace relation. If* <sup>W</sup> *satisfies* ANI <sup>ρ</sup><sup>S</sup> φ<sup>S</sup> *, then* W↓ *satisfies* ANI *<sup>ρ</sup>***# T** *φ***# T** *where φ***# <sup>T</sup>** *is defined as in Theorem 4.1, and <sup>ρ</sup>***# <sup>T</sup>** *is such that:*

$$
\forall \mathbf{s} \,\,\mathbf{t} . \,\mathbf{s}^\* \,\stackrel{\scriptstyle}{\sim} \mathbf{t}^\* \Rightarrow \ \rho\_\mathbf{T}^\#(\mathbf{t}^\*) = \rho\_\mathbf{T}^\#(\tilde{\tau}^\star(\rho\_\mathbf{S}(\mathbf{s}^\star))) .
$$

Technically, instead of giving us a *definition* of *ρ***# <sup>T</sup>** , the theorem gives a *property* of it. The property states that, given a target output trace **t**• , the attacker cannot distinguish it from any other target output traces produced by other possible compilations (τ˜• ) of the source trace s it relates to, up to the observational power of the source level attacker ρS. Therefore, given a source attacker ρS, the theorem characterizes a *family* of attackers that cannot observe any interference for a correctly compiled noninterfering program. Notice that the target attacker *ρ***# <sup>T</sup>** = λ\_. satisfies the premise of the theorem, but defines a trivial hyperproperty, so that we cannot prove in general that ANI *<sup>ρ</sup>***# T** *φ***#** <sup>⊆</sup> Cl<sup>⊆</sup> ◦

**T** <sup>τ</sup>˜(ANI <sup>ρ</sup><sup>S</sup> φ<sup>S</sup> ). The same *ρ***# <sup>T</sup>** = λ\_. shows that the family of attackers described in Theorem 4.2 is nonempty, and this ensures the existence of a most powerful attacker among them [21], whose explicit characterization we leave for future work.

From Target NI to Source NI. We now explore the dual question: under what hypotheses does trace-relating compiler correctness alone allow target noninterference to be reduced to source noninterference? This is of practical interest, as one would be able to protect from target attackers by ensuring noninterference in the source. This task can be made easier if the source language has some static enforcement mechanism [1, 36].

Let us consider the languages from §3.4 extended with inputting of (pairs of) values. It is easy to show that the compiler described in §3.4 is still CC∼. Assume that we want to satisfy a given notion of target noninterference after compilation, i.e., <sup>W</sup>↓*|***=**ANI *<sup>ρ</sup>***<sup>T</sup>** *<sup>φ</sup>***<sup>T</sup>** . Recall that the observational power of the target attacker, *ρ***T**, is expressed as a property of sequences of values. To express the same property (or attacker) in the source, we have to abstract the way pairs of values are nested. For instance, the source attacker should not distinguish v1,v2, v3 and v1, v2, v3. In general (i.e., when ◦ ∼ is not the identity), this argument is valid only when *φ***<sup>T</sup>** can be represented in the source. More precisely, *φ***<sup>T</sup>** must consider as equivalent all target inputs that are related to the same source one, because in the source it is not possible to have a finer distinction of inputs. This intuitive correspondence can be formalized as follows:

Theorem 4.3 (Target ANI by source ANI). *Let <sup>φ</sup>***<sup>T</sup>** <sup>∈</sup> uco(2**Trace**◦ **<sup>T</sup>** )*, <sup>ρ</sup>***<sup>T</sup>** <sup>∈</sup> uco(2**Trace**• **<sup>T</sup>** ) *and* • ∼ *a total and surjective map from source outputs to target ones and assume that*

$$\begin{aligned} \forall \text{s.t. } \mathsf{s}^{\circ} \stackrel{\circ}{\sim} \mathsf{t}^{\circ} \Rightarrow \phi\_{\mathsf{T}}(\mathsf{t}^{\circ}) &= \phi\_{\mathsf{T}}(\tilde{\tau}^{\circ}(\mathsf{s}^{\circ})). \\ \text{If } \cdot \downarrow \text{ is a } \mathsf{C}\mathbb{C}^{\circ} \text{ complex and } \mathsf{N} \text{ satisfies } \mathit{AND}\_{\phi\_{\mathsf{S}}^{\#}}^{\rho\_{\mathsf{S}}^{\#}}, \text{ then } \mathsf{N}\downarrow \text{ satisfies } \mathit{AND}\_{\phi\_{\mathsf{T}}}^{\rho\_{\mathsf{T}}} for \\ \phi\_{\mathsf{S}}^{\#} = \tilde{\sigma}^{\circ} \circ \phi\_{\mathsf{T}} \circ \tilde{\tau}^{\circ}; \qquad \qquad \qquad \qquad \qquad \qquad \qquad \rho\_{\mathsf{S}}^{\#} = \tilde{\sigma}^{\bullet} \circ \rho\_{\mathsf{T}} \circ \tilde{\tau}^{\bullet}. \end{aligned}$$

To wrap up the discussion about noninterference, the results presented in this section formalize and generalize some intuitive facts about compiler correctness and noninterference. Of course, they all place some restrictions on the shape of the noninterference instances that can be considered, because compiler correctness alone is in general not a strong enough criterion for dealing with many security properties [6, 17].

# 5 Trace-Relating Secure Compilation

So far we have studied compiler correctness criteria for whole, standalone programs. However, in practice, programs do not exist in isolation, but in a context where they interact with other programs, libraries, etc. In many cases, this context cannot be assumed to be benign and could instead behave maliciously to try to disrupt a compiled program.

Hence, in this section we consider the following *secure compilation* scenario: a source program is compiled and linked with an arbitrary target-level context, i.e., one that may not be expressible as the compilation of a source context. Compiler correctness does not address this case, as it does not consider arbitrary target contexts, looking instead at whole programs (empty context [33]) or well-behaved target contexts that behave like source ones (as in compositional compiler correctness [27, 30, 45, 57]).

To account for this scenario, Abate et al. [2] describe several secure compilation criteria based on the preservation of classes of (hyper)properties (e.g., trace properties, safety, hypersafety, hyperproperties, etc.) against arbitrary target contexts. For each of these criteria, they give an equivalent "property-free" criterion, analogous to the equivalence between TP and CC=. For instance, their *robust* trace property preservation criterion (RTP) states that, for any trace property π, if a source *partial* program P plugged into any context C<sup>S</sup> satisfies π, then the compiled program P↓ plugged into any target context **C<sup>T</sup>** satisfies π. Their equivalent criterion to RTP is RTC, which states that for any trace produced by the compiled program, when linked with any target context, there is a source context that produces the same trace. Formally (writing C [P] to mean the whole program that results from linking partial program P with context C) they define: RTP ≡ ∀P. <sup>∀</sup>π. (∀CS. <sup>∀</sup>t.C<sup>S</sup> [P]<sup>t</sup> <sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>π</sup>) <sup>⇒</sup> (∀**CT**. <sup>∀</sup>**t**. **<sup>C</sup><sup>T</sup>** [P↓]t ⇒ t ∈ π); RTC ≡ ∀P. <sup>∀</sup>**CT**.∀t.**C<sup>T</sup>** [P↓]<sup>t</sup> ⇒ ∃CS. <sup>C</sup><sup>S</sup> [P]t.

In the following we adopt the notation P |=<sup>R</sup> π to mean "P *robustly* satisfies π," i.e., P satisfies π irrespective of the contexts it is linked with. Thus, we write more compactly:

RTP ≡ ∀π. <sup>∀</sup>P. <sup>P</sup> <sup>|</sup>=R<sup>π</sup> <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=R**π.

All the criteria of Abate et al. [2] share this flavor of stating the existence of some source context that simulates the behavior of any given target context, with some variations depending on the class of (hyper)properties under consideration. All these criteria are stated in a setting where source and target traces are the same. In this section, we extend their result to our trace-relating setting, obtaining trintarian views for secure compilation. Despite the similarities with §2, more challenges show up, in particular when considering the robust preservation of proper sub-classes of trace properties. For example, after application of σ˜ or τ˜, a property may not be safety anymore, a crucial point for the equivalence with the property-free criterion for safety properties by Abate et al. [2]. We solve this by interpreting the class of safety properties as an *abstraction* of the class of all trace properties induced by a closure operator (§5.1). The remaining subsections provide example compilation chains satisfying our trace-relating secure compilation criteria for trace properties (§5.2) and for safety properties hypersafety (§5.3).

#### 5.1 Trace-Relating Secure Compilation: A Spectrum of Trinities

In this subsection we generalize many of the criteria of Abate et al. [2] using the ideas of §2. Before discussing how we solve the challenges for classes such as safety and hypersafety, we show the simple generalization of RTC to the trace-relating setting (RTC∼) and its corresponding trinitarian view (Theorem 5.1):

Theorem 5.1 (Trinity for Robust Trace Properties ). *For any trace relation* ∼ *and induced property mappings* <sup>τ</sup>˜ *and* <sup>σ</sup>˜*, we have:* RTP<sup>τ</sup>˜ ⇐⇒ RTC<sup>∼</sup> ⇐⇒ RTP<sup>σ</sup>˜ *, where* RTC<sup>∼</sup> ≡ ∀<sup>P</sup> <sup>∀</sup>**C<sup>T</sup>** <sup>∀</sup>**t**. **<sup>C</sup><sup>T</sup>** [P↓]**<sup>t</sup>** ⇒ ∃C<sup>S</sup> <sup>∃</sup><sup>s</sup> <sup>∼</sup> **<sup>t</sup>**. <sup>C</sup><sup>S</sup> [P]s;

RTP<sup>τ</sup>˜ ≡ ∀<sup>P</sup> <sup>∀</sup>π<sup>S</sup> <sup>∈</sup> <sup>2</sup>TraceS . <sup>P</sup> <sup>|</sup>=<sup>R</sup> <sup>π</sup><sup>S</sup> <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=<sup>R</sup>** <sup>τ</sup>˜(πS);

$$\mathsf{RTP}^{\tilde{\sigma}} \equiv \forall \mathsf{P} \; \forall \pi\_{\mathsf{T}} \in 2^{\mathsf{Tracec}\_{\mathsf{T}}}.\mathsf{P} \left| \vdash\_{\mathsf{R}} \tilde{\sigma}(\pi\_{\mathsf{T}}) \Rightarrow \mathsf{P} \downarrow \vdash\_{\mathsf{R}} \pi\_{\mathsf{T}}.\right.$$

Abate et al. [2] propose many more equivalent pairs of criteria, each preserving different classes of (hyper)properties, which we briefly recap now. For trace properties, they also have criteria that preserve safety properties plus their version of liveness properties. For hyperproperties, they have criteria that preserve hypersafety properties, subset-closed hyperproperties, and arbitrary hyperproperties. Finally, they define *relational* hyperproperties, which are relations between the behaviors of multiple programs for expressing, e.g., that a program always runs faster than another. For relational hyperproperties, they have criteria that preserve arbitrary relational properties, relational safety properties, relational hyperproperties and relational subset-closed hyperproperties. Roughly speaking, the security guarantees due to robust preservation of trace properties regard only protecting the integrity of the program from the context, the guarantees of hyperproperties also regard data confidentiality, and the guarantees of relational hyperproperties even regard code confidentiality. Naturally, these stronger guarantees are increasingly harder to enforce and prove.

While we have lifted the most significant criteria from Abate et al. [2] to our trinitarian view, due to space constraints we provide the formal definitions only for the two most interesting criteria. We summarize the generalizations of many other criteria in Figure 2, described at the end. Omitted definitions are available in the online appendix.

Beyond Trace Properties: Robust Safety and Hyperproperty Preservation. We detail robust preservation of safety properties and of arbitrary hyperproperties since they are both relevant from a security point of view and their generalization is interesting.

Theorem 5.2 (Trinity for Robust Safety Properties ). *For any trace relation* ∼ *and for the induced property mappings* τ˜ *and* σ˜*, we have:*

RTP*Safe*◦τ˜ ⇐⇒ RSC<sup>∼</sup> ⇐⇒ RSP<sup>σ</sup>˜ , *where* RSC<sup>∼</sup> ≡ ∀<sup>P</sup> <sup>∀</sup>**C<sup>T</sup>** <sup>∀</sup>**<sup>t</sup>** <sup>∀</sup>**<sup>m</sup>** <sup>≤</sup> **<sup>t</sup>**.**C<sup>T</sup>** [P↓]**<sup>t</sup>** ⇒ ∃C<sup>S</sup> <sup>∃</sup>**t** <sup>≥</sup> **<sup>m</sup>** <sup>∃</sup><sup>s</sup> <sup>∼</sup> **<sup>t</sup>** . C<sup>S</sup> [P]s; RTP*Safe*◦τ˜ ≡ ∀P∀π<sup>S</sup> <sup>∈</sup> <sup>2</sup>TraceS .<sup>P</sup> <sup>|</sup>=<sup>R</sup> <sup>π</sup><sup>S</sup> <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=<sup>R</sup>** (Safe ◦ <sup>τ</sup>˜)(πS); RSP<sup>σ</sup>˜ ≡ ∀P∀*π***<sup>T</sup>** <sup>∈</sup> **SafetyT**.<sup>P</sup> <sup>|</sup>=<sup>R</sup> <sup>σ</sup>˜(*π***T**) <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=<sup>R</sup>** *<sup>π</sup>***T**.

There is an interesting asymmetry between the last two characterizations above, which we explain now in more detail. RSP<sup>σ</sup>˜ quantifies over target safety properties, while RTP*Safe*◦τ˜ quantifies over *arbitrary* source properties, but imposes the composition of <sup>τ</sup>˜ with Safe, which maps an arbitrary target property *<sup>π</sup>***<sup>T</sup>** to the target safety property that best over-approximates *π***<sup>T</sup>** <sup>8</sup> (an analogous *closure* was needed for subset-closed hyperproperties in Theorem 2.11). More precisely, Safe is a closure operator on target properties, with **Safety<sup>T</sup>** = Safe(*π***T**) *<sup>π</sup>***<sup>T</sup>** <sup>∈</sup> <sup>2</sup>**Trace<sup>T</sup>** . The mappings

$$Safe \circ \tilde{\tau}: 2^{\mathsf{Tades}} \rightleftharpoons \mathsf{Safety}\_{\mathsf{T}}: \tilde{\sigma}$$

determine a Galois connection between source trace properties and target safety properties, and ensure the equivalence RTP*Safe*◦τ˜ ⇐⇒ RSP<sup>σ</sup>˜ ( ). This argument generalizes to arbitrary closure operators on target properties ( ) and on hyperproperties, as long as the corresponding class is a sub-class of subset-closed hyperproperties, and

<sup>8</sup> *Safe*(*π***T**) = ∩ {*S***<sup>T</sup>** <sup>|</sup> *<sup>π</sup>***<sup>T</sup>** <sup>⊆</sup> *<sup>S</sup>***<sup>T</sup>** <sup>∧</sup> *<sup>S</sup>***<sup>T</sup>** <sup>∈</sup> **SafetyT**} is the topological closure in the topology of Clarkson and Schneider [14], where safety properties coincide with the closed sets.

explains all but one of the asymmetries in Figure 2, the one that concerns the robust preservation of arbitrary hyperproperties:

Theorem 5.3 (Weak Trinity for Robust Hyperproperties ). *For a trace relation* ∼ ⊆ Trace<sup>S</sup> <sup>×</sup> **Trace<sup>T</sup>** *and induced property mappings* <sup>σ</sup>˜ *and* <sup>τ</sup>˜*,* RHC<sup>∼</sup> *is equivalent to* RHPτ˜*; moreover, if* <sup>τ</sup>˜ <sup>σ</sup>˜ *is a Galois insertion (i.e.,* <sup>τ</sup>˜ ◦ <sup>σ</sup>˜ <sup>=</sup> id*),* RHC<sup>∼</sup> *implies* RHPσ˜ *, while if* <sup>σ</sup>˜ <sup>τ</sup>˜ *is a Galois reflection (i.e.,* <sup>σ</sup>˜ ◦ <sup>τ</sup>˜ <sup>=</sup> id*),* RHPσ˜ *implies* RHC∼*,*

*where* RHC<sup>∼</sup> ≡ ∀<sup>P</sup> <sup>∀</sup>**C<sup>T</sup>** <sup>∃</sup>C<sup>S</sup> <sup>∀</sup>**t**. **<sup>C</sup><sup>T</sup>** [P↓]**<sup>t</sup>** ⇐⇒ (∃<sup>s</sup> <sup>∼</sup> **<sup>t</sup>**. <sup>C</sup><sup>S</sup> [P]s);

RHP<sup>τ</sup>˜ ≡ ∀<sup>P</sup> <sup>∀</sup>HS. <sup>P</sup> <sup>|</sup>=<sup>R</sup> <sup>H</sup><sup>S</sup> <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=<sup>R</sup>** <sup>τ</sup>˜(HS); RHP<sup>σ</sup>˜ ≡ ∀<sup>P</sup> <sup>∀</sup>**HT**. <sup>P</sup> <sup>|</sup>=<sup>R</sup> <sup>σ</sup>˜(**HT**) <sup>⇒</sup> <sup>P</sup><sup>↓</sup> *<sup>|</sup>***=<sup>R</sup> <sup>H</sup>T**.

This trinity is *weak* since extra hypotheses are needed to prove some implications. While the equivalence RHC<sup>∼</sup> ⇐⇒ RHP<sup>τ</sup>˜ holds unconditionally, the other two implications hold only under distinct, stronger assumptions. For RHP<sup>σ</sup>˜ it is still possible and correct to deduce a source obligation for a given target hyperproperty **H<sup>T</sup>** when no information is lost in the the composition τ˜ ◦ σ˜ (i.e., the two maps are a Galois *insertion*). On the other hand, RHP<sup>τ</sup>˜ is a consequence of RHP<sup>σ</sup>˜ when no information is lost in composing in the other direction, σ˜ ◦ τ˜ (i.e., the two maps are a Galois *reflection*).

Navigating the Diagram. For a given trace relation ∼, Figure 2 orders the generalized criteria according to their relative strength. If a trinity implies another (denoted by ⇒), then the former provides stronger security for a compilation chain than the latter.

As mentioned, some property-full criteria regarding proper subclasses (i.e., subsetclosed hyperproperties, safety, hypersafety, 2-relational safety and 2-relational hyperproperties) quantify over arbitrary (relational) (hyper)properties and compose τ˜ with an additional operator. We have already presented the Safe operator; other operators are Cl<sup>⊆</sup>, HSafe, and 2rSafe, which approximate the image of <sup>τ</sup>˜ with a subset-closed hyperproperty, a hypersafety and 2-relational safety respectively.

As a reading aid, when quantifying over arbitrary trace properties we use the shaded blue as background color, we use the red when quantifying over arbitrary subset-closed hyperproperties and green for arbitrary 2-relational properties.

We now describe how to interpret the acronyms in Figure 2. All criteria start with R meaning they refer to robust preservation. Criteria for relational hyperproperties—here only arity 2 is shown—contain 2r. Next, criteria names spell the class of hyperproperties they preserve: H for hyperproperties, SCH for subset-closed hyperproperties, HS for hypersafety, T for trace properties, and S for safety properties. Finally, property-free criteria end with a C while property-full ones involving σ˜ and τ˜ end with P. Thus, *robust (*R*) subset-closed hyperproperty-preserving (*SCH*) compilation (*C*)* is RSCHC∼, *robust (*R*) two-relational (*2r*) safety-preserving (*S*) compilation (*C*)* is R2rSC∼, etc.

#### 5.2 Instance of Trace-Relating Robust Preservation of Trace Properties

This subsection illustrates trace-relating secure compilation when the target language has strictly more events than the source that target contexts can exploit to break security.

Source and Target Languages. The source and target languages used here are nearly identical expression languages, borrowing from the syntax of the source language of §3.3. Both languages add *sequencing* of expressions, two kinds of *output events*, and

Fig. 2: Hierarchy of trinitarian views of secure compilation criteria preserving classes of hyperproperties and the key to read each acronym. Shorthands 'Ins.' and 'Refl.' stand for Galois Insertion and Reflection. The symbol denotes trinities proven in Coq.

the expressions that generate them: out<sup>S</sup> n and **out<sup>S</sup> n** usable in source and target, respectively, and **out<sup>T</sup> n** usable only in the target, which is the only difference between source and target. The extra events in the target model the fact that the target language has an increased ability to perform certain operations, some of them potentially dangerous (such as writing to the hard drive), which cannot be performed by the source language, and against which source-level reasoning can therefore offer no protection.

Both languages and compilation chains now deal with partial programs, contexts and linking of those two to produce whole programs. In this setting, a whole program is the combination of a *main expression* to be evaluated and a set of *function definitions* (with distinct names) that can refer to their argument symbolically and can be called by the main expression and by other functions. The set of functions of a whole program is the union of the functions of a partial program and a context; the latter also contains the main expression. The extensions of the typing rules and the operational semantics for whole programs are unsurprising and therefore elided. The trace model also follows closely that of §3.3: it consists of a list of *regular events* (including the new outputs) terminated by a *result event*. Finally, a partial program and a context can be linked into a whole program when their functions satisfy the requirements mentioned above.

Relating Traces. In the present model, source and target traces differ only in the fact that the target draws (regular) events from a strictly larger set than the source, i.e., **<sup>Σ</sup><sup>T</sup>** <sup>⊃</sup> <sup>Σ</sup>S. A natural relation between source and target traces essentially maps to a given target trace **t** the source trace that erases from **t** those events that exist only at the target level. Let **<sup>t</sup>**|ΣS indicate trace **<sup>t</sup>** filtered to retain only those elements included in alphabet ΣS. We define the trace relation as:

$$\mathbf{s} \sim \mathbf{t} \quad \equiv \quad \mathbf{s} = \mathbf{t}|\_{\Sigma\_{\mathbf{S}}}.$$

In the opposite direction, a source trace s is related to many target ones, as any targetonly events can be inserted at any point in s. The induced mappings for ∼ are:

<sup>τ</sup>˜(πS) = {**<sup>t</sup>** | ∃s.<sup>s</sup> <sup>=</sup> **<sup>t</sup>**|ΣS <sup>∧</sup> <sup>s</sup> <sup>∈</sup> <sup>π</sup>S} ; ˜σ(*π***T**) = {<sup>s</sup> | ∀**t**.<sup>s</sup> <sup>=</sup> **<sup>t</sup>**|ΣS <sup>⇒</sup> **<sup>t</sup>** <sup>∈</sup> *<sup>π</sup>***T**} .

That is, the target guarantee of a source property is that the target has the same source-level behavior, sprinkled with arbitrary target-level behavior. Conversely, the source-level obligation of a target property is the aggregate of those source traces all of whose target-level enrichments are in the target property.

Since R<sup>S</sup> and **R<sup>T</sup>** are very similar, it is simple to prove that the identity compiler (·↓) from <sup>R</sup><sup>S</sup> to **<sup>R</sup><sup>T</sup>** is secure according to the trace relation <sup>∼</sup> defined above.

Theorem 5.4 (·↓ is Secure ). ·↓ *is* RTC∼*.*

#### 5.3 Instances of Trace-Relating Robust Preservation of Safety and Hypersafety

To provide examples of cross-language trace-relations that preserve safety and hypersafety properties, we show how existing secure compilation results can be interpreted in our framework. This indicates how the more general theory developed here can already be instantiated to encompass existing results, and that existing proof techniques can be used in order to achieve the secure compilation criteria we define.

For the preservation of safety, Patrignani and Garg [50] study a compiler from a typed, concurrent WHILE language to an untyped, concurrent WHILE language with support for memory capabilities. As in §3.3, their source has bools and nats while their target only has **nat**s. Additionally, their source has an ML-like memory (where the domain is locations ) while their target has an assembly-like memory (where the domain is natural numbers **n**). Their traces consider context-program interactions and as such they are concatenations of call and return actions with parameters, which can include booleans as well as locations. Because of the aforementioned differences, they need a cross-language relation to relate source and target actions.

Besides defining a relation on traces (i.e., an instance of ∼), they also define a relation between source and target safety properties. They provide an instantiation of τ that maps all safe source traces to the related target ones. This ensures that no additional target trace is introduced in the target property, and source safety properties are mapped to target safety ones by τ . Their compiler is then proven to generate code that respects τ , so they achieve a variation of RTP*Safe*◦τ˜.

Concerning the preservation of hypersafety, Patrignani and Garg [49] consider compilers in a reactive setting where traces are sequences of input (α?) and output (α!) actions. In their setting, traces are different between source and target, so they define a cross-language relation on actions that is total on the source actions and injective. Additionally, their set of target output actions is strictly larger than the source one, as it includes a special action *√*, which is how compiled code must respond to invalid target inputs (i.e., receiving a **bool** when a **nat** was expected). Starting from the relation on actions, they define **TPC**, which is an instance of what we call τ . Informally, given a set of source traces, **TPC** generates all target traces that are related (pointwise) to a source trace. Additionally, it generates all traces with interleavings of undesired inputs *α***?** followed by *√* as long as removing *α***?** *√* leaves a trace that relates to the source trace. **TPC** preserves hypersafety across languages, i.e., it is an instance of RSCHP*HSafe*◦τ˜ mapping source hypersafety to target hypersafety (and safety to safety).

# 6 Related Work

We already discussed how our results relate to some existing work in correct compilation [33, 58] and secure compilation [2, 49, 50]. We also already mentioned that most of our definitions and results make no assumptions about the structure of traces. One result that relies on the structure of traces is Theorem 5.2, which involves some *finite prefix* m, suggesting traces should be some sort of sequences of events (or states), as customary when one wants to refer to safety properties [14]. It is however sufficient to fix a topology on properties where safety properties coincide with closed sets [46]. Even for reasoning about safety, hypersafety, or arbitrary hyperproperties, traces can therefore be values, sequences of program states, or of input output events, or even the recently proposed *interaction trees* [62]. In the latter case we believe that the compilation from IMP to ASM proposed by Xia et al. [62] can be seen as an instance of HC∼, for the relation they call "trace equivalence."

Compilers Where Our Work Could Be Useful. Our work should be broadly applicable to understanding the guarantees provided by many verified compilers. For instance, Wang et al. [61] recently proposed a CompCert variant that compiles all the way down to machine code, and it would be interesting to see if the model at the end of §3.1 applies there too. This and many other verified compilers [12, 29, 42, 56] beyond CakeML [58] deal with resource exhaustion and it would be interesting to also apply the ideas of §3.2 to them. Hur and Dreyer [27] devised a correct compiler from an ML language to assembly using a cross-language logical relation to state their CC theorem. They do not have traces, though were one to add them, the logical relation on values would serve as the basis for the trace relation and therefore their result would attain CC∼.

Switching to more informative traces capturing the interaction between the program and the context is often used as a proof technique for secure compilation [2, 28, 48]. Most of these results consider a cross-language relation, so they probably could be proved to attain one of the criteria from Figure 2.

Generalizations of Compiler Correctness. The compiler correctness definition of Morris [41] was already general enough to account for trace relations, since it considered a translation between the semantics of the source program and that of the compiled program, which he called "decode" in his diagram, reproduced in Figure 3 (left). And even some of the more recent compiler correctness definitions preserve this kind of flexibility [51]. While CC<sup>∼</sup> can be seen as an instance of a definition by Morris [41], we are not aware of any prior work that investigated the preservation of properties when the "decode translation" is neither the identity nor a bijection, and source properties need to be re-interpreted as target ones and vice versa.

Correct Compilation and Galois Connections. Melton et al. [38] and Sabry and Wadler [55] expressed a strong variant of compiler correctness using the diagram of Figure 3 (right) [38, 55]. They require that compiled programs *parallel* the computation steps of the original source programs, which can be proven showing the existence of a *decompilation* map # that makes the diagram commute, or equivalently, the existence of an adjoint for <sup>↓</sup> (<sup>W</sup> <sup>≤</sup> <sup>W</sup> ⇐⇒ <sup>W</sup> <sup>W</sup> for both source and target). The

Fig. 3: Morris's [41] (left) and Melton et al.'s [38] and Sabry and Wadler's [55] (right)

"parallel" intuition can be formalized as an instance of CC∼. Take source and target traces to be finite or infinite sequences of program states (maximal trace semantics [15]), and relate them exactly like Melton et al. [38] and Sabry and Wadler [55].

Translation Validation. Translation validation is an important alternative to proving that all runs of a compiler are correct. A variant of CC<sup>∼</sup> for translation validation can simply be obtained by specializing the definition to a particular W, and one can obtain again the same trinitarian view. Similarly for our other criteria, including our extensions of the secure compilation criteria of Abate et al. [2], which Busi et al. [10] seem to already be considering in the context of translation validation.

# 7 Conclusion and Future Work

We have extended the property preservation view on compiler correctness to arbitrary trace relations, and believe that this will be useful for understanding the guarantees various compilers provide. An open question is whether, given a compiler, there exists a most precise ∼ relation for which this compiler is correct. As mentioned in §1, every compiler is CC<sup>∼</sup> for some ∼, but under which conditions is there a most precise relation? In practice, more precision may not always be better though, as it may be at odds with compiler efficiency and may not align with more subjective notions of usefulness, leading to tradeoffs in the selection of suitable relations. Finally, another interesting direction for future work is studying whether using the relation to Galois connections allows to more easily compose trace relations for different purposes, say, for a compiler whose target language has undefined behavior, resource exhaustion, and side-channels. In particular, are there ways to obtain complex relations by combining simpler ones in a way that eases the compiler verification burden?

Acknowledgements. We thank Akram El-Korashy and Amin Timany for participating in an early discussion about this work and the anonymous reviewers for their valuable feedback. This work was in part supported by the European Research Council under ERC Starting Grant SECOMP (715753), by the German Federal Ministry of Education and Research (BMBF) through funding for the CISPA-Stanford Center for Cybersecurity (FKZ: 13N1S0762), by DARPA grant SSITH/HOPE (FA8650-15-C-7558) and by UAIC internal grant 07/2018.

# Bibliography


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Runners in action

Danel Ahman and Andrej Bauer

Faculty of Mathematics and Physics University of Ljubljana, Slovenia

Abstract. Runners of algebraic effects, also known as comodels, provide a mathematical model of resource management. We show that they also give rise to a programming concept that models top-level external resources, as well as allows programmers to modularly define their own intermediate "virtual machines". We capture the core ideas of programming with runners in an equational calculus λcoop, which we equip with a sound and coherent denotational semantics that guarantees the linear use of resources and execution of finalisation code. We accompany λcoop with examples of runners in action, provide a prototype language implementation in OCaml, as well as a Haskell library based on λcoop.

Keywords: Runners, comodels, algebraic effects, resources, finalisation.

# 1 Introduction

Computational effects, such as exceptions, input-output, state, nondeterminism, and randomness, are an important component of general-purpose programming languages, whether they adopt functional, imperative, object-oriented, or other programming paradigms. Even pure languages exhibit computational effects at the top level, so to speak, by interacting with their external environment.

In modern languages, computational effects are often structured using *monads* [22,23,36], or *algebraic effects and handlers* [12,28,30]. These mechanisms excel at implementation of computational effects within the language itself. For instance, the familiar implementation of mutable state in terms of state-passing functions requires no native state, and can be implemented either as a monad or using handlers. One is naturally drawn to using these techniques also for dealing with actual effects, such as manipulation of native memory and access to hardware. These are represented inside the language as algebraic operations (as in Eff [4]) or a monad (in the style of Haskell's IO), but treated specially by the language's top-level runtime, which invokes corresponding operating system functionality. While this approach works in practice, it has some unfortunate downsides too, namely *lack of modularity and linearity*, and *excessive generality*.

Lack of modularity is caused by having the external resources hard-coded into the top-level runtime. As a result, changing which resources are available and how they are implemented requires modifications of the language implementation. Additional complications arise when a language supports several operating systems and hardware platforms, each providing their own, different feature set. One wishes that the ingenuity of the language implementors were better supported by a more flexible methodology with a sound theoretical footing.

Excessive generality is not as easily discerned, because generality of programming concepts makes a language expressive and useful, such as general algebraic effects and handlers enabling one to implement timeouts, rollbacks, stream redirection [30], async & await [16], and concurrency [9]. However, the flip side of such expressive freedom is the lack of any guarantees about how external resources will actually be used. For instance, consider a simple piece of code, written in Eff-like syntax, which first opens a file, then writes to it, and finally closes it:

```
let fh = open "hello.txt" in write (fh, "Hello, world."); close fh
```
What this program actually does depends on how the operations open, write, and close are handled. For all we know, an enveloping handler may intercept the write operation and discard its continuation, so that close never happens and the file is not properly closed. Telling the programmer not to shoot themselves in the foot by avoiding such handlers is not helpful, because the handler may encounter an external reason for not being able to continue, say a full disk.

Even worse, external resources may be misused accidentally when we combine two handlers, each of which works as intended on its own. For example, if we combine the above code with a non-deterministic choose operation, as in

let fh = open "greeting.txt" in let b = choose () in if b then write (fh, "hello") else write (fh, "good bye") **;** close fh

and handle it with the standard non-determinism handler

```
handler { return x Ñ [x], choose () k Ñ return (append (k true) (k false)) }
```
The resulting program attempts to close the file twice, as well as write to it twice, because the continuation k is invoked twice when handling choose. Of course, with enough care all such situations can be dealt with, but that is beside the point. It is worth sacrificing some amount of the generality of algebraic effects and monads in exchange for predictable and safe usage of external computational effects, so long as the vast majority of common use cases are accommodated.

Contributions We address the described issues by showing how to design a programming language based on *runners of algebraic effects*. We review runners in §2 and recast them as a programming construct in §3. In §4, we present λcoop, a calculus that captures the core ideas of programming with runners. We provide a coherent and sound denotational semantics for λcoop in §5, where we also prove that well-typed code is properly finalised. In §6, we show examples of runners in action. The paper is accompanied by a prototype language Coop and a Haskell library Haskell-Coop, based on λcoop, see §7. The relationship between λcoop and existing work is addressed in §8, and future possibilities discussed in §9.

The paper is also accompanied by an online appendix (https://arxiv.org/ abs/1910.11629) that provides the typing and equational rules we omit in §4.

Runners are *modular* in that they can be used not only to model the toplevel interaction with the external environment, but programmers can also use them to define and nest their own intermediate "virtual machines". Our runners are *effectful*: they may handle operations by calling further outer operations, and raise exceptions and send signals, through which exceptional conditions and runtime errors are communicated back to user programs in a safe fashion that preserves linear usage of external resources and ensures their proper finalisation.

We achieve *suitable generality* for handling of external resources by showing how runners provide implementations of algebraic operations together with a natural notion of finalisation, and a strong guarantee that in the absence of external kill signals the finalisation code is executed exactly once (Thm. 7). We argue that for most purposes such discipline is well worth having, and giving up the arbitrariness of effect handlers is an acceptable price to pay. In fact, as will be apparent in the denotational semantics, runners are simply a restricted form of handlers, which apply the continuation at most once in a tail call position.

Runners guarantee *linear usage of resources* not through a linear or uniqueness type system (such as in the Clean programming language [15]) or a syntactic discipline governing the application of continuations in handlers, but rather by a design based on the linear state-passing technique studied by Møgelberg and Staton [21]. In this approach, a computational resource may be implemented without restrictions, but is then guaranteed to be used linearly by user code.

# 2 Algebraic effects, handlers, and runners

We begin with a short overview of the theory of algebraic effects and handlers, as well as runners. To keep focus on how runners give rise to a programming concept, we work naively in set theory. Nevertheless, we use category-theoretic language as appropriate, to make it clear that there are no essential obstacles to extending our work to other settings (we return to this point in §5.1).

#### 2.1 Algebraic effects and handlers

There is by now no lack of material on the algebraic approach to structuring computational effects. For an introductory treatment we refer to [5], while of course also recommend the seminal papers by Plotkin and Power [25,28]. The brief summary given here only recalls the essentials and introduces notation.

An *(algebraic) signature* is given by a set Σ of *operation symbols*, and for each op <sup>P</sup> <sup>Σ</sup> its *operation signature* op : <sup>A</sup>op - Bop, where Aop and Bop are called the *parameter* and *arity* set. A Σ*-structure* M is given by a carrier set |M|, and for each operation symbol op P Σ, a map op<sup>M</sup> : Aop ˆ pBop ñ |M|q Ñ |M|, where ñ is set exponentiation. The *free* Σ*-structure* Tree<sup>Σ</sup> pXq over a set X is the set of well-founded trees generated inductively by

– return x P Tree<sup>Σ</sup> pXq, for every x P X, and

– oppa, κq P Tree<sup>Σ</sup> pXq, for every op P Σ, a P Aop, and κ : Bop Ñ Tree<sup>Σ</sup> pXq.

We are abusing notation in a slight but standard way, by using op both as the name of an operation and a tree-forming constructor. The elements of Tree<sup>Σ</sup> pXq are called *computation trees*: a leaf return x represents a pure computation returning a value x, while oppa, κq represents an effectful computation that calls op with parameter a and continuation κ, which expects a result from Bop.

An *algebraic theory* T " pΣ<sup>T</sup> ,Eq<sup>T</sup> q is given by a *signature* Σ<sup>T</sup> and a set of *equations* Eq<sup>T</sup> . The equations Eq<sup>T</sup> express computational behaviour via interactions between operations, and are written in a suitable formalism, e.g., [30]. We explain these by way of examples, as the precise details do not matter for our purposes. Let **0** " tu be the empty set and **1** " t‹u the standard singleton.

*Example 1.* Given a set C of possible states, the theory of C*-valued state* has two operations, whose somewhat unusual naming will become clear later on,

$$\mathsf{getenv} : \mathbb{1} \leadsto C, \qquad \mathsf{settenv} : C \leadsto \mathbbm{1}$$

and the equations (where we elide appearances of ‹):

$$\begin{aligned} \mathsf{gettenv}(\lambda c.\mathsf{settenv}(c,\kappa)) &= \kappa, \qquad \mathsf{settenv}(c,\mathsf{gettenv}\,\kappa) = \mathsf{settenv}(c,\kappa\,c),\\ \mathsf{settenv}(c,\mathsf{settenv}(c',\kappa)) &= \mathsf{settenv}(c',\kappa). \end{aligned}$$

For example, the second equation states that reading state right after setting it to c gives precisely c. The third equation states that setenv overwrites the state.

*Example 2.* Given a set of exceptions E, the algebraic theory of E*-many exceptions* is given by a single operation raise : E -**0**, and no equations.

A T *-model*, also called a T *-algebra*, is a Σ<sup>T</sup> -structure which satisfies the equations in Eq<sup>T</sup> . The *free* T *-model* over a set X is constructed as the quotient

$$\text{Free}\_{\mathcal{T}}\left(X\right) = \text{Tree}\_{\Sigma\_{\mathcal{T}}}\left(X\right) / \sim$$

by the Σ<sup>T</sup> -congruence " generated by Eq<sup>T</sup> . Each op P Σ<sup>T</sup> is interpreted in the free model as the map pa, κq ÞÑ roppa, κqs, where r´s is the "-equivalence class.

Free<sup>T</sup> p´q is the functor part of a *monad* on sets, whose *unit* at a set X is

$$X \xrightarrow{\text{return}} \text{Tree}\_{\Sigma \tau}(X) \xrightarrow{[-]} \text{Free}\_{\mathcal{T}}(X) \,.$$

The *Kleisli extension* for this monad is then the operation which lifts any map f : X Ñ Tree<sup>Σ</sup><sup>T</sup> pY q to the map f : : Free<sup>Σ</sup><sup>T</sup> pXq Ñ Free<sup>Σ</sup><sup>T</sup> pY q, given by

$$f^{\dagger}\left[\mathsf{return}\,x\right] \stackrel{\text{def}}{=} f\,x,\qquad\qquad f^{\dagger}\left[\mathsf{op}(a,\kappa)\right] \stackrel{\text{def}}{=} \left[\mathsf{op}(a,f^{\dagger}\circ\kappa)\right].$$

That is, f : traverses a computation tree and replaces each leaf return x with f x.

The preceding construction of free models and the monad may be retrofitted to an algebraic signature Σ, if we construe Σ as an algebraic theory with no equations. In this case " is just equality, and so we may omit the quotient and the pesky equivalence classes. Thus the carrier of the free Σ-model is the set of well-founded trees Tree<sup>Σ</sup> pXq, with the evident monad structure.

A fundamental insight of Plotkin and Power [25,28] was that many computational effects may be adequately described by algebraic theories, with the elements of free models corresponding to effectful computations. For example, the monads induced by the theories from Examples 1 and 2 are respectively isomorphic to the usual *state monad* St<sup>C</sup> <sup>X</sup> def " p<sup>C</sup> <sup>ñ</sup> <sup>X</sup> <sup>ˆ</sup> <sup>C</sup><sup>q</sup> and the *exceptions monad* Exc<sup>E</sup> X def " X ` E.

Plotkin and Pretnar [30] further observed that the universal property of free models may be used to model a programming concept known as *handlers*. Given a T -model M and a map f : X Ñ |M|, the universal property of the free T -model gives us a unique T -homomorphism f ; : Free<sup>T</sup> pXqÑ|M| satisfying

f ; rreturn xs " f x, f ; roppa, κqs " opMpa, f ; ˝ κq.

A handler for a theory <sup>T</sup> in a language such as Eff amounts to a model <sup>M</sup> whose carrier |M| is the carrier Free<sup>T</sup> <sup>1</sup> pY q of the free model for some other theory T <sup>1</sup> , while the associated handling construct is the induced T -homomorphism Free<sup>T</sup> pXq Ñ Free<sup>T</sup> <sup>1</sup> pY q. Thus handling transforms computations with effects T to computations with effects T <sup>1</sup> . There is however no restriction on how a handler implements an operation, in particular, it may use its continuation in an arbitrary fashion. We shall put the universal property of free models to good use as well, while making sure that the continuations are always used affinely.

#### 2.2 Runners

Much like monads, handlers are useful for simulating computational effects, because they allow us to transform T -computations to T <sup>1</sup> -computations. However, eventually there has to be a "top level" where such transformations cease and actual computational effects happen. For these we need another concept, known as *runners* [35]. Runners are equivalent to the concept of *comodels* [27,31], which are "just models in the opposite category", although one has to apply the motto correctly by using powers and co-powers where seemingly exponentials and products would do. Without getting into the intricacies, let us spell out the definition.

Definition 1. A *runner* R for a signature Σ is given by a carrier set |R| together with, for each op P Σ, a *co-operation* op<sup>R</sup> : Aop Ñ p|R| ñ Bop ˆ |R|q*.*

Runners are usually defined to have co-operations in the equivalent uncurried form op<sup>R</sup> : Aop ˆ |R| Ñ Bop ˆ |R|, but that is less convenient for our purposes.

Runners may be defined more generally for theories T , rather than just signatures, by requiring that the co-operations satisfy Eq<sup>T</sup> . We shall have no use for these, although we expect no obstacles in incorporating them into our work.

A runner tells us what to do when an effectful computation reaches the top-level runtime environment. Think of |R| as the set of configurations of the runtime environment. Given the current configuration c P |R|, the operation oppa, κq is executed as the corresponding co-operation op<sup>R</sup> a c whose result pb, c<sup>1</sup> q P Bop ˆ |R| gives the result of the operation b and the next runtime configuration c<sup>1</sup> . The continuation κ b then proceeds in runtime configuration c<sup>1</sup> .

It is not too difficult to turn this idea into a mathematical model. For any <sup>X</sup>, the co-operations induce a <sup>Σ</sup>-structure <sup>M</sup> with <sup>|</sup>M<sup>|</sup> def " St|R|X " p|R| ñ X ˆ |R|q and operations op<sup>M</sup> : Aop ˆ pBop ñ St|R|Xq Ñ St|R|X given by

$$(\mathsf{op}\_{\mathcal{M}}(a,\kappa) \overset{\text{att}}{=} \lambda c . \kappa \left(\pi\_1(\overline{\mathsf{op}}\_{\mathcal{R}} \, a \, c)\right) \left(\pi\_2(\overline{\mathsf{op}}\_{\mathcal{R}} \, a \, c)\right).$$

We may then use the universal property of the free Σ-model to obtain a Σhomomorphism r<sup>X</sup> : Tree<sup>Σ</sup> pXq Ñ St<sup>|</sup>R<sup>|</sup>X satisfying the equations

$$\mathbf{r}\_X(\mathsf{return}\,x) = \lambda c.\,(x,c), \qquad \qquad \mathbf{r}\_X(\mathsf{op}(a,\kappa)) = \mathbf{op}\_{\mathcal{M}}(a,\mathbf{r}\_X \circ \kappa).$$

The map r<sup>X</sup> precisely captures the idea that a runner *runs computations* by transforming (static) computation trees into state-passing maps. Note how in the above definition of opM, the continuation κ is used in a controlled way, as it appears precisely once as the head of the outermost application. In terms of programming, this corresponds to linear use in a tail-call position.

Runners are less ad-hoc than they may seem. First, notice that op<sup>M</sup> is just the composition of the co-operation op<sup>R</sup> with the state monad's Kleisli extension of the continuation κ, and so is the standard way of turning *generic effects* into Σstructures [26]. Second, the map r<sup>X</sup> is the component at X of a monad morphism r : Tree<sup>Σ</sup> p´q Ñ St<sup>|</sup>R<sup>|</sup>. Møgelberg & Staton [21], as well as Uustalu [35], showed that the passage from a runner R to the corresponding monad morphism r forms a one-to-one correspondence between the former and the latter.

As defined, runners are too restrictive a model of top-level computation, because the only effect available to co-operations is state, but in practice the runtime environment may also signal errors and perform other effects, by calling its own runtime environment. We are led to the following generalisation.

Definition 2. For a signature Σ and monad T, a T*-runner* R for Σ, or just an *effectful runner*, is given by, for each op P Σ, a *co-operation* op<sup>R</sup> : Aop Ñ T Bop.

The correspondence between runners and monad morphisms still holds.

Proposition 3. *For a signature* Σ *and a monad* T*, the monad morphisms* Tree<sup>Σ</sup> p´q Ñ T *are in one-to-one correspondence with* T*-runners for* Σ*.*

*Proof.* This is an easy generalisation of the correspondence for ordinary runners. Let us fix a signature Σ, and a monad T with unit η and Kleisli extension ´:.

Let R be a T-runner for Σ. For any set X, R induces a Σ-structure M with <sup>|</sup>M<sup>|</sup> def " T X and op<sup>M</sup> : <sup>A</sup>op ˆ pBop <sup>ñ</sup> T Xq Ñ T X defined as opMpa, κ<sup>q</sup> def " κ:pop<sup>R</sup> aq. As before, the universal property of the free model Tree<sup>Σ</sup> pXq provides a unique Σ-homomorphism r<sup>X</sup> : Tree<sup>Σ</sup> pXq Ñ T X, satisfying the equations

$$\mathbf{r}\_X(\mathsf{return}\,x) = \eta\_X(x), \qquad \qquad \mathbf{r}\_X(\mathsf{op}(a,\kappa)) = \mathbf{op}\_{\mathcal{M}}(a,\mathbf{r}\_X \circ \kappa).$$

The maps r<sup>X</sup> collectively give us the desired monad morphism r induced by R.

Conversely, given a monad morphism θ : Tree<sup>Σ</sup> p´q Ñ T, we may recover a Trunner <sup>R</sup> for <sup>Σ</sup> by defining the co-operations as op<sup>R</sup> <sup>a</sup> def " θ<sup>B</sup>op poppa, λb .return bqq. It is not hard to check that we have described a one-to-one correspondence. [\

# 3 Programming with runners

If ordinary runners are not general enough, the effectful ones are too general: parameterised by arbitrary monads T, they do not combine easily and they lack a clear notion of resource management. Thus, we now engineer more specific monads whose associated runners can be turned into a programming concept. While we give up complete generality, the monads presented below are still quite versatile, as they are parameterised by arbitrary algebraic signatures Σ, and so are extensible and support various combinations of effects.

#### 3.1 The user and kernel monads

Effectful source code running inside a runtime environment is just one example of a more general phenomenon in which effectful computations are enveloped by a layer that provides a supervised access to external resources: a user process is controlled by a kernel, a web page by a browser, an operating system by hardware, or a virtual machine, etc. We shall adopt the parlance of software systems, and refer to the two layers generically as the *user* and *kernel* code. Since the two kinds of code need not, and will not, use the same effects, each will be described by its own algebraic theory and compute in its own monad.

We first address the kernel theory. Specifically, we look for an algebraic theory such that effectful runners for the induced monad satisfy the following desiderata:


The totality of external resources available to user code appears as a stateful external environment, even though it has no direct access to it. Thus, kernel computations should carry state. We achieve this by incorporating into the kernel theory the operations getenv and setenv, and equations for state from Example 1.

Apart from managing state, kernel code should have access to further effects, which may be true external effects, or some outer layer of runners. In either case, we should allow the kernel code to call operations from a given signature Σ.

Because kernel computations ought to be able to signal failure, we should include an exception mechanism. In practice, many programming languages and systems have two flavours of exceptions, variously called recoverable and fatal, checked and unchecked, exceptions and errors, etc. One kind, which we call just *exceptions*, is raised by kernel code when a situation requires special attention by user code. The other kind, which we call *signals*, indicates an unrecoverable condition that prevents normal execution of user code. These correspond precisely to the two standard ways of combining exceptions with state, namely the coproduct and the tensor of algebraic theories [11]. The coproduct simply adjoins exceptions raise : E - **0** from Example 2 to the theory of state, while the tensor extends the theory of state with signals kill : S -**0**, together with equations

$$\textbf{getten}\,\textbf{v}(\lambda c.\,\textbf{k}\,\text{l}\,\,\text{s}) = \textbf{k}\,\text{l}\,\,\text{s},\tag{1}$$

$$\textbf{setten}\,\text{v}(c,\textbf{k}\,\text{l}\,\,\text{s}) = \textbf{k}\,\text{l}\,\,\text{s}.\tag{1}$$

These equations say that a signal discards state, which makes it unrecoverable.

To summarise, the *kernel theory* KΣ,E,S,C contains operations from a signature Σ, as well as state operations getenv : **1** - C, setenv : C - **1**, exceptions raise : E - **0**, and signals kill : S - **0**, with equations for state from Example 1, equations (1) relating state and signals, and for each operation op P Σ, equations

$$\begin{aligned} \mathsf{getten}(\lambda c.\mathsf{op}(a,\kappa \, c)) &= \mathsf{op}(a,\lambda b.\mathsf{getten}(\lambda c.\,\kappa \, c \, b)),\\ \mathsf{setten}(c,\mathsf{op}(a,\kappa)) &= \mathsf{op}(a,\lambda b.\mathsf{setten}(c,\kappa \, b)), \end{aligned}$$

expressing that external operations do not interact with kernel state. It is not difficult to see that KΣ,E,S,C induces, up to isomorphism, the *kernel monad*

$$\mathsf{K}\_{\Sigma, E, S, C} X \quad \stackrel{\text{at}}{=} \quad C \Rightarrow \text{Tree}\_{\Sigma} \left( \left( (X + E) \times C \right) + S \right).$$

How about user code? It can of course call operations from a signature Σ (not necessarily the same as the kernel code), and because we intend it to handle exceptions, it might as well have the ability to raise them. However, user code knows nothing about signals and kernel state. Thus, we choose the *user theory* <sup>U</sup>Σ,E to be the algebraic theory with operations <sup>Σ</sup>, exceptions raise : <sup>E</sup> - **0**, and no equations. This theory induces the *user monad* UΣ,EX def " Tree<sup>Σ</sup> pX ` Eq.

#### 3.2 Runners as a programming construct

In this section, we turn the ideas presented so far into programming constructs. We strive for a realistic result, but when faced with several design options, we prefer simplicity and semantic clarity. We focus here on translating the central concepts, and postpone various details to §4, where we present a full calculus.

We codify the idea of user and kernel computations by having syntactic categories for each of them, as well as one for values. We use letters M, N to indicate user computations, K, L for kernel computations, and V , W for values.

User and kernel code raise exceptions with operation raise, and catch them with exception handlers based on Benton and Kennedy's *exceptional syntax* [7],

$$\text{try } M \text{ with } \{ \text{return } x \mapsto N, \dots, \text{raise } e \mapsto N\_e, \dots \},$$

and analogously for kernel code. The familiar binding construct let x " M in N is simply shorthand for try M with treturn x ÞÑ N,..., raise e ÞÑ raise e, . . .u.

As a programming concept, a runner R takes the form

$$\{ (\mathsf{op}\, x \mapsto K\_{\mathsf{op}})\_{\mathsf{op}\in\Sigma} \}\_C,$$

where each Kop is a kernel computation, with the variable x bound in Kop, so that each clause op x ÞÑ Kop determines a co-operation for the kernel monad. The subscript C indicates the type of the state used by the kernel code Kop.

The corresponding elimination form is a handling-like construct

$$\text{using } R \not\equiv V \text{ run } M \text{ finally } F,\tag{2}$$

which uses the co-operations of runner R "at" initial kernel state V to run user code M, and finalises its return value, exceptions, and signals with F, see (3) below. When user code M calls an operation op, the enveloping run construct runs the corresponding co-operation Kop of R. While doing so, Kop might raise exceptions. But not every exception makes sense for every operation, and so we assign to each operation op a set of exceptions Eop which the co-operations implementing it may raise, by augmenting its operation signature with Eop, as

$$\mathsf{op} : A\_{\mathsf{op}} \leadsto B\_{\mathsf{op}} \mid E\_{\mathsf{op}} \mathsf{.}$$

An exception raised by the co-operation Kop propagates back to the operation call in the user code. Therefore, an operation call should have not only a continuation x.M receiving a result, but also continuations Ne, one for each e P Eop,

$$\textsf{op}\left(V,\left(x\,.M\right),\left(N\_{e}\right)\_{e\in E\_{\textsf{op}}}\right).$$

If Kop returns a value b P Bop, the execution proceeds as Mrb{xs, and as N<sup>e</sup> if Kop raises an exception e P Eop. In examples, we use the generic versions of operations [26], written op V , which pass on return values and re-raise exceptions.

One can pass exceptions back to operation calls also in a language with handlers, such as Eff, by changing the signatures of operations to Aop - Bop `Eop, and implementing the exception mechanism by hand, so that every operation call is followed by a case distinction on Bop `Eop. One is reminded of how operating system calls communicate errors back to user code as exceptional values.

A co-operation Kop may also send a signal, in which case the rest of the user code M is skipped and the control proceeds directly to the corresponding case of the finalisation part F of the run construct (2), whose syntactic form is

$$\{\text{return } x \text{ @ } c \mapsto N, \dots, \text{ rão se } \text{@ } c \mapsto N\_e, \dots, \text{kil } s \mapsto N\_s, \dots\}. \tag{3}$$

Specifically, if M returns a value v, then N is evaluated with x bound to v and c to the final kernel state; if M raises an exception e (either directly or indirectly via a co-operation of R), then N<sup>e</sup> is executed, again with c bound to the final kernel state; and if a co-operation of R sends a signal s, then N<sup>s</sup> is executed.

*Example 4.* In anticipation of setting up the complete calculus we show how one can work with files. The language implementors can provide an operation open which opens a file for writing and returns its file handle, an operation close which closes a file handle, and a runner fileIO that implements writing. Let us further suppose that fileIO may raise an exception QuotaExceeded if a write exceeds the user disk quota, and send a signal IOError if an unrecoverable external error occurs. The following code illustrates how to guarantee proper closing of the file:

```
using fileIO @ (open "hello.txt") run
 write "Hello, world."
finally {
 return x @ fh Ñ close fh,
 raise QuotaExceeded @ fh Ñ close fh,
 kill IOError Ñ return () }
```
Notice that the user code does not have direct access to the file handle. Instead, the runner holds it in its state, where it is available to the co-operation that implements write. The finalisation block gets access to the file handle upon successful completion and raised exception, so it can close the file, but when a signal happens the finalisation cannot close the file, nor should it attempt to do so.

We also mention that the code "cheats" by placing the call to open in a position where a value is expected. We should have let-bound the file handle returned by open outside the run construct, which would make it clear that opening the file happens *before* this construct (and that open is *not* handled by the finalisation), but would also expose the file handle. Since there are clear advantages to keeping the file handle inaccessible, a realistic language should accept the above code and hoist computations from value positions automatically.

# 4 A calculus for programming with runners

Inspired by the semantic notion of runners and the ideas of the previous section, we now present a calculus for programming with co-operations and runners, called λcoop. It is a low-level fine-grain call-by-value calculus [19], and as such could inspire an intermediate language that a high-level language is compiled to.

#### 4.1 Types

The types of λcoop are shown in Fig. 1. The *ground types* contain *base types*, and are closed under finite sums and products. These are used in operation signatures and as types of kernel state. (Allowing arbitrary types in either of these entails substantial complications that can be dealt with but are tangential to our goals.) Ground types can also come with corresponding constant symbols f, each associated with a fixed *constant signature* f : pA1,...,Anq Ñ B.

We assume a supply of operation symbols O, exception names E, and signal names S. Each operation symbol op P O is equipped with an *operation signature* Aop - Bop ! Eop, which specifies its parameter type Aop and arity type Bop, and the exceptions Eop that the corresponding co-operations may raise in runners.

The *value types* extend ground types with two function types, and a type of runners. The *user function type* X Ñ Y ! pΣ,Eq classifies functions taking arguments of type X to computations classified by the *user (computation) type* Y ! pΣ,Eq, i.e., those that return values of type Y , and may call operations Σ and raise exceptions E. Similarly, the *kernel function type* X Ñ Y pΣ, E, S, Cq classifies functions taking arguments of type X to computations classified by the *kernel (computation) type* <sup>Y</sup> pΣ, E, S, Cq, i.e., those that return values of type Y , and may call operations Σ, raise exceptions E, send signals S, and use state of type C. We note that the ingredients for user and kernel types correspond precisely to the parameters of the user monad UΣ,E and the kernel monad KΣ,E,S,C from §3.1. Finally, the *runner type* Σ ñ pΣ<sup>1</sup> , S, Cq classifies runners that implement co-operations for the operations Σ as kernel computations which use operations Σ<sup>1</sup> , send signals S, and use state of type C.

Ground type A, B, C ::" b base type ˇ <sup>ˇ</sup> unit unit type ˇ <sup>ˇ</sup> empty empty type ˇ <sup>ˇ</sup> <sup>A</sup> <sup>ˆ</sup> <sup>B</sup> product type ˇ <sup>ˇ</sup> <sup>A</sup> ` <sup>B</sup> sum type Constant signature: f : pA1,...,Anq Ñ B Signature Σ ::" top1, op2,..., opnu Ă O Exception set E ::" te1, e2,...,enu Ă E Signal set S ::" ts1, s2,...,snu Ă S Operation signature: op : Aop - Bop ! Eop Value type X, Y , Z ::" A ground type ˇ <sup>ˇ</sup> <sup>X</sup> <sup>ˆ</sup> <sup>Y</sup> product type ˇ <sup>ˇ</sup> <sup>X</sup> ` <sup>Y</sup> sum type ˇ <sup>ˇ</sup> <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> ! <sup>U</sup> user function type ˇ <sup>ˇ</sup> <sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> -K kernel function type ˇ <sup>ˇ</sup> <sup>Σ</sup> ñ pΣ<sup>1</sup> , S, Cq runner type User (computation) type: X ! U where U " pΣ,Eq Kernel (computation) type: <sup>X</sup>-K where K " pΣ, E, S, Cq

Fig. 1. The types of λcoop.

#### 4.2 Values and computations

The syntax of terms is shown in Fig. 2. The usual fine-grain call-by-value stratification of terms into pure values and effectful computations is present, except that we further distinguish between *user* and *kernel* computations.

Values Among the values are variables, constants for ground types, and constructors for sums and products. There are two kinds of functions, for abstracting over user and kernel computations. A *runner* is a value of the form

$$\{ (\mathsf{op}\,\,x \mapsto K\_{\mathsf{op}})\_{\mathsf{op}\in\Sigma} \}\_{C\cdot\cdot}$$

It implements co-operations for operations op as kernel computations Kop, with x bound in Kop. The type annotation C specifies the type of the state that Kop uses. Note that C ranges over ground types, a restriction that allows us to define a naive set-theoretic semantics. We sometimes omit these type annotations.

User and kernel computations The user and kernel computations both have pure computations, function application, exception raising and handling, stanValues

V,W ::" x variable ˇ <sup>ˇ</sup> <sup>f</sup>pV1,...,Vn<sup>q</sup> ground constant ˇ <sup>ˇ</sup> pq unit ˇ <sup>ˇ</sup> <sup>p</sup>V,W<sup>q</sup> pair ˇ <sup>ˇ</sup> inlX,Y <sup>V</sup> <sup>ˇ</sup> <sup>ˇ</sup> inrX,Y <sup>V</sup> injection ˇ <sup>ˇ</sup> fun <sup>p</sup><sup>x</sup> : <sup>X</sup>q ÞÑ <sup>M</sup> user function ˇ <sup>ˇ</sup> funK <sup>p</sup><sup>x</sup> : <sup>X</sup>q ÞÑ <sup>K</sup> kernel function ˇ <sup>ˇ</sup> tpop <sup>x</sup> ÞÑ <sup>K</sup>opqopP<sup>Σ</sup>u<sup>C</sup> runner

#### User computations

M,N ::" return V value ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

ˇ

<sup>ˇ</sup> V W application <sup>ˇ</sup> try <sup>M</sup> with <sup>t</sup>return <sup>x</sup> ÞÑ N, <sup>p</sup>raise <sup>e</sup> ÞÑ <sup>N</sup>eq<sup>e</sup>P<sup>E</sup><sup>u</sup> exception handler <sup>ˇ</sup> match <sup>V</sup> with tpx, yq ÞÑ <sup>M</sup><sup>u</sup> product elimination <sup>ˇ</sup> match <sup>V</sup> with tu<sup>X</sup> empty elimination <sup>ˇ</sup> match <sup>V</sup> with <sup>t</sup>inl <sup>x</sup> ÞÑ M, inr <sup>y</sup> ÞÑ <sup>N</sup><sup>u</sup> sum elimination <sup>ˇ</sup> opXpV, <sup>p</sup>x.Mq, <sup>p</sup>Neq<sup>e</sup>PEop <sup>q</sup> operation call <sup>ˇ</sup> raise<sup>X</sup> <sup>e</sup> raise exception <sup>ˇ</sup> using <sup>V</sup> @ <sup>W</sup> run <sup>M</sup> finally <sup>F</sup> running user code <sup>ˇ</sup> kernel <sup>K</sup> @ <sup>W</sup> finally <sup>F</sup> switch to kernel mode

F ::" treturn x @ c ÞÑ N, praise e @ c ÞÑ Neq<sup>e</sup>P<sup>E</sup>, pkill s ÞÑ Nsq<sup>s</sup>P<sup>S</sup>u

#### Kernel computations

K, L ::" return<sup>C</sup> V value ˇ <sup>ˇ</sup> V W application ˇ <sup>ˇ</sup> try <sup>K</sup> with <sup>t</sup>return <sup>x</sup> ÞÑ L, <sup>p</sup>raise <sup>e</sup> ÞÑ <sup>L</sup>eq<sup>e</sup>P<sup>E</sup><sup>u</sup> exception handler ˇ <sup>ˇ</sup> match <sup>V</sup> with tpx, yq ÞÑ <sup>K</sup><sup>u</sup> product elimination ˇ <sup>ˇ</sup> match <sup>V</sup> with tuX@<sup>C</sup> empty elimination ˇ <sup>ˇ</sup> match <sup>V</sup> with <sup>t</sup>inl <sup>x</sup> ÞÑ K, inr <sup>y</sup> ÞÑ <sup>L</sup><sup>u</sup> sum elimination ˇ <sup>ˇ</sup> opXpV, <sup>p</sup>x.Kq, <sup>p</sup>Leq<sup>e</sup>PEop <sup>q</sup> operation call ˇ <sup>ˇ</sup> raiseX@<sup>C</sup> <sup>e</sup> raise exception ˇ <sup>ˇ</sup> killX@<sup>C</sup> <sup>s</sup> send signal ˇ <sup>ˇ</sup> getenv<sup>C</sup> <sup>p</sup>c.K<sup>q</sup> get kernel state ˇ <sup>ˇ</sup> setenvpV,K<sup>q</sup> set kernel state ˇ <sup>ˇ</sup> user <sup>M</sup> with <sup>t</sup>return <sup>x</sup> ÞÑ K, <sup>p</sup>raise <sup>e</sup> ÞÑ <sup>L</sup>eq<sup>e</sup>P<sup>E</sup><sup>u</sup> switch to user mode

Fig. 2. Values, user computations, and kernel computations of λcoop.

dard elimination forms, and operation calls. Note that the typing annotations on some of these differ according to their mode. For instance, a user operation call is annotated with the result type X, whereas the annotation X @ C on a kernel operation call also specifies the kernel state type C.

The binding construct letX!<sup>E</sup> x " M in N is not part of the syntax, but is an abbreviation for try M with treturn x ÞÑ N,praise e ÞÑ raise<sup>X</sup> eqePEu, and there is an analogous one for kernel computations. We often drop the annotation X!E.

Some computations are specific to one or the other mode. Only the kernel mode may send a signal with kill, and manipulate state with getenv and setenv, but only the user mode has the run construct from §3.2. Finally, each mode has the ability to "context switch" to the other one. The kernel computation

```
user M with treturn x ÞÑ K,praise e ÞÑ LeqePEu
```
runs a user computation M and handles the returned value and leftover exceptions with kernel computations K and Le. Conversely, the user computation

```
kernel K @ W finally tx @ c ÞÑ M,praise e @ c ÞÑ NeqePE,pkill s ÞÑ NsqsPSu
```
runs kernel computation K with initial state W, and handles the returned value, and leftover exceptions and signals with user computations M, Ne, and Ns.

#### 4.3 Type system

We equip λcoop with a type system akin to type and effect systems for algebraic effects and handlers [3,7,12]. We are experimenting with resource control, so it makes sense for the type system to tightly control resources. Consequently, our effect system does not allow effects to be implicitly propagated outwards.

In §4.1, we assumed that each operation op P O is equipped with some fixed operation signature op : Aop - Bop ! Eop. We also assumed a fixed constant signature f : pA1,...,Anq Ñ B for each ground constant f. We consider this information to be part of the type system and say no more about it.

Values, user computations, and kernel computations each have a corresponding *typing judgement* form and a *subtyping relation*, given by

$$\begin{aligned} &\Gamma \vdash V:X, \quad &\Gamma \vdash M:X\upharpoonright \mathcal{U}, \quad &\Gamma \vdash K:X\upharpoonright \mathcal{K},,\\ &X \sqsubseteq Y, \quad &X\upharpoonright \mathcal{U} \sqsubseteq Y\upharpoonright \mathcal{U}, \quad &X\upharpoonright \mathcal{K} \sqsubseteq Y\upharpoonright \mathcal{L}, \end{aligned}$$

where Γ is a *typing context* x<sup>1</sup> : X1,...,x<sup>n</sup> : Xn. The effect information is an over-approximation, i.e., M and K employ *at most* the effects described by U and K. The complete rules for these judgements are given in the online appendix. We comment here only on the rules that are peculiar to λcoop, see Fig. 3.

Subtyping of ground types Sub-Ground is trivial, as it relates only equal types. Subtyping of runners Sub-Runner and kernel computations Sub-Kernel requires equality of the kernel state types C and C<sup>1</sup> because state is used invariantly in the kernel monad. We leave it for future work to replace C " C<sup>1</sup> with a *lens* [10] from C<sup>1</sup> to C, i.e., maps C<sup>1</sup> Ñ C and C<sup>1</sup> ˆ C Ñ C<sup>1</sup> satisfying state

$$\begin{array}{llll} \text{SUB-GROUND} & \begin{array}{c} \text{SUB-RUNNRER} \\ \Sigma\_{1}' \subseteq \Sigma\_{1} \end{array} & \begin{array}{c} \Sigma\_{1}' \subseteq \Sigma\_{1} \end{array} & \Sigma\_{2}' \subseteq \Sigma\_{2}' \qquad S \subseteq S' \qquad C = C' \\\hline \Sigma\_{1} \Rightarrow \{\Sigma\_{2}, S, C\} \subseteq \Sigma\_{1}' \Rightarrow \{\Sigma\_{2}', S', C'\} \end{array}$$

$$\frac{X \equiv X' \quad \Sigma \subseteq \Sigma' \quad \quad E \subseteq E' \quad \quad S \subseteq S' \quad \quad C = C'}{X \notin (\Sigma, E, S, C) \subseteq X' \notin (\Sigma', E', S', C')}$$

$$\text{T'YUs:=T'n\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}\text{\textquotedblleft}T\text{\textquotedblleft}T\text{\textquotedblright}$$

$$\frac{\Gamma \vdash M : X \mathrel{!} \{ \Sigma, E \} \qquad \Gamma, x : X \vdash N : Y \mathrel{!} \{ \Sigma, E' \} \qquad \left( \Gamma \vdash N\_e : Y \mathrel{!} \{ \Sigma, E' \} \right)\_{e \in E}}{\Gamma \vdash \text{tr} \chi \ M \text{ with } \{ \text{return } x \longmapsto N, \{ \text{raise } e \longmapsto N\_e \}\_{e \in E} \} : Y \mathrel{!} \{ \Sigma, E' \}}$$

$$\begin{array}{l} \text{TY} \text{USER-RUN} \\ F \vdash \{\text{return } x \otimes c \mapsto N, \{\text{raise } e \otimes c \mapsto N\_e\}\_{e \otimes E}, \{\text{kil } s \mapsto N\_s\}\_{s \otimes S} \} \\ \Gamma \vdash V: \Sigma \Rightarrow \{\Sigma', S, C\} \qquad \Gamma \vdash W: C \\ \Gamma \vdash M: X \mathrel{1}\left(\Sigma, E\right) \qquad \Gamma, x: X, c: C \vdash N: Y \mathrel{1}\left(\Sigma', E'\right) \\ \frac{\left(\Gamma, c: C \vdash N\_e: Y \mathrel{1}\left(\Sigma', E'\right)\right)\_{e \otimes E}}{\Gamma \vdash \text{using } V \@{}W \text{ run } M \text{ finally } F: Y \mathrel{1}\left(\Sigma', E'\right) \end{array}$$

TyUser-Op U " pΣ,Eq op P Σ Γ \$ V : Aop Γ, x : Bop \$ M : X ! U ` Γ \$ N<sup>e</sup> : X ! U ˘ ePEop Γ \$ opXpV, px.Mq, pNeq<sup>e</sup>PEop q : X ! U

$$\begin{array}{ll} \Gamma \text{YK}\_{\text{ERNEL-}}^{\text{T}} \text{op} \\ \mathcal{K} \equiv \{ \Sigma, E, S, C \} \qquad \mathsf{op} \in \Sigma \qquad \Gamma \vdash V : A\_{\mathsf{op}} \\ \Gamma, x: B\_{\mathsf{op}} \vdash K : X \notin \mathcal{K} \qquad \left( \Gamma \vdash L\_{\text{e}} : X \notin \mathcal{K} \right)\_{\text{e} \upharpoonright E\_{\text{op}}} \\ \hline \hline I \vdash \mathsf{op}\_{X} \left( V, \left( x \mathrel{\mathop{:}} K \right), \left( L\_{\text{e}} \right)\_{\text{e} \upharpoonright E\_{\text{op}}} \right) : X \notin \mathcal{K} \end{array}$$

TyUser-Kernel F " treturn x @ c ÞÑ N, praise e @ c ÞÑ Neq<sup>e</sup>P<sup>E</sup>, pkill s ÞÑ Nsq<sup>s</sup>P<sup>S</sup>u <sup>Γ</sup> \$ <sup>K</sup> : <sup>X</sup>pΣ, E, S, Cq Γ \$ W : C Γ, x : X, c : C \$ N : Y ! pΣ,E<sup>1</sup> q ` Γ, c : C \$ N<sup>e</sup> : Y ! pΣ,E<sup>1</sup> q ˘ ePE ` Γ \$ N<sup>s</sup> : Y ! pΣ,E<sup>1</sup> q ˘ sPS Γ \$ kernel K @ W finally F : Y ! pΣ,E<sup>1</sup> q

$$\begin{array}{c} \text{TyKERNEL-USER} \\ \mathcal{K} = \{\Sigma, E', S, C\} \\ \hline \Gamma, x: X \vdash K: Y \not\subseteq \mathcal{K} \\ \hline \Gamma \vdash \text{user } M \text{ with } \{\text{return } x \mapsto K, \{\text{raise } e \mapsto L\_e\}\_{e \notin E} \\ \end{array} \\ \begin{array}{c} \Gamma \vdash \text{M }: X \ \text{l } \{\Sigma, E\} \\ \hline \Gamma \vdash L\_e: Y \not\not\subset \mathcal{K} \\ \hline \end{array} \\ \begin{array}{c} \Gamma \vdash \text{l } \{\text{raise } e \mapsto L\_e\}\_{e \in E} \\ \hline \end{array}$$

Fig. 3. Selected typing and subtyping rules.

equations analogous to Example 1. It has been observed [24,31] that such a lens in fact amounts to an ordinary runner for C-valued state.

The rules TyUser-Op and TyKernel-Op govern operation calls, where we have a success continuation which receives a value returned by a co-operation, and exceptional continuations which receive exceptions raised by co-operations.

The rule TyUser-Run requires that the runner V implements *all* the operations M can use, meaning that operations are *not* implicitly propagated outside a run block (which is different from how handlers are sometimes implemented). Of course, the co-operations of the runner may call further external operations, as recorded by the signature Σ<sup>1</sup> . Similarly, we require the finally block F to intercept all exceptions and signals that might be produced by the co-operations of V or the user code M. Such strict control is exercised throughout. For example, in TyUser-Run, TyUser-Kernel, and TyKernel-User we catch all the exceptions and signals that the code might produce. One should judiciously relax these requirements in a language that is presented to the programmer, and allow re-raising and re-sending clauses to be automatically inserted.

#### 4.4 Equational theory

We present λcoop as an *equational calculus*, i.e., the interactions between its components are described by equations. Such a presentation makes it easy to reason about program equivalence. There are three equality judgements

$$I \vdash V \equiv W : X, \qquad I \vdash M \equiv N : X \, ! \, \mathcal{U}, \qquad I \vdash K \equiv L : X \, ! \, \mathcal{K} .$$

It is presupposed that we only compare well-typed expressions with the indicated types. For the most part, the context and the type annotation on judgements will play no significant role, and so we shall drop them whenever possible.

We comment on the computational equations for constructs characteristic of λcoop, and refer the reader to the online appendix for other equations. When read left-to-right, these equations explain the operational meaning of programs.

Of the three equations for run, the first two specify that returned values and raised exceptions are handled by the corresponding clauses,

$$\begin{aligned} \text{using } V \circledast W \text{ run (return } V') \text{ finally } F &\equiv N[V'/x, W/c],\\ \text{using } V \circledast W \text{ run (raise } e) \text{ finally } F &\equiv N\_e[W/c], \end{aligned}$$

where <sup>F</sup> def " treturn <sup>x</sup>@<sup>c</sup> ÞÑ N,praise <sup>e</sup>@<sup>c</sup> ÞÑ <sup>N</sup>eq<sup>e</sup>P<sup>E</sup>,pkill <sup>s</sup> ÞÑ <sup>N</sup>sq<sup>s</sup>P<sup>S</sup>u. The third equation below relates running an operation op with executing the corresponding co-operation Kop, where R stands for the runner tpop x ÞÑ KopqopP<sup>Σ</sup>u<sup>C</sup> :

$$\begin{aligned} \text{using } R \uplus W \text{ run } (\mathsf{op}\_X(V, (x \, M), (N'\_{e'})\_{e' \in E\_{\mathsf{op}}})) \text{ finally } F &\equiv \\ \text{kernel } K\_{\mathsf{op}}[V/x] &\uplus W \text{ finally} \\ \{\text{return } x \uplus c' \mapsto (\text{using } R \uplus c' \text{ run } M \text{ finally } F), \\ \{\text{raise } c' \uplus c' \mapsto (\text{using } R \uplus c' \text{ run } N'\_{e'} \text{ finally } F)\}\_{c' \in E\_{\mathsf{op}}}, \\ \{\text{kil } s \mapsto N\_s\}\_{s \in S} \end{aligned}$$

Because Kop is kernel code, it is executed in kernel mode, whose finally clauses specify what happens afterwards: if Kop returns a value, or raises an exception, execution continues with a suitable continuation, with R wrapped around it; and if Kop sends a signal, the corresponding finalisation code from F is evaluated.

The next bundle describes how kernel code is executed within user code:

$$\begin{aligned} \text{kernel (return}\_C V) &\uplus W \text{ finally } F \equiv N[V/x, W/c],\\ \text{kernel (raise}\_{X \uplus C} e) &\uplus W \text{ finally } F \equiv N\_e[W/c],\\ \text{kernel (kil}\_{X \uplus C} s) &\uplus W \text{ finally } F \equiv N\_s,\\ \text{kernel (getency}\_C(c.K)) &\uplus W \text{ finally } F \equiv \text{kernel } K[W/c] \uplus W \text{ finally } F,\\ \text{kernel (settenv}(V, K)) &\uplus W \text{ finally } F \equiv \text{kernel } K \uplus V \text{ finally } F. \end{aligned}$$

We also have an equation stating that an operation called in kernel mode propagates out to user mode, with its continuations wrapped in kernel mode:

$$\begin{aligned} \text{kernel } \textsf{op}\_X(V, (x \cdot K), (L\_{e'})\_{e' \in E}) & \uplus W \text{ find} \textsf{y} \; F \equiv \\ \textsf{op}\_X(V, (x \cdot \text{kernel } K \uplus W \text{ finally } F), (\textsf{kernel } L\_{e'} \uplus W \text{ finally } F)\_{e' \in E}). \end{aligned}$$

Similar equations govern execution of user computations in kernel mode.

The remaining equations include standard βη-equations for exception handling [7], deconstruction of products and sums, algebraicity equations for operations [33], and the equations of kernel theory from §3.1, describing how getenv and setenv work, and how they interact with signals and other operations.

# 5 Denotational semantics

We provide a coherent denotational semantics for λcoop, and prove it sound with respect to the equational theory given in §4.4. Having eschewed all forms of recursion, we may afford to work simply over the category of sets and functions, while noting that there is no obstacle to incorporating recursion at all levels and switching to domain theory, similarly to the treatment of effect handlers in [3].

#### 5.1 Semantics of types

The meaning of terms is most naturally defined by structural induction on their typing derivations, which however are not unique in λcoop due to subsumption rules. Thus we must worry about devising a *coherent* semantics, i.e., one in which all derivations of a judgement get the same meaning. We follow prior work on the semantics of effect systems for handlers [3], and proceed by first giving a *skeletal* semantics of λcoop in which derivations are manifestly unique because the effect information is unrefined. We then use the skeletal semantics as the frame upon which rests a refinement-style coherent semantics of the effectful types of λcoop.

The *skeletal* types are like λcoop's types, but with all effect information erased. In particular, the ground types A, and hence the kernel state types C, do not change as they contain no effect information. The skeletal value types are

$$P, Q \text{ ::= } \begin{array}{c} A \mid \mathsf{unit} \mid \mathsf{empty} \mid P \times Q \mid P + Q \mid P \to Q ! \mid P \to Q \notin C \mid \mathsf{runner} \, C.$$

The skeletal versions of the user and kernel types are <sup>P</sup>! and <sup>P</sup> -C, respectively. It is best to think of the skeletal types as ML-style types which implicitly over-approximate effect information by "any effect is possible", an idea which is mathematically expressed by their semantics, as explained below.

First of all, the semantics of ground types is straightforward. One only needs to provide sets denoting the base types b, after which the ground types receive the standard set-theoretic meaning, as given in Fig. 4.

Recall that O, S, and E are the sets of all operations, signals, and exceptions, and that each op <sup>P</sup> <sup>O</sup> has a signature op : <sup>A</sup>op - Bop ! Eop. Let us additionally assume that there is a distinguished operation <sup>O</sup> <sup>P</sup> <sup>O</sup> with signature <sup>O</sup> : **<sup>1</sup>** - **0**!**0** (otherwise we adjoin it to O). It ensures that the denotations of skeletal user and kernel types are *pointed* sets, while operationally O indicates a *runtime error*.

Next, we define the *skeletal user and kernel monads* as

$$\begin{aligned} \mathsf{U}^{\mathfrak{s}}X & \stackrel{\text{def}}{=} \mathsf{U}\_{\mathcal{O},\mathcal{E}}X = \text{Tree}\_{\mathcal{O}}\left(X + \mathcal{E}\right),\\ \mathsf{K}\_{C}^{\mathfrak{s}}X & \stackrel{\text{def}}{=} \mathsf{K}\_{\mathcal{O},\mathcal{E},\mathcal{S},C}X = \left(C \Rightarrow \text{Tree}\_{\mathcal{O}}\left(\left(X + \mathcal{E}\right) \times C + \mathcal{S}\right)\right), \end{aligned}$$

and Runner<sup>s</sup> <sup>C</sup> as the set of all *skeletal runners* <sup>R</sup> *(with state* <sup>C</sup>*)*, which are families of co-operations top<sup>R</sup> : rrAopss Ñ KO,Eop,S,C rrBopssuopPO. Note that KO,Eop,S,C is a coproduct [11] of monads C ñ Tree<sup>O</sup> p´ ˆ C ` Sq and Exc<sup>E</sup>op , and thus the skeletal runners are the effectful runners for the former monad, so long as we read the effectful signatures op : Aop - Bop ! Eop as ordinary algebraic ones op : Aop - Bop ` Eop. While there is no semantic difference between the two readings, there is one of intention: KO,Eop,S,C rrBopss is a kernel computation that (apart from using state and sending signals) returns values of type Bop and raises exceptions Eop, whereas C ñ Tree<sup>O</sup> pprrBopss ` Eopq ˆ C ` Sq returns values of type Bop ` Eop and raises no exceptions. We prefer the former, as it reflects our treatment of exceptions as a control mechanism rather than exceptional values.

These ingredients suffice for the denotation of skeletal types as sets, as given in Fig. 4. The user and kernel skeletal types are interpreted using the respective skeletal monads, and hence the two function types as Kleisli exponentials.

We proceed with the semantics of effectful types. The *skeleton* of a value type X is the skeletal type X<sup>s</sup> obtained by removing all effect information, and similarly for user and kernel types, see Fig. 5. We interpret a value type X as a subset rrrXsss Ď rrX<sup>s</sup> ss of the denotation of its skeleton, and similarly for user and computation types. In other words, we treat the effectful types as *refinements* of their skeletons. For this, we define the operation <sup>p</sup>X0, X1<sup>q</sup> <sup>p</sup>Y0, Y1q, for any X<sup>0</sup> Ď X<sup>1</sup> and Y<sup>0</sup> Ď Y1, as the set of maps X<sup>1</sup> Ñ Y<sup>1</sup> restricted to X<sup>0</sup> Ñ Y0:

$$(X\_0, X\_1) \Rightarrow (Y\_0, Y\_1) \stackrel{\text{def}}{=} \{ f : X\_1 \to Y\_1 \mid \forall x \in X\_0 . f(x) \in Y\_0 \}.$$

Next, observe that the user and the kernel monads preserve subset inclusions, in the sense that UΣ,EX Ď U<sup>Σ</sup>1,E1X<sup>1</sup> and KΣ,E,S,CX Ď K<sup>Σ</sup>1,E1,S1,CX<sup>1</sup> if Σ Ď Σ<sup>1</sup> , E Ď E<sup>1</sup> , S Ď S<sup>1</sup> , and X Ď X<sup>1</sup> . In particular, we always have UΣ,EX Ď U<sup>s</sup> X and KΣ,E,S,CX Ď K<sup>s</sup> <sup>C</sup>X. Finally, let RunnerΣ,Σ1,S <sup>C</sup> <sup>Ď</sup> Runner<sup>s</sup> <sup>C</sup> be the subset of those runners R whose co-operations for Σ factor through K<sup>Σ</sup>1,Eop,S,C , i.e., op<sup>R</sup> : rrAopss Ñ K<sup>Σ</sup>1,Eop,S,C rrBopss Ď KO,Eop,S,C rrBopss, for each op P Σ.

#### Ground types

rrbss def " ¨ ¨ ¨ rrunitss def " **<sup>1</sup>** rremptyss def " **0** rr<sup>A</sup> <sup>ˆ</sup> <sup>B</sup>ss def " rrAss ˆ rrBss rr<sup>A</sup> ` <sup>B</sup>ss def " rrAss ` rrBss

Skeletal types

$$\begin{aligned} \left[\left[P \times Q\right]\right] & \stackrel{\text{def}}{=} \left[P\right] \times \left[\left[Q\right]\right] & \qquad \left[P \to Q!\right] \stackrel{\text{def}}{=} \left[P\right] \Rightarrow \left[\left[Q!\right]\right] \\ \left[\left[P + Q\right]\right] & \stackrel{\text{def}}{=} \left[P\right] + \left[Q\right] & \qquad \left[P \to Q \not\subset C\right] \stackrel{\text{def}}{=} \left[P\right] \Rightarrow \left[\left[Q \not\in C\right]\right] \\ \left[\text{runner}\,C\right] & \stackrel{\text{def}}{=} \text{Runner}^{\text{s}}\left[\left[C\right]\right] & \qquad \left[P!\right] \stackrel{\text{def}}{=} \cup^{\text{s}}\left[P\right] & \qquad \left[P \not\in C\right] \stackrel{\text{def}}{=} \nvdash \left[\left[P\right]\right] \\ \end{aligned}$$

$$\left[\left[x\_{1}:P\_{1},\ldots,x\_{n}:P\_{n}\right] \stackrel{\text{def}}{=} \left[P\_{1}\right] \times \cdots \times \left[P\_{n}\right]$$

Fig. 4. Denotations of ground and skeletal types.

Semantics of effectful types is given in Fig. 5. From a category-theoretic viewpoint, it assigns meaning in the category SubpSetq whose objects are subset inclusions X<sup>0</sup> Ď X<sup>1</sup> and morphisms from X<sup>0</sup> Ď X<sup>1</sup> to Y<sup>0</sup> Ď Y<sup>1</sup> those maps X<sup>1</sup> Ñ Y<sup>1</sup> that restrict to X<sup>0</sup> Ñ Y0. The interpretations of products, sums, and function types are precisely the corresponding category-theoretic notions <sup>ˆ</sup>, `, and in SubpSetq. Even better, the pairs of submonads UΣ,E Ď U<sup>s</sup> and KΣ,E,S,C Ď K<sup>s</sup> C are the "SubpSetq-variants" of the user and kernel monads. Such an abstract point of view drives the interpretation of terms, given below, and it additionally suggests how our semantics can be set up on top of a category other than Set. For example, if we replace Set with the category Cpo of ω-complete partial orders, we obtain the domain-theoretic semantics of effect handlers from [3] that models recursion and operations whose signatures contain arbitrary types.

#### 5.2 Semantics of values and computations

To give semantics to λcoop's terms, we introduce *skeletal typing* judgements

$$
\Gamma \vdash^\sharp V : P, \qquad \qquad \Gamma \vdash^\sharp M : P!, \qquad \qquad \Gamma \vdash^\sharp K : P \not\subset C,
$$

which assign skeletal types to values and computations. In these judgements, Γ is a *skeletal context* which assigns skeletal types to variables.

The rules for these judgements are obtained from λcoop's typing rules, by *excluding* subsumption rules and by relaxing restrictions on effects. For example, the skeletal versions of the rules TyValue-Runner and TyKernel-Kill are

$$\frac{\left(\varGamma : x : A\_{\textsf{op}} \vdash^{\mathtt{s}} K\_{\textsf{op}} : B\_{\textsf{op}} \nvdash C\right)\_{\textsf{op} \in \Sigma}}{\varGamma \vdash^{\mathtt{s}} \{ (\textsf{op} \, x \mapsto K\_{\textsf{op}})\_{\textsf{op} \in \Sigma} \}\_{C} : \mathsf{runner } C} \qquad \qquad \frac{\operatorname{s} \in \mathcal{S}}{\varGamma \vdash^{\mathtt{s}} \operatorname{kill}\_{X \uplus C} \, s : X^{\mathtt{s}} \nvdash C}$$

The relationship between effectful and skeletal typing is summarised as follows:

Proposition 5. *(1) Skeletal typing derivations are unique. (2) If* X Ď Y *, then* X<sup>s</sup> " Y <sup>s</sup> *, and analogously for subtyping of user and kernel types. (3) If* Γ \$ V : X*, then* Γ<sup>s</sup> \$<sup>s</sup> V : X<sup>s</sup> *, and analogously for user and kernel computations.*

#### Skeletons

$$A^{\mathfrak{s}} \stackrel{\text{def}}{=} A \qquad \left(\Sigma \twoheadrightarrow \left(\Sigma', S, C\right)\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} \text{runner } C \qquad \left(X \times Y\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}} \times Y^{\mathfrak{s}}$$

$$\left(X \to Y\mathrel{\mathfrak{s}}\!\!/\mathfrak{l}\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}} \to \left(Y\mathrel{\mathfrak{s}}\!\!/\mathfrak{l}\right)^{\mathfrak{s}} \qquad \left(X + Y\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}} + Y^{\mathfrak{s}}$$

$$\left(X \to Y\not\!\!/\mathfrak{l}\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}} \to \left(Y\not\!\!/\sharp\right)^{\mathfrak{s}} \qquad \left(X\!\!\!/\mathfrak{l}\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}}!$$

$$\left(x\_{1}:X\_{1},\ldots,x\_{n}:X\_{n}\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} \left(x\_{1}:X\_{1}^{\mathfrak{s}},\ldots,x\_{n}:X\_{n}^{\mathfrak{s}}\right) \qquad \left(X\not\!\!/\sharp\left(\Sigma,E,S,C\right)\right)^{\mathfrak{s}} \stackrel{\text{def}}{=} X^{\mathfrak{s}} \not\!\!/\mathfrak{l}C$$

#### Denotations

rrrAsss def " rrAss rrr<sup>X</sup> <sup>ˆ</sup> <sup>Y</sup> sss def " rrrXsss ˆ rrrXsss rrrΣ ñ pΣ<sup>1</sup> , S, Cqsss def " RunnerΣ,Σ1,S rrrCsss rrr<sup>X</sup> ` <sup>Y</sup> sss def " rrrXsss ` rrrXsss rrr<sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> ! <sup>U</sup>sss def " prrrXsss,rrX<sup>s</sup> ssq prrr<sup>Y</sup> ! <sup>U</sup>sss,rrp<sup>Y</sup> ! <sup>U</sup><sup>q</sup> s ssq rrr<sup>X</sup> <sup>Ñ</sup> <sup>Y</sup> -<sup>K</sup>sss def " prrrXsss,rrX<sup>s</sup> ssq prrr<sup>Y</sup> -<sup>K</sup>sss,rrp<sup>Y</sup> -Kq s ssq rrr<sup>X</sup> ! <sup>p</sup>Σ,Eqsss def " <sup>U</sup>Σ,ErrrXsss rrrX<sup>p</sup>Σ, E, S, Cqsss def " KΣ,E,S,rrCssrrrXsss rrrx<sup>1</sup> : <sup>X</sup>1,...,x<sup>n</sup> : <sup>X</sup>nsss def " rrrX1sss ˆ ¨ ¨ ¨ ˆ rrrXnsss

Fig. 5. Skeletons and denotations of types.

*Proof.* We prove (1) by induction on skeletal typing derivations, and (2) by induction on subtyping derivations. For (1), we further use the occasional type annotations, and the absence of skeletal subsumption rules. For proving (3), suppose that D is a derivation of Γ \$ V : X. We may translate D to its *skeleton* <sup>D</sup><sup>s</sup> deriving <sup>Γ</sup><sup>s</sup> \$<sup>s</sup> <sup>V</sup> : <sup>X</sup><sup>s</sup> by replacing typing rules with matching skeletal ones, skipping subsumption rules due to (2). Computations are treated similarly. [\

To ensure semantic coherence, we first define the *skeletal semantics* of skeletal typing judgements, rrΓ \$<sup>s</sup> V : Pss : rrΓss Ñ rrPss, rrΓ \$<sup>s</sup> M : P!ss : rrΓss Ñ rrP!ss, and rr<sup>Γ</sup> \$<sup>s</sup> <sup>K</sup> : <sup>P</sup>-<sup>C</sup>ss : rrΓss Ñ rrP-Css, by induction on their (unique) derivations.

Provided maps rrA1ssˆ¨ ¨ ¨ˆrrAnss Ñ rrBss denoting ground constants f, values are interpreted in a standard way, using the bi-cartesian closed structure of sets, except for a runner tpop x ÞÑ KopqopP<sup>Σ</sup>u<sup>C</sup> , which is interpreted at an environment γ P rrΓss as the skeletal runner top : rrAopss Ñ KO,Eop,S,rrCssrrBopssuopPO, given by

$$\overline{\operatorname{op}}\,a \stackrel{\text{def}}{=} \left(\text{if } \operatorname{op} \in \Sigma \text{ then } \rho(\left[\![\![\Gamma, x:A\_{\textsf{op}} \vdash ^{\mathtt{a}}K\_{\textsf{op}}:B\_{\textsf{op}}\sharp C]\right](\gamma, a)\right) \text{ else } \mathfrak{t}\right).$$

Here the map ρ : K<sup>s</sup> rrCssrrBopss Ñ <sup>K</sup>O,Eop,S,rrCssrrBopss is the skeletal kernel theory homomorphism characterised by the equations

$$\begin{split} \rho(\mathsf{return}\,b) &= \mathsf{return}\,b, \qquad \rho(\mathsf{op}'(a',\kappa,(\nu\_e)\_{e\in E\_{\mathsf{op}'}})) = \mathsf{op}'(a',\rho\circ\kappa,(\rho(\nu\_e))\_{e\in E\_{\mathsf{op}'}}),\\ \rho(\mathsf{geten}\,\kappa) &= \mathsf{geten}(\rho\circ\kappa), \qquad \rho(\mathsf{raise}\,e) = (\mathsf{if}\,e\in E\_{\mathsf{op}}\,\mathsf{then}\,\mathsf{raise}\,e\,\mathsf{else}\,\mathsf{t}),\\ \rho(\mathsf{settenv}(c,\kappa)) &= \mathsf{geten}(c,\rho\circ\kappa), \qquad \rho(\mathsf{kill}\,s) = \mathsf{kill}\,s. \end{split}$$

The purpose of O in the definition of op is to model a runtime error when the runner is asked to handle an unexpected operation, while ρ makes sure that op raises at most the exceptions Eop, as prescribed by the signature of op.

User and kernel computations are interpreted as elements of the corresponding skeletal user and kernel monads. Again, most constructs are interpreted in a standard way: returns as the units of the monads; the operations raise, kill, getenv, setenv, and ops as the corresponding algebraic operations; and match statements as the corresponding semantic elimination forms. The interpretation of exception handling offers no surprises, e.g., as in [30], as long as we follow the strategy of treating unexpected situations with the runtime error O.

The most interesting part of the interpretation is the semantics of

$$I \vdash^{\mathfrak{s}} (\text{using } V \text{ @ } W \text{ run } M \text{ finally } F) : Q!,\tag{4}$$

where <sup>F</sup> def " treturn <sup>x</sup> @ <sup>c</sup> ÞÑ N,praise <sup>e</sup> @ <sup>c</sup> ÞÑ <sup>N</sup>eq<sup>e</sup>P<sup>E</sup>,pkill <sup>s</sup> ÞÑ <sup>N</sup>sq<sup>s</sup>P<sup>S</sup>u. At an environment γ P rrΓss, V is interpreted as a skeletal runner with state rrCss, which induces a monad morphism r : Tree<sup>O</sup> p´q Ñ prrCss ñ Tree<sup>O</sup> p´ ˆ rrCss ` Sqq, as in the proof of Prop. 3. Let f : K<sup>s</sup> rrCssrrPss Ñ prrCss ñ <sup>U</sup><sup>s</sup> rrQssq be the skeletal kernel theory homomorphism characterised by the equations

$$\begin{split} f(\mathsf{return}\,p) &= \lambda c. \left[ \| \Gamma, x:P, c:C \vdash^{\mathsf{s}} N:Q \right] (\gamma, p, c), \\ f(\mathsf{ op}(a, \kappa, (\nu\_e)\_{e \in E\_{\mathsf{op}}})) &= \lambda c. \,\mathsf{op}(a, \lambda b. \, f(\kappa \, b) \, c, (f(\nu\_e) \, c)\_{e \in E\_{\mathsf{op}}}), \\ f(\mathsf{raise}\,e) &= \lambda c. \, (\text{if } e \in E \text{ then } \| \Gamma, c:C \vdash^{\mathsf{s}} N\_e:Q \| (\gamma, c) \, \mathsf{else} \, \bot), \\ f(\mathsf{kil}\,s) &= \lambda c. \, (\text{if } s \in S \text{ then } \| \Gamma \vdash^{\mathsf{s}} N\_s:Q \| \, \gamma \, \mathsf{else} \, \bot), \\ f(\mathsf{geten}\,\kappa) &= \lambda c. \, f(\kappa \, c) \, c, \qquad f(\mathsf{step}(c', \kappa)) = \lambda c. \, f \,\kappa \, c'. \end{split} \tag{5}$$

The interpretation of (4) at <sup>γ</sup> is <sup>f</sup>prrr<sup>P</sup> ss`<sup>E</sup> prr<sup>Γ</sup> \$<sup>s</sup> <sup>M</sup> : <sup>P</sup>!ss <sup>γ</sup>qq prr<sup>Γ</sup> \$<sup>s</sup> <sup>W</sup> : <sup>C</sup>ss <sup>γ</sup>q, which reads: map the interpretation of M at γ from the skeletal user monad to the skeletal kernel monad using r (which models the operations of M by the cooperations of V ), and from there using f to a map rrCss ñ U<sup>s</sup> rrQss, that is then applied to the initial kernel state, namely, the interpretation of W at γ.

We interpret the context switch Γ \$<sup>s</sup> kernel K @ W finally F : Q! at an environment <sup>γ</sup> P rrΓss as <sup>f</sup>prr<sup>Γ</sup> \$<sup>s</sup> <sup>K</sup> : <sup>P</sup> -Css γq prrΓ \$<sup>s</sup> W : Css γq, where f is the map (5). Finally, user context switch is interpreted much like exception handling.

We now define coherent semantics of λcoop's typing derivations by passing through the skeletal semantics. Given a derivation D of Γ \$ V : X, its skeleton <sup>D</sup><sup>s</sup> derives <sup>Γ</sup><sup>s</sup> \$<sup>s</sup> <sup>V</sup> : <sup>X</sup><sup>s</sup> . We identify the denotation of V with the skeletal one,

$$\|\|F \vdash V : X\|\| \stackrel{\text{def}}{=} \|\|I^{\mathfrak{s}} \vdash^{\mathfrak{s}} V : X^{\mathfrak{s}}\| : \|\|I^{\mathfrak{s}}\|\| \to \|\|X^{\mathfrak{s}}\|\|.$$

All that remains is to check that rrrΓ \$ V : Xsss restricts to rrrΓsss Ñ rrrXsss. This is accomplished by induction on D. The only interesting step is subsumption, which relies on a further observation that X Ď Y implies rrrXsss Ď rrrY sss. Typing derivations for user and kernel computations are treated analogously.

#### 5.3 Coherence, soundness, and finalisation theorems

We are now ready to prove a theorem that guarantees execution of finalisation code. But first, let us record the fact that the semantics is coherent and sound.

Theorem 6 (Coherence and soundness). *The denotational semantics of* λcoop *is coherent, and it is sound for the equational theory of* λcoop *from §4.4.*

*Proof.* Coherence is established by construction: any two derivations of the same typing judgement have the same denotation because they are both (the same) restriction of skeletal semantics. For proving soundness, one just needs to unfold the denotations of the left- and right-hand sides of equations from §4.4, and compare them, where some cases rely on suitable substitution lemmas. [\

To set the stage for the finalisation theorem, let us consider the computation using V @ W run M finally F, well-typed by the rule TyUser-Run from Fig. 3. At an environment γ P rrrΓsss, the finalisation clauses F are captured semantically by the *finalisation map* φ<sup>γ</sup> : prrrXsss ` Eq ˆ rrrCsss ` S Ñ rrrY !pΣ<sup>1</sup> , E<sup>1</sup> qsss, given by

$$\begin{aligned} \phi\_{\gamma}(\iota\_1(\iota\_1 x, c)) & \stackrel{\text{def}}{=} \|\Gamma, x:X, c:C \vdash N: Y \, !\, (\Sigma', E')\|\|(\gamma, x, c),\\ \phi\_{\gamma}(\iota\_1(\iota\_2 e, c)) & \stackrel{\text{def}}{=} \|\Gamma, c:C \vdash N\_e: Y \, !\, (\Sigma', E')\|\|(\gamma, c),\\ \phi\_{\gamma}(\iota\_2(s)) & \stackrel{\text{def}}{=} \|\Gamma \vdash N\_s: Y \, !\, (\Sigma', E')\|\|\gamma. \end{aligned}$$

With φ in hand, we may formulate the finalisation theorem for λcoop, stating that the semantics of using V @ W run M finally F is a computation tree all of whose branches end with finalisation clauses from F. Thus, unless some enveloping runner sends a signal, finalisation with F is guaranteed to take place.

Theorem 7 (Finalisation). *A well-typed* run *factors through finalisation:*

$$\|\|F \vdash \left(\text{using } V \lll W \text{ run } M \text{ finally } F\right) : Y \restriction \left(\Sigma', E'\right) \|\|\gamma = \phi\_\gamma^\dagger t,$$

*for some* t P Tree<sup>Σ</sup><sup>1</sup> pprrrXsss ` Eq ˆ rrrCsss ` Sq*.*

*Proof.* We first prove that fuc " φ: <sup>γ</sup>pu cq holds for all u P K<sup>Σ</sup>1,E,S,rrrCsssrrrXsss and c P rrrCsss, where f is the map (5). The proof proceeds by computational induction on u [29]. The finalisation statement is then just the special case with u def " <sup>r</sup>rrrXsss`<sup>E</sup>prrr<sup>Γ</sup> \$ <sup>M</sup> : <sup>X</sup> ! <sup>p</sup>Σ,Eqsss <sup>γ</sup><sup>q</sup> and <sup>c</sup> def " rrr<sup>Γ</sup> \$ <sup>W</sup> : <sup>C</sup>sss <sup>γ</sup>. [\

# 6 Runners in action

Let us show examples that demonstrate how runners can be usefully combined to provide flexible resource management. We implemented these and other examples in the language Coop and a library Haskell-Coop, see §7.

To make the code more understandable, we do not adhere strictly to the syntax of λcoop, e.g., we use the generic versions of effects [26], as is customary in programming, and effectful initialisation of kernel state as discussed in §3.2.

*Example 8 (Nesting).* In Example 4, we considered a runner fileIO for basic file operations. Let us suppose that fileIO is implemented by immediate calls to the operating system. Sometimes, we might prefer to accumulate writes and commit them all at once, which can be accomplished by interposing between fileIO and user code the following runner accIO, which accumulates writes in its state:

{ write s' Ñ let s = getenv () in setenv (concat s s') }string

By *nesting* the runners, and calling the outer write (the one of fileIO) only in the finalisation code for accIO, the accumulated writes are commited all at once:

```
using fileIO @ (open "hello.txt") run
 using accIO @ (return "") run
   write "Hello, world."; write "Hello, again."
 finally { return x @ s Ñ write s; return x }
finally { return x @ fh Ñ ... , raise QuotaExceeded @ fh Ñ ... , kill IOError Ñ ... }
```
*Example 9 (Instrumentation).* Above, accIO implements the same signature as fileIO and thus intercepts operations without the user code being aware of it. This kind of invisibility can be more generally used to implement *instrumentation*:

```
using { ..., op x Ñ let c = getenv () in setenv (c+1); op x, ... }int @ (return 0) run
 M
finally { return x @ c Ñ report_cost c; return x, ... }
```
Here the interposed runner implements all operations of some enveloping runner, by simply forwarding them, while also measuring computational cost by counting the total number of operation calls, which is then reported during finalisation.

*Example 10 (ML-style references).* Continuing with the theme of nested runners, they can also be used to implement abstract and safe interfaces to low-level resources. For instance, suppose we have a low-level implementation of a memory heap that potentially allows unsafe memory access, and we would like to implement ML-style references on top of it. A good first attempt is the runner

```
{ ref x Ñ let h = getenv () in
           let (r,h') = malloc h x in
           setenv h'; return r,
 get r Ñ let h = getenv () in memread h r,
 put (r, x) Ñ let h = getenv () in memset h r x }heap
```
which has the desired interface, but still suffers from three deficiencies that can be addressed with further language support. First, *abstract types* would let us hide the fact that references are just memory locations, so that the user code could never devise invalid references or otherwise misuse them. Second, our simple typing discipline forces all references to hold the same type, but in reality we want them to have different types. This could be achieved through quantification over types in the low-level implementation of the heap, as we have done in the Haskell-Coop library using Haskell's forall. Third, user code could hijack a reference and misuse it out of the scope of the runner, which is difficult to prevent. In practice the problem does not occur because, so to speak, the runner for references is at the very top level, from which user code cannot escape.

*Example 11 (Monotonic state).* Nested runners can also implement access restrictions to resources, with applications in security [8]. For example, we can restrict the references from the previous example to be used *monotonically* by associating a preorder with each reference, which assignments then have to obey. This idea is similar to how monotonic state is implemented in the F˚ language [2], except that we make dynamic checks where F˚ statically uses dependent types.

While we could simply modify the previous example, it is better to implement a new runner which is nested inside the previous one, so that we obtain a modular solution that works with *any* runner implementing operations ref, get, and put:

```
{ mref x rel Ñ let r = ref x in
                let m = getenv () in
                setenv (add m (r,rel)); return r,
 mget r Ñ get r,
 mput (r, y) Ñ let x = get r in
                 let m = getenv () in
                 match (sel m r) with
                 | inl rel Ñ if (rel x y) then put (r, y)
                                         else raise MonotonicityViolation
                 | inr () Ñ kill NoPreoderFound }mappref,intRelq
```
The runner's state is a map from references to preorders on integers. The cooperation mref x rel creates a new reference r initialised with x (by calling ref of the outer runner), and then adds the pair pr,relq to the map stored in the runner's state. Reading is delegated to the outer runner, while assignment first checks that the new state is larger than the old one, according to the associated preorder. If the preorder is respected, the runner proceeds with assignment (again delegated to the outer runner), otherwise it reports a monotonicity violation. We may not assume that every reference has an associated preorder, because user code could pass to mput a reference that was created earlier outside the scope of the runner. If this happens, the runner simply kills the offending user code with a signal.

*Example 12 (Pairing).* Another form of modularity is achieved by *pairing* runners. Given two runners tpop x ÞÑ KopqopPΣ<sup>1</sup> u<sup>C</sup><sup>1</sup> and tpop<sup>1</sup> x ÞÑ Kop1qop1PΣ<sup>2</sup> u<sup>C</sup><sup>2</sup> , e.g., for state and file operations, we can use them side-by-side by combining them into a single runner with operations Σ<sup>1</sup> ` Σ<sup>2</sup> and kernel state C<sup>1</sup> ˆ C2, as follows (the co-operations op<sup>1</sup> of the second runner are treated symmetrically):

```
{ op x Ñ let (c,c') = getenv () in
           user
             kernel (Kop x) @ c finally {
               return y @ c'' Ñ return (inl (inl y, c'')),
               (raise e @ c'' Ñ return (inl (inr e, c'')))ePEop ,
               (kill s Ñ return (inr s))sPS1 }
           with {
             return (inl (inl y, c'')) Ñ setenv (c'', c'); return y,
             return (inl (inr e, c'')) Ñ setenv (c'', c'); raise e,
             return (inr s) Ñ kill s},
 op' x Ñ ... , ... }C1ˆC2
```
Notice how the inner kernel context switch passes to the co-operation Kop only its part of the combined state, and how it returns the result of Kop in a reified form (which requires treating exceptions and signals as values). The outer user context switch then receives this reified result, updates the combined state, and forwards the result (return value, exception, or signal) in unreified form.

# 7 Implementation

We accompany the theoretical development with two implementations of λcoop: a prototype language Coop [6], and a Haskell library Haskell-Coop [1].

Coop, implemented in OCaml, demonstrates what a more fully-featured language based on λcoop might look like. It implements a bi-directional variant of λcoop's type system, extended with type definitions and algebraic datatypes, to provide algorithmic typechecking and type inference. The operational semantics is based on the computation rules of the equational theory from §4.4, but extended with general recursion, pairing of runners from Example 12, and an interface to the OCaml runtime called *containers*—these are essentially top-level runners defined directly in OCaml. They are a modular and systematic way of offering several possible top-level runtime environments to the programmer.

The Haskell-Coop library is a shallow embedding of λcoop in Haskell. The implementation closely follows the denotational semantics of λcoop. For instance, user and kernel monads are implemented as corresponding Haskell monads. Internally, the library uses the Freer monad of Kiselyov [14] to implement free model monads for given signatures of operations. The library also provides a means to run user code via Haskell's top-level monads. For instance, code that performs input-output operations may be run in Haskell's IO monad.

Haskell's advanced features make it possible to use Haskell-Coop to implement several extensions to examples from §6. For instance, we implement ML-style state that allow references holding arbitrary values (of different types), and state that uses Haskell's type system to track which references are alive. The library also provides pairing of runners from Example 12, e.g., to combine state and input-output. We also use the library to demonstrate that *ambient functions* from the Koka language [18] can be implemented with runners by treating their binding and application as co-operations. (These are functions that are bound dynamically but evaluated in the lexical scope of their binding.)

# 8 Related work

Comodels and (ordinary) runners have been used as a natural model of stateful top-level behaviour. For instance, Plotkin and Power [27] have given a treatment of operational semantics using the tensor product of a model and a comodel. Recently, Katsumata, Rivas, and Uustalu have generalised this interaction of models and comodels to monads and comonads [13]. An early version of Eff [4] implemented *resources*, which were a kind of stateful runners, although they lacked satisfactory theory. Uustalu [35] has pointed out that runners are the additional structure that one has to impose on state to run algebraic effects statefully. Møgelberg and Staton's [21] linear-use state-passing translation also relies on equipping the state with a comodel structure for the effects at hand. Our runners arise when their setup is specialised to a certain Kleisli adjunction.

Our use of kernel state is analogous to the use of parameters in parameterpassing handlers [30]: their return clause also provides a form of finalisation, as the final value of the parameter is available. There is however no guarantee of finalisation happening because handlers need not use the continuation linearly.

The need to tame the excessive generality of handlers, and willingness to give it up in exchange for efficiency and predictability, has recently been recognised by Multicore OCaml's implementors, who have observed that in practice most handlers resume continuations precisely once [9]. In exchange for impressive efficiency, they require continuations to be used linearly by default, whereas discarding and copying must be done explicitly, incurring additional cost. Leijen [17] has extended handlers in Koka with a finally clause, whose semantics ensures that finalisation happens whenever a handler discards its continuation. Leijen also added an initially clause to parameter-passing handlers, which is used to compute the initial value of the parameter before handling, but that gets executed again every time the handler resumes its continuation.

# 9 Conclusion and future work

We have shown that effectful runners form a mathematically natural and modular model of resources, modelling not only the top level external resources, but allowing programmers to also define their own intermediate "virtual machines". Effectful runners give rise to a bona fide programming concept, an idea we have captured in a small calculus, called λcoop, which we have implemented both as a language and a library. We have given λcoop an algebraically natural denotational semantics, and shown how to program with runners through various examples.

We leave combining runners and general effect handlers for future work. As runners are essentially affine handlers, inspired by Multicore OCaml we also plan to investigate efficient compilation for runners. On the theoretical side, by developing semantics in a SubpCpoq-enriched setting [32], we plan to support recursion at all levels, and remove the distinction between ground and arbitrary types. Finally, by using proof-relevant subtyping [34] and synthesis of lenses [20], we plan to upgrade subtyping from a simple inclusion to relating types by lenses.

*Acknowledgements* We thank Daan Leijen for useful discussions about initialisation and finalisation in Koka, as well as ambient values and ambient functions. We thank Guillaume Munch-Maccagnoni and Matija Pretnar for discussing resources and potential future directions for λcoop. We are also grateful to the participants of the NII Shonan Meeting "Programming and reasoning with algebraic effects and effect handlers" for feedback on an early version of this work.

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 834146.

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-17-1-0326.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **On the Versatility of Open Logical Relations***-*

# **Continuity, Automatic Differentiation, and a Containment Theorem**

Gilles Barthe1,4 , Rapha¨elle Crubill´e4, Ugo Dal Lago2,3 , and Francesco Gavazzo2,3,4

> <sup>1</sup> MPI for Security and Privacy, Bochum, Germany <sup>2</sup> University of Bologna, Bologna, Italy,

> <sup>3</sup> INRIA Sophia Antipolis, Sophia Antipolis, France <sup>4</sup> IMDEA Software Institute, Madrid, Spain

**Abstract.** Logical relations are one among the most powerful techniques in the theory of programming languages, and have been used extensively for proving properties of a variety of higher-order calculi. However, there are properties that cannot be immediately proved by means of logical relations, for instance program continuity and differentiability in higher-order languages extended with real-valued functions. Informally, the problem stems from the fact that these properties are naturally expressed on terms of non-ground type (or, equivalently, on open terms of base type), and there is no apparent good definition for a base case (i.e. for closed terms of ground types). To overcome this issue, we study a generalization of the concept of a logical relation, called *open logical relation*, and prove that it can be fruitfully applied in several contexts in which the property of interest is about expressions of first-order type. Our setting is a simply-typed λ-calculus enriched with real numbers and real-valued first-order functions from a given set, such as the one of continuous or differentiable functions. We first prove a containment theorem stating that for any collection of real-valued firstorder functions including projection functions and closed under function composition, any well-typed term of first-order type denotes a function belonging to that collection. Then, we show by way of open logical relations the correctness of the core of a recently published algorithm for forward automatic differentiation. Finally, we define a refinement-based type system for local continuity in an extension of our calculus with conditionals, and prove the soundness of the type system using open logical relations.

**Keywords:** Lambda Calculus · Logical Relations · Continuity Analysis · Automatic Differentiation

<sup>-</sup> The Second and Fourth Authors are supported by the ANR project 16CE250011 REPAS, the ERC Consolidator Grant DIAPASoN – DLV-818616, and the MIUR PRIN 201784YSZ5 ASPRA.

# **1 Introduction**

Logical relations have been extremely successful as a way of proving equivalence between concrete programs as well as correctness of program transformations. In their "unary" version, they also are a formidable tool to prove termination of typable programs, through the so-called *reducibility* technique. The class of programming languages in which these techniques have been instantiated includes not only higher-order calculi with simple types, but also calculi with recursion [3,2,23], various kinds of effects [14,12,25,36,10,11,34], and concurrency [56,13].

Without any aim to be precise, let us see how reducibility works, in the setting of a simply typed calculus. The main idea is to define, by induction on the structure of types, the concept of a well-behaved program, where in the base case one simply makes reference to the underlying notion of observation (e.g. being strong normalizing), while the more interesting case is handled by stipulating that reducible higher-order terms are those which maps reducible terms to reducible terms, this way exploiting the inductive nature of simple types. One can even go beyond the basic setting of simple types, and extend reducibility to, e.g., languages with recursive types [23,2] or even untyped languages [44] by means of techniques such as step-indexing [3].

The same kind of recipe works in a relational setting, where one wants to *compare* programs rather than merely *proving properties* about them. Again, two terms are equivalent at base types if they have the same observable behaviour, while at higher types one wants that equivalent terms are those which maps equivalent arguments to equivalent results.

There are cases, however, in which the property one observes, or the property in which the underlying notion of program equivalence or correctness is based, is formulated for types which are *not* ground (or equivalently, it is formulated for open expressions). As an example, one could be interested in proving that in a higher-order type system all *first-order* expressions compute numerical functions of a specific kind, for example, continuous or derivable ones. We call such properties *first-order properties*<sup>5</sup>. As we will describe in Section 3 below, logical relations do not seem to be applicable *off-the-shelf* to these cases. Informally, this is due to the fact that we cannot start by defining a base case for ground types and then build the relation inductively.

In this paper, we show that logical relations and reducibility can deal with first-order properties in a compositional way without altering their nature. The main idea behind the resulting definition, known as *open logical relations* [59], consists in parameterizing the set of related terms of a certain type (or the underlying reducibility set) on a *ground environment*, this way turning it into a set of pairs of *open terms*. As a consequence, one can define the target first-order property in a natural way.

<sup>5</sup> To avoid misunderstandings, we emphasize that we use first-order properties to refer to properties of expressions of first-order types—and not in relation with definability of properties in first-order predicate logic.

Generalizations of logical relations to open terms have been used by several authors, and in several (oftentimes unrelated) contexts (see, for instance, [15,39,47,30,53]). In this paper, we show how open logical relations constitute a powerful technique to systematically prove first-order properties of programs. In this respect, the paper's technical contributions are applications of open logical relations to three distinct problems.


Due to space constraints, many details have to be omitted, but can be found in an Extended Version of this work [7].

# **2 The Playground**

In order to facilitate the communication of the main ideas behind open logical relations and their applications, this paper deals with several vehicle calculi. All such calculi can be seen as derived from a unique calculus, denoted by Λ<sup>×</sup>,→,<sup>R</sup>, which thus provides the common ground for our inquiry. The calculus Λ<sup>×</sup>,→,<sup>R</sup> is obtained by adding to the simply typed λ-calculus with product and arrow types (which we denote by <sup>Λ</sup><sup>×</sup>,→) a ground type <sup>R</sup> for real numbers and constants r of type <sup>R</sup>, for each real number r.

Given a collection <sup>F</sup> of real-valued functions, i.e. functions <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> (with n <sup>≥</sup> 1), we endow Λ<sup>×</sup>,→,<sup>R</sup> with an operator <sup>f</sup>, for any <sup>f</sup> <sup>∈</sup> <sup>F</sup>, whose intended meaning is that whenever <sup>t</sup>1, ... ,tn compute real numbers <sup>r</sup>1, ... , <sup>r</sup>n, then <sup>f</sup>(t1, ... ,tn) compute <sup>f</sup>(r1, ... , <sup>r</sup>n). We call the resulting calculus <sup>Λ</sup>×,→,<sup>R</sup> <sup>F</sup> . Depending on the application we are interested in, we will take as F specific collections of real-valued functions, such as continuous or differentiable functions.

The syntax and static semantics of Λ×,→,<sup>R</sup> <sup>F</sup> are defined in Figure 1, where <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> belongs to <sup>F</sup>. The static semantics of Λ×,→,<sup>R</sup> <sup>F</sup> is based on judgments of the form Γ t : τ , which have the usual intended meaning. We adopt standard syntactic conventions as in [6], notably the so-called variable convention. In particular, we denote by F V (t) the collection of free variables of t and by s[t/x] the capture-avoiding substitution of the expression t for all free occurrences of x in s.

$$\tau ::= \mathbb{R} \mid \tau \times \tau \mid \tau \to \tau \qquad \qquad \Gamma ::= \neg \mid x : \tau, \Gamma$$

$$t ::= x \mid \underline{\tau} \mid \underline{\underline{f}}(t, \ldots, t) \mid \lambda x.t \mid t t \mid (t, t) \mid t. 1 \mid t. 2$$

$$\begin{array}{llll} \overline{\Gamma, x : \tau \vdash x : \tau} & \overline{\Gamma \vdash \underline{\tau} : \mathsf{R}} & \frac{\Gamma \vdash t\_{1} : \mathsf{R} & \cdots & \Gamma \vdash t\_{n} : \mathsf{R}}{\Gamma \vdash \underline{f}(t\_{1}, \ldots, t\_{n}) : \mathsf{R}} & \frac{\Gamma, x : \tau\_{1} \vdash t : \tau\_{2}}{\Gamma \vdash \lambda x.t : \tau\_{1} \to \tau\_{2}} \\ \hline \underline{\Gamma \vdash s : \tau\_{1} \to \tau\_{2} & \Gamma \vdash t : \tau\_{1}} & \frac{\Gamma \vdash t\_{1} : \tau \quad \Gamma \vdash t\_{2} : \sigma}{\Gamma \vdash (t\_{1}, t\_{2}) : \tau \times \sigma} & \frac{\Gamma \vdash t : \tau\_{1} \times \tau\_{2}}{\Gamma \vdash t. i : \tau\_{i}} & (i \in \{1, 2\}) \end{array}$$

Fig. 1: Static semantics of Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> .

We do not confine ourselves with a fixed operational semantics (e.g. with a callby-value operational semantics), but take advantage of the simply-typed nature of Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> and opt for a set-theoretic denotational semantics. The category of sets and functions being cartesian closed, the denotational semantics of Λ<sup>×</sup>,→,<sup>R</sup> F is standard and associates to any judgment <sup>x</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup>, ... , <sup>x</sup>n : <sup>τ</sup>n <sup>t</sup> : <sup>τ</sup> , a function <sup>x</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup>, ... , <sup>x</sup>n : <sup>τ</sup>n <sup>t</sup> : <sup>τ</sup> : - i<sup>τ</sup>i <sup>→</sup> τ , where τ —the *semantics* of τ—is thus defined:

$$\begin{array}{ll} \mathbb{[R]} = \mathbb{R}; & \quad \left[\tau\_1 \to \tau\_2\right] = \left[\tau\_2\right]^{\left[\tau\_1\right]}; & \quad \left[\tau\_1 \times \tau\_2\right] = \left[\tau\_1\right] \times \left[\tau\_2\right]. \end{array}$$

Due to space constraints, we omit the definition of -Γ t : τ and refer the reader to any textbook on the subject (such as [43]).

# **3 A Fundamental Gap**

In this section, we will look informally at a problem which, apparently, cannot be solved using vanilla reducibility or logical relations. This serves both as a motivating example and as a justification of some of the design choices we had to do when designing open logical relations.

Consider the simply-typed <sup>λ</sup>-calculus <sup>Λ</sup>×,→, the prototypical example of a well-behaved higher-order functional programming language. As is well known, <sup>Λ</sup>×,<sup>→</sup> is strongly normalizing and the technique of logical relations can be applied on-the-nose. The proof of strong normalization for Λ×,<sup>→</sup> is structured around the definition of a family of reducibility sets of *closed* terms {*Red* τ }τ , indexed by types. At any atomic type <sup>τ</sup> , *Red* τ is defined as the set of terms (of type <sup>τ</sup> ) having the property of interest, i.e. as the collection of strongly normalizing terms. The set *Red* <sup>τ</sup>1→τ<sup>2</sup> , instead, contains those terms which, when applied to a term in *Red* <sup>τ</sup><sup>1</sup> , returns a term in *Red* <sup>τ</sup><sup>2</sup> . Reducibility sets are *afterwards* generalised to open terms, and finally all typable terms are shown to be reducible.

Let us now consider the calculus Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> , where F contains the addition and multiplication functions only. This language has already been considered in the literature, under the name of *higher-order polynomials* [22,40], which are crucial tools in higher-order complexity theory and resource analysis. Now, let us ask ourselves the following question: can we say anything about the nature of those functions <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> which are denoted by (closed) terms of type <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>? Of course, all the polynomials on the real field can be represented, but can we go beyond, thanks to higher-order constructions? The answer is negative: terms of type <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> represent all *and only* the polynomials [5,17]. This result is an instance of the general containment theorem mentioned at the end of Section 1.

Let us now focus on proofs of this containment result. It turns out that proofs from the literature are not compositional, and rely on"heavyweight" tools, including strong normalization of Λ<sup>×</sup>,<sup>→</sup> and soundness of the underlying operational semantics. In fact, proving the result using usual reducibility arguments would not be immediate, precisely because there is no obvious choice for the base case. If, for example, we define *Red* <sup>R</sup> as the set of terms strongly normalizing to a numeral, *Red* <sup>R</sup>n→<sup>R</sup> as the set of polynomials, and for any other type as usual, we soon get into troubles: indeed, we would like the two sets of functions

$$\operatorname{Red}\_{\mathbb{R}\times\mathbb{R}\to\mathbb{R}};\qquad\qquad\operatorname{Red}\_{\mathbb{R}\to\left(\mathbb{R}\to\mathbb{R}\right)};$$

to denote *essentially* the same set of functions, modulo the adjoint between <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> and <sup>R</sup> <sup>→</sup> (<sup>R</sup> <sup>→</sup> <sup>R</sup>). But this is clearly not the case: just consider the function f in <sup>R</sup> <sup>→</sup> (<sup>R</sup> <sup>→</sup> <sup>R</sup>) thus defined:

$$f(x) = \begin{cases} \lambda y.y & \text{if } x \ge 0\\ \lambda y.y + 1 & \text{if } x < 0. \end{cases}$$

Clearly, f turns any *fixed* real number to a polynomial, but when curried, it is far from being a polynomial. In other words, reducibility seems apparently inadequate to capture situations like the one above, in which the "base case" is not the one of *ground* types, but rather the one of *first-order* types.

Before proceeding any further, it is useful to fix the boundaries of our investigation. We are interested in proving that (the semantics of) programs of first-order type <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> enjoy first-order properties, such as continuity or differentiability, under their standard interpretation in calculus and real analysis. More specifically, our results do not cover notions of continuity and differentiability studied in fields such as (exact) real-number computation [57] or computable analysis [58], which have a strong domain-theoretical flavor, and higher-order generalizations of continuity and differentiability (see, e.g., [26,27,32,29]). We leave for future work the study of open logical relations in these settings. What this paper aims to provide, is a family of *lightweight* techniques that can be used to show that practical properties of interest of real-valued functions are guaranteed to hold when programs are written taking advantage of higher-order constructors. We believe that the three case studies we present in this paper are both a way to point to the practical scenarios we have in mind and of witnessing the versatility of our methodology.

# **4 Warming Up: A Containment Theorem**

In this section we introduce open logical relations in their unary version (i.e. open logical predicates). We do so by proving the following Containment Theorem.

**Theorem 1 (Containment Theorem).** *Let* F *be a collection of real-valued functions including projections and closed under function composition. Then, any* Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> *term* <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup><sup>n</sup> : <sup>R</sup> <sup>t</sup> : <sup>R</sup> *denotes a function (from* <sup>R</sup><sup>n</sup> *to* <sup>R</sup>*) in* <sup>F</sup>*. That is,* <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup> <sup>∈</sup> <sup>F</sup>*.*

As already remarked in previous sections, notable instances of Theorem 1 are obtained by taking F as the collection of continuous functions, or as the collection of polynomials.

Our strategy to prove Theorem 1 consists in defining a logical predicate, denoted by F, ensuring the denotation of programs of a first-order type to be in F, and hereditary preserving this property at higher-order types. However, F being a property of real-valued functions—and the denotation of an *open* term of the form <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup> being such a function—we shall work with open terms with free variables of type R and parametrize the candidate logical predicate by types *and* environments Θ containing such variables.

This way, we obtain a family of logical predicates F<sup>Θ</sup> τ acting on terms of the form Θ t : τ . As a consequence, when considering the ground type <sup>R</sup> and an environment <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup>, we obtain a predicate <sup>F</sup><sup>Θ</sup> <sup>R</sup> on expressions <sup>Θ</sup> <sup>t</sup> : <sup>R</sup> which naturally corresponds to functions from <sup>R</sup><sup>n</sup> to <sup>R</sup>, for which belonging to F is indeed meaningful.

**Definition 1 (Open Logical Predicate).** *Let* <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> *be a fixed environment. We define the type-indexed family of predicates* F<sup>Θ</sup> τ *by induction on* τ *as follows:*

$$\begin{split} t \in \mathcal{F}\_{\mathbb{R}}^{\Theta} &\iff (\Theta \vdash t : \mathbb{R} \land [\Theta \vdash t : \mathbb{R}] \in \mathfrak{F}) \\ t \in \mathcal{F}\_{\tau\_{1} \to \tau\_{2}}^{\Theta} &\iff (\Theta \vdash t : \tau\_{1} \to \tau\_{2} \land \forall s \in \mathcal{F}\_{\tau\_{1}}^{\Theta} . \ ts \in \mathcal{F}\_{\tau\_{2}}^{\Theta}) \\ t \in \mathcal{F}\_{\tau\_{1} \times \tau\_{2}}^{\Theta} &\iff (\Theta \vdash t : \tau\_{1} \times \tau\_{2} \land \forall i \in \{1,2\} . \ t . i \in \mathcal{F}\_{\tau\_{i}}^{\Theta}) . \end{split}$$

*We extend* F<sup>Θ</sup> τ *to the predicate* <sup>F</sup>Γ,<sup>Θ</sup> τ *, where* <sup>Γ</sup> *ranges over arbitrary environments (possibly containing variables of type* R*) as follows:*

> <sup>t</sup> ∈ FΓ,<sup>Θ</sup> τ ⇐⇒ (Γ, <sup>Θ</sup> <sup>t</sup> : <sup>τ</sup> ∧ ∀γ. <sup>γ</sup> ∈ F<sup>Γ</sup> Θ <sup>=</sup><sup>⇒</sup> tγ ∈ F<sup>Θ</sup> τ ).

*Here,* γ *ranges over substitutions*<sup>6</sup> *and* <sup>γ</sup> ∈ F<sup>Γ</sup> Θ *holds if the support of* <sup>γ</sup> *is* <sup>Γ</sup> *and* γ(x) ∈ F<sup>Θ</sup> τ *, for any* (<sup>x</sup> : <sup>τ</sup> ) <sup>∈</sup> <sup>Γ</sup>*.*

Notice that Definition 1 ensures first-order real-valued functions to be in F, and asks for such a property to be hereditary preserved at higher-order types. Lemma 1 states that these conditions are indeed sufficient to guarantee any Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> term <sup>Θ</sup> <sup>t</sup> : <sup>R</sup> to denote a function in <sup>F</sup>.

**Lemma 1 (Fundamental Lemma).** *For all environments* Γ, Θ *as above, and for any expression* <sup>Γ</sup>, <sup>Θ</sup> <sup>t</sup> : <sup>τ</sup> *, we have* <sup>t</sup> ∈ FΓ,<sup>Θ</sup> τ *.*

*Proof.* By induction on t, observing that <sup>F</sup><sup>Θ</sup> τ is closed under denotational semantics: if s ∈ F<sup>Θ</sup> τ and -Θ t : τ <sup>=</sup> -Θ s : τ , then t ∈ F<sup>Θ</sup> τ . The proof follows the same structure of Lemma 3, and thus we omit details here.

Finally, a straightforward application of Lemma 1 gives the desired result, namely Theorem 1.

# **5 Automatic Differentiation**

In this section, we show how we can use open logical relations to prove the correctness of (a fragment of) the automatic differentiation algorithm of [50] (suitably adapted to our calculus).

*Automatic differentiation* [8,9,35] (AD, for short) is a family of techniques to efficiently compute the *numerical* (as opposed to *symbolical*) derivative of a computer program denoting a real-valued function. Roughly speaking, AD acts on the code of a program by letting variables incorporate values for their derivative, and operators propagate derivatives according to the *chain rule* of differential calculus [52]. Due to its vast applications in machine learning (backpropagation [49] being an example of an AD technique) and, most notably, in deep learning [9], AD is rapidly becoming a topic of interest in the programming language theory community, as witnessed by the new line of research called *differentiable programming* (see, e.g., [28,50,16,1] for some recent results on AD and programming language theory developed in the latter field).

AD comes several modes, the two most important ones being the *forward mode* (also called *tangent mode*) and the *backward mode* (also called *reverse mode*). These can be seen as different ways to compute the chain rule, the former by traversing the chain rule from inside to outside, while the latter from outside to inside.

<sup>6</sup> We write tγ for the result of applying γ to variables in t.

Here we are concerned with forward mode AD. More specifically, we consider the forward mode AD algorithm recently proposed in [50]. The latter is based on a source-to-source program transformation extracting out of a program t <sup>a</sup> new program <sup>D</sup>t whose evaluation simultaneously gives the result of computing t and its derivative. This is achieved by augmenting the code of t in such a way to handle *dual numbers*7.

The transformation roughly goes as follows: expressions s of type <sup>R</sup> are transformed into dual numbers, i.e. expressions s of type <sup>R</sup>×R, where the first component of <sup>s</sup> gives the original value of s, and the second component of s gives the derivative of s. Real-valued function symbols are then extended to handle dual numbers by applying the chain rule, while other constructors of the language are extended pointwise.

The algorithm of [50] has been studied by means of benchmarks and, to the best of the authors' knowledge, the only proof of its correctness available in the literature<sup>8</sup> has been given at the time of writing by Huot et al. in [37]. However, the latter proof relies on *denotational* semantics, and no *operational* proof of correctness has been given so far. Differentiability being a first-order concept, open logical relations are thus a perfect candidate for such a job.

*An AD Program Transformation* In the rest of this section, given a differentiable function <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, we denote by <sup>∂</sup>x<sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> its partial derivative with respect to the variable x. Let <sup>D</sup> be the collection of (real-valued) differentiable functions, and let us fix a collection F of real-valued functions such that, for any <sup>f</sup> <sup>∈</sup> <sup>D</sup>, both <sup>f</sup> and <sup>∂</sup>x<sup>f</sup> belong to <sup>F</sup>. We also assume <sup>F</sup> to contain functions for real number arithmetic. Notice that since <sup>∂</sup>x<sup>f</sup> is not necessarily differentiable, in general <sup>∂</sup>x<sup>f</sup> ∈ D.

We begin by recalling how the program transformation of [50] works on Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> , the extension of <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> with operators for functions in <sup>D</sup>. In order to define the derivative of a Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> expression, we first define an intermediate *program transformation* <sup>D</sup> : Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> <sup>→</sup> <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> such that:

$$
\Gamma \vdash t : \tau \implies \mathsf{D}\Gamma \vdash \mathsf{D}t : \mathsf{D}\tau.
$$

The action of D on types, environments, and expressions is defined in Figure 2. Notice that t is an expression in Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> , whereas <sup>D</sup><sup>t</sup> is an expression in <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> .

Let us comment the definition of D, beginning with its action on types. Following the rationale behind forward-mode AD, the map D associates to the type

<sup>7</sup> We represent dual numbers [21] as pairs of the form (x, x- ), with x, x- <sup>∈</sup> <sup>R</sup>. The first component, namely x, is subject to the usual real number arithmetic, whereas the second component, namely x- , obeys to first-order differentiation arithmetic. Dual numbers are usually presented, in analogy with complex numbers, as formal sums of the form x + x- ε, where ε is an abstract number (an infinitesimal) subject to the law ε<sup>2</sup> = 0.

<sup>8</sup> However, we remark that formal approaches to *backward* automatic differentiation for higher-order languages have been recently proposed in [1,16] (see Section 7).

$$\mathsf{DR} = \mathsf{R} \times \mathsf{R} \qquad \qquad \mathsf{D}(\cdot) = \cdot$$

$$\begin{array}{ll} \mathsf{D}(\tau\_{1} \times \tau\_{2}) = \mathsf{D}\tau\_{1} \times \mathsf{D}\tau\_{2} & \mathsf{D}(x : \tau, I') = \mathsf{dx} : \mathsf{D}\tau, \mathsf{D}\Gamma \\\\ \mathsf{D}\underline{r} = (\underline{r}, \underline{0}) \quad \mathsf{D}(\underline{f}(t\_{1}, \ldots, t\_{n})) = (\underline{f}(\mathsf{D}t\_{1}.1, \ldots, \mathsf{D}t\_{n}.1), \sum\_{i=1}^{n} \partial\_{x\_{i}} f(\mathsf{D}t\_{1}.1, \ldots, \mathsf{D}t\_{n}.1) \* \mathsf{D}t\_{i}.2) \\\\ \mathsf{D}x = \mathsf{dx} \quad \mathsf{D}(\lambda x.t) = \lambda \mathsf{dx}.\mathsf{D}t \quad \mathsf{D}(st) = (\mathsf{D}s)(\mathsf{D}t) \quad \mathsf{D}(t.i) = \mathsf{D}t.i \quad \mathsf{D}(t\_{1}, t\_{2}) = (\mathsf{D}t\_{1}, \mathsf{D}t\_{2}) \end{array}$$

Fig. 2: Intermediate transformation D

<sup>R</sup> the product type <sup>R</sup> <sup>×</sup> <sup>R</sup>, the first and second components of its inhabitants being the original expression and its derivative, respectively. The action of D on non-basic types is straightforward and it is designed so that the automatic differentiation machinery can handle higher-order expressions in such a way to guarantee correctness at real-valued function types.

The action of <sup>D</sup> on the usual constructors of the λ-calculus is pointwise, although it is worth noticing that <sup>D</sup> associates to any variable x of type τ a new variable, which we denote by <sup>d</sup>x, of type <sup>D</sup>τ . As we are going to see, if τ <sup>=</sup> <sup>R</sup>, then <sup>d</sup>x acts as a placeholder for a dual number.

More interesting is the action of D on real-valued constructors. To any numeral <sup>r</sup>, <sup>D</sup> associates the pair <sup>D</sup>r = (r, 0), the derivative of a number being zero. Let us now inspect the action of <sup>D</sup> on an operator f associated to <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> (we treat <sup>f</sup> as a function in the variables <sup>x</sup><sup>1</sup>, ... , <sup>x</sup>n). The interesting part is the second component of <sup>D</sup>(f(t<sup>1</sup>, ... ,tn)), namely

$$\sum\_{i=1}^{n} \frac{\partial\_{x\_i} f(\mathtt{D}t\_1.1, \ldots, \mathtt{D}t\_n.1) \* \mathtt{D}t\_i.2}{}$$

where <sup>n</sup> i=1 and <sup>∗</sup> denote the operators (of <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> ) associated to summation and (binary) multiplication (for readability we omit the underline notation), and <sup>∂</sup>x<sup>i</sup> <sup>f</sup> is the operator (of <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> ) associated to partial derivative <sup>∂</sup>x<sup>i</sup> <sup>f</sup> of <sup>f</sup> in the variable <sup>x</sup>i. It is not hard to recognize that the above expression is nothing but an instance of the *chain rule*.

Finally, we notice that if Γ t : τ is a (derivable) judgment in Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> , then indeed <sup>D</sup>Γ <sup>D</sup>t : <sup>D</sup>τ is a (derivable) judgment in Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> .

*Example 1.* Let us consider the binary function f(x<sup>1</sup>, <sup>x</sup><sup>2</sup>) = sin(x<sup>1</sup>) + cos(x<sup>2</sup>). For readability, we overload the notation writing f in place of f (and similarly for <sup>∂</sup>x<sup>i</sup> <sup>f</sup>). Given expressions <sup>t</sup><sup>1</sup>,t<sup>2</sup>, we compute <sup>D</sup>(sin(t<sup>1</sup>) + cos(t<sup>2</sup>)). Recall that

$$\begin{split} \partial\_{x\_1} f(x\_1, x\_2) &= \cos(x\_1) \text{ and } \partial\_{x\_2} f(x\_1, x\_2) = -\sin(x\_2). \text{ We have:} \\ \mathsf{D}(\sin(t\_1) + \cos(t\_2)) \\ &= (\sin(\mathsf{D}t\_1.1) + \cos(\mathsf{D}t\_2.1), \partial\_{x\_1} f(\mathsf{D}t\_1.1, \mathsf{D}t\_2.1) \ast \mathsf{D}t\_1.2 + \partial\_{x\_2} f(\mathsf{D}t\_1.1, \mathsf{D}t\_2.1) \ast \mathsf{D}t\_2.2) \\ &= (\sin(\mathsf{D}t\_1.1) + \cos(\mathsf{D}t\_2.1), \cos(\mathsf{D}t\_1.1) \ast \mathsf{D}t\_1.2 - \sin(\mathsf{D}t\_2.1) \ast \mathsf{D}t\_2.2). \end{split}$$

As a consequence, we see that <sup>D</sup>(λx.λy. sin(x) + cos(y)) is

λdx.λdy.(sin(dx.1) + cos(dy.1), cos(dx.1) <sup>∗</sup> <sup>d</sup>x.2 <sup>−</sup> sin(dy.1) <sup>∗</sup> <sup>d</sup>y.2).

We now aim to define the derivative of an expression <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup> with respect to a variable x (of type <sup>R</sup>). In order to do so we first associate to any variable <sup>y</sup> : <sup>R</sup> its dual expression dualx(y) : <sup>R</sup> <sup>×</sup> <sup>R</sup> defined as:

$$\mathsf{dual}\_x(y) = \begin{cases} (y, \underline{1}) & \text{if } x = y \\ (y, \underline{0}) & \text{otherwise.} \end{cases}$$

Next, we define for <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup> the derivative deriv(x,t) of <sup>t</sup> with respect to x as:

$$\mathbf{Deriv}(x,t) = \mathbb{D}t[\mathbf{du} \mathbf{a} \mathbf{1}\_x(x\_1)/\mathbf{d}x\_1, \dots, \mathbf{da} \mathbf{a} \mathbf{1}\_x(x\_n)/\mathbf{d}x\_n].2$$

Let us clarify this passage with a simple example.

*Example 2.* Let us compute the derivative of x : <sup>R</sup>, y : <sup>R</sup> t : <sup>R</sup>, where t <sup>=</sup> x <sup>∗</sup> y. We first of all compute <sup>D</sup>t, obtaining:

$$\mathsf{d}x : \mathsf{R} \times \mathsf{R}, \mathsf{d}y : \mathsf{R} \times \mathsf{R} \vdash ((\mathsf{d}x.1) \ast (\mathsf{d}y.1), (\mathsf{d}x.1) \ast (\mathsf{d}y.2) + (\mathsf{d}x.2) \ast (\mathsf{d}y.1)) : \mathsf{R} \times \mathsf{R}.$$

Observing that dualx(x)=(x, 1) and dualx(y)=(y, 0), we indeed obtain the desired derivative as <sup>x</sup> : <sup>R</sup>, <sup>y</sup> : <sup>R</sup> <sup>D</sup>t[dualx(x)/dx, dualx(y)/dy].2 : <sup>R</sup>. For we have:

$$\begin{aligned} & \left[ \left[ x: \mathbb{R}, y: \mathbb{R} \vdash \mathbb{D}t [\mathbf{du} \mathbf{a} \mathbf{1}\_x(x)/\mathbf{d}x, \mathbf{du} \mathbf{a} \mathbf{1}\_x(y)/\mathbf{d}y].2:\mathbb{R} \right] \right] \\ &= \left[ \left[ x: \mathbb{R}, y: \mathbb{R} \vdash \left( x\*y, x\*0+1\*y \right).2:\mathbb{R} \right] \right] \\ &= \left[ \left[ x: \mathbb{R}, y: \mathbb{R} \vdash y: \mathbb{R} \right] = \partial\_x \left[ x: \mathbb{R}, y: \mathbb{R} \vdash x\*y: \mathbb{R} \right]. \end{aligned}$$

*Remark 1.* For <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> we have <sup>Θ</sup> dualy(xi) : DR and <sup>Θ</sup> <sup>D</sup>s[dualy(x<sup>1</sup>)/dx<sup>1</sup>, ... , dualy(xn)/dxn] : <sup>D</sup><sup>τ</sup> , for any variable <sup>y</sup> and <sup>Θ</sup> <sup>s</sup> : <sup>τ</sup> .

*Open Logical relations for AD* We have claimed that the operation deriv performs automatic differentiation of Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> expressions. By that we mean that once applied to expressions of the form <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup>, the operation deriv can be used to compute the derivative of <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> <sup>t</sup> : <sup>R</sup>. We now show how we can prove such a statement using open logical relations, this way providing a proof of correctness of our AD program transformation.

We begin by defining a logical relations <sup>R</sup> between Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> and <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> expressions. We design <sup>R</sup> in such a way that (i) tRDt and (ii) if tRs and t inhabits a first-order type, then indeed s corresponds to the derivative of t. While (ii) essentially holds by definition, (i) requires some efforts in order to be proved.

**Definition 2 (Open Logical Relation).** *Let* <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup> *be a fixed, arbitrary environment. Define the family of relations* (R<sup>Θ</sup> τ )Θ,<sup>τ</sup> *between* <sup>Λ</sup>×,→,<sup>R</sup> D *and* Λ×,→,<sup>R</sup> <sup>F</sup> *expressions by induction on* <sup>τ</sup> *as follows:*

$$\begin{split} &t\,\mathcal{R}\_{\mathbb{R}}^{\Theta}s \iff \begin{cases} \Theta \vdash t : \mathbb{R} \wedge \mathbb{D}\Theta \vdash s : \mathbb{R} \times \mathbb{R} \\ \forall y : \mathbb{R} \\ \left[\Theta \vdash s[\mathtt{dual}\_{y}(x\_{1})/\mathtt{dx}\_{1}, \ldots, \mathtt{dual}\_{y}(x\_{n})/\mathtt{dx}\_{n}].1 : \mathbb{R}\right] = \left[\Theta \vdash t : \mathbb{R}\right] \\ \left[\Theta \vdash s[\mathtt{dual}\_{y}(x\_{1})/\mathtt{dx}\_{1}, \ldots, \mathtt{dual}\_{y}(x\_{n})/\mathtt{dx}\_{n}].2 : \mathbb{R}\right] = \partial\_{y}[\Theta \vdash t : \mathbb{R}] \\ t\,\mathcal{R}\_{\tau\_{1}\to\tau\_{2}}^{\Theta}s \iff \left\{\begin{array}{l} \Theta \vdash t : \tau\_{1} \to \tau\_{2} \wedge \mathbb{D}\Theta \vdash s : \mathbb{D}\tau\_{1} \to \mathbb{D}\tau\_{2} \\ \forall p,q \;\, p\,\mathcal{R}\_{\tau\_{1}}^{\Theta}q \implies t p \,\mathcal{R}\_{\tau\_{2}}^{\Theta}sq \end{array} \right. \\ &t\,\mathcal{R}\_{\tau\_{1}\times\tau\_{2}}^{\Theta}s \iff \left\{\begin{array}{l} \Theta \vdash t : \tau\_{1} \times \tau\_{2} \wedge \mathbb{D}\Theta \vdash s : \mathbb{D}\tau\_{1} \times \mathbb{D}\tau\_{2} \\ \forall i \in \{1,2\} \; \, t.\, i\,\mathcal{R}\_{\tau\_{i}}^{\Theta}s .i \end{array} \right. \end{split}$$

*We extend* R<sup>Θ</sup> τ *to the family* (RΓ,<sup>Θ</sup> τ )Γ,Θ,<sup>τ</sup> *, where* <sup>Γ</sup> *ranges over arbitrary environments (possibly containing variables of type* R*), as follows:*

$$t\mathcal{R}^{\Gamma,\Theta}\_{\tau}s \iff (\Gamma,\Theta \vdash t : \tau) \land (\textsf{D}\Gamma,\textsf{D}\Theta \vdash s : \textsf{D}\tau) \land (\forall \gamma,\delta . \,\gamma \,\mathsf{R}^{\Gamma}\_{\Theta}\delta \implies t\gamma \,\mathsf{R}^{\Theta}\_{\tau}s\delta)$$

*where* γ*,* δ *range over substitutions, and:*

$$\sim \mathcal{R}\_{\Theta}^{\Gamma} \delta \iff (\mathsf{supp}(\gamma) = \Gamma) \land (\mathsf{supp}(\delta) = \mathsf{D}\Gamma) \land (\forall (x : \tau) \in \Gamma . \,\gamma(x) \,\mathsf{R}\_{\tau}^{\Theta} \,\delta(\mathsf{d}x)) .$$

Obviously, Definition 2 satisfies condition (ii) above. What remains to be done is to show that it satisfies condition (i) as well. In order to prove such a result, we first need to show that the logical relation respects the denotational semantics of Λ<sup>×</sup>,→,<sup>R</sup> <sup>D</sup> .

**Lemma 2.** *Let* <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup>*. Then, the following hold:*

$$\begin{aligned} t' \mathcal{R}\_{\tau}^{\Theta} \; s \wedge [\Theta \vdash t : \tau] &= [\Theta \vdash t' : \tau] \implies t \, \mathcal{R}\_{\tau}^{\Theta} \; s \\\ t \, \mathcal{R}\_{\tau}^{\Theta} \; s' \wedge [\mathsf{D}\Theta \vdash s' : \mathsf{D}\tau] &= [\mathsf{D}\Theta \vdash s : \mathsf{D}\tau] \implies t \, \mathcal{R}\_{\tau}^{\Theta} \; s .\end{aligned}$$

*Proof.* A standard induction on τ .

We are now ready to state and prove the main result of this section.

**Lemma 3 (Fundamental Lemma).** *For all environments* Γ, Θ *and for any expression* <sup>Γ</sup>, <sup>Θ</sup> <sup>t</sup> : <sup>τ</sup> *, we have* <sup>t</sup> <sup>R</sup>Γ,<sup>Θ</sup> τ <sup>D</sup>t*.*

*Proof.* We prove the following statement, by induction on t:

$$(\forall t.\ \forall \tau.\ \forall \Gamma, \Theta.\ (\Gamma, \Theta \vdash t : \tau \implies t \,\mathcal{R}^{\Gamma, \Theta}\_{\tau} \,\mathsf{D}t).$$

We show only the most relevant cases. Suppose t is a variable x. We distinguish whether x belongs to Γ or Θ.

1. Suppose (x : <sup>R</sup>) <sup>∈</sup> Θ. We have to show x <sup>R</sup>Γ,<sup>Θ</sup> <sup>R</sup> <sup>d</sup>x, i.e.

$$\begin{aligned} \left[\Theta \vdash \mathbf{d}x[\mathsf{du}\mathbf{1}\_y(x)/\mathsf{d}x].1:\mathsf{R}\right] &= \left[\Theta \vdash x:\mathsf{R}\right] \\ \left[\Theta \vdash \mathbf{d}x[\mathsf{du}\mathbf{1}\_y(x)/\mathsf{d}x].2:\mathsf{R}\right] &= \partial\_y[\Theta \vdash x:\mathsf{R}] \end{aligned}$$

for any variable y (of type <sup>R</sup>). The first identity obviously holds as

$$\left[\Theta \vdash \mathbf{d}x[\mathsf{dua1}\_y(x)/\mathsf{d}x].1:\mathsf{R}\right] = \left[\Theta \vdash \mathbf{d}x[(x,b)/\mathsf{d}x].1:\mathsf{R}\right] = \left[\Theta \vdash x:\mathsf{R}\right],$$

where <sup>b</sup> ∈ {0, 1}. For the second identity we distinguish whether y <sup>=</sup> x or y <sup>=</sup> <sup>x</sup>. In the former case we have dualy(x)=(x, 1), and thus:

$$\mathbb{I}\left[\Theta \vdash \mathbf{d}x[\mathsf{dua1}\_y(x)/\mathsf{d}x].2:\mathbb{R}\right] = \left[\Theta \vdash \underline{1}:\mathbb{R}\right] = \partial\_y\left[\Theta \vdash y:\mathbb{R}\right].$$

In the latter case we have dualy(x)=(x, 0), and thus:

$$\mathbb{I}\left[\Theta \vdash \mathbf{d}x[\mathsf{dua}1\_y(x)/\mathsf{d}x].2:\mathsf{R}\right] = \left[\Theta \vdash \underline{\Omega}:\mathsf{R}\right] = \partial\_y\llbracket\Theta \vdash x:\mathsf{R}\right].$$

2. Suppose (<sup>x</sup> : <sup>τ</sup> ) <sup>∈</sup> <sup>Γ</sup>. We have to show <sup>x</sup> <sup>R</sup>Γ,<sup>Θ</sup> <sup>d</sup>x, i.e. γ(x) <sup>R</sup><sup>Θ</sup> τ <sup>δ</sup>(dx), for all substitutions γ, δ such that γ <sup>R</sup><sup>Γ</sup> Θ <sup>δ</sup>. Since <sup>x</sup> belongs to <sup>Γ</sup>, we are trivially done.

Suppose t is λx.s, so that we have

$$\frac{\Gamma, \Theta, x:\tau\_1 \vdash s:\tau\_2}{\Gamma, \Theta \vdash \lambda x.s:\tau\_1 \to \tau\_2}$$

for some types τ<sup>1</sup>, τ<sup>2</sup>. As x is bound in λx.s, without loss of generality we can assume (<sup>x</sup> : <sup>τ</sup><sup>1</sup>) <sup>∈</sup> Γ <sup>∪</sup> Θ. Let Δ <sup>=</sup> Γ, x : τ<sup>1</sup>, so that we have Δ, Θ s : τ<sup>2</sup>, and thus <sup>s</sup> <sup>R</sup>Δ,<sup>Θ</sup> <sup>τ</sup><sup>2</sup> <sup>D</sup>s, by induction hypothesis. By definition of open logical relation, we have to prove that for arbitrary γ, δ such that γ <sup>R</sup><sup>Γ</sup> Θ <sup>δ</sup>, we have

$$
\lambda x.s\gamma \,\mathcal{R}^{\Theta}\_{\tau\_1 \to \tau\_2} \,\lambda \mathbf{d}x.(\mathbb{D}s)\delta,
$$

i.e. (λx.sγ)p <sup>R</sup><sup>Θ</sup> <sup>τ</sup><sup>2</sup> (λdx.(Ds)δ)q, for all <sup>p</sup> <sup>R</sup><sup>Θ</sup> <sup>τ</sup><sup>1</sup> <sup>q</sup>. Let us fix a pair (p, <sup>q</sup>) as above. By Lemma 2, it is sufficient to show (sγ)[p/x]R<sup>Θ</sup> <sup>τ</sup><sup>2</sup> ((Ds)δ)[q/dx]. Let <sup>γ</sup> , δ be the substitutions defined as follows:

$$\gamma'(y) = \begin{cases} p & \text{if } y = x \\ \gamma(y) & \text{otherwise} \end{cases} \qquad \delta'(y) = \begin{cases} q & \text{if } y = \mathbf{d}x \\ \delta(y) & \text{otherwise} \end{cases}$$

It is easy to see that γ <sup>R</sup><sup>Δ</sup> Θ δ , so that by <sup>s</sup> <sup>R</sup>Δ,<sup>Θ</sup> <sup>τ</sup><sup>2</sup> <sup>D</sup><sup>s</sup> (recall that the latter follows by induction hypothesis) we infer sγ <sup>R</sup><sup>Θ</sup> <sup>τ</sup><sup>2</sup> (Ds)δ , by the very definition of open logical relation. As a consequence, the thesis is proved if we show

$$(s\gamma)[p/x] = s\gamma';\qquad\qquad(\text{(\textsf{Ds})}\delta)[q/\mathsf{d}x] = (\textsf{Ds})\delta'.$$

The above identities hold if x <sup>∈</sup> F V (γ(y)) and <sup>d</sup>x <sup>∈</sup> F V (δ(dy)), for any (y : τ ) <sup>∈</sup> Γ. This is indeed the case, since γ(y) <sup>R</sup><sup>Θ</sup> τ <sup>δ</sup>(dy) implies <sup>Θ</sup> <sup>γ</sup>(y) : <sup>τ</sup> and <sup>D</sup>Θ δ(dy) : <sup>D</sup>τ , and x <sup>∈</sup> Θ (and thus <sup>d</sup>x <sup>∈</sup> <sup>D</sup>Θ).

A direct application of Lemma 3 allows us to conclude the correctness of the program transformation <sup>D</sup>. In fact, given a first-order term Θ t : <sup>R</sup>, with <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup>n : <sup>R</sup>, by Lemma <sup>3</sup> we have <sup>t</sup> <sup>R</sup><sup>Θ</sup> <sup>R</sup> <sup>D</sup>t, and thus

$$
\partial\_y \left[ \Theta \vdash t : \mathbb{R} \right] = \left[ \Theta \vdash \mathbb{D}t [\mathsf{du} \mathbf{1}\_y(x\_1)/\mathsf{d}x\_1, \dots, \mathsf{du} \mathbf{a} \mathbf{1}\_y(x\_n)/\mathsf{d}x\_n].2 : \mathbb{R} \right],
$$

for any real-valued variable y, meaning that <sup>D</sup>t indeed computes the partial derivative of t.

**Theorem 2.** *For any term* Θ t : <sup>R</sup> *as above, the term* <sup>D</sup>Θ <sup>D</sup>t : DR *computes the partial derivative of* t*, i.e., for any variable* y *we have*

∂y-Θ t : <sup>R</sup> <sup>=</sup> -<sup>Θ</sup> <sup>D</sup>t[dualy(x<sup>1</sup>)/dx<sup>1</sup>, ... , dualy(xn)/dxn].2 : <sup>R</sup>.

# **6 On Refinement Types and Local Continuity**

In Section 4, we exploited open logical relations to establish a containment theorem for the calculus Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> , i.e. the calculus <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> extended with real-valued functions belonging to a set F including projections and closed under function composition. Since the collection C of (real-valued) *continuous* functions satisfies both constraints, Theorem 1 allows us to conclude that all first order terms of Λ<sup>×</sup>,→,<sup>R</sup> <sup>C</sup> represent continuous functions.

The aim of the present section is the development of a framework to prove continuity properties of programs in a calculus that goes *beyond* Λ<sup>×</sup>,→,<sup>R</sup> <sup>C</sup> . More specifically, (i) we do not restrict our analysis to calculi having operators representing continuous real-valued functions only, but consider operators for arbitrary real-valued functions, and (ii) we add to our calculus an if-then-else construct whose static semantics is captured by the following rule:

$$\frac{\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \end{$$

The intended dynamic semantics of the term if t then s else p is the same as the one of s whenever t evaluates to any real number r = 0 and the same as the one of p if it evaluates to 0.

Notice that the crux of the problem we aim to solve is the presence of the if-then-else construct. Indeed, independently of point (i), such a construct breaks the global continuity of programs, as illustrated in Figure 3a. As a consequence we are forced to look at *local* continuity properties, instead: for instance we can say that the program of Figure 3a is continuous both on <sup>R</sup><<sup>0</sup> and <sup>R</sup>≥<sup>0</sup>. Observe that guaranteeing local continuity allows us (up to a certain point) to recover the ability of approximating the output of a program by approximating its input. Indeed, if a program t : <sup>R</sup> <sup>×</sup> ... <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>R</sup> is *locally continuous* on a subset X of <sup>R</sup>n, then the value of ts (for some input <sup>s</sup>) can be approximated

Fig. 3: Simply typed first-order programs with branches

by passing as argument to <sup>t</sup> a family (sn)n∈<sup>N</sup> of approximations of <sup>s</sup>, *as long as* both <sup>s</sup> and all the (sn)n∈<sup>N</sup> are indeed elements of <sup>X</sup>. Notice that the continuity domains we are interested in are not necessary open sets: we could for instance be interested in functions that are continuous on the unit circle, i.e. the points {(a, <sup>b</sup>) <sup>|</sup> <sup>a</sup><sup>2</sup> <sup>+</sup> b<sup>2</sup> = 1} ⊆ <sup>R</sup><sup>2</sup>. For this reason we will work with the notion of *sequential* continuity, instead of the usual topological notion of continuity. It must be observed, however, that these two notions coincide as soon as the continuity domain X is actually an open set.

**Definition 3 (Sequential Continuity).** *Let* <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>*, and* X *be any subset of* <sup>R</sup>n*. We say that* <sup>f</sup> *is* (sequentially) continuous *on* <sup>X</sup> *if for every* <sup>x</sup> <sup>∈</sup> <sup>X</sup>*, and for every sequence* (xn)n∈<sup>N</sup> *of elements of* <sup>X</sup> *such that* limn→∞ <sup>x</sup>n <sup>=</sup> <sup>x</sup>*, it holds that* limn→∞ <sup>f</sup>(xn) = <sup>f</sup>(x)*.*

In [18], Chaudhuri et al. introduced a logical system designed to guarantee local continuity properties on programs in an *imperative* (first-order) programming language with conditional branches and loops. In this section, we develop a similar system in the setting of a *higher-order functional language* with an if-then-else construct, and we use open logical relations to prove the soundness of our system. This witnesses, on yet another situation, the versatility of open logical relations. Compared to [18], we somehow generalize from a result on programs built from only first-order constructs and primitive functions, to a containment result for programs built using also higher-order constructs.

We however mention that, although our system is inspired by the work of Chaudhuri at al., there are significant differences between the two, even at the first-order level. The consequences these differences have on the expressive power of our systems are twofold:

• On the one hand, while inferring continuity on some domain X of a program of the form if t then s else p, we have more flexibility than [18] for the domains of continuity of s and p. To be more concrete, let us consider the program λx.(if (x > 0) then <sup>0</sup> else (if x = 4 then <sup>1</sup> else 0)), which is continuous on <sup>R</sup> even though the second branch is continuous on <sup>R</sup>≤<sup>0</sup>, but not on R. We are able to show in our system that this program is indeed continuous on *the whole* domain R, while Chaudhuri et al. cannot do the same in their system for the corresponding imperative program: they ask the domain of continuity of *each* of the two branches to *coincide* with the domain of continuity of the whole program.

• On the other hand, the system of Chaudhuri at al. allows one to express continuity along a restricted set of variables, which we cannot do. To illustrate this, let us look at the program: λx, y.if (x = 0) then (3 <sup>∗</sup> y) else (4 <sup>∗</sup> y): along the variable y, this program is continuous on the whole of <sup>R</sup>. Chaudhuri et al. are able to express and prove this statement in their system, while we can only say that for every real a, this program is continuous on the domain {a} × <sup>R</sup>.

For the sake of simplicity, it is useful to slightly simplify our calculus; the ideas we present here, however, would still be valid in a more general setting, but that would make the presentation and proofs more involved. As usual, let F be a collection of real-valued functions. We consider the restriction of the calculus Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> obtained by considering types of the form

$$\tau \implies \mathsf{R} \mid \rho; \qquad \qquad \rho \implies = \rho\_1 \times \cdots \times \rho\_n \times \underbrace{\mathsf{R} \times \cdots \times \mathsf{R}}\_{m \text{-times}} \to \tau;$$

only. For the sake of readability, we employ the notation (ρ<sup>1</sup> ... , <sup>ρ</sup>n, <sup>R</sup>, ... , <sup>R</sup>) <sup>→</sup> <sup>τ</sup> in place of <sup>ρ</sup><sup>1</sup> ×···× <sup>ρ</sup>n <sup>×</sup> <sup>R</sup> ×···× <sup>R</sup> <sup>→</sup> <sup>τ</sup> . We also overload the notation and keep indicating the resulting calculus as Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> . Nonetheless, the reader should keep in mind that from now on, whenever referring to a Λ<sup>×</sup>,→,<sup>R</sup> <sup>F</sup> term, we are tacitly referring to a term typable according to the restricted type system, but that can indeed contain conditionals.

Since we want to be able to talk about *composition properties* of locally continuous programs, we actually need to talk not only about the points where a program is continuous, but also about the *image* of this continuity domain. In higher-order languages, a well-established framework for the latter kind of specifications is the one of *refinement types*, that have been first introduced by [31] in the context of ML types: the basic idea is to annotate an existing type system with logical formulas, with the aim of being more precise about the underlying program's behaviors than in simple types. Here, we are going to adapt this framework by replacing the image annotations provided by standard refinement types with *continuity annotations*.

#### **6.1 A Refinement Type System Ensuring Local Continuity**

Our refinement type system is developed on top of the simple types system of Section 2 (actually, on the simplification of such a system we are considering in this section). We first need to introduce a set of logical formulas which talk about n-uples of real numbers, and which we use as annotations in our refinement types. We consider a set V of logical variables, and we construct formulas as follows:

$$\begin{array}{lclcl}\psi,\phi\in\mathcal{L} & ::= & \top & \mid & (e\leq e) & \mid & \psi\wedge\phi & \mid & \neg\psi, \\\\ e\in\mathcal{E} & ::= & \alpha & \mid & a & \mid & f(e,\ldots,e) & \quad \text{with } \alpha\in\mathcal{V}, a\in\mathbb{R}, f:\mathbb{R}^{n}\to\mathbb{R}.\end{array}$$

Recall that with the connectives in our logic, we are able to encode logical disjunction and implication, and as customary, we write φ <sup>⇒</sup> ψ for <sup>¬</sup>φ <sup>∨</sup> ψ. A *real assignment* is a partial map σ : V → <sup>R</sup>. When σ has finite support, we sometimes specify <sup>σ</sup> by writing (α<sup>1</sup> → <sup>σ</sup>(α1), ... , <sup>α</sup>n → <sup>σ</sup>(αn)). We note <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> when σ is defined on the variables occurring in φ, and moreover the real formula obtained when replacing along σ the logical variables of φ is true. We write <sup>|</sup><sup>=</sup> φ when σ <sup>|</sup><sup>=</sup> φ always holds, independently on σ.

We can associate to every formula the subset of R<sup>n</sup> consisting of all points where this formula holds: more precisely, if <sup>φ</sup> is a formula, and <sup>X</sup> <sup>=</sup> <sup>α</sup><sup>1</sup>, ... , <sup>α</sup>n is a list of logical variables such that Vars(φ) <sup>⊆</sup> X, we call *truth domain of* φ *w.r.t.* X the set:

Dom(φ) <sup>X</sup> <sup>=</sup> {(a<sup>1</sup>, ... , <sup>a</sup>n) <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> (α<sup>1</sup> → <sup>a</sup><sup>1</sup>, ... , <sup>α</sup>n → <sup>a</sup>n) <sup>|</sup><sup>=</sup> <sup>φ</sup>}.

We are now ready to define the language of refinement types, which can be seen as simple types annotated by logical formulas. The type R is annotated by logical *variables*: this way we obtain *refinement real types* of the form {α <sup>∈</sup> <sup>R</sup>}. The crux of our refinement type system consists in the annotations we put *on the arrows*. We introduce two distinct refined arrow constructs, depending on the shape of the target type: more precisely we annotate the arrow of a type (T<sup>1</sup>, ... , <sup>T</sup>n) <sup>→</sup> <sup>R</sup> with *two* logical formulas, while we annotate (T<sup>1</sup>, ... , <sup>T</sup>n) <sup>→</sup> <sup>H</sup> (where H is an higher-order type) with only *one* logical formula. This way, we obtain refined arrow types of the form (T<sup>1</sup>, ... , <sup>T</sup>n) ψ<sup>φ</sup> → {<sup>α</sup> <sup>∈</sup> <sup>R</sup>}, and (T<sup>1</sup>, ... , <sup>T</sup>n) <sup>ψ</sup> → H: in both cases the formula ψ specifies the continuity domain, while the formula φ is an *image annotation* used only when the target type is ground. The intuition is as follows: a program of type (H<sup>1</sup>, ... , <sup>H</sup>n, {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>}, ... , {αn <sup>∈</sup> <sup>R</sup>}) ψ<sup>φ</sup> → {α <sup>∈</sup> <sup>R</sup>} uses its real arguments continuously on the domain specified by the formula ψ (w.r.t <sup>α</sup><sup>1</sup>, ... , <sup>α</sup>n), and this domain is sent into the domain specified by the formula <sup>φ</sup> (w.r.t. <sup>α</sup>). Similarly, a program of the type (T<sup>1</sup>, ... , <sup>T</sup>n) <sup>ψ</sup> <sup>→</sup> H has its real arguments used in a continuous way on the domain specified by ψ, but it is not possible anymore to specify an image domain, because H is higher-order.

The general form of our refined types is thus as follows:

$$\begin{array}{lcl} T & ::= \ H & \mid & F; & \mid & F & ::= \{\alpha \in \mathbb{R}\}; \\\\ H & ::= \ (H\_1, \ldots, H\_m, F\_1, \ldots, F\_n) \stackrel{\psi}{\to} H & \mid & (H\_1, \ldots, H\_m, F\_1, \ldots, F\_n) \stackrel{\psi \circ \cdots \circ \phi}{\to} F; \end{array}$$

with <sup>n</sup> <sup>+</sup> m > 0, Vars(φ) ⊆ {α}, Vars(ψ) ⊆ {α<sup>1</sup>, ... , <sup>α</sup>n} when <sup>F</sup> <sup>=</sup> {<sup>α</sup> <sup>∈</sup> <sup>R</sup>}, <sup>F</sup>i <sup>=</sup> {αi <sup>∈</sup> <sup>R</sup>}, and the (αi)<sup>1</sup>≤i≤n are distinct. We take refinement types up to renaming of logical variables. If T is a refinement type, we write <sup>T</sup> for the simple type we obtain by forgetting about the annotations in T.

*Example 3.* We illustrate in this example the intended meaning of our refinement types.

• We first look at how to refine <sup>R</sup> <sup>→</sup> <sup>R</sup>: those are types of the form {α<sup>1</sup> <sup>∈</sup> R} φ1φ2 → {α<sup>2</sup> <sup>∈</sup> <sup>R</sup>}. The intended inhabitants of these types are the programs

t : <sup>R</sup> <sup>→</sup> <sup>R</sup> such that i) t is continuous on the truth domain of φ1; and ii) <sup>t</sup> sends the truth domain of <sup>φ</sup><sup>1</sup> into the truth domain of <sup>φ</sup>2. As an example, <sup>φ</sup><sup>1</sup> could be (α<sup>1</sup> <sup>&</sup>lt; 3), and <sup>φ</sup><sup>2</sup> could be (α<sup>2</sup> <sup>≥</sup> 5). An example of a program having this type is t <sup>=</sup> λx.(5 <sup>+</sup> <sup>f</sup>(x)), where <sup>f</sup> : <sup>R</sup> <sup>→</sup> <sup>R</sup> is defined as f(a) = <sup>1</sup> <sup>3</sup>−a when a < <sup>3</sup> 0 otherwise , and moreover we assume that {f, +} ⊆ <sup>F</sup>. • We look now at the possible refinements of <sup>R</sup> <sup>→</sup> (<sup>R</sup> <sup>→</sup> <sup>R</sup>): those are of the form

{α<sup>1</sup> <sup>∈</sup> <sup>R</sup>} <sup>→</sup> <sup>θ</sup><sup>1</sup> ({α<sup>2</sup> <sup>∈</sup> <sup>R</sup>} θ2 → { <sup>θ</sup><sup>3</sup> <sup>α</sup><sup>3</sup> <sup>∈</sup> <sup>R</sup>}). The intended inhabitants of these types are the programs t : <sup>R</sup> <sup>→</sup> (<sup>R</sup> <sup>→</sup> <sup>R</sup>) whose interpretation function (x, y) <sup>∈</sup> <sup>R</sup><sup>2</sup> → t(x)(y) sends continously Dom(θ<sup>1</sup>)α<sup>1</sup> <sup>×</sup> Dom(θ<sup>2</sup>)α<sup>2</sup> into Dom(θ<sup>3</sup>)α<sup>3</sup> . As an example, consider <sup>θ</sup><sup>1</sup> = (α<sup>1</sup> <sup>&</sup>lt; 1), <sup>θ</sup><sup>2</sup> = (α<sup>2</sup> <sup>≤</sup> 3), and <sup>θ</sup><sup>3</sup> = (α<sup>3</sup> <sup>&</sup>gt; 0). An example of a program having this type is λx<sup>1</sup>.λx<sup>2</sup>.f(x<sup>1</sup> <sup>∗</sup> <sup>x</sup><sup>2</sup>) where we take f as above.

A refined typing context <sup>Γ</sup> is a list <sup>x</sup><sup>1</sup> : <sup>T</sup><sup>1</sup>, ... , <sup>x</sup>n : <sup>T</sup>n, where each <sup>T</sup>i is a refinement type. In order to express continuity constraints, we need to *annotate* typing judgments by logical formulas, in a similar way as what we do for arrow types. More precisely, we consider two kinds of refined typing judgments: one for terms of ground type, and one for terms of higher-order type:

$$
\Gamma \overset{\psi}{\vdash}\_{\mathbf{r}} t : H; \qquad \Gamma \overset{\psi \leadsto \phi}{\vdash}\_{\mathbf{r}} t : F.
$$

#### **6.2 Basic Typing Rules**

We first consider refinement typing rules for the fragment of our language which excludes conditionals: they are given in Figure 4. We illustrate them by way of a series of examples.

*Example 4.* We first look at the typing rule var-F: if θ implies θ , then the variable x—that, in semantics terms, does the projection of the context Γ to one of its component—sends continuously the truth domain of θ into the truth domain of θ . Using this rule we can, for instance, derive the following judgment:

$$x: \{\alpha \in \mathbb{R}\}, y: \{\beta \in \mathbb{R}\} \stackrel{(\alpha \ge 0 \land \beta \ge 0) \leadsto (\alpha \ge 0)}{\vdash\_x} x: \{\alpha \in \mathbb{R}\}. \tag{1}$$

*Example 5.* We now look at the Rf rule, that deals with functions from F. Using this rule, we can show that:

$$x: \{\alpha \in \mathbb{R}\}, y: \{\beta \in \mathbb{R}\} \stackrel{\{\alpha \ge 0 \land \beta \ge 0\} \rightsquigarrow (\gamma \ge 0)}{\vdash\_{\mathbf{r}}} \underline{\min}(x, y): \{\gamma \in \mathbb{R}\}.\tag{2}$$

Before giving the refined typing rule for the if-then-else construct, we also illustrate on an example how the rules in Figure 4 allow us to exploit the continuity informations we have on functions in F, compositionally.

var-H Γ, x : H ψ <sup>r</sup> x : H |= θ ⇒ θ- var-F <sup>Γ</sup>, <sup>x</sup> : {<sup>α</sup> <sup>∈</sup> <sup>R</sup>} θθ- <sup>r</sup> <sup>x</sup> : {<sup>α</sup> <sup>∈</sup> <sup>R</sup>} f ∈ F is continuous on Dom(θ- <sup>1</sup> ∧ ... ∧ θ- n) α1...αn f(Dom(θ- <sup>1</sup> ∧ ... ∧ θ- n) <sup>α</sup>1...α<sup>n</sup> ) <sup>⊆</sup> Dom(θ- ) <sup>β</sup> Γ θθ- i <sup>r</sup> <sup>t</sup><sup>i</sup> : {α<sup>i</sup> <sup>∈</sup> <sup>R</sup>} Rf <sup>Γ</sup> <sup>θ</sup>θ- <sup>r</sup> <sup>f</sup>(t<sup>1</sup> ...tn) : {<sup>β</sup> <sup>∈</sup> <sup>R</sup>} Γ, x<sup>1</sup> : T1, ... , x<sup>n</sup> : T<sup>n</sup> ψ(η) <sup>r</sup> <sup>t</sup> : <sup>T</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>∧</sup> <sup>ψ</sup><sup>2</sup> <sup>⇒</sup> <sup>ψ</sup> abs Γ ψ2 <sup>r</sup> <sup>λ</sup>(x1, ... , <sup>x</sup>n).<sup>t</sup> : (T1, ... , <sup>T</sup>n) <sup>ψ</sup>1(η) <sup>→</sup> <sup>T</sup> (Γ φ <sup>r</sup> s<sup>i</sup> : Hi)<sup>1</sup>≤i≤<sup>m</sup> Γ φ <sup>r</sup> <sup>t</sup> : (H1, ... , <sup>H</sup>m, <sup>F</sup>1, ... , <sup>F</sup>n) <sup>θ</sup>(η) <sup>→</sup> <sup>T</sup> |= θ<sup>1</sup> ∧ ... ∧ θ<sup>n</sup> ⇒ θ (Γ φθj <sup>r</sup> p<sup>j</sup> : F<sup>j</sup> )<sup>1</sup>≤j≤<sup>m</sup> app Γ φ(η) <sup>r</sup> t(s1, ... , sm, p1, ... , pm) : T The formula ψ(η) should be read as ψ when T is a higher-order type, and as ψ η when T is a ground type.

Fig. 4: Typing Rules

*Example 6.* Let f : <sup>R</sup> <sup>→</sup> <sup>R</sup> be the function defined as: f(x) = <sup>−</sup>x if x < <sup>0</sup> x + 1 otherwise .

Observe that we can actually regard f as represented by the program in Figure 3a—but we consider it as a primitive function in F for the time being, since we have not introduced the typing rule for the if-then-else construct, yet. Consider the program:

$$t = \lambda(x, y). \underline{f}(\underline{\min}(x, y)).$$

We see that t : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> is continuous on the set {(x, <sup>y</sup>) <sup>|</sup> <sup>x</sup> <sup>≥</sup> <sup>0</sup> <sup>∧</sup> <sup>y</sup> <sup>≥</sup> <sup>0</sup>}, and that, moreover, the image of f on this set is contained on [1, +∞). Using the rules in Figure 4, the fact that f is continuous on <sup>R</sup>≥<sup>0</sup>, and that min is continuous on <sup>R</sup><sup>2</sup>, we see that our refined type system allows us to prove <sup>t</sup> to be continuous in the considered domain, i.e.:

$$\vdash\_{\mathbf{r}} t : (\{\alpha \in \mathbb{R}\}, \{\beta \in \mathbb{R}\}) \stackrel{(\alpha \ge 0 \land \beta \ge 0) \leadsto (\gamma \ge 1)}{\rightarrow} \{\gamma \in \mathbb{R}\} .$$

#### **6.3 Typing Conditionals**

We now look at the rule for the if-then-else construct: as can be seen in the two programs in Figure 3, the use of conditionals *may* or *may not* induce discontinuity points. The crux here is the behaviour of the two branches at the *discontinuity points of the guard function*. In the two programs represented in Figure 3, we see that the only discontinuity point of the guard is in x = 0. However, in Figure 3b the two branches return the same value in 0, and the resulting program is thus continuous at x = 0, while in Figure 3a the two branches do not coincide in 0, and the resulting program is discontinuous at x = 0. We can generalize this observation: for the program if t then s else p to be continuous, we need the branches s and p to be continuous respectively on the domain where t is 1, and on the domain where t is 0, and moreover we need s and p to be continuous *and to coincide* on the points where t is not continuous. Similarly to the logical system designed by Chaudhuri et al [18], the coincidence of the branches in the discontinuity points is expressed as a set of logical rules by way of *observational equivalence*. It should be observed that such an equivalence check is less problematic for first-order programs than it is for higher-order one (the authors of [18] are able to actually check observational equivalence through an SMT solver). On the other hand, various notions of equivalence which are included in contextual equivalence and sometimes coincide with it (e.g., applicative bisimilarity, denotational semantics, or logical relations themselves) have been developed for higher-order languages, and this starts to give rise to actual automatic tools for deciding contextual equivalence [38].

We give in Figure 5 the typing rule for conditionals. The conclusion of the rule guarantees the continuity of the program if t then s else p on a domain specified by a formula <sup>θ</sup>. The premises of the rule ask for formulas <sup>θ</sup>q for q ∈ {t, s, p} that specify continuity domains for the programs t, s, p, and ask also for two additional formulas <sup>θ</sup>(t,0) and <sup>θ</sup>(t,1) that specify domains where the value of the guard t is 0 and 1, respectively. The target formula θ, and the formulas (θq)q∈{t,s,p,(t,1),(t,0)} are related by two side-conditions. Side-condition (1) consists of the following four distinct requirements, that must hold for every point a in the truth domain of θ: i) a is in the truth domain of at least one of the two formulas <sup>θ</sup>t, <sup>θ</sup>s; ii) if <sup>a</sup> is not in <sup>θ</sup>(t,1) (i.e., we have no guarantee that <sup>t</sup> will return 1 at point a, meaning that the program p *may* be executed) then a must be in the continuity domain of p; iii) a condition symmetric to the previous one, replacing 1 by 0, and p by s; iv) all points of possible discontinuity (i.e. the points a such that θ<sup>t</sup> does not hold) must be in the continuity domain of both <sup>s</sup> and <sup>p</sup>, and as a consequence both <sup>θ</sup><sup>s</sup> and θ<sup>p</sup> must hold there. The side-condition (2) uses *typed contextual equivalence* <sup>≡</sup>ctx between terms to express that the two programs <sup>s</sup> and <sup>p</sup> must coincide on all inputs such that <sup>θ</sup>t does not hold–i.e. that are not in the continuity domain of t. Observe that typed context equivalence here is defined with respect to the system of *simple types*.

**Notation 1.** *We use the following notations in Figure 5. When* Γ *is a typing environement, we write* GΓ *and* HΓ *for the ground and higher-order parts of* Γ*, respectively. Moreover, suppose we have a ground refined typing environment* <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>}, ... , <sup>x</sup>n : {αn <sup>∈</sup> <sup>R</sup>}*: we say that a logical assignment* <sup>σ</sup> *is* compatible with <sup>Θ</sup> *when* {αi <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>} ⊆ supp(σ)*. When it is the case, we build in a natural way the* substitution associated to σ along Θ *by taking* <sup>σ</sup><sup>Θ</sup>(xi) = <sup>σ</sup>(αi)*.*

$$\begin{array}{c} \begin{array}{l} \theta\_{t\curvearrowleft(\beta=0\forall\beta=1\right)} \\ \Gamma \xleftarrow{\theta\_{t}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{t}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\{\beta=0\}\end{subarray}} \\ \Gamma \xleftarrow{\theta\_{t}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{\mathtt{r}}\end{subarray}} \end{array}} t:\{\beta\in\mathsf{R}\} \\ \Gamma \xleftarrow{\theta\_{s}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{s}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{\mathtt{r}}\end{subarray}} \end{array} t:\{\beta\in\mathsf{R}\} \\ \text{If } \begin{array{array{c}{l} \theta\_{\mathtt{t}}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{\mathtt{r}}\end{subarray}} \end{array} t:\{\beta\in\mathsf{R}\} \\ \text{If } \begin{array}{l} \Gamma \xleftarrow{\theta\_{\mathtt{t}}\mathrel{\mathop{=}{\begin{subarray}{c}}{l}\theta\_{\mathtt{r}}\end{subarray}} \end{array} t:\{\beta\in\mathsf{R}\} \end{array} \qquad \begin{array}{l} \theta\_{\mathtt{p}}\langle\eta\rangle \\ \Gamma \xleftarrow{\theta\_{\mathtt{p}}\langle\eta\rangle} \end{array} t:\begin{array}{l} \theta\_{\mathtt{p}}\langle\eta\rangle \\ \Gamma \xleftarrow{\mathsf{r}}\,\,\, \mathtt{p}:T} \\ \end{array} \tag{1},\tag{1},\tag{1},\tag{2}$$

Again, the formula ψ(η) should be read as ψ when T is a higher-order type, and as ψ η when T is a ground type. The side-conditions (1), (2) are given as: 1. |= θ ⇒ (θ<sup>s</sup> <sup>∨</sup> <sup>θ</sup><sup>p</sup>) <sup>∧</sup> (θ(t,1) <sup>∨</sup> <sup>θ</sup><sup>p</sup>) <sup>∧</sup> (θ(t,0) <sup>∨</sup> <sup>θ</sup><sup>s</sup>) <sup>∧</sup> (θ<sup>t</sup> <sup>∨</sup> (θ<sup>s</sup> <sup>∧</sup> <sup>θ</sup>p)) . 2. For all logical assignment σ compatible with GΓ , σ |= θ ∧ ¬θ<sup>t</sup> implies HΓ sσ GΓ <sup>≡</sup>ctx pσ GΓ .

Fig. 5: Typing Rules for the if-then-else construct

*Example 7.* Using our if-then-else typing rule, we can indeed type the program in Figure 3b as expected:

$$\vdash \lambda x. \mathtt{if}\ x < 0 \ \mathtt{then}\ 1 \ \mathtt{else}\ x + 1 : \{\alpha \in \mathbb{R}\} \stackrel{\top}{\to} \mho{\triangleright} \ \{\beta \in \mathbb{R}\}.$$

#### **6.4 Open-logical Predicates for Refinement Types**

Our goal in this section is to show the correctness of our refinement type systems, that we state below.

**Theorem 3.** *Let* t *be any program such that:*

$$x\_1 : \{\alpha\_1 \in \mathbb{R}\}, \ldots, x\_n : \{\alpha\_n \in \mathbb{R}\} \stackrel{\theta \leadsto \theta'}{\vdash\_{\mathbf{r}}} t : \{\beta \in \mathbb{R}\}.$$

*Then it holds that:*


As a first step, we show that our if-then-else rule is reasonable, i.e. that it behaves well with primitive functions in F. More precisely, if we suppose that the functions <sup>f</sup>, <sup>g</sup><sup>0</sup>, <sup>g</sup><sup>1</sup> are such that the premises of the if-then-else rule hold, then the program if <sup>f</sup>(x<sup>1</sup>, ... , <sup>x</sup>n) then <sup>g</sup><sup>1</sup>(x<sup>1</sup>, ... , <sup>x</sup>n) else <sup>g</sup><sup>0</sup>(x<sup>1</sup>, ... , <sup>x</sup>n) is indeed continuous in the domain specified by the conclusion of the rule. This is precisely what we prove in the following lemma.

**Lemma 4.** *Let* <sup>f</sup>, <sup>g</sup><sup>0</sup>, <sup>g</sup><sup>1</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *be functions in* <sup>F</sup>*, and* <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>}, ... , <sup>x</sup>n : {αn <sup>∈</sup> <sup>R</sup>}*. We denote <sup>α</sup> the list of logical variables* <sup>α</sup><sup>1</sup>, ... , <sup>α</sup>n*. We consider logical formulas* <sup>θ</sup> *and* <sup>θ</sup><sup>f</sup> , <sup>θ</sup>(f,0), <sup>θ</sup>(f,1), <sup>φ</sup>g<sup>0</sup> , <sup>φ</sup>g<sup>1</sup> *that have their logical variables in α, and such that:*

*1.* <sup>f</sup> *is continuous on Dom*(θ)*<sup>α</sup> with* <sup>f</sup>(*Dom*(θf )*α*) ⊆ {0, 1} *and* <sup>f</sup>(*Dom*(θ(f,b))*α*) <sup>⊆</sup> {b} *for* b ∈ {0, 1}*.*

*2.* <sup>g</sup><sup>0</sup> *and* <sup>g</sup><sup>1</sup> *are continuous on Dom*(φg<sup>0</sup> )*α, and Dom*(φg<sup>1</sup> )*<sup>α</sup> respectively, and* (α<sup>1</sup> → <sup>a</sup>1, ... , <sup>α</sup>n → <sup>a</sup>n) <sup>|</sup><sup>=</sup> <sup>θ</sup> ∧ ¬θf *implies* <sup>g</sup>0(a1, ... , <sup>a</sup>n) = <sup>g</sup>1(a1, ... , <sup>a</sup>n)*; 3.* <sup>|</sup><sup>=</sup> θ <sup>⇒</sup> (φg<sup>1</sup> <sup>∨</sup> <sup>φ</sup>g<sup>0</sup> ) <sup>∧</sup> (θ(f,0) <sup>∨</sup> <sup>φ</sup>g<sup>1</sup> ) <sup>∧</sup> (θ(f,1) <sup>∨</sup> <sup>φ</sup>g<sup>0</sup> ) <sup>∧</sup> (θ<sup>f</sup> <sup>∨</sup> (φg<sup>0</sup> <sup>∧</sup> <sup>φ</sup>g<sup>1</sup> )) . *Then it holds that:*

$$\left[\overline{\Theta} \vdash \text{if } \underline{f}(x\_1, \dots, x\_n) \text{ then } \underline{g\_1}(x\_1, \dots, x\_n) \text{ \*\*1\*\*se } \underline{g\_0}(x\_1, \dots, x\_n) : \mathbb{R}\right]$$

*is continuous on Dom*(θ)*<sup>α</sup>.*

*Proof.* The proof can be found in the extended version [7].

Similarly to what we did in Section 4, we are going to show Theorem 3 by way of a logical predicate. Recall that the logical predicate we defined in Section 4 consists actually of *three* kind of predicates—all defined in Definition 1 of Section 4: F<sup>Θ</sup> τ , <sup>F</sup><sup>Θ</sup> Γ , <sup>F</sup>Θ,<sup>Γ</sup> <sup>τ</sup> , where <sup>Θ</sup> ranges over ground typing environments, Γ ranges over arbitrary environments, and τ is a type. The first predicate <sup>F</sup><sup>Θ</sup> τ contains admissible terms t of type Θ t : τ , the second predicate <sup>F</sup><sup>Θ</sup> Γ contains admissible substitutions γ that associate to every (x : τ ) in Γ a term of type τ under the typing context <sup>Θ</sup>, and the third predicate <sup>F</sup>Θ,<sup>Γ</sup> τ contains admissible terms t of type Γ, Θ t : τ .

Here, we need to adapt the three kinds of logical predicates to a refinement scenario: first, we replace τ and Θ, Γ with refinement types and refined typing contexts respectively. Moreover, for technical reasons, we also need to *generalize* our typing contexts, by allowing them to be annotated with any subset of R<sup>n</sup> instead of restricting ourselves to those subsets generated by logical formulas. Due to this further complexity, we split our definition of logical predicates into two: we first define the counterpart of the ground typing context predicate F<sup>Θ</sup> τ in Definition 4, then the counterpart of the predicate for substitutions F<sup>Θ</sup> Γ and the counterpart of the predicates <sup>F</sup>Θ,<sup>Γ</sup> τ for higher-order typing environment in Definition 5.

Let us first see how we can adapt the predicates F<sup>Θ</sup> τ to our refinement types setting. Recall that in Section 4, we defined the predicate F<sup>Θ</sup> <sup>R</sup> as the collection of terms t such that Θ t : <sup>R</sup>, and its semantics -Θ t : <sup>R</sup> belongs to <sup>F</sup>. As we are interested in local continuity properties, we need to build a predicate expressing local continuity constraints. Moreover, in order to be consistent with our two arrow constructs and our two kinds of typing judgments, we actually need to consider also *two* kinds of logical predicates, depending on whether the target type we consider is a real type or an higher-order type. We thus introduce the following logical predicates:

$$\mathcal{C}(\Theta, X \leadsto \phi, F); \qquad \mathcal{C}(\Theta, X, H);$$

where Θ is a ground typing environment, X is a subset of <sup>R</sup>n, <sup>φ</sup> is a logical formula, and, as usual, F ranges over the real refinements types, while H ranges over the higher-order refinement types. As expected, X and φ are needed to encode continuity constraints inside our logical predicates.

**Definition 4.** *Let* Θ *be a* ground *typing context of length* n*,* F *and* H *refined ground type and higher-order type, respectively. We define families of predicates on terms* <sup>C</sup>(Θ, Y <sup>φ</sup>, <sup>F</sup>) *and* <sup>C</sup>(Θ, <sup>Y</sup> , <sup>H</sup>)*, with* <sup>Y</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> *and* φ *a logical formula, as specified in Figure 6.*

• For <sup>F</sup> <sup>=</sup> {<sup>α</sup> <sup>∈</sup> <sup>R</sup>} we take: <sup>C</sup>(Θ, <sup>Y</sup> <sup>ψ</sup>, <sup>F</sup>) := {<sup>t</sup> <sup>|</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup><sup>n</sup> : <sup>R</sup> <sup>t</sup> : <sup>R</sup>, <sup>t</sup>(<sup>Y</sup> ) <sup>⊆</sup> Dom(ψ) <sup>α</sup> <sup>∧</sup> <sup>t</sup> continuous over <sup>Y</sup> }. • if <sup>H</sup> is an arrow type of the form <sup>H</sup> = (H1, ... , <sup>H</sup>m, {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>1}, ... , {α<sup>p</sup> <sup>∈</sup> <sup>R</sup>}) <sup>ψ</sup>(η) <sup>→</sup> T: <sup>C</sup>(Θ, <sup>Y</sup> , <sup>H</sup>) := {<sup>t</sup> <sup>|</sup> <sup>x</sup><sup>1</sup> : <sup>R</sup>, ... , <sup>x</sup><sup>n</sup> : <sup>R</sup> <sup>t</sup> : <sup>H</sup>, ∀Z, ∀*s* = (s1, ... , sm) with s<sup>i</sup> ∈ C(Θ,Z, Hi), <sup>∀</sup>*<sup>p</sup>* = (p1, ...pp), <sup>∀</sup>ψ<sup>j</sup> with <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>ψ</sup><sup>p</sup> <sup>⇒</sup> <sup>ψ</sup>, and <sup>p</sup><sup>j</sup> ∈ C(Θ,<sup>Z</sup> ψ<sup>j</sup> , {α<sup>j</sup> <sup>∈</sup> <sup>R</sup>}), it holds that t(*s*, *p*) ∈ C(Θ, (Y ∩ Z)(η), T)}, where as usual we should read ψ(η) = ψ, (Y ∩ Z)(η) = Y ∩ Z when T is higherorder, and ψ(η) = ψ <sup>η</sup>, (<sup>Y</sup> <sup>∩</sup> <sup>Z</sup>)(η)=(<sup>Y</sup> <sup>∩</sup> <sup>Z</sup>) η when T is an annnotated

Fig. 6: Open Logical Predicates for Refinement Types.

real type.

*Example 8.* We illustrate Definition <sup>4</sup> on some examples. We denote by <sup>B</sup>◦ the open unit ball in <sup>R</sup><sup>2</sup>, i.e. <sup>B</sup>◦ <sup>=</sup> {(a, <sup>b</sup>) <sup>∈</sup> <sup>R</sup><sup>2</sup> <sup>|</sup> <sup>a</sup><sup>2</sup> <sup>+</sup> <sup>b</sup><sup>2</sup> <sup>&</sup>lt; <sup>1</sup>}. We consider the ground typing context <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>}, <sup>x</sup><sup>2</sup> : {α<sup>2</sup> <sup>∈</sup> <sup>R</sup>}.


$$t = \lambda w. \underline{f}(w, x\_1^2 + y\_1^2) \qquad \text{where } f(w, a) = \frac{w}{1 - a} \text{ if } a < 1; 0 \text{ otherwise.} $$

Looking at Figure 6, we see that it is enough to check that for any Y <sup>⊆</sup> <sup>R</sup><sup>2</sup> and any s ∈ C(Θ, Y -(β<sup>1</sup> <sup>≥</sup> 0), {β<sup>1</sup> <sup>∈</sup> <sup>R</sup>}), it holds that:

$$its \in \mathcal{C}(\Theta, B^{\diamond} \cap Y \leadsto (\beta\_2 \ge 0), \{\beta\_2 \in \mathbb{R}\}).$$

Our overall goal—in order to prove Theorem 3—is to show the counterpart of the Fundamental Lemma from Section 4 (i.e. Lemma 1), which states that the logical predicate F<sup>Θ</sup> <sup>R</sup> contains all well-typed terms. This lemma only talks about the logical predicates for *ground typing contexts*, so we can state it as of now, but its proof is based on the fact that we dispose of the *three* predicates. Observe that from there, Theorem 3 follows just from the definition of the logical predicates on base types. Similarly to what we did for Lemma 1 in Section 4, *proving* it requires to define the logical predicates for *substitutions* and *higherorder typing contexts*. We do this in Definition 5 below. As before, they consist in an adaptation to our refinement types framework of the open logical predicates FΓ Θ and <sup>F</sup>Θ,<sup>Γ</sup> <sup>τ</sup> of Section 4: as usual, we need to add continuity annotations, and distinguish whether the target type is a ground type or an higher-order type.

**Notation 2.** *We need to first introduce the following notation: let* Γ*,* Θ *be two ground non-refined typing environments of length* m *and* n *respectively–and with disjoint support. Let* <sup>γ</sup> : supp(Γ) → {<sup>t</sup> <sup>|</sup> Θ t : <sup>R</sup>} *be a substitution. We write* γ *for the real-valued function:*

$$\begin{aligned} \left[\gamma\right] : \mathbb{R}^n &\to \mathbb{R}^{n+m} \\ a &\mapsto (a, \left[\gamma(x\_1)\right](a), \dots, \left[\gamma(x\_m)\right](a)) \end{aligned}$$

**Definition 5.** *Let* Θ *be a ground typing environment of length* n*, and* Γ *an arbitrary typing environment. We note* n *and* m *the lengths of respectively* Θ *and* GΓ *.*

	- ∀(x : H) <sup>∈</sup> HΓ *,* γ(x) ∈ C(Θ,Z, H)*,*
	- <sup>γ</sup><sup>|</sup> GΓ : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>n+<sup>m</sup> *sends continuously* <sup>Z</sup> *into* <sup>W</sup>*;*

$$\begin{aligned} \mathcal{C}((\Gamma; \Theta), W &\leadsto \psi, F) := \{ t \mid \overline{\Gamma, \Theta} \vdash t : \mathbb{R} \\ \wedge \forall X &\subseteq \mathbb{R}^n, \forall \gamma \in \mathcal{C}(\Theta, X \leadsto W, \Gamma), \; t\gamma \in \mathcal{C}(\Theta, X \leadsto \psi, F) \}. \end{aligned}$$

• *Let* <sup>W</sup> <sup>⊆</sup> <sup>R</sup>n+m*, and* H *an higher-order refined type. We define :*

$$\begin{aligned} \mathcal{C}((\varGamma, \Theta), W, H) &:= \{ t \mid \overline{\varGamma, \Theta} \vdash t : \overline{H} \\ \wedge \forall X \subseteq \mathbb{R}^n, \forall \gamma \in \mathcal{C}(\Theta, X \leadsto W, \varGamma). \ t\gamma \in \mathcal{C}(\Theta, X, H) \}. \end{aligned}$$

*Example 9.* We illustrate Definition 5 on an example. We consider the same context <sup>Θ</sup> as in Example 8, i.e. <sup>Θ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : {α<sup>1</sup> <sup>∈</sup> <sup>R</sup>}, <sup>x</sup><sup>2</sup> : {α<sup>2</sup> <sup>∈</sup> <sup>R</sup>}, and we take <sup>Γ</sup> <sup>=</sup> <sup>x</sup><sup>3</sup> : {α<sup>3</sup> <sup>∈</sup> <sup>R</sup>}, <sup>z</sup> : <sup>H</sup>, with <sup>H</sup> <sup>=</sup> {β<sup>1</sup> <sup>∈</sup> <sup>R</sup>} (β1≥0)-(β2≥0) → {β<sup>2</sup> <sup>∈</sup> <sup>R</sup>}. We are interested in the following logical predicate for substitution:

$$\mathcal{C}(\Theta, B^{\diamond} \leadsto \{ (v, |v|) \mid v \in B^{\diamond} \} ), \varGamma)$$

where the norm of the couple (a, b) is taken as: <sup>|</sup>(a, b)<sup>|</sup> <sup>=</sup> <sup>√</sup> a<sup>2</sup> <sup>+</sup> b<sup>2</sup>. We are going to build a substitution γ : {x<sup>3</sup>, <sup>z</sup>} → <sup>Λ</sup><sup>×</sup>,→,<sup>R</sup> <sup>F</sup> that belongs to this set. We take:

• γ(z) = λw.f(w, x<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> <sup>2</sup>) where f(w, a) = <sup>w</sup> <sup>1</sup>−a if a < 1; 0 otherwise.

$$\begin{array}{c} \bullet \quad \gamma(x\_3) = (\sqrt{\cdot})(x\_1^2 + x\_2^2). \\ \text{Wa can check that the root.} \end{array}$$

We can check that the requirements of Definition <sup>5</sup> indeed hold for γ:


**Lemma 5 (Fundamental Lemma).** *Let* Θ *be a ground typing context, and* Γ *an arbitrary typing context–thus* Γ *can contain both ground type variables and non-ground type variables.*


*Proof Sketch.* The proof is by induction on the derivation of the refined typing judgment. Along the lines, we need to show that our logical predicates play well with the underlying denotational semantics, but also with logic. The details can be found in the extended version [7].

From there, we can finally prove the main result of this section, i.e. Theorem 3, that states the correctness of our refinement type system. Indeed, Lemma 5 has Theorem 3 as a corollary: from there it is enough to look at the definition of the logical predicate for first-order programs to finally show the correctness of our type system.

# **7 Related Work**

Logical relations are certainly one of the most well-studied concepts in higherorder programming language theory. In their unary version, they have been introduced by Tait [54], and further exploited by Girard [33] and Tait [55] himself in giving strong normalization proofs for second-order type systems. The relational counterpart of realizability, namely logical relations proper, have been introduced by Plotkin [48], and further developed along many different axes, and in particular towards calculi with fixpoint constructs or recursive types [3,4,2], probabilistic choice [14], or monadic and algebraic effects [34,11,34]. Without any hope to be comprehensive, we may refer to Mitchell's textbook on programming language theory for a comprehensive account about the earlier, classic definitions [43], or to aforementioned papers for more recent developments.

Extensions of logical relations to open terms have been introduced by several authors [39,47,30,53,15] and were explicitly referred to as *open logical relations* in [59]. However, to the best of the authors' knowledge, all the aforementioned works use open logical relations for specific purposes, and do not investigate their applicability as a general methodology.

Special cases of our Containment Theorem can be found in many papers, typically as auxiliary results. As already mentioned, an example is the one of higher-order polynomials, whose first-order terms are proved to compute proper polynomials in many ways [40,5], none of them in the style of logical relations. The Containment Theorem itself can be derived by a previous result by Lafont [41] (see also Theorem 4.10.7 in [24]). Contrary to such a result, however, our proof of the Containment Theorem is entirely syntactical and consists of a straightforward application of open logical relations.

Algorithms for automatic differentiation have recently been extended to higherorder programming languages [50,46,51,42,45], and have been investigated from a semantical perspective in [16,1] relying on insights from linear logic and denotational semantics. In particular, the work of Huot et al. [37] provides a denotational proof of correctness of the program transformation of [50] that we have studied in Section 5.

Continuity and robustness analysis of imperative first-order programs by way of program logics is the topic of study of a series of papers by Chaudhuri and co-authors [19,18,20]. None of them, however, deal with higher-order programs.

# **8 Conclusion and Future Work**

We have showed how a mild variation on the concept of a logical relation can be fruitfully used for proving both predicative and relational properties of higherorder programming languages, when such properties have a first-order, rather than a ground "flavor". As such, the added value of this contribution is not much in the technique itself, but in showing how it is extremely useful in heterogeneous contexts, this way witnessing the versatility of logical relations.

The three case studies, and in particular the correctness of automatic differentiation and refinement type-based continuity analysis, are given as proofof-concepts, but this does not mean they do not deserve to be studied more in depth. An example of an interesting direction for future work is the extension of our correctness proof from Section 5 to backward propagation differentiation algorithms. Another one consists in adapting the refinement type system of Section 6.1 to deal with differentiability. That would of course require a substantial change in the typing rule for conditionals, which should take care of checking not only continuity, but also differentiability at the critical points. It would also be interesting to implement the refinement type system using standard SMT-based approaches. Finally, the authors plan to investigate extensions of open logical relations to non-normalizing calculi, as well as to non-simply typed calculi (such as calculi with polymorphic or recursive types).

# **References**

1. Abadi, M., Plotkin, G.D.: A simple differentiable programming language. PACMPL **4**(POPL), 38:1–38:28 (2020)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### Constructive Game Logic *-*

Brandon Bohrer<sup>1</sup> and André Platzer1,<sup>2</sup>

<sup>1</sup> Computer Science Department, Carnegie Mellon University, Pittsburgh, USA {bbohrer,aplatzer}@cs.cmu.edu

<sup>2</sup> Fakultät für Informatik, Technische Universität München, München, Germany

Abstract. Game Logic is an excellent setting to study proofs-aboutprograms via the interpretation of those proofs as programs, because constructive proofs for games correspond to effective winning strategies to follow in response to the opponent's actions. We thus develop *Constructive Game Logic*, which extends Parikh's Game Logic (GL) with constructivity and with first-order programs *à la* Pratt's first-order dynamic logic (DL). Our major contributions include: 1. a novel realizability semantics capturing the adversarial dynamics of games, 2. a natural deduction calculus and operational semantics describing the computational meaning of strategies via proof-terms, and 3. theoretical results including soundness of the proof calculus w.r.t. realizability semantics, progress and preservation of the operational semantics of proofs, and Existential Properties enabling the extraction of computational artifacts from game proofs. Together, these results provide the most general account of a Curry-Howard interpretation for any program logic to date, and the first at all for Game Logic.

Keywords: Game Logic, Constructive Logic, Natural Deduction, Proof Terms

# 1 Introduction

Two of the most essential tools in theory of programming languages are *program logics*, such as Hoare calculi [29] and dynamic logics [45], and the *Curry-Howard correspondence* [17,31], wherein propositions correspond to types, proofs to functional programs, and proof term normalization to program evaluation. Their intersection, the Curry-Howard interpretation of program logics, has received surprisingly little study. We undertake such a study in the setting of Game Logic (GL) [38], because this leads to novel insights, because the Curry-Howard correspondence can be explained particularly intuitively for games, and because our first-order GL is a superset of common logics such as first-order Dynamic Logic (DL).

Constructivity and program verification have met before: Higher-order constructive logics [16] obey the Curry-Howard correspondence and are used to

<sup>-</sup> This research was sponsored by the AFOSR under grant number FA9550-16-1-0288. The authors were also funded by the NDSEG Fellowship and Alexander von Humboldt Foundation, respectively.

develop verified functional programs. Program logics are also often embedded in constructive proof assistants such as Coq [48], inheriting constructivity from their metalogic. Both are excellent ways to develop verified software, but we study something else.

We study the computational content of a program logic *itself*. Every fundamental concept of computation is expected to manifest in all three of logic, type systems, and category theory [27]. Because dynamics logics (DL's) such as GL have shown that program execution is a first-class construct in modal logic, the theorist has an imperative to explore the underlying notion of computation by developing a constructive GL with a Curry-Howard interpretation.

The computational content of a proof is especially clear in GL, which generalizes DL to programmatic models of zero-sum, perfect-information games between two players, traditionally named Angel and Demon. Both normal-play and misère-play games can be modeled in GL. In classical GL, the diamond modality αφ and box modality [α]φ say that Angel and Demon respectively have a strategy to ensure φ is true at the end of α, which is a model of a game. The difference between classical GL and CGL is that classical GL allows proofs that exclude the middle, which correspond to strategies which branch on undecidable conditions. CGL proofs can branch only on decidable properties, thus they correspond to strategies which are *effective* and can be executed by computer. Effective strategies are crucial because they enable the synthesis of code that implements a strategy. Strategy synthesis is itself crucial because even simple games can have complicated strategies, and synthesis provides assurance that the implementation correctly solves the game. A GL strategy resolves the choices inherent in a game: a diamond strategy specifies every move made by the Angel player, while a box strategy specifies the moves the Demon player will make.

In developing *Constructive Game Logic* (CGL), adding constructivity is a deep change. We provide a natural deduction calculus for CGL equipped with proof terms and an operational semantics on the proofs, demonstrating the meaning of strategies as functional programs and of winning strategies as functional programs that are guaranteed to achieve their objective no matter what counterstrategy the opponent follows. While the proof calculus of a constructive logic is often taken as ground truth, we go a step further and develop a realizability semantics for CGL as programs performing winning strategies for game proofs, then prove the calculus sound against it. We adopt realizability semantics in contrast to the winning-region semantics of classical GL because it enables us to prove that CGL satisfies novel properties (Section 8). The proof of our Strategy Property (Theorem 2) constitutes an (on-paper) algorithm that computes a player's (effective) strategy from a proof that they can win a game. This is the key test of constructivity for CGL, which would not be possible in classical GL. We show that CGL proofs have *two* computational interpretations: the operational semantics interpret an arbitrary proof (strategy) as a functional program which reduces to a normal-form proof (strategy), while realizability semantics interpret Angel strategies as programs which defeat arbitrary Demonic opponents.

While CGL has ample theoretical motivation, the practical motivations from synthesis are also strong. A notable line of work on dGL extends first-order GL to hybrid games to verify safety-critical adversarial cyber-physical systems [42]. We have designed CGL to extend smoothly to hybrid games, where synthesis provides the correctness demanded by safety-critical systems and the synthesis of correct monitors of the external world [36].

# 2 Related Work

This work is at the intersection of game logic and constructive modal logics. Individually, they have a rich literature, but little work has been done at their intersection. Of these, we are the first for GL and the first with a proofs-asprograms interpretation for a full first-order program logic.

*Games in Logic.* Parikh's propositional GL [38] was followed by coalitional GL [39]. A first-order version of GL is the basis of differential game logic dGL [42] for hybrid games. GL's are unique in their clear delegation of strategy to the *proof* language rather than the *model* language, crucially allowing succinct game specifications with sophisticated winning strategies. Succinct specifications are important: specifications are *trusted* because proving the *wrong theorem* would not ensure correctness. Relatives without this separation include Strategy Logic [15], Alternating-Time Temporal Logic (ATL) [5], CATL [30], Ghosh's SDGL [24], Ramanujam's structured strategies [46], Dynamic-epistemic logics [6,10,49], evidence logics [9], and Angelic Hoare logic [35].

*Constructive Modal Logics.* A major contribution of CGL is our constructive semantics for games, not to be confused with game semantics [1], which are used to give programs semantics *in terms of* games. We draw on work in semantics for constructive modal logics, of which two main approaches are intuitionistic Kripke semantics and realizability semantics.

An overview of Intuitionistic Kripke semantics is given by Wijesekera [52]. Intuitionistic Kripke semantics are parameterized over worlds, but in contrast to classical Kripke semantics, possible worlds represent what is currently *known* of the state. Worlds are preordered by w<sup>1</sup> ≥ w<sup>2</sup> when w<sup>1</sup> contains at least the knowledge in w2. Kripke semantics were used in Constructive Concurrent DL [53], where both the world and knowledge of it change during execution. A key advantage of realizability semantics [37,33] is their explicit interpretation of constructivity as computability by giving a *realizer*, a program which witnesses a fact. Our semantics combine elements of both: Strategies are represented by realizers, while the game state is a Kripke world. Constructive set theory [2] aids in understanding which set operations are permissible in constructive semantics.

Modal semantics have also exploited mathematical structures such as: i) Neighborhood models [8], topological models for spatial logics [7], and temporal logics of dynamical systems [20]. ii) Categorical [3], sheaf [28], and pre-sheaf [23] models. iii) Coalgebraic semantics for classical Propositional Dynamic Logic

(PDL) [19]. While games are known to exhibit algebraic structure [25], such laws are not essential to this work. Our semantics are also notable for the seamless interaction between a constructive Angel and a classical Demon.

CGL is first-order, so we must address the constructivity of operations that inspect game state. We consider rational numbers so that equality is decidable, but our work should generalize to constructive reals [11,13].

Intuitionistic modalities also appear in dynamic-epistemic logic (DEL) [21], but that work is interested primarily in proof-theoretic semantics while we employ realizability semantics to stay firmly rooted in computation. Intuitionistic Kripke semantics have also been applied to multimodal System K with iteration [14], a weak fragment of PDL.

*Constructivity and Dynamic Logic.* With CGL, we bring to fruition several past efforts to develop constructive dynamic logics. Prior work on PDL [18] sought an Existential Property for Propositional Dynamic Logic (PDL), but they questioned the practicality of their own implication introduction rule, whose side condition is non-syntactic. One of our results is a first-order Existential Property, which Degen cited as an open problem beyond the methods of their day [18]. To our knowledge, only one approach [32] considers Curry-Howard or functional proof terms for a program logic. While their work is a notable precursor to ours, their logic is a weak fragment of PDL without tests, monotonicity, or unbounded iteration, while we support not only PDL but the much more powerful first-order GL. Lastly, we are preceded by Constructive Concurrent Dynamic Logic, [53] which gives a Kripke semantics for Concurrent Dynamic Logic [41], a proper fragment of GL. Their work focuses on an epistemic interpretation of constructivity, algebraic laws, and tableaux. We differ in our use of realizability semantics and natural deduction, which were essential to developing a Curry-Howard interpretation for CGL. In summary, we are justified in claiming to have the first Curry-Howard interpretation with proof terms and Existential Properties for an *expressive* program logic, the first constructive game logic, and the only with first-order proof terms.

While constructive natural deduction calculi map most directly to functional programs, proof terms can be generated for any proof calculus, including a wellknown interpretation of classical logic as continuation-passing style [26]. Proof terms have been developed [22] for a Hilbert calculus for dL, a dynamic logic (DL) for hybrid systems. Their work focuses on a provably correct interchange format for classical dL proofs, not constructive logics.

# 3 Syntax

We define the language of CGL, consisting of terms, games, and formulas. The simplest terms are *program variables* x, y ∈ V where V is the set of variable identifiers. Globally-scoped mutable program variables contain the state of the game, also called the *position* in game-theoretic terminology. All variables and terms are rational-valued (Q); we also write <sup>B</sup> for the set of Boolean values {0, <sup>1</sup>} for false and true respectively.

Definition 1 (Terms). *A* term f,g *is a rational-valued computable function over the game state. We give a nonexhaustive grammar of terms, specifically those used in our examples:*

$$f, g \implies \neg \neg \mid q \mid x \mid f + g \mid f \cdot g \mid f/g \mid f \bmod g$$

*where* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *is a rational literal,* <sup>x</sup> *a program variable,* <sup>f</sup>+<sup>g</sup> *a sum,* <sup>f</sup> ·<sup>g</sup> *a product. Division-with-remainder is intended for use with integers, but we generalize the standard notion to support rational arguments. Quotient* f /g *is integer even when* f *and* g *are non-integer, and thus leaves a rational remainder* f mod g*. Divisors* g *are assumed to be nonzero.*

A game in CGL is played between a constructive player named Angel and a classical player named Demon. Our usage of the names Angel and Demon differs subtly from traditional GL usage for technical reasons. Our Angel and Demon are asymmetric: Angel is "our" player, who must play constructively, while the "opponent" Demon is allowed to play classically because our opponent need not be a computer. At any time some player is *active*, meaning their strategy resolves all decisions, and the opposite player is called *dormant*. Classical GL identifies Angel with active and Demon with dormant; the notions are distinct in CGL.

Definition 2 (Games). *The set of* games α, β *is defined recursively as such:*

$$\alpha, \beta \implies \text{?} \\ \phi \mid x := f \mid x := \* \mid \alpha \cup \beta \mid \alpha; \beta \mid \alpha^\* \mid \alpha^d$$

In the *test game* ?φ, the active player wins if they can exhibit a constructive proof that formula φ currently holds. If they do not exhibit a proof, the dormant player wins by default and we informally say the active player "broke the rules". In deterministic assignment games x := f, neither player makes a choice, but the program variable x takes on the value of a term f. In nondeterministic assignment games <sup>x</sup> := <sup>∗</sup>, the active player picks a value for <sup>x</sup> : <sup>Q</sup>. In the choice game α ∪ β, the active player chooses whether to play game α or game β. In the sequential composition game α; β, game α is played first, then β from the resulting state. In the repetition game α∗, the active player chooses after each repetition of α whether to continue playing, but loses if they repeat α infinitely. Notably, the exact number of repetitions can depend on the dormant player's moves, so the active player need not know, let alone announce, the exact number of iterations in advance. In the dual game α<sup>d</sup>, the active player becomes dormant and vice-versa, then α is played. We parenthesize games with braces {α} when necessary. Sequential and nondeterministic composition both associate to the right, i.e., α∪β ∪γ ≡ {α∪ {β ∪γ}}. This does not affect their semantics as both operators are associative, but aids in reading proof terms.

Definition 3 (CGL Formulas). *The set of* CGL formulas φ *(also* ψ, ρ*) is given recursively by the grammar:*

$$\phi \implies \langle \alpha \rangle \phi \mid [\alpha] \phi \mid f \sim g$$

The defining constructs in CGL (and GL) are the modalities αφ and [α]φ. These mean that the active or dormant Angel (i.e., constructive) player has a constructive strategy to play α and achieve postcondition φ. This paper does not develop the modalities for active and dormant Demon (i.e., classical) players because by definition those cannot be synthesized to executable code. We assume the presence of interpreted comparison predicates ∼ ∈ {≤, <, =, =, >, ≥}.

The standard connectives of first-order constructive logic can be derived from games and comparisons. Verum (tt) is defined 1 > 0 and falsum (ff) is 0 > 1. Conjunction φ ∧ ψ is defined -?φψ, disjunction φ ∨ ψ is defined -?φ∪?ψtt, implication φ → ψ is defined [?φ]ψ, universal quantification ∀x φ is defined [x := ∗]φ, and existential quantification ∃x φ is defined x := ∗φ. As usual in logic, equivalence φ ↔ ψ can also be defined (φ → ψ) ∧ (ψ → φ). As usual in constructive logics, negation ¬φ is defined φ → ff, and inequality is defined by f = g ≡ ¬(f = g). We will use the derived constructs freely but present semantics and proof rules only for the core constructs to minimize duplication. Indeed, it will aid in understanding of the proof term language to keep the definitions above in mind, because the proof terms for many first-order programs follow those from first-order constructive logic.

For convenience, we also write derived operators where the dormant player is given control of a single choice before returning control to the active player. The *dormant choice* <sup>α</sup> <sup>∩</sup> β, defined {α<sup>d</sup> <sup>∪</sup> <sup>β</sup><sup>d</sup>}<sup>d</sup>, says the dormant player chooses which branch to take, but the active player is in control of the subgames. We write φ<sup>y</sup> <sup>x</sup> (likewise for α and f) for the *renaming* of x for y and vice versa in formula φ, and write φ<sup>f</sup> <sup>x</sup> for the *substitution* of term f for program variable x in φ, if the substitution is admissible (Def. 9 in Section 6).

#### 3.1 Example Games

We demonstrate the meaning and usage of the CGL constructs via examples, culminating in the two classic games of Nim and cake-cutting.

*Nondeterministic Programs.* Every (possibly nondeterministic) program is also a one-player game. For example, the program n := 0; {n := n + 1} <sup>∗</sup> can nondeterministically sets n to any natural number because Angel has a choice whether to continue after every repetition of the loop, but is not allowed to continue forever. Conversely, games are like programs where the environment (Demon) is adversarial, and the program (Angel) strategically resolves nondeterminism to overcome the environment.

*Demonic Counter.* Angel's choices often must be *reactive* to Demon's choices. Consider the game c := 10; {c := c − 1 ∩ c := c − 2} ∗ ; ?0 ≤ c ≤ 2 where Demon repeatedly decreases c by 1 or 2, and Angel chooses when to stop. Angel only wins because she can pass the test 0 ≤ c ≤ 2, which she can do by simply repeating the loop until 0 ≤ c ≤ 2 holds. If Angel had to decide the loop duration in advance, Demon could force a rules violation by "guessing" the duration and changing his choices of c := c − 1 vs. c := c − 2.

*Coin Toss.* Games are perfect-information and do not possess randomness in the probabilistic sense, only (possibilistic) nondeterminism. This standard limitation is shown by attempting to express a coin-guessing game:

$$\{\mathsf{coin} := 0 \cap \mathsf{coin} := 1\}; \{\mathsf{guess} := 0 \cup \mathsf{guess} := 1\}; \text{?} \mathsf{guess} = \mathsf{coin}$$

The Demon player sets the value of a tossed coin, but does so adversarially, not randomly, since strategies in CGL are *pure* strategies. The Angel player has perfect knowledge of coin and can set guess equivalently, thus easily passing the test guess = coin, unlike a real coin toss. Partial information games are interesting future work that could be implemented by limiting the variables visible in a strategy.

*Nim.* Nim is the standard introductory example of a discrete, 2-player, zerosum, perfect-information game. We consider misère play (last player loses) for a version of Nim that is also known as the *subtraction game*. The constant Nim defines the game Nim.

$$\text{NIM} = \left\{ \left\{ \{c := c - 1 \cup c := c - 2 \cup c := c - 3\}; \text{?} c > 0 \right\};$$

$$\left\{ \{c := c - 1 \cup c := c - 2 \cup c := c - 3\}; \text{?} c > 0 \right\}^d \right\}^\*$$

The game state consists of a single counter c containing a natural number, which each player chooses (∪) to reduce by 1, 2, or 3 (c := c − k). The counter is nonnegative, and the game repeats as long as Angel wishes, until some player empties the counter, at which point that player is declared the loser (?c > 0).

Proposition 1 (Dormant winning region). *Suppose* c ≡ 1 (mod 4), *Then the dormant player has a strategy to ensure* c ≡ 1 (mod 4) *as an invariant. That is, the following* CGL *formula is valid (true in every state):*

$$c > 0 \to c \mod 4 = 1 \to [\text{Nim}^\*] c \mod 4 = 1$$

This implies the dormant player wins the game because the active player violates the rules once c = 1 and no move is valid. We now state the winning region for an active player.

Proposition 2 (Active winning region). *Suppose* c ∈ {0, 2, 3} (mod 4) *initially, and the active player controls the loop duration. Then the active player can achieve* c ∈ {2, 3, 4}*:*

$$c > 0 \to c \text{ mod } 4 \in \{0, 2, 3\} \to \langle \text{NIM}^\* \rangle \, c \in \{2, 3, 4\}.$$

At that point, the active player will win in one move by setting c = 1 which forces the dormant player to set c = 0 and fail the test ?c > 0.

*Cake-cutting.* Another classic 2-player game, from the study of equitable division, is the cake-cutting problem [40]: The active player cuts the cake in two, then the (initially-)dormant player gets first choice of a piece. This is an optimal protocol for splitting the cake in the sense that the active player is incentivized to split the cake evenly, else the dormant player could take the larger piece. Cake-cutting is also a simple use case for fractional numbers. The constant CC defines the cake-cutting game. Here x is the relative size (from 0 to 1) of the first piece, y is the size of the second piece, a is the size of the active player's piece, and d is the size of dormant player's piece.

$$\begin{aligned} CC = x &:= \*; ? (0 \le x \le 1); \\ \{a := x; d := y \cap a := y; d := x\} \end{aligned}$$

The game is played only once. The active player picks the division of the cake, which must be a fraction 0 ≤ x ≤ 1. The dormant player then picks which slice goes to whom.

The active player has a tight strategy to achieve a 0.5 cake share, as stated in Proposition 3.

#### Proposition 3 (Active winning region). *The following formula is valid:*

$$\left< \text{CC} \right> a \ge 0.5$$

The dormant player also has a computable strategy to achieve exactly 0.5 share of the cake (Proposition 4). Division is fair because each player has a strategy to get their fair 0.5 share.

#### Proposition 4 (Dormant winning region). *The following formula is valid:*

$$\left[\text{CC}\right]d \ge 0.5$$

*Computability and Numeric Types.* Perfect fair division is only achieved for a, d ∈ <sup>Q</sup> because rational equality is decidable. Trichotomy (a < <sup>0</sup>.5∨<sup>a</sup> = 0.5∨a > <sup>0</sup>.5) is a tautology, so the dormant player's strategy can inspect the active player's choice of a. Notably, we intend to support constructive reals in future work, for which exact equality is not decidable and trichotomy is not an axiom. Future work on real-valued CGL will need to employ approximate comparison techniques as is typical for constructive reals [11,13,51]. The examples in this section have been proven [12] using the calculus defined in Section 5.

# 4 Semantics

We now develop the semantics of CGL. In contrast to classical GL, whose semantics are well-understood [38], the major semantic challenge for CGL is capturing the competition between a *constructive* Angel and *classical* Demon. We base our approach on realizability semantics [37,33], because this approach makes the relationship between constructive proofs and programs particularly clear, and generating programs from CGL proofs is one of our motivations.

Unlike previous applications of realizability, games feature two agents, and one could imagine a semantics with two realizers, one for each of Angel and Demon. However, we choose to use only one realizer, for Angel, which captures the fact that only Angel is restricted to a computable strategy, not Demon. Moreover, a single realizer makes it clear that Angel cannot inspect Demon's strategy, only the game state, and also simplifies notations and proofs. Because Angel is computable but Demon is classical, our semantics has the flavor both of realizability semantics and of a traditional Kripke semantics for programs.

The semantic functions employ *game states* ω ∈ S where we write S for the set of all states. We additionally write , ⊥ ∈ S (not to be confused with formulas tt and ff) for the pseudo-states and ⊥ indicating that Angel or Demon respectively has won the game early by forcing the other to fail a test. Each <sup>ω</sup> <sup>∈</sup> <sup>S</sup> maps each <sup>x</sup> ∈ V to a value <sup>ω</sup>(x) <sup>∈</sup> <sup>Q</sup>. We write <sup>ω</sup><sup>v</sup> <sup>x</sup> for the state that agrees with <sup>ω</sup> except that <sup>x</sup> is assigned value <sup>v</sup> where <sup>v</sup> <sup>∈</sup> <sup>Q</sup>.

Definition 4 (Arithmetic term semantics). *A term* f *is a computable function of the state, so the interpretation* [[f]]ω *of term* f *in state* ω *is* f(ω)*.*

### 4.1 Realizers

To define the semantics of games, we first define realizers, the programs which implement strategies. The language of realizers is a higher-order lambda calculus where variables can range over game states, numbers, or realizers which realize a give proposition φ. Gameplay proceeds in continuation-passing style: invoking a realizer returns another realizer which performs any further moves. We describe the typing constraints for realizers informally, and say a is a αφ-realizer (a ∈ αφ R**z**) if it provides strategic decisions exactly when αφ demands them.

Definition 5 (Realizers). *The syntax of realizers* a, b, c ∈ R**z** *(where* R**z** *is the set of all realizers) is defined coinductively:*

$$\begin{aligned} \left(a,b,c ::= x \mid \text{O} \mid (a,b) \mid \pi\_L(a) \mid \pi\_R(a) \mid (\lambda \omega : \mathfrak{S} . \ a(\omega)) \mid (Ax : \mathbb{Q} . a) \right) \\ \mid (Ax : \phi \, \mathfrak{R} \mathbf{z} . \ a) \mid a \, v \mid a \, b \mid a \, \omega \mid \text{if } (f(\omega)) \, a \, \mathfrak{s} \mathbf{1} \mathfrak{s} \, b \end{aligned}$$

where x is a program (or realizer) variable and f is a term over the state ω. The Roman a, b, c should not be confused with the Greek α, β, γ which range over games. Realizers have access to the game state ω, expressed by lambda realizers (λω : S. a(ω)) which, when applied in a state ν, compute the realizer a with ν substituted for ω. State lambdas λ are distinguished from propositional and first-order lambdas Λ. The unit realizer () makes no choices and is understood as a unit tuple. Units () realize f ∼ g because *rational* comparisons, in contrast to real comparisons, are decidable. Conditional strategic decisions are realized by if (f(ω)) <sup>a</sup> else <sup>b</sup> for computable function <sup>f</sup> : <sup>S</sup> <sup>→</sup> <sup>B</sup>, and execute <sup>a</sup> if <sup>f</sup> returns truth, else b. Realizer (λω : S. f(ω)) is a α ∪ βφ-realizer if f(ω) ∈ ({0}×αφ R**z**) ∪ ({1}×βφ R**z**) for all ω. The first component determines which branch is taken, while the second component is a continuation which must be able to play the corresponding branch. Realizer (λω : S. f(ω)) can also be a <sup>x</sup> := ∗φ-realizer, which requires <sup>f</sup>(ω) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> (<sup>φ</sup> <sup>R</sup>**z**) for all <sup>ω</sup>. The first component determines the value of x while the second component demonstrates the postcondition φ. The pair realizer (a, b) realizes both Angelic tests -?φ and dormant choices [α ∪ β]φ. It is identified with a pair of realizers: (a, b) ∈ R**z**×R**z**. ψ

A dormant realizer waits and remembers the active Demon's moves, because they typically inform Angel's strategy once Angel resumes action. The first-order realizer (Λx : <sup>Q</sup>. b) is a [<sup>x</sup> := <sup>∗</sup>]φ-realizer when <sup>b</sup><sup>v</sup> <sup>x</sup> is a <sup>φ</sup>-realizer for every <sup>v</sup> <sup>∈</sup> <sup>Q</sup>; Demon tells Angel the desired value of x, which informs Angel's continuation b. The higher-order realizer (Λx : <sup>φ</sup> <sup>R</sup>**z**. b) realizes [?φ]<sup>ψ</sup> when <sup>b</sup><sup>c</sup> <sup>x</sup> realizes ψ for every φ-realizer c. Demon announces the realizer for φ which Angel's continuation b may inspect. Tuples are inspected with projections πL(a) and πR(a). A lambda is inspected by applying arguments a ω for state-lambdas, a v for first-order, and a b for higher-order. Realizers for sequential compositions α; βφ (likewise [α; β]φ) are αβφ-realizers: first α is played, and in every case the continuation must play β before showing φ. Realizers for repetitions α<sup>∗</sup> are streams containing α-realizers, possibly infinite by virtue of coinductive syntax. Active loop realizer ind(x. a) is the least fixed point of the equation b = [b/x]a, i.e., x is a recursive call which must be invoked only in accordance with some well-order. We realize dormant loops with gen(a, x.b, x.c), coinductively generated from initial value a, update b, and post-step c with variable x for current generator value.

Active loops must terminate, so α<sup>∗</sup>φ-realizers are constructed inductively using any well-order on states. Dormant loops must be played as long as the opponent wishes, so [α∗]φ-realizers are constructed coinductively, with the invariant that φ has a realizer at every iteration.

#### 4.2 Formula and Game Semantics

A state ω paired with a realizer a that continues the game is called a *possibility*. A *region* (written X, Y, Z) is a set of possibilities. We write [[φ]] ⊆ φ R**z** × S for the region which realizes formula φ. A formula φ is *valid* iff some a uniformly realizes every state, i.e., {a} × S ⊆ [[φ]]. A sequent Γ φ is *valid* iff the formula Γ → φ is valid, where Γ is the conjunction of all assumptions in Γ.

The game semantics are region-oriented, i.e., they process possibilities in bulk, though Angel commits to a strategy from the start. The region X-α : ℘(R**z** × S) is the union of all end regions of game α which arise when active Angel commits to an element of X, then Demon plays adversarially. In X[[α]] : ℘(R**z**×S) Angel is the *dormant* player, but it is still Angel who commits to an element of X and Demon who plays adversarially. Recall that pseudo-states and ⊥ represent early wins by each Angel and Demon, respectively. The definitions below implicitly assume ⊥, ∈/ X, they extend to the case ⊥ ∈ X (likewise ∈ X) using the equations (X ∪ {⊥})[[α]] = X[[α]]∪ {⊥} and (X ∪ {⊥})-α = X-α ∪ {⊥}. That is, if Demon has already won by forcing an Angel violation initially, any remaining game can be skipped with an immediate Demon victory, and vice-versa. The game semantics exploit the *Angelic* projections Z0, Z1 and *Demonic* projections Z[0], Z[1], which represent binary decisions made by a constructive Angel and a classical Demon, respectively. The Angelic projections, which are defined Z0 = {(πR(a), ω) | πL(a)(ω)=0,(a, ω) ∈ Z} and Z1 = {(πR(a), ω) | πL(a)(ω)=1,(a, ω) ∈ Z}, filter by which branch Angel chooses with <sup>π</sup>L(a)(ω) <sup>∈</sup> <sup>B</sup>, then project the remaining strategy <sup>π</sup>R(a). The Demonic projections, which are defined Z[0] ≡ {(πL(a), ω) | (a, ω) ∈ Z} and Z[1] ≡ {(πR(a), ω) | (a, ω) ∈ Z}, contain the same states as Z, but project the realizer to tell Angel which branch Demon took.

Definition 6 (Formula semantics). [[φ]] ⊆ R**z** × S *is defined as:*

$$\begin{aligned} (\bullet, \omega) &\in \left[f \sim g\right] \; i\mathcal{f} \; [f] \omega \sim [g] \omega \\ (a, \omega) &\in \left[\langle \alpha \rangle \phi\right] \; i\mathcal{f} \; \{ (a, \omega) \} \langle \alpha \rangle \subseteq \left( [\phi] \cup \{ \top \} \right) \\ (a, \omega) &\in \left[ [\alpha] \phi\right] \; i\mathcal{f} \; \{ (a, \omega) \} [\alpha] \subseteq \left( [\phi] \cup \{ \top \} \right) \end{aligned}$$

Comparisons f ∼ g defer to the term semantics, so the interesting cases are the game modalities. Both [α]φ and αφ ask whether Angel wins α by following the given strategy, and differ only in whether Demon vs. Angel is the active player, thus in both cases *every* Demonic choice must satisfy Angel's goal, and early Demon wins are counted as Angel losses.

Definition 7 (Angel game forward semantics). *We inductively define the region* X-α : ℘(R**z** × S) *in which* α *can end when active Angel plays* X*:*

$$\begin{aligned} X\left[\left??\phi\right] &= \left\{ \left(\pi\_R(a), \omega\right) \mid \left(\pi\_L(a), \omega\right) \in \left[\phi\right] \text{ for some } (a, \omega) \in X \right\} \\ &\cup \left\{ \perp \mid \left(\pi\_L(a), \omega\right) \notin \left[\phi\right] \text{ for all } (a, \omega) \in X \right\} \end{aligned}$$

$$\begin{aligned} X\left[\left(x := f\right)\right] &= \left\{ \left(a, \omega\_x^{\text{I}\left[f\right]\omega}\right) \mid (a, \omega) \in X\right\} \\ X\left\langle\left(x := \*\right)\right\rangle &= \left\{ \left(\pi\_R(a), \omega\_x^{\text{rL}\left(a\right)\left(\omega\right)}\right) \mid (a, \omega) \in X\right\} \\ X\left\langle\left(\alpha ; \beta\right)\right\rangle &= \left(X\left\langle\left|\alpha\right\rangle\right) \left|\left\langle\beta\right\rangle\right| \\ X\left\langle\left(\alpha \cup \beta\right)\right| &= X\_{\langle\left|\alpha\right\rangle}\left|\left\langle\alpha\right\rangle\right| \cup X\_{\langle1\rangle}\left|\left\langle\beta\right\rangle\right| \\ X\left\langle\left|\alpha^\*\right\rangle\right| &= \bigcap \{Z\_{\langle\rangle} \subseteq \Re z \times \mathfrak{S} \mid X \cup \left(Z\_{\langle1\rangle}\left|\left\langle\alpha\right\rangle\right|\right) \subseteq Z\} \\ X\left\langle\left|\alpha^d\right\rangle\right| &= X\left[\left|\alpha\right\rangle\right] \end{aligned}$$

Definition 8 (Demon game forward semantics). *We inductively define the region* X[[α]] : ℘(R**z** × S) *in which* α *can end when dormant Angel plays* X*:*

$$\begin{aligned} X[[?\phi]] &= \{ (a\boldsymbol{b},\omega) \mid (a,\omega) \in X, (b,\omega) \in [\phi], \text{ some } b \in \mathcal{R}\mathbf{z} \} \\ &\cup \{ \top \mid (a,\omega) \in X, \text{ but } no \; (b,\omega) \in [\phi] \} \end{aligned}$$

$$\begin{aligned} X[[x:=f]] &= \{ (a,\omega\_x^{\text{f}}[\!\int \omega) \mid (a,\omega) \in X \} \\ X[[x:=\*]] &= \{ (a\boldsymbol{r},\omega\_x^{\text{r}}) \mid \boldsymbol{r} \in \mathbb{Q} \} \\ X[\boldsymbol{\alpha};\beta] &= (X[\![\boldsymbol{\alpha}]])[\![\beta]] \\ X[[\boldsymbol{\alpha}\cup\beta]] &= X\_{[0]}[[\boldsymbol{\alpha}]] \cup X\_{[1]}[[\beta]] \\ X[[\boldsymbol{\alpha}^\*]] &= \bigcap \{ Z\_{[0]} \subseteq \Re \mathbf{z} \times \mathfrak{S} \mid X \cup (Z\_{[1]}[[\boldsymbol{\alpha}]]) \subseteq Z \} \\ X[[\boldsymbol{\alpha}^d]] &= X\langle \boldsymbol{\alpha} \rangle \end{aligned}$$

Angelic tests ?φ end in the current state ω with remaining realizer πR(a) if Angel can realize φ with πL(a), else end in ⊥. Angelic deterministic assignments consume no realizer and simply update the state, then end. Angelic nondeterministic assignments x := ∗ ask the realizer πL(a) to compute a new value for x from the current state. Angelic compositions α; β first play α, then β from the resulting state using the resulting continuation. Angelic choice games α ∪ β use the Angelic projections to decide which branch is taken according to πL(a). The realizer πR(a) may be reused between α and β, since πR(a) could just invoke πL(a) if it must decide which branch has been taken. This definition of Angelic choice (corresponding to constructive disjunction) captures the reality that realizers in CGL, in contrast with most constructive logics, are entitled to observe a game state, but they must do so in computable fashion.

*Repetition Semantics.* In any GL, the challenge in defining the semantics of repetition games α<sup>∗</sup> is that the number of iterations, while finite, can depend on both players' actions and is thus not known in advance, while the DL-like semantics of α<sup>∗</sup> as the finite reflexive, transitive closure of α gives an advancenotice semantics. Classical GL provides the no-advance-notice semantics as a fixed point [38], and we adopt the fixed point semantics as well. The Angelic choice whether to stop (Z0) or iterate the loop (Z1) is analogous to the case for α ∪ β.

*Duality Semantics.* To play the dual game α<sup>d</sup>, the active and dormant players switch roles, then play α. In *classical* GL, this characterization of duality is interchangeable with the definition of α<sup>d</sup> as the game that Angel wins exactly when it is impossible for Angel to lose. The characterizations are *not* interchangeable in CGL because the Determinacy Axiom (all games have winners) of GL is not valid in CGL:

*Remark 1 (Indeterminacy).* Classically equivalent determinacy axiom schemata ¬α¬φ → [α]φ and α¬φ ∨ [α]φ of classical GL are not valid in CGL, because they imply double negation elimination.

*Remark 2 (Classical duality).* In classical GL, Angelic dual games are characterized by the axiom schema <sup>α</sup><sup>d</sup><sup>φ</sup> ↔ ¬α¬φ, which is not valid in in CGL. It is classically interdefinable with <sup>α</sup><sup>d</sup> ↔ [α]φ.

The determinacy axiom is not valid in CGL, so we take <sup>α</sup><sup>d</sup> ↔ [α]<sup>φ</sup> as primary.

#### 4.3 Demonic Semantics

Demon wins a Demonic test by presenting a realizer b as evidence that the precondition holds. If he cannot present a realizer (i.e., because none exists), then the game ends in so Angel wins by default. Else Angel's higher-order realizer a consumes the evidence of the pre-condition, i.e., Angelic strategies are entitled to depend (computably) on *how* Demon demonstrated the precondition. Angel can check that Demon passed the test by executing b. The Demonic repetition game α<sup>∗</sup> is defined as a fixed-point [42] with Demonic projections. Computationally, a winning invariant for the repetition is the witness of its winnability.

The remaining cases are innocuous by comparison. Demonic deterministic assignments x := f deterministically store the value of f in x, just as Angelic assignments do. In demonic nondeterministic assignment x := ∗, Demon chooses to set x to *any* value. When Demon plays the choice game α∪β, Demon chooses classically between α and β. The dual game α<sup>d</sup> is played by Demon becoming dormant and Angel become active in α.

*Semantics Examples.* The realizability semantics of games are subtle on a first read, so we provide examples of realizers. In these examples, the state argument ω is implicit, and we refer to ω(x) simply as x for brevity.

Recall that [?φ]ψ and φ → ψ are equivalent. For any φ, the identity function (Λx : φ R**z**. x) is a φ → φ-realizer: for every φ-realizer x which Demon presents, Angel can present the same x as evidence of φ. This confirms expected behavior per propositional constructive logic: the identity function is the proof of selfimplication.

In example formula <sup>x</sup> := <sup>∗</sup><sup>d</sup>; {<sup>x</sup> := <sup>x</sup> <sup>∪</sup> <sup>x</sup> := <sup>−</sup>x}<sup>x</sup> <sup>≥</sup> <sup>0</sup>, Demon gets to set x, then Angel decides whether to negate x in order to make it nonnegative. It is realized by Λx : Q. ((if (x < 0) 1 else 0), ()): Demon announces the value of x, then Angel's strategy is to check the sign of x, taking the right branch when x is negative. Each branch contains a deterministic assignment which consumes no realizer, then the postcondition x ≥ 0 has trivial realizer ().

Consider the formula -{x := x + 1} ∗ x > y, where Angel's winning strategy is to repeat the loop until x>y, which will occur as x increases. The realizer is ind(w. (if (x>y) (0, ()) else (1, w), ())), which says that Angel stops the loop if x>y and proves the postcondition with a trivial strategy. Else Angel continues the loop, whose body consumes no realizer, and supplies the inductive call w to continue the strategy inductively.

Consider the formula [?x > 0; {x := x + 1} ∗ ]∃y (y ≤ x ∧ y > 0) for a subtle example. Our strategy for Angel is to record the initial value of x in y, then maintain a proof that y ≤ x as x increases. This strategy is represented by Λw : (x > 0) R**z**. gen((x,((), w)), z.(πL(z),((), πR(πR(z)))), z.z). That is, initially Demon announces a proof w of x > 0. Angel specifies the initial element of the realizer stream by witnessing ∃y (y ≤ x∧y > 0) with c<sup>0</sup> = (x,((), w)), where the first component instantiates y = x, the trivial second component indicates that y ≤ y trivially, and the third component reuses w as a proof of y > 0. Demon can choose to repeat the loop arbitrarily. When Demon demands the k'th repetition, z is bound to c<sup>k</sup>−<sup>1</sup> to compute c<sup>k</sup> = (πL(z),((), πR(πR(z)))), which plays the next iteration. That is, at each iteration Angel witnesses ∃y (y ≤ x ∧ y > 0) by assigning the same value (stored in πL(z)) to y, reproving y ≤ x with (), then reusing the proof (stored in πR(πR(z))) that y > 0.

# 5 Proof Calculus

Having settled on the meaning of a game in Section 4, we proceed to develop a calculus for proving CGL formulas syntactically. The goal is twofold: the practical motivation, as always, is that when verifying a concrete example, the realizability semantics provide a notion of ground truth, but are impractical for proving large formulas. The theoretical motivation is that we wish to expose the computational interpretation of the modalities αφ and [α]φ as the types of the players' respective winning strategies for game α that has φ as its goal condition. Since CGL is constructive, such a strategy constructs a proof of the postcondition φ.

To study the computational nature of proofs, we write proof terms explicitly: the main proof judgement Γ M : φ says proof term M is a proof of φ in context Γ, or equivalently a proof of sequent (Γ φ). We write M,N,O (sometimes A, B, C) for arbitrary proof terms, and p, q, , r, s, g for *proof variables*, that is variables that range over proof terms of a given proposition. In contrast to the assignable *program variables*, the proof variables are given their meaning by substitution and are scoped locally, not globally. We adapt propositional proof terms such as pairing, disjoint union, and lambda-abstraction to our context of game logic. To support first-order games, we include first-order proof terms and new terms for features: dual, assignment, and repetition games.

We now develop the calculus by starting with standard constructs and working toward the novel constructs of CGL. The assumptions p in Γ are named, so that they may appear as variable proof-terms p. We write Γ <sup>y</sup> <sup>x</sup> and <sup>M</sup> <sup>y</sup> <sup>x</sup> for the renaming of program variable x to y and vice versa in context Γ or proof term M, respectively. Proof rules for state-modifying constructs explicitly perform renamings, which both ensures they are applicable as often as possible and also ensures that references to proof variables support an intuitive notion of lexical scope. Likewise Γ<sup>f</sup> <sup>x</sup> and M<sup>f</sup> <sup>x</sup> are the substitutions of term f for program variable x. We use distinct notation to substitute proof terms for proof variables while avoiding capture: [N/p]M substitutes proof term N for proof variable p in proof term M. Some proof terms such as pairs prove both a diamond formula and a box formula. We write -M,N and [M,N] respectively to distinguish the terms or -[M,N] to treat them uniformly. Likewise we abbreviate -[α]φ when the same rule works for both diamond and box modalities, using [α]φ to denote its dual modality. The proof terms x := f <sup>y</sup> <sup>x</sup> in p. M and [<sup>x</sup> := <sup>f</sup> <sup>y</sup> <sup>x</sup> in p. M] introduce an auxiliary ghost variable y for the old value of x, which improves completeness without requiring manual ghost steps.

The propositional proof rules of CGL are in Fig. 1. Formula [?φ]ψ is constructive implication, so rule [?]E with proof term M N eliminates M by supplying an N that proves the test condition. Lambda terms (λp : φ. M) are introduced by rule [?]I by extending the context Γ. While this rule is standard, it is worth emphasizing that here p is a *proof variable* for which a proof term (like N in [?]E) may be substituted, and that the *game state* is untouched by [?]I. Constructive disjunction (between the branches αφ and βφ) is the choice α ∪ βφ. The introduction rules for injections are -∪I1 and -∪I2, and case-analysis is performed with rule -∪E, with two branches that prove a common consequence


Fig. 1. CGL proof calculus: Propositional rules

from each disjunct. The cases -?φψ and [α ∪ β]φ are conjunctive. Conjunctions are introduced by -?I and [∪]I as pairs, and eliminated by -?E1, -?E2, [∪]E1, and [∪]E2 as projections. Lastly, rule hyp says formulas in the context hold by assumption.

We now begin considering non-propositional rules, starting with the simplest ones. The majority of the rules in Fig. 2, while thoroughly useful in proofs,

$$\begin{array}{c} (\*) \circ \Gamma \xleftarrow{\Gamma \vdash A : \langle \alpha^{\*} \rangle \phi} \quad \Gamma, s: \phi \vdash B: \psi \quad \Gamma, g: \langle \alpha \rangle \langle \alpha^{\*} \rangle \phi \vdash C: \psi\\ \hline \Gamma \vdash M : \langle \alpha \rangle \phi \quad \Gamma \xleftarrow{\Psi} \quad g \Rightarrow C: \psi\\ \hline \Gamma \vdash M : \langle \alpha \rangle \phi \quad \Gamma \xleftarrow{\Psi} \quad \Gamma \phi \vdash N: \psi\\ \hline \Gamma \vdash M : [\alpha^{\*} \!] \phi\\ \hline \Gamma \vdash \langle \text{unroll} \ M \!] : \phi \land [\alpha] [\alpha^{\*} \!] \phi\\ \hline \Gamma \vdash M : \phi\\ (\*) \circ \Gamma \xleftarrow{\Gamma \vdash A : \phi} \quad \Gamma \vdash M : \phi\\ \hline \Gamma \vdash \langle \text{stop } M \rangle : \langle \alpha^{\*} \rangle \phi\\ \hline \Gamma \vdash M : \phi \land \langle \alpha \rangle \langle \alpha^{\*} \rangle \phi\\ \hline \Gamma \vdash \langle \text{go } M \rangle : \langle \alpha^{\*} \rangle \phi \end{array} \qquad \begin{array}{c} \Gamma \vdash M : [\alpha] \phi\\ \hline \Gamma \vdash M : [\alpha] \phi\\ \hline \Gamma \vdash M : [\alpha] \phi\\ \hline \Gamma \vdash \langle \text{voth } M : [\alpha^{\*} \!] \phi\\ \hline \Gamma \vdash M : \phi \land [\alpha] [\alpha^{\*} \!] \phi\\ \hline \Gamma \vdash \langle \text{voth } M : [\alpha^{\*} \!] \phi\\ \hline \Gamma \vdash M : \phi \alpha [ \beta] \phi \end{array}$$

Fig. 2. CGL proof calculus: Some non-propositional rules

are computationally trivial. The repetition rules ([∗]E,[∗]R) fold and unfold the notion of repetition as iteration. The rolling and unrolling terms are named in analogy to the *iso-recursive* treatment of recursive types [50], where an explicit operation is used to expand and collapse the recursive definition of a type.

Rules -∗C,-∗S,-∗G are the destructor and injectors for α<sup>∗</sup>φ, which are similar to those for α ∪ βφ. The duality rules (-[ <sup>d</sup>]I) say the dual game is proved by proving the game where roles are reversed. The sequencing rules (-[;]I) say a sequential game is played by playing the first game with the goal of reaching a state where the second game is winnable.

Among these rules, monotonicity M is especially computationally rich. The notation Γ *<sup>y</sup>* BV(α) says that in the second premiss, the assumptions in Γ have all bound variables of α (written BV(α)) renamed to fresh variables *y* for completeness. In practice, Γ usually contains some assumptions on variables that are not bound, which we wish to access without writing them explicitly in φ. Rule M is used to execute programs right-to-left, giving shorter, more efficient proofs. It can also be used to derive the Hoare-logical sequential composition rule, which is frequently used to reduce the number of case splits. Note that like every GL, CGL is subnormal, so the modal modus ponens axiom K and Gödel generalization (or necessitation) rule G are not sound, and M takes over much of the role they usually serve. On the surface, M simply says games are monotonic: a game's goal proposition may freely be replaced with a weaker one. From a computational perspective, Section 7 will show that rule M can be (lazily) eliminated. Moreover, M is an *admissible* rule, one whose instances can all be derived from existing rules. When proofs are written right-to-left with M, the normalization relation translates them to left-to-right normal proofs. Note also that in checking M◦<sup>p</sup>N, the context Γ has the bound variables α renamed freshly to some *y* within N, as required to maintain soundness across execution of α.

Next, we consider *first-order* rules, i.e., those which deal with first-order programs that modify *program* variables. The first-order rules are given in Fig. 3. In -:∗E, FV(ψ) are the *free variables* of ψ, the variables which can influence its meaning. Nondeterministic assignment provides quantification over rational-


Fig. 3. CGL proof calculus: first-order games

valued *program* variables. Rule [:∗]<sup>I</sup> is universal, with proof term (λx : <sup>Q</sup>. M). While this notation is suggestive, the difference vs. the function proof term (λp : φ. M) is essential: the proof term M is checked (resp. evaluated) in a state where the program variable x has changed from its initial value. For soundness, [:∗]<sup>I</sup> renames <sup>x</sup> to fresh program variable <sup>y</sup> throughout context Γ, written <sup>Γ</sup> <sup>y</sup> x . This means that M can freely refer to all facts of the full context, but they now refer to the state as it was before x received a new value. Elimination [:∗]E then allows instantiating x to a term f. Existential quantification is introduced by -:∗I whose proof term f y <sup>x</sup> :∗ p. M is like a dependent pair plus bound renaming of x to y. The witness f is an arbitrary computable term, as always. We write f <sup>x</sup> :∗ M for short when y is not referenced in M. It is eliminated in -:∗E by unpacking the pair, with side condition x /∈ FV(ψ) for soundness. The assignment rules -[:=]I do not quantify, per se, but always update x to the value of the term f, and in doing so introduce an assumption that x and f (suitably renamed) are now equal. In -:∗I and -[:=]I, program variable y is fresh.

$$\begin{array}{c} \Gamma \vdash A : \varphi\\ (\circ) \vdash \frac{\Gamma \vdash \varphi, q : \mathcal{M}\_{0} = \mathcal{M} \vdash B : \langle \alpha \rangle (\varphi \wedge \mathcal{M}\_{0} \succ \mathcal{M}) \quad p : \varphi, q : \mathcal{M} = \mathbf{0} \vdash C : \phi\\ \hline \Gamma \vdash \mathsf{for} (p : \varphi(\mathcal{M}) = A ; q . B : C) \{\alpha\} : \langle \alpha^{\*} \rangle \phi\\ \hline \Gamma \vdash M : J \qquad \Gamma \vdash A : \langle \alpha^{\*} \rangle \phi\\ [\*] \vdash \frac{\Gamma \vdash \langle M \text{ rep } p : J . N \text{ in } O \rangle : [\alpha^{\*} ] \phi\\ \text{split} \Gamma \vdash \langle \mathsf{split for} [\ \langle f, g \ \rangle \ ) : f \leq g \lor f > g \end{array} \quad \begin{array}{c} \Gamma \vdash A : \phi\\ \hline \Gamma \vdash A : \langle \alpha^{\*} \rangle \phi\\ \hline \Gamma \vdash A : \langle \alpha^{\*} \rangle \phi\\ \hline \Gamma \vdash A : \langle \alpha \rangle \psi \vdash \langle \alpha \rangle \psi\\ \hline \Gamma \vdash F \mathcal{P} (A , s . B, g . C) : \psi \end{array} \quad \begin{array}{c} \Gamma \vdash \langle \alpha \rangle \phi\\ \hline \Gamma \vdash \langle \alpha \rangle \phi\\ \hline \Gamma \vdash F \mathcal{P} (A , s . B, g . C) : \psi \end{array}$$

Fig. 4. CGL proof calculus: loops

The looping rules in Fig. 4, especially -∗I, are arguably the most sophisticated in CGL. Rule -∗I provides a strategy to repeat a game α until the postcondition φ holds. This is done by exhibiting a convergence predicate ϕ and termination metric M with terminal value **0** and well-ordering *-*. Proof term A shows ϕ holds initially. Proof term B guarantees M decreases with every iteration where M<sup>0</sup> is a fresh metric variable which is equal to M at the antecedent of B and is never modified. Proof term C allows any postcondition φ which follows from convergence ϕ ∧ M = **0**. Proof term for(p : ϕ(M) = A; q. B; C){α} suggests the computational interpretation as a for-loop: proof A shows the convergence predicate holds in the initial state, B shows that each step reduces the termination metric while maintaining the predicate, and C shows that the postcondition follows from the convergence predicate upon termination. The game α repeats until convergence is reached (M = **0**). By the assumption that metrics are well-founded, convergence is guaranteed in finitely (but arbitrarily) many iterations.

A naïve, albeit correct, reading of rule -∗I says M is literally some term f. If lexicographic or otherwise non-scalar metrics should be needed, it suffices to interpret ϕ and M<sup>0</sup> *-*M as formulas over several scalar variables.

Rule FP says α<sup>∗</sup>φ is a least pre-fixed-point. That is, if we wish to show a formula ψ holds now, we show that ψ is any pre-fixed-point, then it must hold as it is no lesser than φ. Rule [∗]I is the well-understood induction rule for loops, which applies as well to repeated games. Premiss O ensures [∗]I supports any provable postcondition, which is crucial for eliminating M in Lemma 7. The elimination form for [α∗]φ is simply [∗]E. Like any program logic, reasoning in CGL consists of first applying program-logic rules to decompose a program until the program has been entirely eliminated, then applying first-order logic principles at the leaves of the proof. The *constructive* theory of rationals is undecidable because it can express the undecidable [47] *classical* theory of rationals. Thus facts about rationals require proof in practice. For the sake of space and since our focus is on program reasoning, we defer an axiomatization of rational arithmetic to future work. We provide a (non-effective!) rule FO which says valid first-order formulas are provable.

$$\text{FO} \xrightarrow[\Gamma \vdash \mathsf{FO}[\phi](M):\phi]{\Gamma \vdash M : \rho} \text{ (exists } a \text{ s.t. } \{a\} \times \mathfrak{S} \subseteq \left[\rho \to \phi\right], \text{ } \rho, \phi \text{ F.O.)}$$

An effective special case of FO is split (Fig. 4), which says all term comparisons are decidable. Rule split can be generalized to decide termination metrics (M = **0**∨M *-* **0**). Rule iG says the value of term f can be remembered in fresh ghost variable x:

$$\text{ri}\_{\mathbb{G}}^{\circlearrowleft} \frac{\Gamma, p:x = f \vdash M:\phi}{\Gamma \vdash \mathsf{Ghost}[x = f](p.\ M):\phi} \quad \text{( $x$  fresh except free in } M, p \text{ fresh)}$$

Rule iG can be defined using arithmetic and with quantifiers:

$$(\mathsf{Ghost}[x=f](p.\ M) \equiv (\lambda x : \mathbb{Q}.\ (\lambda p : (x=f).\ M)) \ f\ (\mathsf{FO}[f=f]())$$

*What's Novel in the* CGL *Calculus?* CGL extends first-order reasoning with game reasoning (sequencing [32], assignments, iteration, and duality). The combination of first-order reasoning with game reasoning is synergistic: for example, repetition games are known to be more expressive than repetition systems [42]. We give a new natural-deduction formulation of monotonicity. Monotonicity is admissible and normalization translates monotonicity proofs into monotonicityfree proofs. In doing so, normalization shows that right-to-left proofs can be (lazily) rewritten as left-to-right. Additionally, first-order games are rife with changing state, and soundness requires careful management of the context Γ. The extended version [12] uses our calculus to prove the example formulas.

# 6 Theory: Soundness

Full versions of proofs outlined in this paper are given in the extended version [12]. We have introduced a proof calculus for CGL which can prove winning strategies for Nim and CC. For any new proof calculus, it is essential to convince ourselves of our soundness, which can be done within several prominent schools of thought. In proof-theoretic semantics, for example, the proof rules are taken as the ground truth, but are validated by showing the rules obey expected properties such as harmony or, for a sequent calculus, cut-elimination. While we will investigate proof terms separately (Section 8), we are already equipped to show soundness by direct appeal to the realizability semantics (Section 4), which we take as an independent notion of ground truth. We show soundness of CGL proof rules against the realizability semantics, i.e., that every provable naturaldeduction sequent is valid. An advantage of this approach is that it explicitly connects the notions of provability and computability! We build up to the proof of soundness by proving lemmas on structurality, renaming and substitution.

Lemma 1 (Structurality). *The structural rules W, X, and C are admissible, i.e., the conclusions are provable whenever the premisses are provable.*

$$\text{W } \frac{\Gamma \vdash M : \phi}{\Gamma, p : \psi \vdash M : \phi} \quad \text{\tiny{'} \quad \frac{\Gamma, p : \phi, q : \psi \vdash M : \rho}{\Gamma, q : \psi, p : \phi \vdash M : \rho} \quad \text{\tiny{'} \quad \frac{\Gamma, p : \phi, q : \phi \vdash M : \rho}{\Gamma, p : \phi \vdash [p/q]M : \rho}$$

*Proof summary.* Each rule is proved admissible by induction on M. Observe that the only premisses regarding Γ are of the form Γ(p) = φ, which are preserved under weakening. Premisses are trivially preserved under exchange because contexts are treated as sets, and preserved modulo renaming by contraction as it suffices to have *any* assumption of a given formula, regardless its name. The context Γ is allowed to vary in applications of the inductive hypothesis, e.g., in rules that bind program variables. Some rules discard Γ in checking the subterms inductively, in which case the IH need not be applied at all.

Lemma 2 (Uniform renaming). *Let* M <sup>y</sup> <sup>x</sup> *be the renaming of program variable* x *to* y *(and vice-versa) within* M*, even when neither* x *nor* y *is fresh. If* Γ M : φ *then* Γ <sup>y</sup> <sup>x</sup> <sup>M</sup> <sup>y</sup> <sup>x</sup> : <sup>φ</sup><sup>y</sup> x *.*

*Proof summary.* Straightforward induction on the structure of M. Renaming within proof terms (whose definition we omit as it is quite tedious) follows the usual homomorphisms, from which the inductive cases follow. In the case that M is a proof variable z, then Γ y x (z) = Γ(z) <sup>y</sup> <sup>x</sup> from which the case follows. The interesting cases are those which modify program variables, e.g., z := f <sup>w</sup> <sup>z</sup> in p. M. The bound variable <sup>z</sup> is renamed to <sup>z</sup> <sup>y</sup> <sup>x</sup> , while the auxiliary variable w is α-varied if necessary to maintain freshness. Renaming then happens recursively in M.

Substitution will use proofs of coincidence and bound effect lemmas.

Lemma 3 (Coincidence). *Only the free variables of an expression influence its semantics.*

Lemma 4 (Bound effect). *Only the bound variables of a game are modified by execution.*

*Summary.* By induction on the expression, in analogy to [43].

Definition 9 (Term substitution admissibility). *For simplicity, we say* φ<sup>f</sup> x *(likewise for context* Γ, *term* f, *game* α, *and proof term* M*) is admissible if* φ *binds neither* x *nor free variables of* f*.*

The latter condition can be relaxed in practice [44] to requiring φ does not mention x under bindings of free variables.

Lemma 5 (Arithmetic-term substitution). *If* Γ M : φ *and the substitutions* Γ<sup>f</sup> <sup>x</sup> , M<sup>f</sup> <sup>x</sup> , *and* φ<sup>f</sup> <sup>x</sup> *are admissible, then* Γ<sup>f</sup> <sup>x</sup> <sup>M</sup><sup>f</sup> <sup>x</sup> : φ<sup>f</sup> x*.*

*Summary.* By induction on M. Admissibility holds recursively, and so can be assumed at each step of the induction. For non-atomic M that bind no variables, the proof follows from the inductive hypotheses. For M that bind variables, we appeal to Lemma 3 and Lemma 4.

Just as arithmetic terms are substituted for program variables, proof terms are substituted for proof variables.

Lemma 6 (Proof term substitution). *Let* [N/p]M *substitute* N *for* p *in* M*, avoiding capture. If* Γ, p : ψ M : φ *and* Γ N : ψ *then* Γ [N/p]M : φ*.*

*Proof.* By induction on M, appealing to renaming, coincidence, and bound effect. When substituting N for p into a term that binds program variables such as z := f <sup>y</sup> <sup>z</sup> in q. M, we avoid capture by renaming within occurrences of N in the recursive call, i.e., [N/p]z := f <sup>y</sup> <sup>z</sup> in q. M = z := f <sup>y</sup> <sup>z</sup> in q. [<sup>N</sup> <sup>z</sup> <sup>y</sup> /p]M, preserving soundness by Lemma 2.

Soundness of the proof calculus exploits renaming and substitution.

Theorem 1 (Soundness of proof calculus). *If* Γ M : φ *then* (Γ φ) *is valid. As a special case for empty context* ·*, if* · M : φ, *then* φ *is valid.*

*Proof summary.* By induction on M. Modus ponens case A B reduces to Lemma 6. Cases that bind program variables, such as assignment, hold by Lemma 5 and Lemma 2. Rule W is employed when substituting under a binder.

We have now shown that the CGL proof calculus is sound, the *sine qua non* condition of any proof system. Because soundness was w.r.t. a realizability semantics, we have shown CGL is constructive in the sense that provable formulas correspond to realizable strategies, i.e., imperative programs executed in an adversarial environment. We will revisit constructivity again in Section 8 from the perspective of proof terms as *functional* programs.

# 7 Operational Semantics

The Curry-Howard interpretation of games is not complete without exploring the interpretation of proof simplification as normalization of functional programs. To this end, we now introduce a structural operational semantics for CGL proof terms. This semantics provides a view complementary to the realizability semantics: not only do provable formulas correspond to realizers, but proof terms can be directly executed as functional programs, resulting in a *normal* proof term. The chief subtlety of our operational semantics is that in contrast to realizer execution, proof simplification is a static operation, and thus does not inspect game state. Thus the normal form of a proof which branches on the game state is, of necessity, also a proof which branches on the game state. This static-dynamic phase separation need not be mysterious: it is analogous to the monadic phase separation between a functional program which returns an imperative command vs. the execution of the returned command. While the primary motivation for our operational semantics is to complete the Curry-Howard interpretation, proof normalization is also helpful when implementing software tools which process proof artifacts, since code that consumes a normal proof is in general easier to implement than code that consumes an arbitrary proof.

The operational semantics consist of two main judgments: M normal says that M is a normal form, while M → M says that M reduces to term M in one step of evaluation. A normal proof is allowed a case operation at the top-level, either case A of ⇒ B | r ⇒ C or case<sup>∗</sup> A of s ⇒ B | g ⇒ C. Normal proofs M without state-casing are called *simple*, written M simp. The requirement that cases are top-level ensures that proofs which differ only in where the case was applied share a common normal form, and ensures that β-reduction is never blocked by a case interceding between introduction-elimination pairs. Top-level case analyses are analogous to case-tree normal forms in lambda calculi with coproducts [4]. Reduction of proof terms is eager.

Definition 10 (Normal forms). *We say* M *is* simple*, written* M simp, *if eliminators occur only under binders. We say* M *is* normal*, written* M normal, *if* M simp *or* M *has shape* case <sup>A</sup> *of* <sup>⇒</sup> <sup>B</sup> <sup>|</sup> <sup>r</sup> <sup>⇒</sup> <sup>C</sup> *or* case<sup>∗</sup> <sup>A</sup> *of* <sup>s</sup> <sup>⇒</sup> <sup>B</sup> <sup>|</sup> <sup>g</sup> <sup>⇒</sup> <sup>C</sup> *where* <sup>A</sup> *is a term such as* (split [f,g] <sup>M</sup>) *that inspects the state. Subterms* B *and* C *need not be normal since they occur under the binding of or* r *(resp.* s *or* g*).*

That is, a normal term has no top-level beta-redexes, and state-dependent cases are top-level. We consider rules [∗]R, [:∗]I, [?]I, and -[:=]I binding. Rules such as -∗I have multiple premisses but bind only one. While [∗]R does not introduce a proof variable, it is rather considered binding to prevent divergence, which is in keeping with a coinductive understanding of formula [α∗]φ. If we did not care whether terms diverge, we could have made [∗]R non-binding.

For the sake of space, this section focuses on the β-rules (Fig. 5). The full calculus, given in the extended version [12], includes structural and commutingconversion rules, as well as what we call *monotonicity conversion* rules: a proof term M◦<sup>p</sup>N is simplified by structural recursion on M. The capture-avoiding substitution of M for p in N is written [M/p]N (Lemma 6). The propositional cases λφβ, λβ, caseβL, caseβR, π1β, and π2β are standard reductions for applications, cases, and projections. Projection terms π1M and π2M should not be confused with projection realizers πL(a) and πR(a). Rule unpackβ makes the witness of an existential available in its client as a ghost variable.

Rule FPβ, repβ, and forβ reduce introductions and eliminations of loops. Rule FPβ, which reduces a proof *FP*(A, s. B, g. C) says that if α<sup>∗</sup> has already terminated according to A, then B proves the postcondition. Else the inductive step C applies, but every reference to the IH g is transformed to a recursive application of FP. If A uses only -∗S and -∗G, then *FP*(A, s. B, g. C) reduces to a simple term, else if A uses -∗I, then *FP*(A, s. B, g. C) reduces to a case. Rule repβ says loop induction (M rep p : J. N in O) reduces to a delayed pair

λφβ (λp : φ. M) N <sup>→</sup> [N/p]M λβ (λx : <sup>Q</sup>. M) f <sup>→</sup> M<sup>f</sup> x π<sup>1</sup>β -[π1-[M,N]] 
→ M <sup>π</sup><sup>2</sup>β -[π2-[M,N]] 
→ N caseβ<sup>L</sup> -[case -[ · A] of <sup>⇒</sup> B <sup>|</sup> r <sup>⇒</sup> C] 
→ [A/]B caseβ<sup>R</sup> -[case -[r · A] of <sup>⇒</sup> B <sup>|</sup> r <sup>⇒</sup> C] 
→ [A/r]C unrollβ [unroll [roll M]] <sup>→</sup> M unpackβ unpack(f y <sup>x</sup> :<sup>∗</sup> q. M, py. N) <sup>→</sup> (Ghost[<sup>x</sup> <sup>=</sup> <sup>f</sup> <sup>y</sup> <sup>x</sup> ](q. [M/p]N)) <sup>x</sup> y FPβ *FP*(D, s. B, g. C) <sup>→</sup> (case<sup>∗</sup> <sup>D</sup> of <sup>s</sup> <sup>⇒</sup> <sup>B</sup> <sup>|</sup> <sup>g</sup> <sup>⇒</sup> [(g◦<sup>z</sup>*FP*(z, s. B, g. C))/g]C) repβ (M rep p : J. N in O) <sup>→</sup> [roll -M,([M/p]N)◦<sup>q</sup>(q rep p : J. N in O)] for<sup>β</sup> for(<sup>p</sup> : <sup>ϕ</sup>(M) = <sup>A</sup>; q. B; <sup>C</sup>){α} 
→ case split [M, 0] of ⇒ stop [(A, )/(p, q)]C <sup>|</sup> <sup>r</sup> <sup>⇒</sup> Ghost[M<sup>0</sup> <sup>=</sup> <sup>M</sup>](*rr*. go (([A,*rr*, r/p, q]B)◦<sup>t</sup>(for(p : ϕ(M) = π<sup>1</sup>t; q. B; C){α})))

Fig. 5. Operational semantics: β-rules

of the "stop" and "go" cases, where the "go" case first shows [α]J, for loop invariant J, then expands J → [α∗]φ in the postcondition. Note the laziness of [roll] is essential for normalization: when (M rep p : J. N in O) is understood as a coinductive proof, it is clear that normalization would diverge if repβ were applied indefinitely. Rule forβ for for(p : ϕ(M) = A; q. B; C){α} checks whether the termination metric M has reached terminal value **0**. If so, the loop stop's and A proves it has converged. Else, we remember M's value in a ghost term M0, and go forward, supplying A and r, rr to satisfy the preconditions of inductive step B, then execute the loop for(p : ϕ(M) = π1t; q. B; C){α} in the postcondition. Rule forβ reflects the fact that the exact number of iterations is state dependent.

We discuss the structural, commuting conversion, and monotonicity conversion rules for left injections as an example, with the full calculus in [12]. Structural rule ·S evaluates term M under an injector. Commuting conversion rule -[·]C normalizes an injection of a case to a case with injectors on each branch. Monotonicity conversion rule -[·]◦ simplifies a monotonicity proof of an injection to an injection of a monotonicity proof.

·S M <sup>→</sup> M -[ · M] 
→ -[ · M] -[·]<sup>C</sup> -[ · case A of p <sup>⇒</sup> B <sup>|</sup> q <sup>⇒</sup> C] 
→ case A of p ⇒ -[ · B] | q ⇒ -[ · C] -[·]◦ -[ · M]◦<sup>p</sup><sup>N</sup> → -[ · (M◦<sup>p</sup>N)]

Fig. 6. Operational semantics: structural, commuting conversion, monotonicity rules

# 8 Theory: Constructivity

We now complete the study of CGL's constructivity. We validate the operational semantics on proof terms by proving that progress and preservation hold, and thus the CGL proof calculus is sound as a type system for the functional programming language of CGL proof terms.

Lemma 7 (Progress). *If* · M : φ, *then either* M *is normal or* M → M *for some* M *.*

*Summary.* By induction on the proof term M. If M is an introduction rule, by the inductive hypotheses the subterms are well-typed. If they are all simple, then M simp. If some subterm (not under a binder) steps, then M steps by a structural rule. Else some subterm is an irreducible case expression not under a binder, it lifts by the commuting conversion rule. If M is an elimination rule, structural and commuting conversion rules are applied as above. Else by Def. 10 the subterm is an introduction rule, and M reduces with a β-rule. Lastly, if M has form A◦<sup>x</sup>B and A simp, then by Def. 10 A is an introduction form, thus reduced by some monotonicity conversion rule.

Lemma 8 (Preservation). *Let* → <sup>∗</sup> *be the reflexive, transitive closure of the* → *relation. If* · M : φ *and* M→ <sup>∗</sup>M , *then* · M : φ

*Summary.* Induct on the derivation M→<sup>∗</sup>M , then induct on M → M . The β cases follow by Lemma 6 (for base constructs), and Lemma 6 and Lemma 2 (for assignments). C-rules and ◦-rules lift across binders, soundly by W. S-rules are direct by IH.

We gave two understandings of proofs in CGL, as imperative strategies and as functional programs. We now give a final perspective: CGL proofs support synthesis in principle, one of our main motivations. Formally, the Existential Property (EP) and Disjunction Property (DP) justify synthesis [18] for existentials and disjunctions: whenever an existential or disjunction has a proof, then we can compute some instance or disjunct that has a proof. We state and prove an EP and DP for CGL, then introduce a Strategy Property, their counterpart for synthesizing strategies from game modalities. It is important to our EP that terms are arbitrary computable functions, because more simplistic term languages are often too weak to witness the existentials they induce.

*Example 1 (Rich terms help).* Formulas over polynomial terms can have nonpolynomial witnesses.

$$\text{Let } \phi \equiv (x = y \land x \ge 0) \lor (x = -y \land x < 0). \text{ Then } f = |x| \text{ witnesses } \exists y : \mathbb{Q} \text{ } \phi.$$

Lemma 9 (Existential Property). *If* <sup>Γ</sup> <sup>M</sup> :(∃<sup>x</sup> : <sup>Q</sup> <sup>φ</sup>) *then there exists a term* <sup>f</sup> *and realizer* <sup>b</sup> *such that for all* (a, ω) <sup>∈</sup> [[ <sup>Γ</sup>]], *we have* (b a, ω<sup>f</sup>(ω) <sup>x</sup> ) <sup>∈</sup> [[φ]]*.*

*Proof.* By Theorem 1, the sequent (<sup>Γ</sup> ∃<sup>x</sup> : <sup>Q</sup> <sup>φ</sup>) is valid. Since (a, ω) <sup>∈</sup> [[ <sup>Γ</sup>]], then by the definition of sequent validity, there exists a common realizer c such that (c a, ω) <sup>∈</sup> [[∃<sup>x</sup> : <sup>Q</sup> <sup>φ</sup>]]. Now let <sup>f</sup> <sup>=</sup> <sup>π</sup>L(c a) and <sup>b</sup> <sup>=</sup> <sup>π</sup>R(c a) and the result is immediate by the semantics of existentials.

Disjunction strategies can depend on the state, so naïve DP does not hold.

*Example 2 (Naïve DP).* When Γ M :(φ ∨ ψ) there need not be N such that Γ N : φ or Γ N : ψ.

Consider φ ≡ x > 0 and ψ ≡ x < 1. Then · split [x, 0] () :(φ ∨ ψ), but neither x < 1 nor x > 0 is valid, let alone provable.

Lemma 10 (Disjunction Property). *When* Γ M : φ∨ψ *there exists realizer* b *and computable* f, *s.t. for every* ω *and* a *such that* (a, ω) ∈ [[ Γ]]*, either* f(ω)=0 *and* (πL(b), ω) ∈ [[φ]]*, else* f(ω)=1 *and* (πR(b), ω) ∈ [[ψ]]*.*

*Proof.* By Theorem 1, the sequent Γ φ ∨ ψ is valid. Since (a, ω) ∈ [[ Γ]], then by the definition of sequent validity, there exists a common realizer c such that (c a, ω) ∈ [[φ ∨ ψ]]. Now let f = πL(c a) and b = πR(c a) and the result is immediate by the semantics of disjunction.

Following the same approach, we generalize to a Strategy Property. In CGL, strategies are represented by realizers, which implement every computation made throughout the game. Thus, to show provable games have computable winning strategies, it suffices to exhibit realizers.

Theorem 2 (Active Strategy Property). *If* Γ M :αφ, *then there exists a realizer* b *such that for all* ω *and realizers* a *such that* (a, ω) ∈ [[ Γ]], *then* {(b a, ω)}-α ⊆ [[φ]] ∪ {}*.*

Theorem 3 (Dormant Strategy Property). *If* Γ M : [α]φ, *then there exists a realizer* b *such that for all* ω *and realizers* a *such that* (a, ω) ∈ [[ Γ]], *then* {(b a, ω)}[[α]] ⊆ [[φ]] ∪ {}*.*

*Summary.* From proof term M and Theorem 1, we have a realizer for formula αφ or [α]φ, respectively. We proceed by induction on α: the realizer b a contains all realizers applied in the inductive cases composed with their continuations that prove φ in each base case.

While these proofs, especially EP and DP, are short and direct, we note that this is by design: the challenge in developing CGL is not so much the proofs of this section, rather these proofs become simple because we adopted a realizability semantics. The challenge was in developing the semantics and adapting the proof calculus and theory to that semantics.

# 9 Conclusion and Future Work

In this paper, we developed a Constructive Game Logic CGL, from syntax and realizability semantics to a proof calculus and operational semantics on the proof terms. We developed two understandings of proofs as programs: semantically, every proof of game winnability corresponds to a realizer which computes the game's winning strategy, while the language of proof terms is also a functional programming language where proofs reduce to their normal forms according to the operational semantics. We completed the Curry-Howard interpretation for games by showing Existential, Disjunction, and Strategy properties: programs can be synthesized that decide which instance, disjunct, or moves are taken in existentials, disjunctions, and games. In summary, we have developed the most comprehensive Curry-Howard interpretation of any program logic to date, for a much more expressive logic than prior work [32]. Because CGL contains constructive Concurrent DL and first-order DL as strict fragments, we have provided a comprehensive Curry-Howard interpretation for them in one fell swoop. The key insights behind CGL should apply to the many dynamic and Hoare logics used in verification today.

Synthesis is the immediate application of CGL. Motivations for synthesis include security games [40], concurrent programs with demonic schedulers (Concurrent Dynamic Logic), and control software for safety-critical cyber-physical systems such as cars and planes. In general, any kind of software program which must operate correctly in an adversarial environment can benefit from game logic verification. The proofs of Theorem 2 and Theorem 3 constitute an (on-paper) algorithm which performs synthesis of guaranteed-correct strategies from game proofs. The first future work is to implement this algorithm in code, providing much-needed assurance for software which is often mission-critical or safetycritical. This paper focused on discrete CGL with one numeric type simply because any further features would distract from the core features. Real applications come from many domains which add features around this shared core.

The second future work is to extend CGL to hybrid games, which provide compelling applications from the domain of adversarial cyber-physical systems. This future work will combine the novel features of CGL with those of the classical logic dGL. The primary task is to define a constructive semantics for differential equations and to give constructive interpretations to the differential equation rules of dGL. Previous work on formalizations of differential equations [34] suggests differential equations can be treated constructively. In principle, existing proofs in dGL might happen to be constructive, but this does not obviate the present work. On the contrary, once a game logic proof is shown to fall in the constructive fragment, our work gives a correct synthesis guarantee for it too!

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimal and Perfectly Parallel Algorithms for On-demand Data-flow Analysis**<sup>∗</sup>

Krishnendu Chatterjee1, Amir Kafshdar Goharshady1, Rasmus Ibsen-Jensen2, and Andreas Pavlogiannis<sup>3</sup>

> <sup>1</sup> IST Austria, Klosterneuburg, Austria [krishnendu.chatterjee, amir.goharshady]@ist.ac.at <sup>2</sup> University of Liverpool, Liverpool, United Kingdom r.ibsen-jensen@liverpool.ac.uk <sup>3</sup> Aarhus University, Aarhus, Denmark pavlogiannis@cs.au.dk

**Abstract.** Interprocedural data-flow analyses form an expressive and useful paradigm of numerous static analysis applications, such as live variables analysis, alias analysis and null pointers analysis. The most widely-used framework for interprocedural data-flow analysis is *IFDS*, which encompasses distributive data-flow functions over a finite domain. *On-demand* data-flow analyses restrict the focus of the analysis on specific program locations and data facts. This setting provides a natural split between (i) an *offline (or preprocessing) phase*, where the program is partially analyzed and analysis summaries are created, and (ii) an *online (or query) phase*, where analysis queries arrive on demand and the summaries are used to speed up answering queries.

In this work, we consider on-demand IFDS analyses where the queries concern program locations of the same procedure (aka same-context queries). We exploit the fact that flow graphs of programs have low treewidth to develop faster algorithms that are *space and time optimal* for many common data-flow analyses, in both the preprocessing and the query phase. We also use treewidth to develop query solutions that are *embarrassingly parallelizable*, i.e. the total work for answering each query is split to a number of threads such that each thread performs only a constant amount of work. Finally, we implement a static analyzer based on our algorithms, and perform a series of on-demand analysis experiments on standard benchmarks. Our experimental results show a drastic speed-up of the queries after only a lightweight preprocessing phase, which significantly outperforms existing techniques.

**Keywords:** Data-flow analysis, IFDS, Treewidth

<sup>∗</sup>The research was partly supported by Austrian Science Fund (FWF) Grant No. NFN S11407-N23 (RiSE/SHiNE), FWF Schr¨odinger Grant No. J-4220, Vienna Science and Technology Fund (WWTF) Project ICT15-003, Facebook PhD Fellowship Program, IBM PhD Fellowship Program, and DOC Fellowship No. 24956 of the Austrian Academy of Sciences (OAW). A longer version of this work is available at [17]. ¨

# **1 Introduction**

*Static data-flow analysis.* Static program analysis is a fundamental approach for both analyzing program correctness and performing compiler optimizations [25,39,44,64,30]. Static data-flow analyses associate with each program location a set of data-flow facts which are guaranteed to hold under all program executions, and these facts are then used to reason about program correctness, report erroneous behavior, and optimize program execution. Static data-flow analyses have numerous applications, such as in pointer analysis (e.g., pointsto analysis and detection of null pointer dereferencing) [46,57,61,62,66,67,69], in detecting privacy and security issues (e.g., taint analysis, SQL injection analysis) [3,37,31,33,47,40], as well as in compiler optimizations (e.g., constant propagation, reaching definitions, register allocation) [50,32,55,13,2].

*Interprocedural analysis and the IFDS framework.* Data-flow analyses fall in two large classes: *intraprocedural* and *interprocedural*. In the former, each procedure of the program is analyzed in isolation, ignoring the interaction between procedures which occurs due to parameter passing/return. In the latter, all procedures of the program are analyzed together, accounting for such interactions, which leads to results of increased precision, and hence is often preferable to intraprocedural analysis [49,54,59,60]. To filter out false results, interprocedural analyses typically employ call-context sensitivity, which ensures that the underlying execution paths respect the calling context of procedure invocations. One of the most widely used frameworks for interprocedural data-flow analysis is the framework of Interprocedural Finite Distributive Subset (IFDS) problems [50], which offers a unified formulation of a wide class of interprocedural data-flow analyses as a reachability problem. This elegant algorithmic formulation of data-flow analysis has been a topic of active study, allowing various subsequent practical improvements [36,45,8,3,47,56] and implementations in prominent static analysis tools such as Soot [7] and WALA [1].

*On-demand analysis.* Exhaustive data-flow analysis is computationally expensive and often unnecessary. Hence, a topic of great interest in the community is that of *on-demand* data-flow analysis [4,27,36,51,48,68,45]. On-demand analyses have several applications, such as (quoting from [36,48]) (i) narrowing down the focus to specific points of interest, (ii) narrowing down the focus to specific data-flow facts of interest, (iii) reducing work in preliminary phases, (iv) sidestepping incremental updating problems, and (v) offering demand analysis as a user-level operation. On-demand analysis is also extremely useful for speculative optimizations in just-in-time compilers [24,43,5,29], where dynamic information can dramatically increase the precision of the analysis. In this setting, it is crucial that the the on-demand analysis runs fast, to incur as little overhead as possible.

*Example 1.* As a toy motivating example, consider the partial program shown in Figure 1, compiled with a just-in-time compiler that uses speculative optimizations. Whether the compiler must compile the expensive function h depends on whether x is null in line 6. Performing a null-pointer analysis from the entry of

```
1 void f ( int b){
2 int *x = NULL , *y = NULL;
3 if(b > 1)
4 y = &b;
5 g(x,y);
6 if(x==NULL)
7 h();
8 }
                                  9 void g ( int *&x, int *y){
                                 10 x=y;
                                 11 }
                                 12 void h(){
                                 13 //An expensive
                                 14 //function
                                 15 }
```
Fig. 1: A partial C++ program.

f reveals that <sup>x</sup> might be null in line 6. Hence, if the decision to compile h relies only on an offline static analysis, h is always compiled, even when not needed.

Now consider the case where the execution of the program is in line 4, and at this point the compiler decides on whether to compile h. It is clear that given this information, <sup>x</sup> cannot be null in line 6 and thus h does not have to be compiled. As we have seen above, this decision can not be made based on offline analysis. On the other hand, an *on-demand* analysis starting from the current program location will correctly conclude that x is not null in line 6. Note however, that this decision is made by the compiler during runtime. Hence, such an on-demand analysis is useful only if it can be performed extremely fast. It is also highly desirable that the time for running this analysis is predictable, so that the compiler can decide whether to run the analysis or simply compile h proactively.

The techniques we develop in this paper answer the above challenges rigorously. Our approach exploits a key structural property of flow graphs of programs, called treewidth.

*Treewidth of programs.* A very well-studied notion in graph theory is the concept of *treewidth* of a graph, which is a measure of how similar a graph is to a tree (a graph has treewidth 1 precisely if it is a tree) [52]. On one hand the treewidth property provides a mathematically elegant way to study graphs, and on the other hand there are many classes of graphs which arise in practice and have constant treewidth. The most important example is that the flow graph for goto-free programs in many classic programming languages have constant treewidth [63]. The low treewidth of flow graphs has also been confirmed experimentally for programs written in Java [34], C [38], Ada [12] and Solidity [15].

Treewidth has important algorithmic implications, as many graph problems that are hard to solve in general admit efficient solutions on graphs of low treewidth. In the context of program analysis, this property has been exploited to develop improvements for register allocation [63,9] (a technique implemented in the Small Device C Compiler [28]), cache management [18], on-demand algebraic path analysis [16], on-demand *intraprocedural* data-flow analysis of concurrent programs [20] and data-dependence analysis [14].

*Problem statement.* We focus on on-demand data-flow analysis in IFDS [50,36,48]. The input consists of a supergraph G of n vertices, a data-fact domain D and a data-flow transformer function M. Edges of G capture control-flow within each procedure, as well as procedure invocations and returns. The set D defines the domain of the analysis, and contains the data facts to be discovered by the analysis for each program location. The function M associates with every edge (u, v) of <sup>G</sup> a data-flow transformer <sup>M</sup>(u, v):2<sup>D</sup> <sup>→</sup> <sup>2</sup><sup>D</sup>. In words, <sup>M</sup>(u, v) defines the set of data facts that hold at v in some execution that transitions from u to v, given the set of data facts that hold at u.

On-demand analysis brings a natural separation between (i) an *offline (or preprocessing) phase*, where the program is partially analyzed, and (ii) an *online (or query) phase*, where on-demand queries are handled. The task is to preprocess the input in the offline phase, so that in the online phase, the following types of on-demand queries are answered efficiently:


*Previous results.* The on-demand analysis problem admits a number of solutions that lie in the preprocessing/query spectrum. On the one end, the preprocessing phase can be disregarded, and every on-demand query be treated anew. Since each query starts a separate instance of IFDS, the time to answer it is O(n·|D| <sup>3</sup>), for both pair and single-source queries [50]. On the other end, all possible queries can be pre-computed and cached in the preprocessing phase in time <sup>O</sup>(n<sup>2</sup> ·|D<sup>|</sup> <sup>3</sup>), after which each query costs time proportional to the size of the output (i.e., O(1)) for pair queries and O(n·|D|) for single-source queries). Note that this full preprocessing also incurs a cost <sup>O</sup>(n<sup>2</sup> · |D<sup>|</sup> <sup>2</sup>) in space for storing the cache table, which is often prohibitive. On-demand analysis was more thoroughly studied in [36]. The main idea is that, instead of pre-computing the answer to all possible queries, the analysis results obtained by handling each query are memoized to a cache table, and are used for speeding up the computation of subsequent queries. This is a heuristic-based approach that often works well in practice, however, the only guarantee provided is that of *same-worst-case-complexity*, which states that in the worst case, the algorithm uses <sup>O</sup>(n<sup>2</sup> · |D<sup>|</sup> <sup>3</sup>) time and <sup>O</sup>(n<sup>2</sup> · |D<sup>|</sup> 2) space, similarly to the complete preprocessing case. This guarantee is inadequate for runtime applications such as the example of Figure 1, as it would require either (i) to run a full analysis, or (ii) to run a partial analysis which might wrongly conclude that h is reachable, and thus compile it. Both cases incur a large runtime overhead, either because we run a full analysis, or because we compile an expensive function.

*Our contributions.* We develop algorithms for on-demand IFDS analyses that have strong worst-case time complexity guarantees and thus lead to more predictable performance than mere heuristics. The contributions of this work are as follows:


Recently, we exploited the low-treewidth property of programs to obtain faster algorithms for algebraic path analysis [16] and intraprocedural reachability [21]. Data-flow analysis can be reduced to these problems. Hence, the algorithms in [16,21] can also be applied to our setting. However, our new approach has two important advantages: (i) we show how to answer queries in a perfectly parallel manner, and (ii) reducing the problem to algebraic path properties and then applying the algorithms in [16,21] yields O(n·|D| <sup>3</sup>) preprocessing time and O(n · log n · |D| <sup>2</sup>) space, and has pair and single-source query time <sup>O</sup>(|D|) and O(n · |D| <sup>2</sup>). Hence, our space usage and query times are better by a factor of

<sup>§</sup> Note that we count the input itself as part of the space usage.

log n¶. Moreover, when considering the complexity wrt n, i.e. considering D to be a constant, these results are optimal wrt both time and space. Hence, no further improvement is possible.

*Remark.* Note that our approach does not apply to arbitrary CFL reachability in constant treewidth. In addition to the treewidth, our algorithms also exploit specific structural properties of IFDS. In general, small treewidth alone does not improve the complexity of CFL reachability [14].

# **2 Preliminaries**

*Model of computation.* We consider the standard RAM model with word size W = Θ(log n), where n is the size of our input. In this model, one can store W bits in one word (aka "word tricks") and arithmetic and bitwise operations between pairs of words can be performed in O(1) time. In practice, word size is a property of the machine and not the analysis. Modern machines have words of size at least 64. Since the size of real-world input instances never exceeds 2<sup>64</sup>, the assumption of word size W = Θ(log n) is well-realized in practice and no additional effort is required by the implementer to account for W in the context of data flow analysis.

*Graphs.* We consider directed graphs G = (V,E) where V is a finite set of vertices and E ⊆ V × V is a set of directed edges. We use the term graph to refer to directed graphs and will explicitly mention if a graph is undirected. For two vertices u, v ∈ V, a path P from u to v is a finite sequence of vertices P = (wi)<sup>k</sup> <sup>i</sup>=0 such that w<sup>0</sup> = u, w<sup>k</sup> = v and for every i<k, there is an edge from w<sup>i</sup> to w<sup>i</sup>+1 in E. The length |P| of the path P is equal to k. In particular, for every vertex u, there is a path of length 0 from u to itself. We write P : u v to denote that P is a path from u to v and u v to denote the existence of such a path, i.e. that v is reachable from u. Given a set V - ⊆ V of vertices, the induced subgraph of G on V is defined as G[V - ]=(V - , E ∩(V - ×V - )). Finally, the graph G is called *bipartite* if the set V can be partitioned into two sets V1, V2, so that every edge has one end in V<sup>1</sup> and the other in V2, i.e. E ⊆ (V<sup>1</sup> × V2) ∪ (V<sup>2</sup> × V1).

#### **2.1 The IFDS Framework**

IFDS [50] is a ubiquitous and general framework for interprocedural data-flow analyses that have finite domains and distributive flow functions. It encompasses a wide variety of analyses, including truly-live variables, copy constant propagation, possibly-uninitialized variables, secure information-flow, and gen/kill or bitvector problems such as reaching definitions, available expressions and live variables [50,7]. IFDS obtains *interprocedurally precise* solutions. In contrast to intraprocedural analysis, in which precise denotes "meet-over-all-paths", interprocedurally precise solutions only consider valid paths, i.e. paths in which when

<sup>¶</sup>This improvement is due to the differences in the preprocessing phase. Our algorithms for the query phase are almost identical to our previous work.

a function reaches its end, control returns back to the site of the most recent call [58].

*Flow graphs and supergraphs.* In IFDS, a program with k procedures is specified by a *supergraph*, i.e. a graph G = (V,E) consisting of k flow graphs G1,...,Gk, one for each procedure, and extra edges modeling procedure-calls. Flow graphs represent procedures in the usual way, i.e. they contain one vertex v<sup>i</sup> for each statement i and there is an edge from v<sup>i</sup> to v<sup>j</sup> if the statement j may immediately follow the statement i in an execution of the procedure. The only exception is that a procedure-call statement i is represented by two vertices, a *call* vertex c<sup>i</sup> and a *return-site* vertex ri. The vertex c<sup>i</sup> only has incoming edges, and the vertex r<sup>i</sup> only has outgoing edges. There is also a *call-to-return-site* edge from c<sup>i</sup> to ri. The call-to-return-site edges are included for passing intraprocedural information, such as information about local variables, from c<sup>i</sup> to ri. Moreover, each flow graph G<sup>l</sup> has a unique *start* vertex s<sup>l</sup> and a unique *exit* vertex el.

The supergraph G also contains the following edges for each procedure-call i with call vertex c<sup>i</sup> and return-site vertex r<sup>i</sup> that calls a procedure l: (i) an interprocedural *call-to-start* edge from c<sup>i</sup> to the start vertex of the called procedure, i.e. sl, and (ii) an interprocedural *exit-to-return-site* edge from the exit vertex of the called procedure, i.e. el, to ri.

*Example 2.* Figure 2 shows a simple C++ program on the left and its supergraph on the right. Each statement i of the program has a corresponding vertex v<sup>i</sup> in the supergraph, except for statement 7, which is a procedure-call statement and hence has a corresponding call vertex c<sup>7</sup> and return-site vertex r7.

Fig. 2: A C++ program (left) and its supergraph (right).

*Interprocedurally valid paths.* Not every path in the supergraph G can potentially be realized by an execution of the program. Consider a path P in G and let P- be the sequence of vertices obtained by removing every v<sup>i</sup> from P, i.e. P only consists of ci's and ri's. Then, P is called a *same-context valid path* if P can be generated from S in this grammar:

S →c<sup>i</sup> S r<sup>i</sup> S for a procedure-call statement i <sup>|</sup> <sup>ε</sup> .

Moreover, P is called an *interprocedurally valid path* or simply *valid* if P can be generated from the nonterminal Sin the following grammar:

$$\begin{array}{lcl} S' & \to S' & c\_i & S \\ & | & S \end{array} \quad \text{for a procedure-call statement } i \\ \mid S \end{array}$$

For any two vertices u, v of the supergraph G, we denote the set of all interprocedurally valid paths from u to v by IVP(u, v) and the set of all same-context valid paths from u to v by SCVP(u, v). Informally, a valid path starts from a statement in a procedure p of the program and goes through a number of procedure-calls while respecting the rule that whenever a procedure ends, control should return to the return-site in its parent procedure. A same-context valid path is a valid path in which every procedure-call ends and hence control returns back to the initial procedure p in the same context.

*IFDS [50].* An IFDS problem *instance* is a tuple I = (G, D, F, M, ) where:


Let P = (wi)<sup>k</sup> <sup>i</sup>=0 be a path in G, e<sup>i</sup> = (w<sup>i</sup>−<sup>1</sup>, wi) and m<sup>i</sup> = M(ei). In other words, the ei's are the edges appearing in P and the mi's are their corresponding distributive flow functions. The *path function* of P is defined as: pf<sup>P</sup> := m<sup>k</sup> ◦ ··· ◦ m<sup>2</sup> ◦ m<sup>1</sup> where ◦ denotes function composition. The solution of I is the collection of values {MVPv}<sup>v</sup>∈<sup>V</sup> :

$$\mathsf{MVP}\_v := \bigcap\_{P \in \mathsf{WP}(s\_{\min}, v)} \mathsf{pf}\_P(D).$$

Intuitively, the solution is defined by taking *meet-over-all-valid-paths*. If the meet operator is union, then MVP<sup>v</sup> is the set of data flow facts that *may* hold at v, when v is reached in *some* execution of the program. Conversely, if the meet operator is intersection, then MVP<sup>v</sup> consists of data flow facts that *must* hold at v in *every* execution of the program that reaches v. Similarly, we define the same-context solution of I as the collection of values {MSCPv}<sup>v</sup>∈Vmain defined as follows:

$$\mathsf{MSCP}\_v := \bigcap\_{P \in \mathsf{SCQP}(s\_{\min}, v)} \mathsf{pf}\_P(D). \tag{1}$$

The intuition behind MSCP is similar to that of MVP, except that in MSCP<sup>v</sup> we consider *meet-over-same-context-paths* (corresponding to runs that return to the same stack state).

*Remark 1.* We note two points about the IFDS framework:


*Succinct representations.* A distributive function <sup>f</sup> : 2<sup>D</sup> <sup>→</sup> <sup>2</sup><sup>D</sup> can be succinctly represented by a relation R<sup>f</sup> ⊆ (D ∪ {**0**}) × (D ∪ {**0**}) defined as:

$$\begin{array}{c} R\_f := \{ (\mathbf{0}, \mathbf{0}) \} \\ \cup \{ (\mathbf{0}, b) \mid b \in f(\emptyset) \} \\ \cup \ \{ (a, b) \mid b \in f(\{ a \}) - f(\emptyset) \}. \end{array}$$

Given that f is distributive over union, we have f({d1,...,dk}) = f({d1})∪···∪ f({dk}). Hence, to specify f it is sufficient to specify f(∅) and f({d}) for each d ∈ D. This is exactly what R<sup>f</sup> does. In short, we have: f(∅) = {b ∈ D | (**0**, b) ∈ R<sup>f</sup> } and f({d}) = f(∅)∪ {b ∈ D | (d, b) ∈ R<sup>f</sup> }. Moreover, we can represent the relation R<sup>f</sup> as a bipartite graph H<sup>f</sup> in which each part consists of the vertices D ∪ {**0**} and R<sup>f</sup> is the set of edges. For brevity, we define D<sup>∗</sup> := D ∪ {**0**}.

Fig. 3: Succinct representation of several distributive functions.

*Example 3.* Let D = {a, b}. Figure 3 provides several examples of bipartite graphs representing distributive functions.

*Bounded Bandwidth Assumption.* Following [50], we assume that the bandwidth in function calls and returns is bounded by a constant. In other words, there is a small constant b, such that for every edge e that is a call-to-start or exit-toreturn-site edge, every vertex in the graph representation H<sup>M</sup>(e) has degree b or less. This is a classical assumption in IFDS [50,7] and models the fact that every parameter in a called function is only dependent on a few variables in the callee (and conversely, every returned value is only dependent on a few variables in the called function).

*Composition of distributive functions.* Let f and g be distributive functions and R<sup>f</sup> and R<sup>g</sup> their succinct representations. It is easy to verify that g ◦ f is also distributive, hence it has a succinct representation R<sup>g</sup>◦<sup>f</sup> . Moreover, we have R<sup>g</sup>◦<sup>f</sup> = R<sup>f</sup> ; R<sup>g</sup> = {(a, b) | ∃c (a, c) ∈ R<sup>f</sup> ∧ (c, b) ∈ Rg}.

$$\bigwedge\_{\lambda x.\subseteq \subseteq \subseteq} \bigwedge\_{a}^{\mathsf{a}} \bigwedge\_{\substack{a \neq \emptyset \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \bullet \\ \\ \bullet \\ \bullet \\$$

Fig. 4: Obtaining H<sup>g</sup>◦<sup>f</sup> (right) from H<sup>f</sup> and H<sup>g</sup> (left)

*Example 4.* In terms of graphs, to compute H<sup>g</sup>◦<sup>f</sup> , we first take H<sup>f</sup> and Hg, then contract corresponding vertices in the lower part of H<sup>f</sup> and the upper part of Hg, and finally compute reachability from the topmost part to the bottommost part of the resulting graph. Consider f(x) = x ∪ {a}, g(x) = {a} for x = ∅ and g(∅) = ∅, then g ◦ f(x) = {a} for all x ⊆ D. Figure 4 shows contracting of corresponding vertices in H<sup>f</sup> and H<sup>g</sup> (left) and using reachability to obtain H<sup>g</sup>◦<sup>f</sup> (right).

*Exploded supergraph.* Given an IFDS instance I = (G, D, F, M, ∪) with supergraph G = (V,E), its *exploded supergraph* G is obtained by taking |D<sup>∗</sup>| copies of each vertex in V , one corresponding to each element of D∗, and replacing each edge e with the graph representation H<sup>M</sup>(e) of the flow function M(e). Formally, G = (V , E) where V = V × D<sup>∗</sup> and

$$\overline{E} = \left\{ ((u, d\_1), (v, d\_2)) \: \mid \: e = (u, v) \in E \; \land \; (d\_1, d\_2) \in R\_{M(e)} \right\} \dots$$

A path P in G is (same-context) valid, if the path P in G, obtained by ignoring the second component of every vertex in P, is (same-context) valid. As shown in [50], for a data flow fact d ∈ D and a vertex v ∈ V, we have d ∈ MVP<sup>v</sup> iff there is a valid path in G from (smain, d- ) to (v, d) for some d- ∈ D ∪ {**0**}. Hence, the IFDS problem is reduced to reachability by valid paths in G. Similarly, the same-context IFDS problem is reduced to reachability by same-context valid paths in G.

*Example 5.* Consider a null pointer analysis on the program in Figure 2. At each program point, we want to know which pointers can potentially be null. We first model this problem as an IFDS instance. Let D = {x, ¯ y¯}, where ¯x is the data flow fact that x might be null and ¯y is defined similarly. Figure 5 shows the same program and its exploded supergraph.

At point 8, the values of both pointers x and y are used. Hence, if either of x or y is null at 8, a null pointer error will be raised. However, as evidenced by the two valid paths shown in red, both x and y might be null at 8. The pointer <sup>y</sup> might be null because it is passed to the function f by value (instead of by reference) and keeps its local value in the transition from c<sup>7</sup> to r7, hence the edge ((c7, <sup>y</sup>¯),(r7, <sup>y</sup>¯)) is in <sup>G</sup>. On the other hand, the function f only initializes y, which is its own local variable, and does not change x (which is shared with main).

Fig. 5: A Program (left) and its Exploded Supergraph (right).

#### **2.2 Trees and Tree Decompositions**

*Trees.* A rooted tree T = (V<sup>T</sup> , E<sup>T</sup> ) is an undirected graph with a distinguished "root" vertex <sup>r</sup> <sup>∈</sup> <sup>V</sup><sup>T</sup> , in which there is a unique path <sup>P</sup> <sup>u</sup> <sup>v</sup> between every pair {u, v} of vertices. We refer to the number of vertices in V<sup>T</sup> as the *size* of T. For an arbitrary vertex v ∈ V<sup>T</sup> , the *depth* of v, denoted by dv, is defined as the length of the unique path P<sup>r</sup> <sup>v</sup> : r v. The *depth* or *height* of T is the maximum depth among its vertices. A vertex u is called an *ancestor* of v if u appears in P<sup>r</sup> <sup>v</sup> . In this case, v is called a *descendant* of u. In particular, r is an ancestor of every vertex and each vertex is both an ancestor and a descendant of itself. We denote the set of ancestors of v by A<sup>↑</sup> <sup>v</sup> and its descendants by D<sup>↓</sup> <sup>v</sup>. It is straightforward to see that for every 0 ≤ d ≤ dv, the vertex v has a unique ancestor with depth d. We denote this ancestor by a<sup>d</sup> <sup>v</sup>. The ancestor <sup>p</sup><sup>v</sup> <sup>=</sup> <sup>a</sup><sup>d</sup>v−<sup>1</sup> <sup>v</sup> of <sup>v</sup> at depth <sup>d</sup><sup>v</sup> <sup>−</sup> <sup>1</sup> is called the *parent* of v and v is a *child* of pv. The subtree T <sup>↓</sup> <sup>v</sup> corresponding to v is defined as T[D<sup>↓</sup> v]=(D<sup>↓</sup> <sup>v</sup>, E<sup>T</sup> <sup>∩</sup> <sup>2</sup><sup>D</sup><sup>↓</sup> <sup>v</sup> ), i.e. the part of T that consists of v and its descendants. Finally, a vertex v ∈ V<sup>T</sup> is called a *leaf* if it has no children. Given two vertices u, v ∈ V<sup>T</sup> , the *lowest common ancestor* lca(u, v) of u and v is defined as argmax<sup>w</sup>∈A<sup>↑</sup> u∩A<sup>↑</sup> <sup>v</sup> dw. In other words, lca(u, v) is the common ancestor of u and v with maximum depth, i.e. which is farthest from the root.

**Lemma 1 ([35]).** *Given a rooted tree* T *of size* n*, there is an algorithm that preprocesses* T *in* O(n) *and can then answer lowest common ancestor queries, i.e. queries that provide two vertices* u *and* v *and ask for* lca(u, v)*, in* O(1)*.*

*Tree decompositions [52].* Given a graph G = (V,E), a *tree decomposition* of G is a rooted tree T = (B, E<sup>T</sup> ) such that:


The *width* of the tree decomposition T = (B, E<sup>T</sup> ) is defined as the size of its largest bag minus 1. The *treewidth* tw(G) of a graph G is the minimal width among its tree decompositions. A vertex v ∈ V appears in a connected subtree, so there is a unique bag b with the smallest possible depth such that v ∈ V (b). We call b the *root bag* of v and denote it by rb(v).

Fig. 6: A Graph G (left) and its Tree Decomposition T (right).

It is well-known that flow graphs of programs have typically small treewidth [63]. For example, programs written in Pascal, C, and Solidity have treewidth at most 3, 6 and 9, respectively. This property has also been confirmed experimentally for programs written in Java [34], C [38] and Ada [12]. The challenge is thus to exploit treewidth for faster interprocedural on-demand analyses. The first step in this approach is to compute tree decompositions of graphs. As the following lemma states, tree decompositions of low-treewidth graphs can be computed efficiently.

**Lemma 2 ([11]).** *Given a graph* G *with constant treewidth* t*, a binary tree decomposition of size* O(n) *bags, height* O(log n) *and width* O(t) *can be computed in linear time.*

*Separators [26].* The key structural property that we exploit in low-treewidth flow graphs is a separation property. Let A, B ⊆ V. The pair (A, B) is called a *separation* of G if (i) A ∪ B = V, and (ii) no edge connects a vertex in A − B to a vertex in B − A or vice versa. If (A, B) is a separation, the set A ∩ B is called a *separator*. The following lemma states such a separation property for low-treewidth graphs.

**Lemma 3 (Cut Property [26]).** *Let* T = (B, E<sup>T</sup> ) *be a tree decomposition of* G = (V,E) *and* e = {b, b- } ∈ E<sup>T</sup> *. If we remove* e*, the tree* T *breaks into two connected components,* T<sup>b</sup> *and* T<sup>b</sup> , *respectively containing* b *and* b- *. Let* A = <sup>t</sup>∈T<sup>b</sup> <sup>V</sup> (t) *and* <sup>B</sup> <sup>=</sup> <sup>t</sup>∈Tb <sup>V</sup> (t)*. Then* (A, B) *is a separation of* <sup>G</sup> *and its corresponding separator is* A ∩ B = V (b) ∩ V (b- ).

*Example 6.* Figure 6 shows a graph and one of its tree decompositions with width 2. In this example, we have rb(v5) = b1,rb(v3) = b2,rb(v4) = b3, and rb(v7) = b4. For the separator property of Lemma 3, consider the edge {b2, b4}. By removing it, T breaks into two parts, one containing the vertices A = {v1, v2, v3, v4, v5} and the other containing B = {v2, v6, v7}. We have A ∩ B = {v2} = V (b2) ∩ V (b4). Also, any path from B − A = {v6, v7} to A − B = {v1, v3, v4, v5} or vice versa must pass through {v2}. Hence, (A, B) is a separation of G with separator V (b2) ∩ V (b4) = {v2}.

# **3 Problem definition**

We consider same-context IFDS problems in which the flow graphs G<sup>i</sup> have a treewidth of at most t for a fixed constant t. We extend the classical notion of same-context IFDS solution in two ways: (i) we allow arbitrary start points for the analysis, i.e. we do not limit our analyses to same-context valid paths that start at smain; and (ii) instead of a one-shot algorithm, we consider a two-phase process in which the algorithm first preprocesses the input instance and is then provided with a series of queries to answer. We formalize these points below. We fix an IFDS instance I = (G, D, F, M, ∪) with exploded supergraph G = (V , E). *Meet over same-context valid paths.* We extend the definition of MSCP by specifying a start vertex u and an initial set Δ of data flow facts that hold at u. Formally, for any vertex v that is in the same flow graph as u, we define:

$$\mathsf{MSCP}\_{u,\Delta,v} := \bigcap\_{P \in \mathsf{SCP}(u,v)} \mathsf{pf}\_P(\Delta). \tag{2}$$

The only difference between (2) and (1) is that in (1), the start vertex u is fixed as smain and the initial data-fact set Δ is fixed as D, while in (2), they are free to be any vertex/set.

*Reduction to reachability.* As explained in Section 2.1, computing MSCP is reduced to reachability via same-context valid paths in the exploded supergraph G. This reduction does not depend on the start vertex and initial data flow facts. Hence, for a data flow fact d ∈ D, we have d ∈ MSCPu,Δ,v iff in the exploded supergraph G the vertex (v, d) is reachable via same-context valid paths from a vertex (u, δ) for some δ ∈ Δ ∪ {**0**}. Hence, we define the following types of queries:

*Pair query.* A pair query provides two vertices (u, d1) and (v, d2) of the exploded supergraph G and asks whether they are reachable by a same-context valid path. Hence, the answer to a pair query is a single bit. Intuitively, if d<sup>2</sup> = **0**, then the query is simply asking if v is reachable from u by a same-context valid path in G. Otherwise, d<sup>2</sup> is a data flow fact and the query is asking whether d<sup>2</sup> ∈ MSCPu,{d1}∩D,v.

*Single-source query.* A single-source query provides a vertex (u, d1) and asks for all vertices (v, d2) that are reachable from (u, d1) by a same-context valid path. Assuming that u is in the flow graph G<sup>i</sup> = (Vi, Ei), the answer to the single source query is a sequence of |Vi|·|D<sup>∗</sup>| bits, one for each (v, d2) ∈ V<sup>i</sup> × D∗, signifying whether it is reachable by same-context valid paths from (u, d1). Intuitively, a single-source query asks for all pairs (v, d2) such that (i) v is reachable from u by a same-context valid path and (ii) d<sup>2</sup> ∈ MSCPu,{d1}∩D,v ∪ {**0**}.

*Intuition.* We note the intuition behind such queries. We observe that since the functions in F are distributive over ∪, we have MSCPu,Δ,v = ∪<sup>δ</sup>∈ΔMSCPu,{δ},v, hence MSCPu,Δ,v can be computed by O(|Δ|) single-source queries.

# **4 Treewidth-based Data-flow Analysis**

#### **4.1 Preprocessing**

The original solution to the IFDS problem, as first presented in [50], reduces the problem to reachability over a newly constructed graph. We follow a similar approach, except that we exploit the low-treewidth property of our flow graphs at every step. Our preprocessing is described below. It starts with computing constant-width tree decompositions for each of the flow graphs. We then use standard techniques to make sure that our tree decompositions have a nice form, i.e. that they are balanced and binary. Then comes a reduction to reachability, which is similar to [50]. Finally, we precompute specific useful reachability information between vertices in each bag and its ancestors. As it turns out in the next section, this information is sufficient for computing reachability between any pair of vertices, and hence for answering IFDS queries.

*Overview.* Our preprocessing consists of the following steps:


Gˆ iff there is a *same-context valid path* from (u, d1) to (v, d2) in G. So, this step reduces the problem of reachability via same-context valid paths in G to simple reachability in G. ˆ


Steps (1)–(3) above are standard and well-known processes. We now provide details of steps (4)–(6). To skip the details and read about the query phase, see Section 4.3 below.

#### **Step (4): Reduction to Reachability**

In this step, our goal is to compute a new graph Gˆ from the exploded supergraph G such that there is a path from (u, d1) to (v, d2) in Gˆ iff there is a same-context valid path from (u, d1) to (v, d2) in G. The idea behind this step is the same as that of the *tabulation algorithm* in [50].

*Summary edges.* Consider a call vertex c<sup>l</sup> in G and its corresponding return-site vertex rl. For d1, d<sup>2</sup> ∈ D∗, the edge ((cl, d1),(rl, d2)) is called a *summary edge* if there is a same-context valid path from (cl, d1) to (rl, d2) in the exploded supergraph G. Intuitively, a summary edge summarizes the effects of procedure calls (same-context interprocedural paths) on the reachability between c<sup>l</sup> and rl. From the definition of *summary edges*, it is straightforward to verify that the graph Gˆ obtained from G by adding every summary edge and removing every interprocedural edge has the desired property, i.e. a pair of vertices are reachable in Gˆ iff they are reachable by a same-context valid path in G. Hence, we first find all summary edges and then compute Gˆ. This is shown in Algorithm 1.

We now describe what Algorithm 1 does. Let s<sup>p</sup> be the start point of a procedure p. A *shortcut edge* is an edge ((sp, d1),(v, d2)) such that v is in the same procedure p and there is a same-context valid path from (sp, d1) to (v, d2) in G. The algorithm creates an empty graph H = (V,E- ). Note that H is implicitly represented by only saving E- . It also creates a queue Q of edges to be added to H (initially Q = E) and an empty set S which will store the summary edges. The goal is to construct H such that it contains (i) *intraprocedural* edges of G, (ii) summary edges, and (iii) shortcut edges.

It constructs H one edge at a time. While there is an unprocessed intraprocedural edge e = ((u, d1),(v, d2)) in Q, it chooses one such e and adds it to H (lines 5–10). Then, if (u, d1) is reachable from (sp, d3) via a same-context valid **Algorithm 1:** Computing Gˆ in Step (4)

**<sup>1</sup>** <sup>Q</sup> <sup>←</sup> <sup>E</sup>; **<sup>2</sup>** <sup>S</sup> ← ∅; **<sup>3</sup>** <sup>E</sup> ← ∅; **while** <sup>Q</sup> <sup>=</sup> <sup>∅</sup> **do** Choose <sup>e</sup> = ((u, d1),(v, d2)) <sup>∈</sup> <sup>Q</sup>; <sup>Q</sup> <sup>←</sup> <sup>Q</sup> − {e}; **if** (u, v) *is an interprocedural edge, i.e. a call-to-start or exit-to-return-site edge* **then continue**; <sup>p</sup> <sup>←</sup> the procedure s.t. u, v <sup>∈</sup> <sup>V</sup>p; <sup>E</sup> <sup>←</sup> <sup>E</sup> ∪ {e}; **foreach** <sup>d</sup><sup>3</sup> s.t. ((sp, d3),(u, d1)) <sup>∈</sup> <sup>E</sup> **do if** ((sp, d3),(v, d2)) ∈ <sup>E</sup> <sup>∪</sup> <sup>Q</sup> **then** <sup>Q</sup> <sup>←</sup> <sup>Q</sup> ∪ {((sp, d3),(v, d2))}; **if** u = s<sup>p</sup> **and** v = e<sup>p</sup> **then foreach** (cl, d3) s.t. ((cl, d3),(u, d1)) <sup>∈</sup> <sup>E</sup> **do foreach** <sup>d</sup><sup>4</sup> s.t. ((v, d2),(rl, d4)) <sup>∈</sup> <sup>E</sup> **do if** ((cl, d3),(rl, d4)) ∈ <sup>E</sup> <sup>∪</sup> <sup>Q</sup> **then** <sup>S</sup> <sup>←</sup> <sup>S</sup> ∪ {((cl, d3),(rl, d4))}; <sup>Q</sup> <sup>←</sup> <sup>Q</sup> ∪ {((cl, d3),(rl, d4))}; **<sup>20</sup>** <sup>G</sup><sup>ˆ</sup> <sup>←</sup> <sup>G</sup>; **foreach** <sup>e</sup> = ((u, d1),(v, d2)) <sup>∈</sup> <sup>E</sup> **do if** u *and* v *are not in the same procedure* **then** <sup>G</sup><sup>ˆ</sup> <sup>=</sup> <sup>G</sup><sup>ˆ</sup> − {e}; <sup>G</sup><sup>ˆ</sup> <sup>←</sup> <sup>G</sup><sup>ˆ</sup> <sup>∪</sup> <sup>S</sup>;

path, then by adding the edge e, the vertex (v, d2) also becomes accessible from (sp, d3). Hence, it adds the shortcut edge ((sp, d3),(v, d2)) to Q, so that it is later added to the graph H. Moreover, if u is the start s<sup>p</sup> of the procedure p and v is its end ep, then for every call vertex c<sup>l</sup> calling the procedure p and its respective return-site rl, we can add summary edges that summarize the effect of calling p (lines 14–19). Finally, lines 20–24 compute Gˆ as discussed above.

*Correctness.* As argued above, every edge that is added to H is either intraprocedural, a summary edge or a shortcut edge. Moreover, all such edges are added to H, because H is constructed one edge at a time and every time an edge e is added to H, all the summary/shortcut edges that might occur as a result of adding e to H are added to the queue Q and hence later to H. Therefore, Algorithm 1 correctly computes summary edges and the graph G. ˆ

*Complexity.* Note that the graph H has at most O(|E|·|D<sup>∗</sup>| <sup>2</sup>) edges. Addition of each edge corresponds to one iteration of the while loop at line 4 of Algorithm 1. Moreover, each iteration takes O(|D<sup>∗</sup>|) time, because the loop at line 11 iterates over at most |D<sup>∗</sup>| possible values for d<sup>3</sup> and the loops at lines 15 and 16 have constantly many iterations due to the bounded bandwidth assumption (Section 2.1). Since |D∗| = O(|D|) and |E| = O(n), the total runtime of Algorithm 1 is O(|n|·|D| <sup>3</sup>). For a more detailed analysis, see [50, Appendix].

#### **Step (5): Local Preprocessing**

In this step, we compute the set Rlocal of local reachability edges, i.e. edges of the form ((u, d1),(v, d2)) such that u and v appear in the same bag b of a tree decomposition T<sup>i</sup> and (u, d1) - (v, d2) in G. ˆ We write (u, d1) local (v, d2) to denote ((u, d1),(v, d2)) <sup>∈</sup> <sup>R</sup>local. Note that <sup>G</sup><sup>ˆ</sup> has no interprocedural edges. Hence, we can process each T<sup>i</sup> separately. We use a divide-and-conquer technique similar to the kernelization method used in [22] (Algorithm 2).

Algorithm 2 processes each tree decomposition T<sup>i</sup> separately. When processing T, it chooses a leaf bag b<sup>l</sup> of T and computes all-pairs reachability on the induced subgraph <sup>H</sup><sup>l</sup> <sup>=</sup> <sup>G</sup>ˆ[<sup>V</sup> (bl) <sup>×</sup> <sup>D</sup>∗], consisting of vertices that appear in <sup>b</sup>l. Then, for each pair of vertices (u, d1) and (v, d2) s.t. u and v appear in b<sup>l</sup> and (u, d1) - (v, d2) in Hl, the algorithm adds the edge ((u, d1),(v, d2)) to both Rlocal and Gˆ (lines 7–9). Note that this does not change reachability relations in Gˆ, given that the vertices connected by the new edge were reachable by a path before adding it. Then, if b<sup>l</sup> is not the only bag in T, the algorithm recursively calls itself over the tree decomposition T −bl, i.e. the tree decomposition obtained by removing b<sup>l</sup> (lines 10–11). Finally, it repeats the reachability computation on H<sup>l</sup> (lines 12–14). The running time of the algorithm is O(n · |D<sup>∗</sup>| <sup>3</sup>).

#### **Algorithm 2:** Local Preprocessing in Step (5)

 <sup>R</sup>local ← ∅; **foreach** T<sup>i</sup> **do** computeLocalReachability(Ti); **Function** computeLocalReachability(T) Choose a leaf bag b<sup>l</sup> of T; <sup>b</sup><sup>p</sup> <sup>←</sup> parent of <sup>b</sup>l; **foreach** u, v <sup>∈</sup> <sup>V</sup> (bl), d1, d<sup>2</sup> <sup>∈</sup> <sup>D</sup><sup>∗</sup> *s.t.* (u, d1) - (v, d2) *in* <sup>G</sup>ˆ[<sup>V</sup> (bl) <sup>×</sup> <sup>D</sup>∗] **do** <sup>G</sup><sup>ˆ</sup> <sup>=</sup> <sup>G</sup><sup>ˆ</sup> ∪ {((u, d1),(v, d2))}; <sup>R</sup>local <sup>=</sup> <sup>R</sup>local ∪ {((u, d1),(v, d2))}; **if** <sup>b</sup><sup>p</sup> <sup>=</sup> **null then** computeLocalReachability(<sup>T</sup> <sup>−</sup> <sup>b</sup>l); **foreach** u, v <sup>∈</sup> <sup>V</sup> (bl), d1, d<sup>2</sup> <sup>∈</sup> <sup>D</sup><sup>∗</sup> *s.t.* (u, d1) - (v, d2) *in* <sup>G</sup>ˆ[<sup>V</sup> (bl) <sup>×</sup> <sup>D</sup>∗] **do** <sup>G</sup><sup>ˆ</sup> <sup>=</sup> <sup>G</sup><sup>ˆ</sup> ∪ {((u, d1),(v, d2))}; <sup>R</sup>local <sup>=</sup> <sup>R</sup>local ∪ {((u, d1),(v, d2))};

*Example 7.* Consider the graph G and tree decomposition T given in Figure 6 and let <sup>D</sup><sup>∗</sup> <sup>=</sup> {**0**}, i.e. let <sup>G</sup><sup>ˆ</sup> and <sup>G</sup>¯ be isomorphic to <sup>G</sup>. Figure 7 illustrates the steps taken by Algorithm 2. In each step, a bag is chosen and a local all-pairs reachability computation is performed over the bag. Local reachability edges are added to Rlocal and to Gˆ (if they are not already in Gˆ).

We now prove the correctness and establish the complexity of Algorithm 2. *Correctness.* We prove that when computeLocalReachability(T) ends, the set Rlocal contains all the local reachability edges between vertices that appear in the same bag in T. The proof is by induction on the size of T. If T consists of a single bag, then the local reachability computation on H<sup>l</sup> (lines 7–9) fills Rlocal correctly. Now assume that <sup>T</sup> has <sup>n</sup> bags. Let <sup>H</sup>−<sup>l</sup> <sup>=</sup> <sup>G</sup>ˆ[∪<sup>b</sup>i∈T ,i <sup>=</sup><sup>l</sup><sup>V</sup> (bi) <sup>×</sup> <sup>D</sup>∗]. Intuitively, <sup>H</sup>−<sup>l</sup> is the part of <sup>G</sup><sup>ˆ</sup> that corresponds to other bags in <sup>T</sup>, i.e. every bag except the leaf bag bl. After the local reachability computation at lines 7– 9, (v, d2) is reachable from (u, d1) in <sup>H</sup>−<sup>l</sup> only if it is reachable in G. <sup>ˆ</sup> This is because (i) the vertices of <sup>H</sup><sup>l</sup> and <sup>H</sup>−<sup>l</sup> form a separation of <sup>G</sup><sup>ˆ</sup> with separator (V (bl) ∩ V (bp)) × D<sup>∗</sup> (Lemma 3) and (ii) all reachability information in H<sup>l</sup> is now replaced by direct edges (line 8). Hence, by induction hypothesis, line 11 finds all the local reachability edges for T − b<sup>l</sup> and adds them to both Rlocal and <sup>G</sup>ˆ. Therefore, after line 11, for every u, v <sup>∈</sup> <sup>V</sup> (bl), we have (u, d1) - (v, d2) in H<sup>l</sup> iff (u, d1) - (v, d2) in G. ˆ Hence, the final all-pairs reachability computation of lines 12–14 adds all the local edges in b<sup>l</sup> to Rlocal.

*Complexity.* Algorithm 2 performs at most two local all-pair reachability computations over the vertices appearing in each bag, i.e. O(t · |D<sup>∗</sup>|) vertices. Each such computation can be performed in O(t <sup>3</sup> · |D<sup>∗</sup><sup>|</sup> <sup>3</sup>) using standard reachability algorithms. Given that the Ti's have O(n) bags overall, the total runtime of Algorithm 2 is O(n · t <sup>3</sup> · |D<sup>∗</sup><sup>|</sup> <sup>3</sup>) = <sup>O</sup>(<sup>n</sup> · |D<sup>∗</sup><sup>|</sup> <sup>3</sup>). Note that the treewidth t is a constant and hence the factor t <sup>3</sup> can be removed.

#### **Step (6): Ancestors Reachability Preprocessing**

This step aims to find reachability relations between each vertex of a bag and vertices that appear in the ancestors of that bag. As in the previous case, we compute a set Ranc and write (u, d1) anc (v, d2) if ((u, d1),(v, d2)) ∈ Ranc.

This step is performed by Algorithm 3. For each bag b and vertex (u, d) such that u ∈ V (b) and each 0 ≤ j < dv, we maintain two sets: F(u, d, b, j) and F- (u, d, b, j) each containing a set of vertices whose first coordinate is in the ancestor of b at depth j. Intuitively, the vertices in F(u, d, b, j) are reachable from (u, d). Conversely, (u, d) is reachable from the vertices in F- (u, d, b, j). At first all F and F sets are initialized as ∅. We process each tree decomposition T<sup>i</sup> in a top-down manner and does the following actions at each bag:


Fig. 7: Local Preprocessing (Step 5) on the graph and decomposition of Figure 6

After the execution of Algorithm 3, we have (v, d2) ∈ F(u, d1, b, j) iff (i) (v, d2) is reachable from (u, d1) and (ii) <sup>u</sup> <sup>∈</sup> <sup>V</sup> (b) and <sup>v</sup> <sup>∈</sup> <sup>V</sup> (a<sup>j</sup> <sup>b</sup>), i.e. v appears in the ancestor of b at depth j. Conversely, (u, d1) ∈ F- (v, d2, b, j) iff (i) (v, d2) is reachable from (u, d1) and (ii) <sup>v</sup> <sup>∈</sup> <sup>V</sup> (b) and <sup>u</sup> <sup>∈</sup> <sup>V</sup> (a<sup>j</sup> <sup>b</sup>). Algorithm 3 has a runtime of O(n · |D| <sup>3</sup> · log <sup>n</sup>). See [17] for detailed proofs. In the next section, we show that this runtime can be reduced to O(n · |D| <sup>3</sup>) using word tricks.

#### **4.2 Word Tricks**

We now show how to reduce the time complexity of Algorithm 3 from O(n · |D<sup>∗</sup>| <sup>3</sup> · log <sup>n</sup>) to <sup>O</sup>(<sup>n</sup> · |D<sup>∗</sup><sup>|</sup> <sup>3</sup>) using word tricks. The idea is to pack the F and F- sets of Algorithm 3 into words, i.e. represent them by a binary sequence.



Given a bag b, we define δ<sup>b</sup> as the sum of sizes of all ancestors of b. The tree decompositions are balanced, so b has O(log n) ancestors. Moreover, the width is t, hence δ<sup>b</sup> = O(t · log n) = O(log n) for every bag b. We perform a top-down pass of each tree decomposition T<sup>i</sup> and compute δ<sup>b</sup> for each b.

For every bag b, u ∈ V (b) and d<sup>1</sup> ∈ D∗, we store F(u, d1, b, −) as a binary sequence of length δ<sup>b</sup> ·|D<sup>∗</sup>|. The first |V (b)|·|D<sup>∗</sup>| bits of this sequence correspond to F(u, d1, b, db). The next |V (bp)|·|D<sup>∗</sup>| correspond to F(u, d1, b, d<sup>b</sup> − 1), and so on. We use a similar encoding for F- . Using this encoding, Algorithm 3 can be rewritten by word tricks and bitwise operations as follows:

**–** Lines 5–6 copy F(u, d, bp, −) into F(u, d, b, −). However, we have to shift and align the bits, so these lines can be replaced by

$$F(u,d,b,-) \leftarrow F(u,d,b\_p,-) \ll |V(b)| \cdot |D^\*|;$$


$$F(u, d\_1, b, -) \leftarrow F(u, d\_1, b, -) \text{ OR } F(v, d\_2, b, -);$$

**–** Computations on Fcan be handled similarly.

Note that we do not need to compute Ranc explicitly given that our queries can be written in terms of the F and F sets. It is easy to verify that using these word tricks, every W operations in lines 6, 7, 13 and 14 are replaced by one or two bitwise operations on words. Hence, the overall runtime of Algorithm 3 is reduced to O <sup>n</sup>·|D∗<sup>|</sup> <sup>3</sup>·log <sup>n</sup> W = O(n · |D<sup>∗</sup>| <sup>3</sup>).

#### **4.3 Answering Queries**

We now describe how to answer pair and single-source queries using the data saved in the preprocessing phase.

*Answering a Pair Query.* Our algorithm answers a pair query from a vertex (u, d1) to a vertex (v, d2) as follows:


*Correctness.* If there is a path P : (u, d1) - (v, d2), then we claim P must pass through a vertex (w, d3) with w ∈ V (b). If b = b<sup>u</sup> or b = bv, the claim is obviously true. Otherwise, consider the path P- : b<sup>u</sup> b<sup>v</sup> in the tree decomposition Ti. This path passes through b (by definition of b). Let e = {b, b- } be an edge of P- . Applying the cut property (Lemma 3) to e, proves that P must pass through a vertex (w, d3) with w ∈ V (b- ) ∩ V (b). Moreover, b is an ancestor of both b<sup>u</sup> and bv, hence we have (u, d1) anc (w, d3) and (w, d3) anc (v, d2).

*Complexity.* Computing LCA takes O(1) time. Checking all possible vertices (w, d3) takes O(t·|D<sup>∗</sup>|) = O(|D|). This runtime can be decreased to O <sup>|</sup>D<sup>|</sup> log n by word tricks.

*Answering a Single-source Query.* Consider a single-source query from a vertex (u, d1) with u ∈ Vi. We can answer this query by performing |Vi|×|D<sup>∗</sup>| pair queries, i.e. by performing one pair query from (u, d1) to (v, d2) for each v ∈ V<sup>i</sup> and d<sup>2</sup> ∈ D∗. Since |D<sup>∗</sup>| = O(|D|), the total complexity is O <sup>|</sup>Vi|·|D| · <sup>|</sup>D<sup>|</sup> log n for answering a single-source query. Using a more involved preprocessing method, we can slightly improve this time to O |Vi|·|D<sup>|</sup> 2 log n . See [17] for more details. Based on the results above, we now present our main theorem:

**Theorem 1.** *Given an IFDS instance* I = (G, D, F, M, ∪)*, our algorithm preprocesses* I *in time* O(n · |D| <sup>3</sup>) *and can then answer each pair query and singlesource query in time*

$$O\left(\left\lceil \frac{|D|}{\log n} \right\rceil\right) \quad \text{and} \quad O\left(\frac{n \cdot |D|^2}{\log n}\right), \quad respectively.$$

#### **4.4 Parallelizability and Optimality**

We now turn our attention to parallel versions of our query algorithms, as well as cases where the algorithms are optimal.

*Parallelizability.* Assume we have k threads in our disposal.


With word tricks, parallel pair and single-source queries require O <sup>|</sup>D<sup>|</sup> k·log n and O <sup>n</sup>·|D<sup>|</sup> k·log n time, respectively. Hence, for large enough k, each query requires only O(1) time, and we achieve *perfect parallelism*.

*Optimality.* Observe that when |D| = O(1), i.e. when the domain is small, our algorithm is *optimal*: the preprocessing runs in O(n), which is proportional to the size of the input, and the pair query and single-source query run in times O(1) and O(n/ log n), respectively, each case being proportional to the size of the output. Small domains arise often in practice, e.g. in dead-code elimination or null-pointer analysis.

# **5 Experimental Results**

We report on an experimental evaluation of our techniques and compare their performance to standard alternatives in the literature.

*Benchmarks.* We used 5 classical data-flow analyses in our experiments, including reachability (for dead-code elimination), possibly-uninitialized variables analysis, simple uninitialized variables analysis, liveness analysis of the variables, and reaching-definitions analysis. We followed the specifications in [36] for modeling the analyses in IFDS. We used real-world Java programs from the DaCapo benchmark suite [6], obtained their flow graphs using Soot [65] and applied the JTDec tool [19] for computing balanced tree decompositions. Given that some of these benchmarks are prohibitively large, we only considered their main Java packages, i.e. packages containing the starting point of the programs. We experimented with a total of 22 benchmarks, which, together with the 5 analyses above, led to a total of 110 instances. Our instance sizes, i.e. number of vertices and edges in the exploded supergraph, range from 22 to 190, 591. See [17] for details.

*Implementation and comparison.* We implemented both variants of our approach, i.e. sequential and parallel, in C++. We also implemented the parts of the classical IFDS algorithm [50] and its on-demand variant [36] responsible for samecontext queries. All of our implementations closely follow the pseudocodes of our algorithms and the ones in [50,36], and no additional optimizations are applied. We compared the performance of the following algorithms for randomlygenerated queries:


For each instance, we randomly generated 10,000 pair queries and 100 singlesource queries. In case of single-source queries, source vertices were chosen uniformly at random. For pair queries, we first chose a source vertex uniformly at random, and then chose a target vertex in the same procedure, again uniformly at random.

*Experimental setting.* The results were obtained on Debian using an Intel Xeon E5-1650 processor (3.2 GHz, 6 cores, 12 threads) with 128GB of RAM. The parallel results used all 12 threads.

*Time limit.* We enforced a preprocessing time limit of 5 minutes per instance. This is in line with the preprocessing times of state-of-the-art tools on benchmarks of this size, e.g. Soot takes 2-3 minutes to generate all flow graphs for each benchmark.

Fig. 8: Preprocessing times of CPP and SEQ/PAR (over all instances). A dot above the 300s line denotes a timeout.

*Results.* We found that, except for the smallest instances, our algorithm consistently outperforms all previous approaches. Our results were as follows:


Note that Figure 9 combines the results of all five mentioned data-flow analyses. However, the observations above hold independently for every single analysis, as well. See [17] for analysis-specific figures.

Fig. 9: Comparison of pair query time (top row) and single source query time (bottom row) of the algorithms. Each dot represents one of the 110 instances. Each row starts with a global picture (left) and zooms into smaller time units (right) to differentiate between the algorithms. The plots above contain results over all five analyses. However, our observations hold independently for every single analysis, as well (See [17]).

# **6 Conclusion**

We developed new techniques for on-demand data-flow analyses in IFDS, by exploiting the treewidth of flow graphs. Our complexity analysis shows that our techniques (i) have better worst-case complexity, (ii) offer certain optimality guarantees, and (iii) are embarrassingly paralellizable. Our experiments demonstrate these improvements in practice: after a lightweight one-time preprocessing, queries are answered as fast as the heavyweight complete preprocessing, and the parallel speedup is close to its theoretical optimal. The main limitation of our approach is that it only handles same-context queries. Using treewidth to speedup non-same-context queries is a challenging direction of future work.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Concise Read-Only Specifications for Better Synthesis of Programs with Pointers**

Andreea Costea<sup>1</sup> , Amy Zhu<sup>2</sup> -, Nadia Polikarpova<sup>3</sup> , and Ilya Sergey4,<sup>1</sup>

> School of Computing, National University of Singapore, Singapore University of British Columbia, Vancouver, Canada University of California, San Diego, USA Yale-NUS College, Singapore

**Abstract.** In program synthesis there is a well-known trade-off between *concise* and *strong* specifications: if a specification is too verbose, it might be harder to write than the program; if it is too weak, the synthesised program might not match the user's intent. In this work we explore the use of annotations for restricting memory access permissions in program synthesis, and show that they can make specifications much stronger while remaining surprisingly concise. Specifically, we enhance Synthetic Separation Logic (SSL), a framework for synthesis of heap-manipulating programs, with the logical mechanism of *read-only borrows*.

We observe that this minimalistic and conservative SSL extension benefits the synthesis in several ways, making it more (a) *expressive* (stronger correctness guarantees are achieved with a modest annotation overhead), (b) *effective* (it produces more concise and easier-to-read programs), (c) *efficient* (faster synthesis), and (d) *robust* (synthesis efficiency is less affected by the choice of the search heuristic). We explain the intuition and provide formal treatment for read-only borrows. We substantiate the claims (a)–(d) by describing our quantitative evaluation of the borrowing-aware synthesis implementation on a series of standard benchmark specifications for various heap-manipulating programs.

# **1 Introduction**

Deductive program synthesis is a prominent approach to the generation of correctby-construction programs from their declarative specifications [14, 23, 29, 33]. With this methodology, one can represent searching for a program satisfying the user-provided constraints as a proof search in a certain logic. Following this idea, it has been recently observed [34] that the synthesis of correct-by-construction *imperative heap-manipulating* programs (in a language similar to C) can be implemented as a proof search in a version of Separation Logic (SL)—a program logic designed for modular verification of programs with pointers [32, 37].

SL-based deductive program synthesis based on *Synthetic* Separation Logic (SSL) [34] requires the programmer to provide a Hoare-style specification for a program of interest. For instance, given the predicate ls(x, S), which denotes a symbolic heap corresponding to a linked list starting at a pointer x, ending with null, and containing elements from the set S, one can specify the behaviour of the procedure for copying a linked list as follows:

$$\{\mathbf{r} \mapsto \mathbf{x} \* \mathbf{ls}(\mathbf{x}, \mathbf{S})\} \text{ 1isstcopy(\"r\) } \{\mathbf{r} \mapsto \mathbf{y} \* \mathbf{ls}(\mathbf{x}, \mathbf{S}) \* \mathbf{ls}(\mathbf{y}, \mathbf{S})\} \tag{1}$$

<sup>-</sup>Work done during an internship at NUS School of Computing in Summer 2019.

The precondition of specification (1), defining the shape of the initial heap, is illustrated by the figure above. It requires the heap to contain a pointer r, which is taken by the procedure as an argument and whose stored value, x, is the head pointer of the list to be copied. The list itself is described by the symbolic heap predicate instance ls(x, S), whose footprint is assumed to be *disjoint* from the entry r -→ x, following the standard semantics of the *separating conjunction* operator (∗) [32]. The postcondition asserts that the final heap, in addition to containing the original list ls(x, S), will contain a new list starting from y whose contents S are the same as of the original list, and also that the pointer r will now point to the head y of the list copy. Our specification is incomplete: it allows, for example, duplicating or rearranging elements. One hopes that such a program is unlikely to be synthesised. In synthesis, it is common to provide incomplete specs: writing complete ones can be as hard as writing the program itself.

#### **1.1 Correct Programs that Do Strange Things**

Provided the definition of the heap predicate ls and the specification (1), the SuS-Lik tool, an implementation of the SSLbased synthesis [34], will produce the program depicted in Fig. 1. It is easy to check that this program satisfies the ascribed spec (1). Moreover, it correctly duplicates the original list, faithfully preserving its contents and the ordering. However, an astute reader might notice a certain oddity in the way it treats the initial list provided for copying. According to the postcondition of (1), the value of the pointer r stored in a local immutable variable y1 on line 9 is the head of the copy of the original list's tail. Quite unexpectedly, the pointer y1 becomes the tail of the original list on line 11, while the *original list's* tail pointer nxt, once assigned to \*(y + 1) on line 13, becomes the tail of the *copy*!

Indeed, the exercise in tail swapping is totally pointless: not only does it produces less "natural" and readable code, but the

Fig. 1: Result program for spec (1) and the shape of its final heap.

resulting program's locality properties are unsatisfactory; for instance, this pro-

gram cannot be plugged into a concurrent setting where multiple threads rely on ls(x, S) to be unchanged.

The issue with the result in Fig. 1 is caused by specification (1) being *too permissive*: it does not prevent the synthesised program from *modifying* the structure of the initial list, while creating its copy. Luckily, the SL community has devised a number of SL extensions that allow one to impose such restrictions, like declaring a part of the provided symbolic heap as *read-only* [5,8,9,11,15,20,21], *i.e.*, forbidden to modify by the specified code.

#### **1.2 Towards Simple Read-Only Specifications for Synthesis**

The main challenge of introducing read-only annotations (commonly also referred to as *permissions*)<sup>5</sup> into Separation Logic lies in establishing the discipline for performing sound accounting in the presence of mixed read-only and mutating heap accesses by different components of a program.

As an example, consider a simple symbolic heap x <sup>M</sup> -<sup>→</sup> <sup>f</sup> <sup>∗</sup> <sup>r</sup> <sup>M</sup> -→ h that declares two *mutable* (*i.e.*, allowed to be written to) pointers x and r, that point to unspecified values f and h, correspondingly. With this symbolic heap, is it safe to call the following function that modifies the contents of r but not of x?

$$\left\{\mathbf{x}\stackrel{\mathrm{RO}}{\right\rightarrow}\mathbf{f}\;\ast\;\mathbf{r}\stackrel{\mathrm{M}}{\rightarrow}\mathbf{h}\right\}\;\mathsf{readX}\{\mathbf{x},\;\mathsf{r}\}\;\left\{\mathbf{x}\stackrel{\mathrm{RO}}{\rightarrow}\mathbf{f}\;\ast\;\mathbf{r}\stackrel{\mathrm{M}}{\rightarrow}\mathbf{f}\right\}\tag{2}$$

The precondition of readX requires a weaker form of access permission for x (read-only, RO), while the considered heap asserts a stronger *write* permission (M). It should be possible to satisfy readX's requirement by providing the necessary read-only permission for x. To do so, we need to agree on a discipline to "adapt" the caller's *write*-permission M to the callee's *read-only* permission RO. While seemingly trivial, if implemented na¨ıvely, accounting of RO permissions in SL might compromise either soundness or completeness of the logical reasoning.

A number of proposals for logically sound interplay between write- and readonly access permissions in the presence of function calls has been described in the literature [7–9, 11, 13, 20, 30]. Some of these works manage to maintain the simplicity of having only *mutable*/*read-only* annotations when confined to the sequential setting [9,11,13]. More general (but harder to implement) approaches rely on *fractional permissions* [8,25], an expressive mechanism for permission accounting, with primary applications in concurrent reasoning [7, 28]. We started this project by attempting to adapt some of those logics [9,11,13] as an extension of SSL in order to reap the benefits of read-only annotations for the synthesis of sequential program. The main obstacle we encountered involved definitions of inductive heap predicates with *mixed* permissions. For instance, how can one specify a program that modifies the contents of a linked list, but not its structure? Even though it seemed possible to enable this treatment of predicates via permission multiplication [25], developing support for this machinery on top of existing SuSLik infrastructure was a daunting task. Therefore, we had to look for a technically simpler solution.

<sup>5</sup> We will be using the words "annotation" and "permission" interchangeably.

#### **1.3 Our Contributions**

*Theoretical Contributions.* Our main conceptual innovation is the idea of instrumenting SSL with symbolic *read-only borrows* to enable faster and more predictable program synthesis. Borrows are used to annotate symbolic heaps in specifications, similarly to abstract fractional permissions from the deductive verification tools, such as Chalice and VeriFast [20,21,27]. They enable simple but principled lightweight threading of heap access permissions from the callers to callees and back, while enforcing *read-only* access whenever it is required. For basic intuition on read-only borrows, consider the specification below:


The precondition requires a heap with three pointers, x, y, and r, pointing to unspecified f, g, and h, correspondingly. Both x and y are going to be treated as read-only, but now, instead of simply annotating them with RO, we add *symbolic borrowing annotations* a and b. The semantics of these borrowing annotations is the same as that of other ghost variables (such as f). In particular, the *callee* must behave correctly for any valuation of a and b, which leaves it no choice but to treat the corresponding heap fragments as read-only (hence preventing the heap fragments from being written). On the other hand, from the perspective of the *caller*, they serve as formal parameters that are substituted with actuals of caller's choosing: for instance, when invoked with a caller's symbolic heap - x <sup>M</sup> -<sup>→</sup> <sup>1</sup> <sup>∗</sup> <sup>y</sup> <sup>c</sup> -<sup>→</sup> <sup>2</sup> <sup>∗</sup> <sup>r</sup> <sup>M</sup> -→ 0 (where c denotes a read-only borrow of the caller), readXY is guaranteed to "restore" the same access permissions in the postcondition, as per the substitution [M/a, c/b]. The example above demonstrates that read-only borrows are straightforward to compose when reasoning about code with function calls. They also make it possible to define *borrow-polymorphic* inductive heap predicates, *e.g.*, enhancing ls from spec (1) so it can be used in specifications with mixed access permissions on their components.<sup>6</sup> Finally, readonly borrows make it almost trivial to adapt the existing SSL-based synthesis to work with read-only access permissions; they reduce the complex permission *accounting* to easy-to-implement permission *substitution*.

*Practical Contributions.* Our first practical contribution is ROBoSuSLik—an enhancement of the SuSLik synthesis tool [34] with support for read-only borrows, which required us to modify less than 100 lines of the original code.

Our second practical contribution is the extensive evaluation of synthesis with read-only permissions, on a standard benchmark suite of specifications for heapmanipulating programs. We compare the behaviour, performance, and the outcomes of the synthesis when run with the standard ("all-mutable") specifications and their analogues instrumented with read-only permissions wherever reasonable. By doing so, we substantiate the following claims regarding the practical impact of using read-only borrows in SSL specifications:

**–** First, we show that synthesis of read-only specifications is more *efficient*: it does *less backtracking* while searching for a program that satisfies the imposed constraints, entailing better performance.

<sup>6</sup> We will present borrow-polymorphic inductive heap predicates in Sec. 2.4.


*Paper Outline.* We start by showcasing the intricacies and the virtues of SSLbased synthesis with read-only specifications in Sec. 2. We provide the formal account of read-only borrows and present the modified SSL rules, along with the soundness argument in Sec. 3. We report on the implementation and evaluation of the enhanced synthesis in Sec. 4. We conclude with a discussion on the limitations of read-only borrows in Sec. 5 and compare to related work in Sec. 6.

# **2 Program Synthesis with Read-Only Borrows**

We introduce the enhancement of SSL with read-only borrows by walking the reader through a series of small but characteristic examples of deductive synthesis with separation logic. We provide the necessary background on SSL in Sec. 2.1; the readers familiar with the logic may want to skip to Sec. 2.2.

#### **2.1 Basics of SSL-based Deductive Program Synthesis**

In a deductive Separation Logic-based synthesis, a client provides a specification of a function of interest as a pair of pre- and post-conditions, such as {P} void foo(loc x, int i) {Q}. The precondition P constrains the symbolic state necessary to run the function safely (*i.e.*, without crashes), while the postcondition Q constrains the resulting state at the end of the function's execution. A function body c satisfying the provided specification is obtained as a result of deriving the SSL statement, representing the synthesis *goal*:

$$\{\mathbf{x}, \mathbf{i}\}; \{\mathcal{P}\} \sim \{\mathcal{Q}\} |\mathbf{c}\rangle$$

In the statement above, x and i are *program variables*, and they are explicitly stated in the environment Γ = {x, i}. Variables that appear in {P} and that are not program variables are called (logical) *ghost* variables, while the non-program variables that only appear in {Q} are referred to as (logical) *existential* ones (EV). The meaning of the statement Γ; {P}❀{Q}| c is the *validity* of the Hoare-style triple {P} c {Q} for all possible values of variables from Γ. <sup>7</sup> Both pre- and postcondition contain a *spatial* part describing the shape of the symbolic state (spatial formulae are ranged over via P, Q, and R), and a *pure* part (ranged over via φ, ψ, and ξ), which states the relations between variables (both program and logical). A derivation of an SSL statement is conducted by applying logical

<sup>7</sup> We often care only about the *existence* of a program c to be synthesised, not its specific shape. In those cases we will be using a shorter statement: Γ; {P} ❀ {Q}.

rules, which reduce the initial goal to a trivial one, so it can be solved by one of the *terminal* rules, such as, *e.g.*, the rule Emp shown below:

$$\text{EMP} \xrightarrow[]{} \frac{\vdash \phi \Rightarrow \psi}{\Gamma; \{\phi; \text{emp}\} \sim \{\psi; \text{emp}\} | \text{skip} \}$$

That is, Emp requires that (i) symbolic heaps in both pre- and post-conditions are empty and (ii) that the pure part φ of the precondition implies the pure part ψ of the postcondition. As the result, Emp "emits" a trivial program skip. Some of the SSL rules are aimed at simplifying the goal, bringing it to the shape that can be solved with Emp. For instance, consider the following rules:

$$\begin{array}{l} \text{FRAIME} \\ \mathsf{EV}(\Gamma,\mathcal{P},\mathcal{Q}) \cap \mathsf{Vars}\left(\mathsf{R}\right) = \emptyset \\ \Gamma; \{\phi;\mathsf{P}\} \stackrel{\sim}{\longrightarrow} \{\psi;\mathsf{Q}\} \mid \mathsf{c} \\ \hline \Gamma; \{\phi;\mathsf{P}\*\mathsf{R}\} \stackrel{\sim}{\longrightarrow} \{\psi;\mathsf{Q}\*\mathsf{R}\} \mid \mathsf{c} \\ \end{array} \qquad \begin{array}{l} \text{UNFYHE{E}APS} \\ \left[\sigma\right] \mathsf{R}' = \mathsf{R} \\ \qquad \Gamma; \{\phi;\mathsf{P}\*\mathsf{R}\} \stackrel{\sim}{\longrightarrow} \{\sigma\} \{\psi;\mathsf{Q}\*\mathsf{R}'\} \\ \hline \Gamma; \{\phi;\mathsf{P}\*\mathsf{R}\} \stackrel{\sim}{\longrightarrow} \{\sigma\} \{\psi;\mathsf{Q}\*\mathsf{R}'\} \mid \mathsf{c} \\ \hline \Gamma; \{\phi;\mathsf{P}\*\mathsf{R}\} \stackrel{\sim}{\longrightarrow} \{\psi;\mathsf{Q}\*\mathsf{R}'\} \mid \mathsf{c} \\ \end{array}$$

Neither of the rules Frame and UnifyHeaps "adds" to the program c being synthesised. However, Frame reduces the goal by removing a matching part R (*a.k.a. frame*) from both the pre- and the post-condition. UnifyHeaps nondeterministically picks a substitution σ, which replaces existential variables in a sub-heap R of the postcondition to match the corresponding symbolic heap R in the precondition. Both of these rules make choices with regard to what frame R to remove or which substitution σ to adopt—a point that will be of importance for the development described in Sec. 2.2.

Finally, the following (simplified) rule for producing a *write* command is *operational*, as it emits a part of the program to be synthesised, while also modifying the goal accordingly. The resulting program will, thus, consist of the emitted store ∗x = e of an expression e to the pointer variable x. The remainder is synthesised by solving the sub-goal produced by applying the Write rule.

$$\Gamma \text{WRTE } \frac{\mathsf{Vars}\left(\mathsf{e}\right) \subseteq \Gamma \qquad \mathsf{e} \neq \mathsf{e}' \qquad \Gamma; \ \{\phi; \mathsf{x} \mapsto \mathsf{e} \ast \mathsf{P}\} \leadsto \{\psi; \mathsf{x} \mapsto \mathsf{e} \ast \mathsf{Q}\} \mid \mathsf{c}}{\Gamma; \ \{\phi; \mathsf{x} \mapsto \mathsf{e}' \ast \mathsf{P}\} \sim \{\psi; \mathsf{x} \mapsto \mathsf{e} \ast \mathsf{Q}\} \mid \mathsf{x} \mathsf{x} = \mathsf{e}; \mathsf{c}}$$

As it is common with proof search, should no rule apply to an intermediate goal within one of the derivations, the deductive synthesis back-tracks, possibly discarding a partially synthesised program fragment, trying alternative derivation branches. For instance, firing UnifyHeaps to unify wrong sub-heaps might lead the search down a path to an unsatisfiable goal, eventually making the synthesis back-track and leading to longer search. Consider also a misguided application of Write into a certain location, which can cause the synthesizer to generate a less intuitive program that "makes up" for the earlier spurious writes. This is precisely what we are going to fix by introducing read-only annotations.

#### **2.2 Reducing Non-Determinism with Read-Only Annotations**

Consider the following example adapted from the original SSL paper [34]. While the example is intentionally artificial, it captures a frequent synthesis scenario non-determinism during synthesis. This specification allows a certain degree of freedom in how it can be satisfied:

{x -→ 239 ∗ y -→ 30} void pick(loc x, loc y) {z ≤ 100; x -→ z ∗ y -→ z} (4) It seems logical for the synthesis to start the program derivation by applying the rule UnifyHeaps, thus reducing the initial goal to the one of the form

> {x, y} ; {x -→ 239 ∗ y -→ 30} ❀ {239 ≤ 100; x -→ 239 ∗ y -→ 239}

This new goal has been obtained by picking one particular substitution σ = [239/z] (out of multiple possible ones), which delivers two identical *heaplets* of the form x -<sup>→</sup> <sup>239</sup> in pre- and postcondition. It is time for the Write rule to strike to fix the discrepancy between the symbolic heap in the pre- and postcondition by emitting the command ∗y = 239 (at last, some executable code!), and resulting in the following new goal (notice the change of y-related entry in the precondition):

> {x, y} ; {x -→ 239 ∗ y -→ 239} ❀ {239 ≤ 100; x -→ 239 ∗ y -→ 239}

What follows are two applications of the Frame rule to the common symbolic heaps, leading to the goal: {x, y} {emp} ❀ {239 ≤ 100; emp}. At this point, we are clearly in trouble. The pure part of the precondition is simply true, while the postcondition's pure part is 239 ≤ 100, which is unsolvable.

Turns out that our initial pick of the substitution σ = [239/z] was an unfortunate one, and we should discard the series of rule applications that followed it, back-track and adopt a different substitution, *e.g.*, σ- = [30/z], which will indeed result in solving our initial goal.<sup>8</sup>

Let us now consider the same specification for pick that has been enhanced by explicitly annotating parts of the symbolic heap as mutable and read-only:

x <sup>M</sup> -<sup>→</sup> <sup>239</sup> <sup>∗</sup> <sup>y</sup> RO -<sup>→</sup> <sup>30</sup> void pick(loc x, loc y) <sup>z</sup> <sup>≤</sup> <sup>100</sup>; <sup>x</sup> <sup>M</sup> -<sup>→</sup> <sup>z</sup> <sup>∗</sup> <sup>y</sup> RO -→ z (5) In this version of SSL, the effect of rules such as Emp, Frame, and UnifyHeaps remains the same, while operational rules such as Write, become *annotationaware*. Specifically, the rule Write is now replaced by the following one:

WriteRO Vars(e) ⊆ Γ e = e- Γ; - φ; x <sup>M</sup> -→ e ∗ P ❀ - ψ; x <sup>M</sup> -→ e ∗ Q c Γ; - φ; x <sup>M</sup> -→ e- ∗ P ❀ - ψ; x <sup>M</sup> -→ e ∗ Q <sup>∗</sup><sup>x</sup> <sup>=</sup> <sup>e</sup>; <sup>c</sup>

Notice how in the rule above the heaplets of the form x <sup>M</sup> -→ e are now annotated with the access permission M, which explicitly indicates that the code may modify the corresponding heap location.

Following with the example specification (5), we can imagine a similar scenario when the rule UnifyHeaps picks the substitution σ = [239/z]. Should this be the case, the next application of the rule WriteRO will not be possible, due to the *read-only* annotation on the heaplet y RO -→ 239 in the resulting sub-goal:

$$\left\{ \left\{ \mathbf{x}, \mathbf{y} \right\}; \left\{ \mathbf{x} \stackrel{\scriptstyle \mathbf{M}}{\right\rightharpoonup} 239 \ast \left\| \mathbf{y} \stackrel{\scriptstyle \mathbf{R} \mathbf{0}}{\right\rightharpoonup} 30 \right\} \sim \sim \left\{ \mathbf{z} \le 100; \mathbf{x} \stackrel{\scriptstyle \mathbf{M}}{\right\rightharpoonup} 239 \ast \left\| \mathbf{y} \stackrel{\scriptstyle \mathbf{R} \mathbf{0}}{\right\rightharpoonup} 239 \right\}$$

As the RO access permission prevents the synthesised code from modifying the greyed heaplets, the synthesis search is forced to back-track, picking an alternative substitution σ- =[30/z] and converging on the desirable program ∗x=30.

<sup>8</sup> One might argue that it was possible to detect the unsolvable conjunct 239 <sup>≤</sup> 100 in the postcondition immediately after performing substitution, thus sparing the need to proceed with this derivation further. This is, indeed, a possibility, but in general it is hard to argue which of the heuristics in applying the rules will work better in general. We defer the quantitative argument on this matter until Sec. 4.4.

#### **2.3 Composing Read-Only Borrows**

Having synthesised the pick function from specification (5), we would like to use it in future programs. For example, imagine that at some point, while synthesising another program, we see the following as an intermediate goal:

$$\left\{ \mathbf{u}, \mathbf{v} \right\}; \left\{ \mathbf{u} \stackrel{\scriptstyle \mathbf{M}}{\longmapsto} \mathbf{239} \ast \mathbf{v} \stackrel{\scriptstyle \mathbf{M}}{\longmapsto} \mathbf{30} \ast \mathbf{P} \right\} \sim \rightsquigarrow \left\{ \mathbf{v} \le \mathbf{200}; \mathbf{u} \stackrel{\scriptstyle \mathbf{M}}{\longmapsto} \mathbf{v} \ast \mathbf{v} \stackrel{\scriptstyle \mathbf{M}}{\longmapsto} \mathbf{u} \ast \mathbf{Q} \right\} \tag{6}$$

It is clear that, modulo the names of the variables, we can synthesise a part of the desired program by emitting a call pick(u, v), which we can then reduce to the goal {u, <sup>v</sup>} {P} ❀ {<sup>w</sup> <sup>≤</sup> <sup>200</sup>; <sup>Q</sup>} via an application of Frame.

Why is emitting such a call to pick() safe? Intuitively, this can be done because the precondition of the spec (5) is *weaker* than the one in the goal (6). Indeed, the precondition of the latter provides the full (mutable) access permission on the heap portion v <sup>M</sup> -→ 30, while the pre/postcondition of former requires a weaker form of access, namely read-only: y RO -→ 30. Therefore, our logical foundations should allow temporary "downgrading" of an access permission, *e.g.*, from M to RO, for the sake of synthesising calls. While allowing this is straightforward and can be done similarly to up-casting a type in languages like Java, what turns out to be less trivial is making sure that the caller's initial stronger access permission (M) is *restored* once pick(u, v) returns.

*Non-solutions.* Perhaps, the simplest way to allow the call to a function with a weaker (in terms of access permissions) specification, would be to (a) downgrade the caller's permissions on the corresponding heap fragments to RO, and (b) recover the permissions as per the callee's specification. This approach significantly reduces the expressivity of the logic (and, as a consequence, completeness of the synthesis). For instance, adopting this strategy for using specification (5) in the goal (6) would result in the unsolvable sub-goal of the form {u, v} ; - u <sup>M</sup> -<sup>→</sup> <sup>30</sup> <sup>∗</sup> <sup>v</sup> RO -→ 30 ∗ P ❀ - u <sup>M</sup> -<sup>→</sup> <sup>30</sup> <sup>∗</sup> <sup>v</sup> <sup>M</sup> -→ 30 ∗ Q . This is due to the fact that the postcondition requires the heaplet v <sup>M</sup> -→ 30 to have the write-permission M, while the new precondition only provides the RO-access.

Another way to cater for a weaker callee's specification would be to "chip out" a RO-permission from a caller's M-annotation (in the spirit of fractional permissions), offer it to the callee, and then "merge" it back to the caller's fullblown permission upon return. This solution works for simple examples, but not for heap predicates with mixed permissions (discussion in Sec. 6). Yet another approach would be to create a "RO clone" of the caller's M-annotation, introducing an axiom of the form x <sup>M</sup> -<sup>→</sup> <sup>t</sup> <sup>x</sup> <sup>M</sup> -<sup>→</sup> <sup>t</sup> <sup>∗</sup> <sup>x</sup> RO -<sup>→</sup> <sup>t</sup>. The created component <sup>x</sup> RO -→ t could be provided to the callee and discarded upon return since the caller retained the full permission of the original heap. Several works on RO permissions have adopted this approach [9, 11, 13]. While discarding such clones works just fine for sequential program verification, in the case of synthesis guided by preand postconditions, *incomplete* postconditions could lead to intractable goals.

*Our solution.* The key to gaining the necessary expressivity *wrt.* passing/returning access permissions, while maintaining a sound yet simple logic, is *treating access permissions as first-class values*. A natural consequence of this treatment is that immutability annotations can be symbolic (*i.e.*, variables of a special sort "permission"), and the semantics of such variables is well understood; we refer to these symbolic annotations as *read-only borrows*. <sup>9</sup> For instance, using borrows, we can represent the specification (5) as an equivalent one:

$$\left\{ \mathbf{x} \stackrel{\text{\tiny\textbf{w}}}{\to} 239 \ast \mathbf{y} \stackrel{\scriptstyle\textbf{\tiny\textbf{z}}}{\to} 30 \right\} \text{void } \texttt{pick}(\mathbf{1} \text{oc } \mathbf{x}, \ \mathbf{1} \text{oc } \mathbf{y}) \left\{ \mathbf{z} \le 100; \mathbf{x} \stackrel{\scriptstyle\textbf{\tiny\textbf{w}}}{\to} \mathbf{z} \ast \mathbf{y} \stackrel{\scriptstyle\textbf{\tiny\textbf{z}}}{\to} \mathbf{z} \right\} \tag{7}$$

The only substantial difference with spec (5) is that now the pointer y's access permission is given an *explicit name* a. Such named annotations (*a.k.a.* borrows) are treated as RO by the callee, as long as the pure precondition does not constrain them to be mutable. However, giving these permissions names achieves an important goal: performing accurate accounting while composing specifications with different access permissions. Specifically, we can now emit a call to pick(u, v) as specified by (7) from the goal (6), keeping in mind the substitution σ = [u/x, v/y, M/a]. This call now accounts for borrows as well, and makes it straightforward to restore v's original permission M upon returning.

Following the same idea, borrows can be naturally composed through captureavoiding substitutions. For instance, the same specification (7) of pick could be used to advance the following modified version of the goal (6):

$$\left\{ \begin{array}{c} \{\mathtt{u},\mathtt{v}\} \\ \end{array} ; \left\{ \begin{array}{c} \mathtt{u} \stackrel{\mathtt{M}}{\longmapsto} \mathtt{239} \ \* \ \mathtt{v} \stackrel{\mathtt{c}}{\longmapsto} \mathtt{30} \ \* \ \mathtt{P} \end{array} \right\} \sim \left\{ \begin{array}{c} \mathtt{v} \leq \mathtt{210}; \mathtt{u} \stackrel{\mathtt{M}}{\longmapsto} \mathtt{v} \ \* \ \mathtt{v} \stackrel{\mathtt{c}}{\longmapsto} \mathtt{w} \ast \ \mathtt{Q} \end{array} \right\} \end{array}$$

by means of taking the substitution σ-= [u/x, v/y, c/a].

#### **2.4 Borrow-Polymorphic Inductive Predicates**

Separation Logic owes its glory to the extensive use of *inductive heap predicates* a compact way to capture the shape and the properties of finite heap fragments corresponding to recursive linked data structures. Below we provide one of the most widely-used SL predicates, defining the shape of a heap containing a nullterminated singly-linked list with elements from a set S:

$$\begin{array}{rcl} \mathsf{ls}(\mathbf{x}, \mathsf{S}) & \triangleq & \mathbf{x} = \mathsf{0} \land \{\mathsf{S} = \emptyset; \mathsf{emp}\} \\ & & \mid & \mathbf{x} \neq \emptyset \land \{\mathsf{S} = \{\mathbf{v}\} \cup \mathsf{S}\_{1}; [\mathbf{x}, \mathsf{2}] \ast \mathbf{x} \longmapsto \mathsf{v} \ast \langle \mathsf{x}, \mathsf{1}\rangle \mapsto n\boldsymbol{x} t \ast \mathsf{ls}(n\boldsymbol{x} t, \mathsf{S}\_{1}) \} \end{array} \tag{8}$$

The predicate contains two clauses describing the corresponding cases of the list's shape depending on the value of the head pointer x. If x is zero, the list's heap representation is empty, and so is the set of elements S. Alternatively, if x is not zero, it stores a record with two items (indicated by the *block assertion* [x, 2]), such that the *payload* pointer x contains the value v (where S = {v} ∪ S<sup>1</sup> for some set S1), and the pointer, corresponding to x + 1 (denoted as x, 1 ) contains the address of the list's tail, *nxt*.

While expressive enough to specify and enable synthesis of various list-traversing and list-generating recursive functions via SSL, the definition (8) does not allow one to restrict the access permissions to different components of the list: all of the involved memory locations can be mutated (which explains the synthesis issue we described in Sec. 1.1). To remedy this weakness of the traditional SLstyle predicates, we propose to *parameterise* them with read-only borrows, thus making them aware of different access permissions to their various components. For instance, we propose to redefine the linked list predicate as follows:

<sup>9</sup> In this regard, our symbolic borrows are very similar to abstract fractional permissions in Chalice and VeriFast [21, 27]. We discuss the relation in detail in Sec. 6.

$$\begin{aligned} \mathsf{ls}(\mathsf{x}, \mathsf{S}, \mathsf{a}, \mathsf{b}, \mathsf{c}) & \xrightarrow{\mathsf{h}} \mathsf{x} = \mathsf{0} \land \{\mathsf{S} = \varnothing; \mathsf{emp}\} \\ \mid \mathsf{x} \neq \mathsf{0} \land \left\{ \mathsf{S} = \{\mathsf{v}\} \cup \mathsf{S}\_{1}; [\mathsf{x}, \mathsf{2}]^{\mathsf{a}} \ast \mathsf{x} \stackrel{\mathsf{b}}{\to} \mathsf{v} \ast \langle \mathsf{x}, \mathsf{1}\rangle \stackrel{\mathsf{c}}{\mapsto} n \mathsf{x} t \ast \mathsf{ls}(n \mathsf{x} t, \mathsf{S}\_{1}, \mathsf{a}, \mathsf{b}, \mathsf{c}) \right\} \end{aligned} \tag{9}$$

The new definition (9) is similar to the old one (8), but now, in addition to the standard predicate parameters (*i.e.*, the head pointer x and the set S in this case), also features three borrow parameters a, b, and c that stand as placeholders for the access permissions to some particular components of the list. Specifically, the symbolic borrows b and c control the permissions to manipulate the pointers x and x + 1, correspondingly. The borrow a, modifying a blocktype heaplet, determines whether the record starting at x can be deallocated with free(x). All the three borrows are passed in the same configuration to the recursive instance of the predicate, thereby imposing the same constraints on the rest of the corresponding list components.

Let us see the borrow-polymorphic inductive predicates in action. Consider the following specification that asks for a function taking a list of arbitrary values and replacing all of them with zeroes:<sup>10</sup>

{ls(x, <sup>S</sup>, <sup>d</sup>, <sup>M</sup>, <sup>e</sup>)} void reset(loc x) {ls(x, <sup>O</sup>, <sup>d</sup>, <sup>M</sup>, <sup>e</sup>)} (10)

The spec (10) gives very little freedom to the function that would satisfy it with regard to permissions to manipulate the contents of the heap, constrained by the predicate ls(x, S, d, M, e). As the first and the third borrow parameters are instantiated with read-only borrows (d and e), the desired function is not going to be able to change the structural pointers or deallocate parts of the list. The only allowed manipulation is, thus, changing the values of the payload pointers.

This concise specification is pleasantly strong. To wit, in plain SSL, a similar spec (without read-only annotations) would also admit an implementation that fully deallocates the list or arbitrarily changes its length. In order to avoid these outcomes, one would, therefore, need to provide an alternative definition of the predicate ls, which would incorporate the length property too.

Imagine now that one would like to use the implementation of reset satisfying specification (10) to generate a function with the following spec, providing stronger access permissions for the list components:

```
{ls(y, S, M, M, M)} void call_reset(loc y) {ls(y, O, M, M, M)}
```
During the synthesis of call reset, a call to reset is generated. For this purpose the access permissions are borrowed and recovered as per spec (10) via the substitution [y/x, M/d, M/e] in a way described in Sec. 2.3.

#### **2.5 Putting It All Together**

We conclude this overview by explaining how synthesis via SSL enhanced with read-only borrows avoids the issue with spurious writes outlined in Sec. 1.1.

To begin, we change the specification to the following one, which makes use of the new list predicate (9) and prevents any modifications in the original list.


We should remark that, contrary to the solution sketched at the end of Sec. 1.1, which suggested using the predicate instance of the shape ls(x, S)[RO], our concrete proposal does not allow us to constrain the entire predicate with a single

<sup>10</sup> We use O as a notation for a multi-set with an arbitrary finite number of zeroes.

Variable x, y Alpha-numeric identifiers Size, offset *n*, ι Non-negative integers Expressione ::= 0 | true | x | e = e | e ∧ e | ¬e Command c ::= **let** x = ∗(x + ι) | ∗(x + ι) = e | **let** x = malloc(*n*) | free(x) | err | f(ei) | c; c | **if** (e) {c} **else** {c} Fun. dict. Δ ::= | Δ, f (xi) { c } Fig. 2: Programming language grammar.

Pure term φ, ψ, χ, α ::= 0 | true | M | RO | x | φ = φ | φ ∧ φ | ¬φ Symbolic heap <sup>P</sup>, <sup>Q</sup>, <sup>R</sup> ::= emp | e, ι <sup>α</sup> -→ e | [e, ι] <sup>α</sup> <sup>|</sup> <sup>p</sup>(φi) <sup>|</sup> <sup>P</sup> <sup>∗</sup> <sup>Q</sup> Heap predicate D ::= p(xi) ek, {χk, Rk} Function spec F ::= f(xi) : {P}{Q} Assertion P, Q ::= {φ; P} Environment Γ := | Γ, x Context Σ := | Σ, D | Σ, F Fig. 3: BoSSL assertion syntax.

access permission (*e.g.*, RO). Instead, we allow *fine-grained* access control to its particular elementary components by annotating each one with an individual borrow. The specification above allows the greatest flexibility *wrt.* access permissions to the original list by giving them different names (a, b, c).

In the process of synthesising the non-trivial branch of listcopy, the search at some point will come up with the following intermediate goal:

$$\left\{ \mathbf{S} = \{\mathbf{v}\} \cup \mathbf{S}\_1; \mathbf{r} \stackrel{\mathsf{M}}{\mapsto} \mathbf{y} \mathbf{1} \\ 2 \star \left[ \mathbf{z}, 2 \right] \stackrel{\mathsf{a}}{\mapsto} \mathbf{v} \star \left< \mathbf{z}, 1 \right> \stackrel{\mathsf{c}}{\mapsto} \mathbf{x} \\ \mathbf{x} \mathbf{t} \; \star \left< \mathbf{z}, 1 \right> \stackrel{\mathsf{c}}{\mapsto} \mathbf{x} \\ \mathbf{x} \mathbf{t} \; \star \left< \mathbf{z}, 1 \right> \stackrel{\mathsf{M}}{\mapsto} \mathbf{y} \mathbf{1} \\ 2 \star \left< \mathbf{z}, 1 \right> \stackrel{\mathsf{M}}{\mapsto} \mathbf{y} \mathbf{1} \\ 2 \star \left< \mathbf{y} \mathbf{1}, \mathbf{S}\_1, \mathbf{M}, \mathbf{M}, \mathbf{M} \right> \stackrel{\mathsf{c}}{\mapsto} \dots \right> $$

Since the logical variable z in the postcondition is an existential one, the greyed part of the symbolic heap can be satisfied by either (a) re-purposing the greyed part of the precondition (which is what the implementation in Sec. 1.1 does), or (b) allocating a corresponding record of two elements (as should be done). With the read-only borrows in place, the unification of the two greyed fragments in the pre- and postcondition via UnifyHeaps fails, because the mutable annotation of z <sup>M</sup> -<sup>→</sup> <sup>v</sup> in the post cannot be matched by the read-only borrow <sup>x</sup> <sup>b</sup> -→ v in the precondition. Therefore, not being able to follow the derivation path (a), the synthesiser is forced to explore an alternative one, eventually deriving the version of listcopy without tail-swapping.

# **3 BoSSL: Borrowing Synthetic Separation Logic**

We now give a formal presentation of BoSSL—a version of SSL extended with read-only borrows. Fig. 2 and Fig. 3 present its programming and assertion language, respectively. For simplicity, we formalise a core language without theories (*e.g.*, natural numbers), similar to the one of Smallfoot [6]; the only sorts in the core language are locations, booleans, and permissions (where permissions appear only in specifications) and the pure logic only has equality. In contrast, our implementation supports integers and sets (where the latter also only appear in specifications), with linear arithmetic and standard set operations. We do

$$\frac{\begin{array}{l} \text{WRTTE} \\ \text{Vars (e)} \subseteq \Gamma \end{array}}{\begin{array}{l} \text{Vars (e)} \subseteq \Gamma \end{array}} \quad \mathbf{e} \neq \mathbf{e}' \qquad \Gamma; \left\{ \phi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e} \* \mathbf{P} \right\} \sim \* \left\{ \psi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e} \* \mathbf{Q} \right\} \Big|}{\Gamma; \left\{ \phi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e}' \* \mathbf{P} \right\} \sim \* \left\{ \psi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e} \* \mathbf{Q} \right\} \Big|} \stackrel{\scriptstyle \mathbf{c}}{\*} \\ \mathbf{e} \, \mathbf{f} \, \mathbf{e} = \left\{ \psi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e} \* \mathbf{Q} \right\} \Big|} \stackrel{\scriptstyle \mathbf{c}}{\*} \sim \left\{ \psi; \langle \mathbf{x}, \iota \rangle \stackrel{\scriptstyle \mathbf{M}}{\right\models} \mathbf{e} \* \mathbf{Q} \right\} \Big|} \stackrel{\scriptstyle \mathbf{c}}{\*} \left( \mathbf{x} + \iota \right) = \mathbf{e}; \mathbf{c} \end{array}$$

Alloc

R= [z, n] <sup>α</sup> <sup>∗</sup> ∗<sup>0</sup>≤i<<sup>n</sup> z, <sup>i</sup> <sup>α</sup><sup>i</sup> -→ e<sup>i</sup> ({y}∪{ti}) ∩ Vars(Γ, P, Q) =∅ z∈EV (Γ, P, Q) R- - [y, n] <sup>M</sup> <sup>∗</sup> ∗<sup>0</sup>≤i<<sup>n</sup> y, <sup>i</sup> <sup>M</sup> -→ t<sup>i</sup> Σ; Γ; φ; P ∗ R- ❀{ψ; Q ∗ R} c Σ; Γ; {φ; P}❀{ψ; Q ∗ R}| **let** y = malloc(n); c

Free

$$\frac{\mathsf{R} = [\mathsf{x}, \mathsf{n}]^{\mathsf{M}} \; \mathsf{w} \; \mathsf{x}\_{0 \leq 1 < \mathsf{n}} \left( \langle \mathsf{x}, \mathsf{i} \rangle \stackrel{\mathsf{M}}{\mapsto} \mathsf{e}\_{\mathsf{i}} \right) \qquad \mathsf{Vars} \left( \{ \mathsf{x} \} \cup \{ \overline{\mathsf{e}\_{1}} \} \right) \subseteq \Gamma \qquad \mathsf{E}; \Gamma; \{ \phi; \mathsf{P} \} \sim \{ \mathsf{Q} \} \; \middle| \; \mathsf{c} \; \phi \right)}{\mathsf{E}; \Gamma; \{ \phi; \mathsf{P} \ast \mathsf{R} \} \sim \{ \mathsf{Q} \} \mid \mathsf{Free}(\mathsf{x}); \mathsf{c}}$$

Fig. 4: BoSSL derivation rules.

not formalise sort-checking of formulae; however, for readability, we will use the meta-variable α where the intended sort of the pure logic term is "permission", and Perm for the set of all permissions. The permission to allocate or deallocate a memory-block [x, n] <sup>α</sup> is controlled by α.

#### **3.1 BoSSL rules**

New rules of BoSSL are shown in Fig. 4. The figure contains only 3 rules: this minimal adjustment is possible thanks to our approach to unification and permission accounting from first principles. Writing to a memory location requires its corresponding symbolic heap to be annotated as mutable. Note that for a precondition a = M; x <sup>a</sup> -→ 5 , a normalisation rule like SubstLeft would first transform it into - M = M; x <sup>M</sup> -→ 5 , at which point the Write rule can be applied. Note also that Alloc does not require specific permissions on the block in the postcondition; if they turn out to be RO, the resulting goal is unsolvable.

Unsurprisingly, the rule for accessing a memory cell just for reading purposes requires no adjustments since any permission allows reading. Moreover, the Call rule for method invocation does not need adjustments either. Below, we describe how borrow and return seamlessly operate within a method call:

$$\begin{array}{lcl} \text{CALL} & & \\ \hline \mathcal{F} \stackrel{\scriptstyle \mathsf{f}}{=} \mathsf{f} \left( \overline{\mathsf{x}\_{\mathsf{i}}} \right) : \left\{ \phi\_{\mathsf{f}}; \mathsf{P}\_{\mathsf{f}} \right\} \{ \psi\_{\mathsf{f}}; \mathsf{Q}\_{\mathsf{f}} \right) \in \mathsf{\Sigma} & \mathsf{R} = [\sigma] \mathsf{P}\_{\mathsf{f}} \qquad \vdash \phi \Rightarrow [\sigma] \phi\_{\mathsf{f}} \quad \overline{\mathsf{e}\_{\mathsf{i}}} = [\sigma] \overline{\mathsf{x}\_{\mathsf{i}}} \\ \hline \mathsf{Vars} \left( \overline{\mathsf{e}\_{\mathsf{i}}} \right) \subseteq \Gamma \quad \phi' \stackrel{\scriptstyle \mathsf{f}}{=} [\sigma] \psi\_{\mathsf{f}} \qquad \mathsf{R}' \stackrel{\scriptstyle \mathsf{f}}{=} [\sigma] \mathsf{Q}\_{\mathsf{f}} \quad \mathsf{E}; \Gamma; \{ \phi \land \phi'; \mathsf{P} \* \mathsf{R}' \} \sim \{ \mathsf{Q} \} | \, \mathsf{c} \\ \hline & \Sigma; \Gamma; \{ \phi; \mathsf{P} \* \mathsf{R} \} \sim \{ \mathsf{Q} \} | \, \mathsf{f} \, \overline{(\overline{\mathsf{e}\_{\mathsf{i}}})}; \mathsf{c} \end{array}$$

The Call rule fires when a sub-heap R in the precondition of the goal can be unified with the precondition P<sup>f</sup> of a function f from context Σ. Some salient points are worth mentioning here: (1) the *annotation borrowing* from R to P<sup>f</sup> for those symbolic sub-heaps in P<sup>f</sup> which require read-only permissions is handled by the unification of P<sup>f</sup> with R, namely R =[σ]P<sup>f</sup> (*i.e.*, substitution accounts for borrows: α/a); (2) the *annotation recovery* in the new precondition is implicit via R- - [σ]Qf, where the substitution σ was computed during the unification, that is, while borrowing; (3) finding a substitution σ for R =[σ]P<sup>f</sup> fails if R does not have sufficient accessibility permissions to call f (*i.e.*, substitutions of the form a/M are disallowed since the domain of σ may only contain existentials). We reiterate that read-only specifications only manipulate symbolic borrows, that is to say, RO constants are not expected in the specification.

#### **3.2 Memory Model**

We closely follow the standard SL memory model [32,37] and assume Loc ⊂ Val.

$$\text{(Heap)} \text{ h} \in \text{Heaps} ::= \text{Loc} \to \text{Val} \qquad\qquad\text{(Stack)} \text{ s} \in \text{Stacks} ::= \text{Var} \to \text{Val} \qquad\qquad\qquad\text{(2)}$$

To enable C-like accounting of dynamically-allocated memory blocks, we assume that the heap h also stores sizes of allocated blocks in dedicated locations. Conceptually, this part of the heap corresponds to the meta-data of the memory allocator. This accounting ensures that only a previously allocated memory block can be disposed (as opposed to any set of allocated locations), enabling the free command to accept a single argument, the address of the block. To model this meta-data, we introduce a function bl: Loc → Loc, where bl(x) denotes the location in the heap where the block meta-data for the address x is stored, if x is the starting address of a block. In an actual language implementation, bl(x) might be, *e.g.*, x − 1 (*i.e.*, the meta-data is stored right before the block).

Since we have opted for an unsophisticated permission mechanism, where the *heap ownership is not divisible*, but some heap locations are restricted to RO, the definition of the satisfaction relation <sup>Σ</sup>,<sup>R</sup> <sup>I</sup> for the annotated assertions in a particular context Σ and given an interpretation I, is parameterised with a fixed set of read-only locations, R:


$$\begin{array}{c} \langle \mathsf{h}, \mathsf{s} \rangle \models\_{\mathcal{X}}^{\mathsf{T}, \mathsf{R}} \Big{{\langle \boldsymbol{\phi}, \mathsf{p}(\boldsymbol{\psi}\_{\mathtt{i}}) \rangle} \, \, \mathsf{if} \, \|\, \mathsf{f}\|\, \mathsf{s} \Big{{\|} \, \mathsf{s} \|} = \mathsf{true} \, \mathsf{and} \, \mathcal{D} \stackrel{\scriptstyle \mathsf{d}}{=} \mathsf{p}(\underline{\mathsf{x\_{i}}}) \overline{\langle \mathsf{e\_{k}}, \{\boldsymbol{\chi}\_{\mathtt{k}}, \mathsf{R}\_{\mathtt{k}}\} \rangle} \in \mathsf{E} \, \mathsf{and} \\\langle \mathsf{h}, \overline{\|\, \boldsymbol{\psi}\_{\mathtt{i}}\|} \rangle \in \mathcal{D}(\mathcal{D}) \, \text{and} \, \mathsf{V}\_{\mathtt{k}}(\langle \mathsf{h}, \mathsf{s} \rangle \models\_{\mathcal{X}}^{\mathsf{T}, \mathsf{R}} \overline{\langle \boldsymbol{\psi}\_{\mathtt{i}} \rangle \, \mathsf{q}} \{\phi \land \mathsf{e\_{k}} \land \chi\_{\mathtt{k}}; \mathsf{R}\_{\mathtt{k}} \}). \end{array}$$

There are two non-standard cases: points-to and block, whose permissions must agree with R. Note that in the definition of satisfaction, we only need to consider that case where the permission α is a value (*i.e.*, either RO or M). Although in a specification α can also be a variable, well-formedness guarantees that this variable must be logical, and hence will be substituted away in the definition of validity. We stress the fact that a reference that has RO permissions to a certain symbolic heap still retains the full ownership of that heap, with the restriction that it is not allowed to update or deallocate it. Note that deallocation additionally requires a mutable permission for the enclosing block.

#### **3.3 Soundness**

The BoSSL operational semantics is in the spirit of the traditional SL [38], and hence is omitted for the sake of saving space (selected rules are available in the extended version of the paper). The validity definition and the soundness proofs of SSL are ported to BoSSL without any modifications, since our current definition of satisfaction implies the one defined for SSL:

**Definition 1 (Validity).** *We say that a well-formed Hoare-style specification* Σ; Γ; {P} c {Q} *is* valid wrt. *the function dictionary* Δ iff *whenever* dom (s) = Γ*,* <sup>∀</sup>σgv = [x<sup>i</sup> -→ di]xi∈GV(Γ,P,Q) *such that* h, s Σ <sup>I</sup>[σgv]P*, and* <sup>Δ</sup>; <sup>h</sup>,(c, <sup>s</sup>) · <sup>∗</sup> h- ,(skip, s- ) · *, it is also the case that* h- , s- Σ <sup>I</sup>[σev ∪· <sup>σ</sup>gv]<sup>Q</sup> *for some* <sup>σ</sup>ev <sup>=</sup> [y<sup>j</sup> -→ dj]yj∈EV(Γ,P,Q)*.*

The following theorem guarantees that, given a program c generated with BoSSL, a heap model, and a set of read-only locations R that satisfy the program's precondition, executing c does not change those read-only locations:

**Theorem 1 (RO Heaps Do Not Change).** *Given a Hoare-style specification* Σ; Γ; {φ; P}c{Q}*, which is valid* wrt. *the function dictionary* Δ*, and a set of readonly memory locations* R*, if:*


*then* R ⊆ dom (h- ) *and* ∀l ∈ R, h(l) = h- (l).

Starting from an abstract state where a spatial heap has a read-only permission, under no circumstance can this permission be strengthened to M:

**Corollary 1 (No Permission Strengthening).** *Given a valid Hoare-style specification* Σ; Γ; {φ; P} c {ψ; Q} *and a permission* α*, if* ψ ⇒ (α = M) *then it is also the case that* φ ⇒ (α = M) *.*

As it turns out, permission weakening is possible, since, though problematic, postcondition weakening is sound in general. However, even though this affects completeness, it does not affect our termination results. For example, given a synthesised auxiliary function <sup>F</sup> f(x, r) : - x a1 -<sup>→</sup> <sup>t</sup> <sup>∗</sup> <sup>r</sup> <sup>M</sup> -→ x x a2 -<sup>→</sup> <sup>t</sup> <sup>∗</sup> <sup>r</sup> <sup>M</sup> -→ t + 1 , and a synthesis goal Σ, F; Γ; - x <sup>M</sup> -<sup>→</sup> <sup>7</sup> <sup>∗</sup> <sup>y</sup> <sup>M</sup> -→ x ❀ - x <sup>M</sup> -<sup>→</sup> <sup>7</sup> <sup>∗</sup> <sup>y</sup> <sup>M</sup> -→ z <sup>c</sup>, firing the Call rule for the candidate function f(x, r) would lead to the unsolvable goal Σ, F; Γ; x a- 2 -<sup>→</sup> <sup>7</sup> <sup>∗</sup> <sup>y</sup> <sup>M</sup> -→ 8 ❀ - x <sup>M</sup> -<sup>→</sup> <sup>7</sup> <sup>∗</sup> <sup>y</sup> <sup>M</sup> -→ z f(x, y); c. Frame may never be fired on this new goal since the permission of reference x in the goal's precondition has been permanently weakened. To eliminate such sources of incompleteness we require the user-provided predicates and specifications to be well-formed:

**Definition 2 (Well-Formedness of Spatial Predicates).** *We say that a spatial predicate* p(xi) ek, {χk, Rk} <sup>k</sup>∈1..<sup>N</sup> *is* well-formed iff

( <sup>N</sup> <sup>k</sup>=1(Vars(ek) ∪ Vars(χk) ∪ Vars(Rk)) ∩ Perm) ⊆ (x<sup>i</sup> ∩ Perm)*.* That is, every accessibility annotation within the predicate's clause is bound by the predicate's parameters.

**Definition 3 (Well-Formedness of Specifications).** *We say that a Hoarestyle specification* Σ; Γ; {P} c {Q} *is* well-formed iff EV (Γ,P, Q)∩Perm = ∅ *and every predicate instance in* P *and* Q *is an instance of a well-formed predicate.*

That is, postconditions are not allowed to have existential accessibility annotations in order to avoid permanent weakening of accessibility.

A callee that requires borrows for a symbolic heap always returns back to the caller its original permission for that respective symbolic heap:

**Corollary 2 (Borrows Always Return).** *A heaplet with permission* α*, either (a) retains the same permission* α *after a call to a function that is decorated with well-formed specifications and that requires for that heaplet to have read-only permission, or (b) it may be deallocated in case if* α = M*.*

# **4 Implementation and Evaluation**

We implemented BoSSL in an enhanced version of the SuSLik tool, which we refer to as ROBoSuSLik [12].<sup>11</sup> The changes to the original SuSLik infrastructure affected less than 100 lines of code. The extended synthesis is backwards-compatible with the original benchmarks. To make this possible, we treat the original SSL specifications as annotated/instantiated with M permissions, whenever necessary, which is consistent with treatment of access permissions in BoSSL.

We have conducted an extensive experimental evaluation of ROBoSuSLik, aiming to answer the following research questions:


# **4.1 Experimental Setup**

*Benchmark Suite.* To tackle the above research questions, we have adopted most of the heap-manipulating benchmarks from SuSLik suite [34, § 6.1] (with some variations) into our sets of experiments. In particular we looked at the group of benchmarks which manipulate singly linked list segments, sorted linked list segments and binary trees. We did not include the benchmarks concerning binary search trees (BSTs) for the reasons outlined in the next paragraph.

<sup>11</sup> The sources are available at https://github.com/TyGuS/robosuslik.

*The Tools.* For a fair comparison which accounts for the latest advancements to SuSLik, we chose to parameterise the synthesis process with a flag that turns the read-only annotations on and off (off means that they are set to be mutable). Those values which are the result of having this flag set will be marked in the experiments with RO, while those marked with Mut ignore the read-only annotations during the synthesis process. For simplicity, we will refer to the two instances of the tool, namely RO and Mut, as two different tools. Each tool was set to timeout after 2 minutes of attempting to synthesise a program.

*Criteria.* In an attempt to quantify our results, we have looked at the size of the synthesised program (*AST size*), the absolute time needed to synthesise the code given its specification, averaged over several runs (*Time*), the number of backtrackings in the proof search due to nondeterminism (*#Backtr* ), the total number of rule applications that the synthesis fired during the search (*#Rules*), including those that lead to unsolvable goals, and the strength of the guarantees offered by the specifications (*Stronger Guarantees*).

*Variables.* Some benchmarks have shown improvement over the synthesis process without the read-only annotations. To emphasise the fact that read-only annotations' improvements are not accidental, we have varied the inductive definitions of the corresponding benchmarks to experiment with different properties of the underlying structure: the shape of the structure (in all the definitions), the length of the structure (for those benchmarks tagged with *len*), the values stored within the structure (*val*), a combination of all these properties (*all*) as well as with the sortedness property for the "Sorted list" group of benchmarks. *Experiment Schema.* To measure the performance and the quality of the borrowingaware synthesis we ran the benchmarks against the two different tools and did a one-to-one comparison of the results. We ran each tool three times for each benchmark, and average the resulted synthesis time. All the other evaluation

criteria remain constant within all three runs. To measure the tools' robustness we stressed the synthesis algorithm by altering the default proof search strategy. We prepared 42 such perturbations which we used to run against the different program variants enumerated above. Each pair of program variant and proof strategy perturbation has been then analysed to measure the number of rules that had been fired by RO and Mut.

*Hardware Setup.* The experiments were conducted on a 64-bit machine running Ubuntu, with an Intel Xeon CPU (6 cores, 2.40GHz) with 32GB RAM.

#### **4.2 Performance and Quality of the Borrowing-Aware Synthesis**

Tab. 1 captures the results of running RO and Mut against the considered benchmarks. It provides the empirical proof that the borrowing-aware synthesis improves the performance of the original SSL-based synthesis, or in other words, answering positively the Research Question 1. RO suffers almost no loss in performance (except for a few cases, such as the list segment append where there is a negligible increase in time), while the gain is considerable for those synthesis problems with complex pointer manipulation. For example, if we consider the number of fired rules as the performance measurement criteria, in the worst


Table 1: Benchmarks and comparison between the results for synthesis with readonly annotations (RO) and without them (Mut). For each case study we measure the *AST size* of the synthesised program, the *Time* needed to synthesize the benchmark, the number of times that the synthesiser had to discard a derivation branch (*#Backtr.*), and the total number of fired rules (*#Rules*).

case, RO behaves the same as Mut, while in the best scenario it buys us a 32-fold decrease in the number of applied rules. At the same time, synthesising a few small examples in the RO case is a bit slower, despite the same or smaller number of rule applications. This is due to the increased number of logical variables (because of added borrows) when discharging obligations via SMT solver.

Fig. 5 offers a statistical view of the numbers in the table, where smaller bars mark a better performance. The barplots indicate that as the complexity of the problem increases (approximately from left to right), RO outperforms Mut.

Perhaps the most important take-away from this experiment is that the synthesis with read-only borrows often produces a more concise program (light green cells in the columnt *AST size* of Tab. 1), while retaining the same or better performance *wrt.* all the evaluated criteria. For instance, RO gets rid of the spurious write from the motivating example introduced in Sec. 1, reducing the AST size from 35 nodes down to 32, while in the same time firing fewer rules. That also means that we secure a positive answer for Research Question 2.

#### **4.3 Stronger Correctness Guarantees**

To answer Research Question 3, we have manually compared the guarantees offered by the specifications annotated with RO permissions against the default

ones - the results are summarized in the last column of Tab. 1. For instance, a specification stating that the shape of a linked-list segment is read-only implies that the size of that segment remains constant through the program's execution. In other words, the length property need not be captured separately in the segment's definition. If, in addition to the shape, the payload of the segment is also read-only, then the set of values and their ordering are also invariant.

Consider the goal {lseg(x, y, s, a1, a2, a3)} ❀ {lseg(x, y, s, a1, a2, a3)}, where lseg is an inductive definition of a list segment which ends at y and contains the set of values s. The borrowing-aware synthesiser will produce a program which is guaranteed to treat the segment pointed by x and ending with y as read-only (that is, its shape, length, values and orderings are invariant). At the same time, for a goal {lseg(x, y, s)} ❀ {lseg(x, y, s)} , the guarantees are that the returned segment still ends in y and contains values s. Internal modifications of the segment, such as reordering and duplicating list elements, may still occur.

The few entries marked with same are programs with specifications which have not got stronger when instrumented with RO annotations (*e.g.*, delete). These benchmarks require mutation over the entire data structure, hence the read-only annotations do not influence the offered guarantees. Overall, our observations that read-only annotations offer stronger guarantees are in agreement with the works on SL-based program verification [9, 13], but are promoted here to the more challenging problem of program synthesis.

#### **4.4 Robustness under Synthesis Perturbations**

There is no single search heuristics that will work equally well for any given specification: for a particular fixed search strategy, a synthesiser can exhibit suboptimal performance for some goals, while converging quickly on some others. By evaluating robustness *wrt.* to RO and M specification methodologies, we are hoping to show that, provided a large variety of "reasonable" search heuristics, read-only annotations deliver better synthesis performance "on average".

For this set of experiments, we have focused on four characteristic programs from our performance benchmarks based on their pointer manipulation complexity: list segment copy (lcopy), insertion into a sorted list segment (insert), copying a tree (tcopy), and a variation of the tree copy that shares the same pointer for the input tree and its returned copy (tcopy-ptr).

*Exploring Different Unification Orders.* Since spatial unification stays at the core of the synthesis process, we implemented 6 different strategies for choosing a unification candidate based on the following criteria: the size of the heaplet chunk (favor the smallest heap vs. the largest one as the best unification candidate), the name of the predicate (we considered both an ascending as well as a descending priority queue), and a customised ranking function which associates a cost to a symbolic heap based on its kind—a block is cheaper to unify than a points-to which in turn is cheaper than a spatial predicate.

*Exploring Different Search Strategies.* We next designed 6 strategies for prioritising the rule applications. One of the crux rules in this matter, is the Write rule whose different priority schemes might make all the results seem randomlygenerated. In the cases where Write leads to unsolvable goals, one might rightfully argue that RO has a clear advantage over Mut (*fail fast*). However, for the cases where mutation leads to a solution faster, then Mut might have an advantage over RO (*solve fast*). Because these are just intuitive observations, and for fairness sake, we experimented with both the cases where Write has a high and a low priority in the queue of rule phases [34, § 5.2]. Since most of the benchmarks involve recursion, we also chose to shuffle around the priorities of the Open and Call rules. Again, we chose between a stack high and a bottom low priority for these rules to give a fair chance to both tools.

We considered all combinations of the 6 unification permutations and the 6 rule-application permutations (plus the default one) to obtain 42 different proof search perturbations. We will use the following notation in the narrative below:


The distributions of the number of rules fired for each tool (RO and Mut) with the 42 perturbations over the 4 synthesis problems with 3 variants of specification each, that is 1008 different synthesis runs, are summarised using the boxplots in Fig. 6. There is a boxplot corresponding to each pair of tool and synthesis problem. In the ideal case, each boxplot contains 126 data points corresponding to a unique combination (v, k) of a specification variation v ∈ V and a tool perturbation k ∈ K. A boxplot is the distribution of such data based on a

Fig. 6: Boxplots of variations in log2(numbers of applied rules) for synthesis perturbations. Numbers of data points for each example are given in parentheses.

six number summary: minimum, first quartile, median, third quartile, maximum, outliers. For example, the boxplot for tcopy-ptr corresponding to RO and containing 90 data points, reads as follows: "the synthesis processes fired between 64 and 256 rules, with most of the processes firing between 64 and 128 rules. There are three exception where the synthesiser fired more than 256 rules". Note that the y-axis represents the binary logarithm of the number of fired rules.

Even though we attempted to synthesise each program 126 times for each tool, some attempts hit the timeout and therefore their corresponding data points had to be eliminated from the boxplot. It is of note, though, that whenever RO with configuration (v, k) hit the timeout for the synthesis problem s ∈ S, so did Mut, hence both the (RO, s,(v, k)) as well as (Mut, s,(v, k)) are omitted from the boxplots. But the inverse did not hold: RO hit the timeout fewer times than Mut, hence RO is measured at disadvantage (*i.e.*, more data points means more opportunities to show worse results). Since insert collected the highest number of timeouts, we equalised it to remove non-matched entries across the two tools.

Despite RO's potential measurement disadvantage, the boxplots depicts it as a clear winner. Not only RO fires fewer rules in all the cases, but with the exception of insert, it is also more stable to the proof search perturbations, it varies a few order of magnitude less than Mut does for the same configurations. Fig. 7 supports this observation by offering a more detailed view on the distributions of the numbers of fired rules per synthesis configuration. Taller bars show that more processes fall in the same range (*wrt.* the number of fired rules). For lcopy, tcopy, tcopy-ptr it is clear that Mut has a wider distribution of the number of fired rules, that is, Mut is *more sensitive* to the perturbations than RO. We additionally make some further observations:

Fig. 7: Distributions of log2(number of attempted rule applications).


We believe that the main take-aways from this set of experiments, along with the positive answer to the Research Question 4, are as follows:


# **5 Limitations and Discussion**

*Flexible aliasing.* Separating conjunction asserts that the heap can be split into two disjoint parts, or in other words it carries an implicit non-aliasing information. Specifically, x -→ ∗ y -→ states that x and y are non-aliased. Such assertions can be used to specify methods as below:

```
{x -
   → n ∗ y -
            → m ∗ ret -
                       → x} sum(x, y, ret) {x -
                                                    → n ∗ y -
                                                             → m ∗ ret -
                                                                        → n + m}
```
Occasionally, enforcing x and y to be non-aliased is too restrictive, rejecting safe calls such as sum(p, p, q). Approaches to support immutable annotations permit such calls without compromising safety if both pointers, aliased or not, are annotated as read-only [9,13]. BoSSL does not support such flexible aliasing. *Precondition strengthening.* Let us assume that srtl(x, n, lo, hi, α1, α2, α3) is an inductive predicate that describes a sorted linked list of size n with lo and hi being the list's minimum and maximum payload value, respectively. Now, consider the following synthesis goal:

{x, y} ; {y -→ x ∗ srtl(x, n, lo, hi, M, M, M)} ❀ {y -→ n ∗ srtl(x, n, lo, hi, M, M, M)}.

As stated, the goal clearly requires the program to compute the length n of the list. Imagine that we already have a function that does precisely that, even though it is stated in terms of a list predicate that does not enforce sortedness:

{ret -→ x ∗ ls(x, n, a1, a2, a3)} length(x, ret) {ret -→ n ∗ ls(x, n, a1, a2, a3)}

To solve the initial goal, the synthesiser could weaken the given precondition srtl(x, n, lo, hi, M, M, M) to ls(x, n, M, M, M), and then successfully synthesise a call to the length method. Unfortunately, the resulting goal, obtained after having emitted the call to length and applying Frame, is unsolvable:

{x, y} {ls(x, n, M, M, M)} ❀ {srtl(x, n, lo, hi, M, M, M)}.

since the logic does not allow to strengthen an arbitrary linked list to a *sorted* linked list without retaining the prior knowledge. Should we have adopted an alternative approach to read-only annotations [9,13] allowing the caller to retain the full permission of the sorted list, then the postcondition of length would *not* contain the list-related part of the heap and would only quantify over the result pointer {ret -→ n}, thus leading to the solvable goal below:

{x, y} ; {srtl(x, n, lo, hi, M, M, M)} ❀ {srtl(x, n, lo, hi, M, M, M)}.

One straightforward way for BoSSL to cope with this limitation is to simply add a version of length annotated with specifications that cater to srtl.

*Overcoming the limitations.* While the "caller keeps the permission" kind of approach would buy us flexible aliasing and calls with weaker specifications, it would compromise the benefits discussed earlier with respect to the granularity of borrow-polymorphic inductive predicates. One possible solution to gain the best of both worlds would be to design a permission system which allows both borrow-polymorphic inductive predicates as well as read-only modalities to co-exist, where the latter would overwrite the predicate's mixed permissions. In other words, the read-only modality enforces a read-only treatment of the predicate irrespective of its permission arguments, while the permission arguments control the treatment of a mutable predicate. The theoretical implications of such a design choice are left as part of future work.

*Extending read-only specifications to concurrency.* Thus far we have only investigated the synthesis of sequential programs, for which read-only annotations helped to reduce the synthesis cost. Assuming that the synthesiser has the capability to synthesise concurrent programs as well, the borrows annotation mechanism in its current form may not be able to cope with general resource sharing.

This is because a callee which requires read-only permissions to a particular symbolic heap still consumes the entire required symbolic heap from the caller, despite the read-only requirement; hence, there is no space left for sharing. That said, the recently proposed alternative approaches to introduce read-only annotations [9, 13] have no formal support for heap sharing in the presence of concurrency either. To address these challenges, we could adopt a more sophisticated approach based on fractional permissions mechanism [7,8,20,25,30], but this is left as part of future work since it is orthogonal to the current scope.

# **6 Related Work**

*Language design.* There is a large body of work on integrating access permissions into practical type systems [5,16,42] (see, *e.g.*, the survey by Clarke *et al.* [10]). One notable such system, which is the closest in its spirit to our proposal, is the borrows type system of the Rust programming language [1] proved safe with RustBelt [22]. Similar to our approach, borrows in Rust are short-lived: in Rust they share the scope with the owner; in our approach they do not escape the scope of a method call. In contrast with our work, Rust's type system carefully manages different references to data by imposing strict sharing constraints, whereas in our approach the treatment of aliasing is taken care of automatically by building on Separation Logic. Moreover, Rust allows read-only borrows to be duplicated, while in the sequential setting of BoSSL this is currently not possible.

Somewhat related to our approach, Naden *et al.* propose a mechanisms for borrowing permissions, albeit integrated as a fundamental part of a type system [31]. Their type system comes equipped with *change permissions* which enforce the borrowing requirements and describe the effects of the borrowing upon return. As a result of treating permissions as first-class values, we do not need to explicitly describe the flow of permissions for each borrow since this is controlled by a mix of the substitution and unification principles.

*Program verification with read-only permissions.* Boyland introduced fractional permissions to statically reason about interference in the presence of *sharedmemory concurrency* [8]. A permission p denotes full resource ownership (i.e. read-write access) when p = 1, while p ∈ (0, 1) denotes a partial ownership (i.e. read-only access). To leverage permissions in practice, a system must support two key operations: permission splitting and permission borrowing. Permission splitting (and merging back) follows the split rule: x <sup>p</sup> -<sup>→</sup> <sup>a</sup> <sup>=</sup> <sup>x</sup> p1 -<sup>→</sup> <sup>a</sup>∗<sup>x</sup> p2 -→ a, with p = p1+p<sup>2</sup> and p, p1, p<sup>2</sup> ∈ (0, 1]. Permission borrowing refers to the safe manipulation of permissions: a callee may remove some permissions from the caller, use them temporarily, and give them back upon return.

Though it exists, tool support for fractional permissions is still scarce. Leino and M¨uller introduced a mechanism for storing fractional permissions in data structures via dedicated access predicates in the Chalice verification tool [27]. To promote generic specifications, Heule *et al.* advanced Chalice with instatiable abstract permissions, allowing automatic fire of the split rule and symbolic borrowing [20]. VeriFast [21] is guided by contracts written in Separation Logic and assumes the existence of lemmas to cater for permission splitting. Viper [30] is an intermediate language which supports various permission models, including abstract fractional permissions [4,43]. Similar to Chalice, the permissions are attached to memory locations using an accessibility predicate. To reason about it, Viper uses permission-aware assertions and assumptions, which correspond in our approach to the unification and the substitution operations, respectively. Like Viper, we enhance the basic memory constructors, that is blocks and points-to, to account for permissions, but in contrast, the Call rule in our approach is standard, *i.e.*, *not* permission-aware.

These tools, along with others [3, 18], offer strong correctness guarantees in the presence of resource sharing. However, there is a class of problems, namely those involving predicates with mixed permissions, whose guarantees are weakened due to the general fractional permissions model behind these tools. We next exemplify this class of problems in a sequential setting. We start by considering a method which resets the values stored in a linked-list while maintaining its shape (p < 1 below is to enforce the immutable shape):

{p < 1; ls(x, S)[1, p]} void reset(loc x) {ls(x, {0})[1, p]}. Assume a call to this method, namely reset(y). The caller has full permission over the entire list passed as argument, that is ls(y, B)[1, 1]. This attempt leads to two issues. The first has to do with splitting the payload's permission (before the call) such that it matches the callee's postcondition. To be able to modify the list's payload, the callee must get the payload's full ownership, hence the caller should retain 0: ls(y, B)[1, 1] = ls(y, B)[0, 1/2]∗ls(y, B)[1, 1/2]. But 0 is not a valid fractional permission. The second issue surfaces while attempting to merge the permissions after the call: ls(y, B)[0, 1/2]∗ls(y, {0})[1, 1/2] is invalid since the two instances of ls have incompatible arguments (namely B and {0}). To avoid such problems, BoSSL abandons the split rule and instead always manipulates full ownership of resources, hence it does not use fractions. This compromise, along with the support for symbolic borrows, allows ROBoSuSLik to guarantee readonly-ness in a sequential setting while avoiding the aforementioned issues. More investigations are needed in order to lift this result to concurrency reasoning. Another feature which distinguishes the current work from those based on fractional permissions, is the support for permissions as parameters of the predicate, which in turn supports the definition of predicates with mixed permissions.

Immutable specifications on top of Separation Logic have also been studied by David and Chin [13]. Unlike our approach which treats borrows as polymorphic variables that rely on the basic concept of substitution, their annotation mechanism comprises only constants and requires a specially tailored entailment on top of enhanced proof rules. Since callers retain the heap ownership upon calling a method with read-only requirements, their machinery supports flexible aliasing and cut-point preservation—features that we could not find a good use for in the context of program synthesis. An attempt to extend David and Chin's work by adding support for predicates with mixed permissions [11] suffers from significant annotation overhead. Specifically, it employs a mix of mutable, immutable, and *absent* permissions, so that each mutable heaplet in the precondition requires a corresponding matching heaplet annotated with absent in the postcondition.

Chargu´eraud and Pottier [9] extended Separation Logic with RO assertions that can be freely duplicated or discarded. Their approach creates lexicallyscoped copies of the RO-permissions before emitting a call, which, in turn, involves discarding the corresponding heap from the postcondition to guarantee a sound RO-modality. Adapting this modality to program synthesis guided by preand postconditions would require a completely new system of deductive synthesis since most of the rules in SSL are not designed to handle the *d*iscardable ROheaps. In contrast, BoSSL supports permission-parametric predicates (*e.g.*, (9)) requiring only minimal adjustments to its host logic, *i.e.*, SSL.

*Program synthesis.* BoSSL continues a long line of work on program synthesis from formal specifications [26, 36, 40, 41, 44] and in particular, *deductive synthesis* [14,23,29,33,34], which can be characterised as search in the space of *proofs* of program correctness (rather than in the space of programs). Most directly BoSSL builds upon our prior work on SSL [34] and enhances its specification language with read-only annotations. In that sense, the present work is also related to various approaches that use *non-functional* specifications as input to synthesis. It is common to use *syntactic* non-functional specifications, such as grammars [2], sketches [36, 40], or restrictions on the number of times a component can be used [19]. More recent work has explored *semantic* non-functional specifications, including type annotations for resource consumption [24] and security/privacy [17,35,39]. This research direction is promising because (a) annotations often enable the programmer to express a strong specification concisely, and (b) checking annotations is often more compositional (*i.e.*, fails faster) than checking functional specifications, which makes synthesis more efficient. In the present work we have demonstrated that both of these benefits of non-functional specifications also hold for the read-only annotations of BoSSL.

# **7 Conclusion**

In this work, we have advanced the state of the art in program synthesis by highlighting the benefits of guiding the synthesis process with information about memory access permissions. We have designed the logic BoSSL and implemented the tool ROBoSuSLik, showing that a minimalistic discipline for read-only permissions already brings significant improvements *wrt.* the performance and robustness of the synthesiser, as well as *wrt.* the quality of its generated programs.

*Acknowledgements.* We thank Alexander J. Summers, Cristina David, Olivier Danvy, and Peter O'Hearn for their comments on the prelimiary versions of the paper. We are very grateful to the ESOP 2020 reviewers for their detailed feedback, which helped to conduct a more adequate comparison with related approaches and, thus, better frame the conceptual contributions of this work.

Nadia Polikarpova's research was supported by NSF grant 1911149. Amy Zhu's research internship and stay in Singapore during the Summer 2019 was supported by Ilya Sergey's start-up grant at Yale-NUS College, and made possible thanks to UBC Science Co-op Program.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Soundness conditions for big-step semantics**

Francesco Dagnino<sup>1</sup> , Viviana Bono<sup>2</sup> , Elena Zucca<sup>1</sup> , and Mariangiola Dezani-Ciancaglini<sup>2</sup>

<sup>1</sup> DIBRIS, University of Genova, Italy

<sup>2</sup> Computer Science Department, University of Torino, Italy

**Abstract.** We propose a general proof technique to show that a predicate is sound, that is, prevents stuck computation, with respect to a big-step semantics. This result may look surprising, since in big-step semantics there is no difference between non-terminating and stuck computations, hence soundness cannot even be expressed. The key idea is to define constructions yielding an extended version of a given arbitrary big-step semantics, where the difference is made explicit. The extended semantics are exploited in the meta-theory, notably they are necessary to show that the proof technique works. However, they remain transparent when using the proof technique, since it consists in checking three conditions on the original rules only, as we illustrate by several examples.

# **1 Introduction**

The semantics of programming languages or software systems specifies, for each program/system configuration, its final result, if any. In the case of non-existence of a final result, there are two possibilities:


There are two main styles to define operationally a semantic relation: the *small-step* style [34,35], on top of a reduction relation representing single computation steps, or directly by a set of rules as in the *big-step* style [28]. Within a small-step semantics it is straightforward to make the distinction between stuck and non-terminating computations, while a typical drawback of the big-step style is that they are not distinguished (no judgement is derived in both cases).

For this reason, even though big-step semantics is generally more abstract, and sometimes more intuitive to design and therefore to debug and extend, in the literature much more effort has been devoted to study the meta-theory of smallstep semantics, providing properties, and related proof techniques. Notably, the *soundness* of a type system (typing prevents stuck computation) can be proved by *progress* and *subject reduction* (also called *type preservation*) [40].

Our quest is then to provide a general proof technique to prove the soundness of a predicate with respect to an arbitrary big-step semantics. How can we achieve this result, given that in big-step formulation soundness cannot even be *expressed*, since non-termination is modelled as the absence of a final result exactly like stuck computation? The key idea is the following:


Keypoint (2)'s three sufficient conditions are *local preservation*, ∃*-progress*, and ∀*-progress*. For *proving* the result that the three conditions actually ensure soundness, the setting up of the extended semantics from the given one is necessary, since otherwise, as said above, we could not even express the property.

*However, the three conditions deal only with the original rules of the given big-step semantics.* This means that, practically, in order to use the technique there is no need to deal with the extended semantics. This implies, in particular, that our approach does *not* increase the original number of rules. Moreover, the sufficient conditions are checked only on *single rules*, which makes explicit the proof fragments typically needed in a proof of soundness. Even though this is not exploited in this paper, this form of *locality* means *modularity*, in the sense that adding a new rule implies adding the corresponding proof fragment only.

As an important by-product, in order to formally define and prove correct the keypoints (1) and (2), we propose a formalisation of "what is a big-step semantics" which captures its essential features. Moreover, we support our approach by presenting several examples, demonstrating that: on the one hand, their soundness proof can be easily rephrased in terms of our technique, that is, by directly reasoning on big-step rules; on the other hand, our technique is essential when the property to be checked (for instance, the soundness of a type system) is *not preserved* by intermediate computation steps, whereas it holds for the final result. On a side note, our examples concern type systems, but the meta-theory we present in this work holds for any predicate.

We describe now in more detail the constructions of keypoint (1). Starting from an arbitrary big-step judgment *c* ⇒ *r* that evaluates *configurations c* into *results r* , the *first construction* produces an enriched judgement *c* ⇒tr t where t is a *trace*, that is, the (finite or infinite) sequence of all the (sub)configurations encountered during the evaluation. In this way, by interpreting coinductively the rules of the extended semantics, an infinite trace models divergence (whereas no result corresponds to stuck computation). The *second construction* is in a sense dual. It is the *algorithmic* version of the well-known technique presented in Exercise 3.5.16 from the book [33] of adding a special result wrong explicitly modelling stuck computations (whereas no result corresponds to divergence).

By trace semantics and wrong semantics we can express two flavours of soundness, *soundness-may* and *soundness-must*, respectively, and show the correctness of the corresponding proof technique. This achieves our original aim, and it should be noted that *we define soundness with respect to a big-step semantics* *within a big-step formulation*, without resorting to a small-step style (indeed, the two extended semantics are themselves big-step).

Lastly, we consider the issue of justifying on a formal basis that the two constructions are correct with respect to their expected meaning. For instance, for the wrong semantics we would like to be sure that *all* the cases are covered. To this end, we define a *third construction*, dubbed pev for "partial evaluation", which makes explicit the *computations* of a big-step semantics, intended as the sequences of execution steps of the naturally associated evaluation algorithm. Formally, we obtain a reduction relation on approximated proof trees, so termination, non-termination and stuckness can be defined as usual. Then, the correctness of traces and wrong constructions is proved by showing they are equivalent to pev for diverging and stuck computations, respectively.

In Sect. 2 we illustrate the meta-theory on a running example. In Sect. 3 we define the trace and wrong constructions. In Sect. 4 we express soundness in the *must* and *may* flavours, introduce the proof technique, and prove its correctness. In Sect. 5 we show in detail how to apply the technique to the running example, and other significant examples. In Sect. 6 we introduce the third construction and state that the three constructions are equivalent. Finally, in 7 and 8 we discuss related and further work and summarise our contribution. An extended version including an additional example, proofs omitted for lack of space, and technical details on the pev semantics, can be found at http://arxiv.org/abs/2002.08738.

# **2 A meta-theory for big-step semantics**

We introduce a formalisation of "what is a big-step semantics" that captures its essential features, subsuming a large class of examples (as testified in Sect. 5). This enables a general formal reasoning on an arbitrary big-step semantics.

A *big-step semantics* is a triple *C* , *R*, R where:


*j*<sup>1</sup> ... *j*<sup>n</sup> *j*n+1 *<sup>c</sup>* <sup>⇒</sup> *<sup>R</sup>*(*j*n+1) also written in *inline format*: rule(*j*<sup>1</sup> ... *<sup>j</sup>*n, *<sup>j</sup>*n+1, *<sup>c</sup>*) with *c* ∈ *C* \*R*, where *j*<sup>1</sup> ... *j*<sup>n</sup> are the *dependencies* and *j*n+1 is the *continu-*

*ation*. Set *C* (ρ)=*c* and, for i ∈ 1..n + 1, *C* (ρ, i)=*C* (*j*i) and *R*(ρ, i)=*R*(*j*i). **–** For each result *r* ∈ *R*, we implicitly assume a single axiom *r* ⇒ *r* . Hence, the only derivable judgment for *r* is *r* ⇒ *r* , which we will call a *trivial* judgment.

We will use the inline format, more concise and manageable, for the development of the meta-theory, e.g., in constructions.

A rule corresponds to the following evaluation process for a non-result configuration: first, dependencies are evaluated in the given order, then the continuation is evaluated and its result is returned as result of the entire computation. e ::= x | v | e<sup>1</sup> e<sup>2</sup> | succ e | e<sup>1</sup> ⊕ e<sup>2</sup> expression v ::= n | λx.e value


(app) rule(e<sup>1</sup> ⇒ λx .e e<sup>2</sup> ⇒ v2, e[v2/x ] ⇒ v, e<sup>1</sup> e2) (succ) rule(e ⇒ n, n + 1 ⇒ n + 1, succ e) (choice) rule(, e<sup>i</sup> ⇒ v, e<sup>1</sup> ⊕ e2) i = 1, 2

**Fig. 1.** Example of big-step semantics

Rules as defined above specify an inference system [1,30], whose inductive interpretation is, as usual, the semantic relation. However, they carry slightly more structure with respect to standard inference rules. Notably, premises are a sequence rather than a set, and the last premise plays a special role. Such additional structure does not affect the semantic relation defined by the rules, but allows abstract reasoning about an arbitrary big-step semantics, in particular it is relevant for defining the three constructions. In the following, we will write R *c* ⇒ *r* when the judgment *c* ⇒ *r* is derivable in R.

As customary, the (infinite) set of rules R is described by a finite set of metarules, each one with a finite number of premises. As a consequence, the number of premises of rules is not only finite but *bounded*. Since we have no notion of metarule, we model this feature (relevant in the following) as an explicit assumption:

BP there exists <sup>b</sup> <sup>∈</sup> <sup>N</sup> such that, for each <sup>ρ</sup> <sup>≡</sup> rule(*j*<sup>1</sup> ... *<sup>j</sup>*n, *<sup>j</sup>*n+1, *<sup>c</sup>*), n<b. We end this section illustrating the above definitions and conditions by a simple example: a λ-calculus with natural constants, successor and non-deterministic choice shown in Fig. 1. We present this example as an instance of our definition:

	- axiom (val) can be omitted (it is implicitly assumed)
	- in (app) we consider premises as a sequence rather than a set (the third premise is the continuation)
	- in (succ), which has no continuation, we add a dummy continuation
	- on the contrary, in (choice) there is only the continuation (dependencies are the empty sequence, denoted in the inline format).

Note that (app) corresponds to the standard left-to-right evaluation order. We could have chosen the right-to-left order instead:

(app-r) rule(e<sup>2</sup> ⇒ v<sup>2</sup> e<sup>1</sup> ⇒ λx.e , e[v2/x] ⇒ v, e<sup>1</sup> e2)

or even opt for a non-deterministic approach by taking both rules (app) and

<sup>3</sup> In general, configurations may include additional components, see Sect. 5.2.

(app-r). As said above, these different choices do not affect the semantic relation *c* ⇒ *r* defined by the inference system, which is always the same. However, they will affect the way the extended semantics distinguishing stuck computation and non-termination is constructed. Indeed, if the evaluation of *e*<sup>1</sup> and *e*<sup>2</sup> is stuck and non-terminating, respectively, we should obtain stuck computation with rule (app) and non-termination with rule (app-r).

In summary, to see a typical big-step semantics as an instance of our definition, it is enough to assume an order (or more than one) on premises, make implicit the axiom for results, and add a dummy continuation when needed. In the examples (Sect. 5), we will assume a left-to-right order on premises, and omit dummy continuations to keep a more familiar style. In the technical part (Sect. 3, Sect. 4 and Sect. 6) we will adopt the inline format.

# **3 Extended semantics**

In the following, we assume a big-step semantics *C* , *R*, R and describe two constructions which make the distinction between non-termination and stuck computation explicit. In both cases, the approach is based on well-know ideas; the novel contribution is that, thanks to the meta-theory in Sect. 2, we provide a *general* construction working on an arbitrary big-step semantics.

#### **3.1 Traces**

We denote by *C* -, *C* <sup>ω</sup>, and *C* <sup>∞</sup> = *C* -<sup>∪</sup>*<sup>C</sup>* <sup>ω</sup>, respectively, the sets of finite, infinite, and possibly infinite *traces*, that is, sequences of configurations. We write t · t for concatenation of <sup>t</sup>∈*<sup>C</sup>* with t ∈*C* <sup>∞</sup>.

We derive, from the judgement *c* ⇒ *r* , an enriched big-step judgement *c* ⇒tr t with t ∈ *C* <sup>∞</sup>. Intuitively, t keeps trace of all the configurations visited during the evaluation, starting from *c* itself. To define the trace semantics, we construct, starting from R, a new set of rules Rtr, which are of two kinds:

**trace introduction** These rules enrich the standard semantics by finite traces: for each <sup>ρ</sup> <sup>≡</sup> rule(*j*<sup>1</sup> ... *<sup>j</sup>*n, *<sup>j</sup>*n+1, *<sup>c</sup>*) in <sup>R</sup>, and finite traces <sup>t</sup>1,...,tn+1∈*<sup>C</sup>* -, we add the rule

$$\begin{array}{ccccc} C(j\_1) \Rightarrow\_{\text{tr}} t\_1 \cdot R(j\_1) & \dots & C(j\_{n+1}) \Rightarrow\_{\text{tr}} t\_{n+1} \cdot R(j\_{n+1})\\ \hline \hline \dots & c \Rightarrow\_{\text{tr}} c \cdot t\_1 \cdot R(j\_1) \cdot \dots \cdot t\_{n+1} \cdot R(j\_{n+1})\\ \hline \end{array}$$

We denote this rule by trace(ρ, t1,...,tn+1), to highlight the relationship with the original rule ρ. We also add one axiom *r* ⇒tr *r* for each result *r* . Such rules derive judgements *<sup>c</sup>* <sup>⇒</sup> <sup>t</sup> with <sup>t</sup>∈*<sup>C</sup>* -, for convergent computations. **divergence propagation** These rules propagate divergence, that is, if a

(sub)configuration in the premise of a rule diverges, then the subsequent premises are ignored and the configuration in the conclusion diverges as well: for each ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) in R, index i∈1..n + 1, finite traces <sup>t</sup>1,...,t<sup>i</sup>−<sup>1</sup> <sup>∈</sup> *<sup>C</sup>* -, and infinite trace t, we add the rule:

$$\frac{C(j\_1) \Rightarrow\_{\text{tr}} t\_1 \cdot R(j\_1) \quad \dots \quad C(j\_{i-1}) \Rightarrow\_{\text{tr}} t\_{i-1} \cdot R(j\_{i-1}) \quad C(j\_i) \Rightarrow t\_i}{c \Rightarrow c \cdot t\_1 \cdot R(j\_1) \cdot \dots \cdot t\_{i-1} \cdot R(t\_{i-1}) \cdot t}$$

(app-trace) <sup>e</sup><sup>1</sup> <sup>⇒</sup>tr <sup>t</sup><sup>1</sup> · λx.e e<sup>2</sup> <sup>⇒</sup>tr <sup>t</sup><sup>2</sup> · <sup>v</sup><sup>2</sup> <sup>e</sup>[v2/x] <sup>⇒</sup>tr <sup>t</sup> · <sup>v</sup> <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>⇒</sup>tr <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> · <sup>t</sup><sup>1</sup> · λx.e · <sup>t</sup><sup>2</sup> · <sup>v</sup><sup>2</sup> · <sup>t</sup> · <sup>v</sup> <sup>t</sup>1, t2, t∈C- (div-app-1) e<sup>1</sup> ⇒tr t e<sup>1</sup> e<sup>2</sup> ⇒tr e<sup>1</sup> e<sup>2</sup> · t <sup>t</sup>∈<sup>C</sup> <sup>ω</sup> (div-app-2) <sup>e</sup><sup>1</sup> <sup>⇒</sup>tr <sup>t</sup><sup>1</sup> · λx.e e<sup>2</sup> <sup>⇒</sup>tr <sup>t</sup> e<sup>1</sup> e<sup>2</sup> ⇒tr e<sup>1</sup> e<sup>2</sup> · t<sup>1</sup> · λx.e · t <sup>t</sup>1∈C-, t∈<sup>C</sup> <sup>ω</sup> (div-app-3) <sup>e</sup><sup>1</sup> <sup>⇒</sup>tr <sup>t</sup><sup>1</sup> · λx.e e<sup>2</sup> <sup>⇒</sup>tr <sup>t</sup><sup>2</sup> · <sup>v</sup><sup>2</sup> <sup>e</sup>[v2/x] <sup>⇒</sup>tr <sup>t</sup> e<sup>1</sup> e<sup>2</sup> ⇒tr e<sup>1</sup> e<sup>2</sup> · t<sup>1</sup> · λx.e · t<sup>2</sup> · v<sup>2</sup> · t <sup>t</sup>1, t<sup>2</sup> <sup>∈</sup> <sup>C</sup>-, t <sup>∈</sup> <sup>C</sup> <sup>ω</sup>

# **Fig. 2.** Trace semantics for application

We denote this rule by prop(ρ, i, t1,...,t<sup>i</sup>−<sup>1</sup>, t) to highlight the relationship with the original rule <sup>ρ</sup>. These rules derive judgements *<sup>c</sup>* <sup>⇒</sup>tr <sup>t</sup> with <sup>t</sup> <sup>∈</sup> *<sup>C</sup>* <sup>ω</sup>, modelling diverging computations.

The inference system Rtr must be interpreted *coinductively*, to properly model diverging computations. Indeed, since there is no axiom introducing an infinite trace, they can be derived only by an infinite proof tree. We write Rtr *c* ⇒tr t when the judgment *c* ⇒tr t is derivable in Rtr.

We show in Fig. 2 the rules obtained starting from meta-rule (app) of the example (for other meta-rules the outcome is analogous).

For instance, set Ω = ω ω = (λ*x* .*x x* ) (λ*x* .*x x* ), and t<sup>Ω</sup> the infinite trace Ω ·ω ·ω ·Ω ·ω ·ω ·..., it is easy to see that the judgment Ω ⇒tr t<sup>Ω</sup> can be derived by the following infinite tree:<sup>4</sup>

$$\underbrace{\begin{pmatrix} \texttt{true}\texttt{:}\texttt{:}\texttt{var}\texttt{}}\;\begin{matrix}\texttt{(\texttt{true}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{}}\texttt{})\\\omega\Rightarrow\_{\texttt{tr}}\omega\end{matrix}\;\begin{matrix}\texttt{(\texttt{true}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{}}\end{matrix}\;\begin{matrix}\texttt{(\texttt{in}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{}}\texttt{)}\\\hline\omega\Rightarrow\_{\texttt{tr}}\omega\end{matrix}\;\begin{matrix}\texttt{(\texttt{in}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{}}\texttt{})\;\begin{matrix}\texttt{(\texttt{in}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{})}\texttt{(\texttt{in}\texttt{:}\texttt{}\texttt{})}\\\hline\omega\&\equiv\left[\begin{matrix}\texttt{(\texttt{x}\texttt{:}\texttt{})}\texttt{(\texttt{x}\texttt{:}\texttt{}\texttt{})}\texttt{}\;\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}\texttt{:}\texttt{}}\end{matrix}\right]$$

Note that *only* the judgment Ω ⇒tr t<sup>Ω</sup> can be derived, that is, the trace semantics of Ω is uniquely determined to be tΩ, since the infinite proof tree forces the equation t<sup>Ω</sup> = Ω ·ωω ·tΩ. This example is a cyclic proof, but there are divergent computations with no circular derivation.

The trace construction is *conservative* with respect to the original semantics, that is, converging computations are not affected.

**Theorem 1.** Rtr *<sup>c</sup>* <sup>⇒</sup>tr <sup>t</sup> · *r for some* <sup>t</sup> <sup>∈</sup> *<sup>C</sup> iff* R *c* ⇒ *r .*

#### **3.2 Wrong**

A well-known technique [33] (Exercise 3.5.16) to distinguish between stuck and diverging computations, in a sense "dual" to the previous one, is to add a special result wrong, so that *c* ⇒ wrong means that the evaluation of *c* goes stuck.

In this case, to define an "automatic" version of the construction, starting from *C* , *R*, R, is a non-trivial problem. Our solution is based on defining a relation on rules, modelling *equality up to a certain index* i, also used for other aims

<sup>4</sup> To help the reader, we add equivalent expressions with a grey background.

in the following. Consider ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*), ρ ≡ rule(*j* <sup>1</sup> ... *j* <sup>m</sup>, *j* <sup>m</sup>+1, *c* ), and an index i ∈ 1.. min(n + 1, m + 1), then ρ ∼<sup>i</sup> ρ if

$$\begin{array}{l} -\ c = c'\\ -\ \text{for all } k < i, j\_k = j'\_k\\ -\ C(j\_i) = C(j'\_i) \end{array}$$

Intuitively, this means that rules ρ and ρ model the same computation until the i-th premise. Using this relation, we derive, from the judgment *c* ⇒ *r* , an enriched big-step judgement *c* ⇒ *r*wr where *r*wr ∈ *R* ∪ {wrong}, defined by a set of rules Rwr containing all rules in R and two other kinds of rules:

**wrong introduction** These rules derive wrong whenever the (sub)configuration in a premise of a rule reduces to a result which is not admitted in such (or any equivalent) rule: for each ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) in R, index i ∈ 1..n + 1, and result *r* ∈ *R*, if for all rules ρ such that ρ ∼<sup>i</sup> ρ , *R*(ρ , i) = *r* , then we add the rule wrong(ρ, i, *r* ) as follows:

$$\frac{j\_1 \dots j\_{i-1} \quad C(j\_i) \Rightarrow r}{\circ \to \dots \text{чол } \sigma}$$

*c* ⇒ wrong We also add an axiom *c* ⇒ wrong for each configuration *c* which is not the conclusion of any rule.

**wrong propagation** These rules propagate wrong analogously to those for divergence propagation: for each ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) in R, and index i ∈ 1..n + 1, we add the rule prop(ρ, i,wrong) as follows:

$$\frac{j\_1 \dots j\_{i-1} \quad C(j\_i) \Rightarrow \text{wrong}}{c \Rightarrow \text{wrong}}$$

We write Rwr *c* ⇒ *r*wr when the judgment *c* ⇒ *r*wr is derivable in Rwr.

We show in Fig. 3 the meta-rules for wrong introduction and propagation constructed starting from those for application and successor. For instance, rule (wrong-app) is introduced since in the original semantics there is rule (app) with *e*<sup>1</sup> *e*<sup>2</sup> in the consequence and *e*<sup>1</sup> in the first premise, but there is no equivalent rule (that is, with *e*<sup>1</sup> *e*<sup>2</sup> in the consequence and *e*<sup>1</sup> in the first premise) such that the result in the first premise is n.

The wrong construction is conservative as well.

**Theorem 2.** Rwr *c* ⇒ *r iff* R *c* ⇒ *r .*

$$\begin{array}{ll} \text{(\text{\tiny(\text{\tiny(\text{\tiny(\text{\tiny(\text{\tiny(\text{\tiny(\text{\tiny(\text{?}}}})}{}} \text{)}} \frac{e\_{1} \Rightarrow n}{e\_{1} \; e\_{2} \Rightarrow \mathtt{w} \,\mathtt{rows}}} & \frac{e \,\mathtt{\,} \Rightarrow \lambda x.e'}{\mathtt{suc} \,\mathtt{cc} \,\mathtt{s} \,\mathtt{rows}} \\\\ \frac{e\_{1} \Rightarrow \mathtt{w} \,\mathtt{rows}}{e\_{1} \; e\_{2} \Rightarrow \mathtt{w} \,\mathtt{rows}}}{e\_{1} \; e\_{2} \Rightarrow \mathtt{w} \,\mathtt{rows}} & \frac{e\_{1} \Rightarrow \lambda x.e \,\mathtt{e} \,\mathtt{ } e\_{2} \Rightarrow \mathtt{w} \,\mathtt{rows}}}{e\_{1} \; e\_{2} \Rightarrow \mathtt{w} \,\mathtt{rows}} \\\\ \frac{e\_{1} \Rightarrow \lambda x.e \,\mathtt{e} \,\mathtt{ } e\_{2} \Rightarrow \mathtt{e}\_{2} \,\mathtt{ } e \,\mathtt{ } e\_{2}/x \,\mathtt{s} \,\mathtt{ }}}{e\_{1} \; e\_{2} \Rightarrow \mathtt{w} \,\mathtt{n} \,\mathtt{ng}} \end{array}$$

# **4 Expressing and proving soundness**

A predicate (for instance, a typing judgment) is *sound* when, informally, a program satisfying the predicate (e.g., a well-typed program) cannot *go wrong*, following Robin Milner's slogan [31]. In small-step style, as firstly formulated in [40], this is naturally expressed as follows: well-typed programs never reduce to terms which neither are values, nor can be further reduced (called *stuck* terms). The standard technique to ensure soundness is by subject reduction (well-typedness is preserved by reduction) and progress (a well-typed term is not stuck).

We discuss how soundness can be expressed for the two approaches previously presented and we introduce sufficient conditions. In other words, we provide a proof technique to show the soundness of a predicate with respect to a big-step semantics. As mentioned in the Introduction, the extended semantics is only needed to prove the correctness of technique, whereas to *apply* the technique for a given big-step semantics it is enough to reason on the original rules. the

#### **4.1 Expressing soundness**

In the following, we assume a big-step semantics *C* , *R*, R, and an *indexed predicate on configurations*, that is, a family Π = (Πι)<sup>ι</sup>∈<sup>I</sup> , for *I* set of *indexes*, with Π<sup>ι</sup> ⊆ *C* . A representative case is that, as in the examples of Sect. 5, the predicate is a typing judgment and the indexes are types; however, the proof technique could be applied to other kinds of predicates. When there is no ambiguity, we also denote by Π the corresponding predicate - <sup>ι</sup>∈<sup>I</sup> <sup>Π</sup><sup>ι</sup> on *<sup>C</sup>* (e.g., to be well-typed with an arbitrary type).

To discuss how to express soundness of Π, first of all note that, in the nondeterministic case (that is, there is possibly more than one computation for a configuration), we can distinguish two flavours of soundness [21]:

**soundness-must** (or simply soundness) no computation can be stuck **soundness-may** at least one computation is not stuck

Soundness-must is the standard soundness in small-step semantics, and can be expressed in the wrong extension as follows:

#### **soundness-must (wrong)** If *<sup>c</sup>* <sup>∈</sup> <sup>Π</sup>, then <sup>R</sup>wr *c* ⇒ wrong

Instead, soundness-must *cannot* be expressed in the trace extension. Indeed, stuck computations are not explicitly modelled. Conversely, soundness-may can be expressed in the trace extension as follows:

**soundness-may (traces)** If *c* ∈ Π, then there is t such that Rtr *c* ⇒tr t

whereas cannot be expressed in the wrong semantics, since diverging computations are not modelled.

Of course soundness-must and soundness-may coincide in the deterministic case. Finally, note that indexes (e.g., the specific types of configurations) do not play any role in the above statements. However, they are relevant in the notion of *strong soundness*, introduced by [40]. Strong soundness holds if, for configurations satisfying Π<sup>ι</sup> (e.g., having a given type), computation cannot be stuck, and moreover, produces a result satisfying Π<sup>ι</sup> (e.g., of the same type) if terminating. Note that soundness alone does not even guarantee to obtain a result satisfying Π (e.g., a well-typed result). The three conditions introduced in the following section actually ensure strong soundness.

In Sect. 4.2 we provide sufficient conditions for soundness-must, showing that they actually ensure soundness in the wrong semantics (Theorem 3). Then, in Sect. 4.3, we provide (weaker) sufficient conditions for soundness-may, and show that they actually ensure soundness-may in the trace semantics (Theorem 4).

#### **4.2 Conditions ensuring soundness-must**

The three conditions which ensure the soundness-must property are *local preservation*, ∃*-progress*, and ∀-progress. The names suggest that the former plays the role of the *type preservation (subject reduction)* property, and the latter two of the *progress* property in small-step semantics. However, as we will see, the correspondence is only rough, since the reasoning here is different.

Considering the first condition more closely, we use the name *preservation* rather than type preservation since, as already mentioned, the proof technique can be applied to arbitrary predicates. More importantly, *local* means that the condition is *on single rules* rather than on the semantic relation as a whole, as standard subject reduction. The same holds for the other two conditions.

**Definition 1 (S1: Local Preservation).** *For each* ρ≡rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*)*, if c*∈Πι*, then there exist* ι1,...,ιn+1 ∈ *I , with* ιn+1=ι*, such that, for all* k ∈ 1..n + 1*:*

*if, for all* h<k*, R*(*j*h) ∈ Π<sup>ι</sup><sup>h</sup> *, then C* (*j*k) ∈ Π<sup>ι</sup><sup>k</sup> *.*

Thinking to the paradigmatic case where the indexes are types, for each rule ρ, if the configuration *c* in the consequence has type ι, we have to find types ι1,...,ιn+1 which can be assigned to (the configurations in) the premises, in particular the same type as *c* for the continuation. More precisely, we start finding type ι1, and successively find the type ι<sup>k</sup> for (the configuration in) the k-th premise assuming that the results of all the previous premises have the expected types. Indeed, if all such previous premises are derivable, then the expected type should be preserved by their results; if some premise is not derivable, the considered rule is "useless". For instance, considering (an instantiation of) meta-rule (app) rule(*e*<sup>1</sup> ⇒ λ*x* .*e e*<sup>2</sup> ⇒ *v*2, *e*[*v*2/*x* ] ⇒ *v*, *e*<sup>1</sup> *e*2) in Sect. 2, we prove that *e*[*v*2/*x* ] has the type *T* of *e*<sup>1</sup> *e*<sup>2</sup> under the assumption that λ*x* .*e* has type *T* → *T*, and *v*<sup>2</sup> has type *T* (see the proof example in Sect. 5.1 for more details).

A counter-example to condition **S1** is discussed at the beginning of Sect. 5.3.

The following lemma states that local preservation actually implies *preservation* of the semantic relation as a whole.

**Lemma 1 (Preservation).** *Let* R *and* Π *satisfy condition* **S1***. If* R *c* ⇒ *r and c* ∈ Πι*, then r* ∈ Πι*.*

*Proof.* The proof is by a double induction. We denote by RH and IH the first and the second induction hypothesis, respectively. The first induction is on bigstep rules. Axioms have conclusion *r* ⇒ *r* , hence the thesis holds since *r* ∈ Π<sup>ι</sup> by hypothesis. Other rules have shape rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) with *c* ∈ Πι. We prove by complete induction on k ∈ 1..n + 1 that*C* (*j*k) ∈ Πι<sup>k</sup> , for all k ∈ 1..n + 1 and for some ι1,...,ιn+1 ∈ *I*. By **S1**, there are ι1,...,ιn+1 ∈ *I* and *C* (*j*1) ∈ Πι<sup>1</sup> . For k > 1, by IH we know that *C* (*j*h) ∈ Πι<sup>h</sup> , for all h<k. Then, by RH, we get that *R*(*j*h) ∈ Π<sup>ι</sup><sup>h</sup> . Moreover by **S1**, *C* (*j*k) ∈ Π<sup>ι</sup><sup>k</sup> , as needed. In particular, we have just proved that *C* (*j*n+1) ∈ Π<sup>ι</sup>n+1 and, since by **S1** ιn+1 = ι, we get *C* (*j*n+1) ∈ Πι. Then, by RH, we conclude that *r* = *R*(*j*n+1) ∈ Πι, as needed.

The following proposition is a form of local preservation where indexes (e.g., specific types) are not relevant, simpler to use in the proofs of Theorems 3 and 4.

**Proposition 1.** *Let* R *and* Π *satisfy condition* **S1***. For each* rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) *and* k ∈ 1..n + 1*, if c* ∈ Π *and, for all* h<k*,* R *j*h*, then C* (*j*k) ∈ Π*.*

The second condition, named ∃*-progress*, ensures that, for configurations satisfying the predicate Π (e.g., well-typed), we can *start constructing* a proof tree.

**Definition 2 (S2:** ∃**-progress).** *For each c* ∈ Π\*R, C* (ρ) = *c for some rule* ρ*.*

The third condition, named ∀*-progress*, ensures that, for configurations satisfying Π, we can *continue constructing* the proof tree. This condition uses the notion of rules *equivalent up-to an index* introduced at the beginning of Sect. 3.2.

**Definition 3 (S3:** ∀**-progress).** *For each* ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*)*, if c* ∈ Π*, then, for each* k ∈ 1..n + 1*:*

*if, for all* h<k*,* R *j*<sup>h</sup> *and* R *C* (*j*k) ⇒ *r , for some r* ∈ *R, then there is a rule* ρ ∼<sup>k</sup> ρ *such that R*(ρ , k) = *r .*

We have to check, for each rule ρ, the following: if the configuration *c* in the consequence satisfies the predicate (e.g., is well-typed), then, for each k, if the configuration in premise k evaluates to some result *r* (that is, R *C* (*j*k) ⇒ *r* ), then there is a rule (ρ itself or another rule with the same configuration in the consequence and the first k − 1 premises) with such judgment as k-th premise. This check can be done under the assumption that all the previous premises are derivable. For instance, consider again (an instantiation of) the meta-rule (app) rule(*e*<sup>1</sup> ⇒ λ*x* .*e e*<sup>2</sup> ⇒ *v*2, *e*[*v*2/*x* ] ⇒ *v*, *e*<sup>1</sup> *e*2). Assuming that *e*<sup>1</sup> evaluates to some *v*1, we have to check that there is a rule with first premise *e*<sup>1</sup> ⇒ *v*1, in pratice, that *v*<sup>1</sup> is a λ-abstraction; in general, checking **S3** for a (meta-)rule amounts to show that (sub)configurations in the premises evaluate to results with the required shape (see also the proof example in Sect. 5.1).

*Soundness-must in* wrong *semantics* Recall that <sup>R</sup>wr is the extension of <sup>R</sup> with wrong (Sect. 3.2). We prove the claim of soundness-must with respect to Rwr.

**Theorem 3.** *Let* R *and* Π *satisfy conditions* **S1***,* **S2** *and* **S3***. If c* ∈ Π*, then* Rwr *c* ⇒ wrong.

*Proof.* To prove the statement, we assume Rwr *c* ⇒ wrong and look for a contradiction. The proof is by induction on the derivation of *c* ⇒ wrong.

If the last applied rule is an axiom, then, by construction, there is no rule ρ ∈ R such that *C* (ρ) = *c*, and this violates condition **S2**, since *c* ∈ Π.

If the last applied rule is wrong(ρ, i, *r* ), with ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*), then, by hypothesis, for all k<i, Rwr *j*k, and Rwr *C* (*j*i) ⇒ *r* , and these judgments can also be derived in R by conservativity (Theorem 2). Furthermore, by construction of this rule, we know that there is no other rule ρ ∼<sup>i</sup> ρ such that *R*(ρ , i) = *r* , and this violates condition **S3**, since *c* ∈ Π.

If the last applied rule is prop(ρ, i,wrong), with ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*), then, by hypothesis, for all k<i, Rwr *j*k, and these judgments can also be derived in R by conservativity. Then, by Prop. 1 (which requires condition **S1**), since *c* ∈ Π, we have *C* (*j*i) ∈ Π, hence we get the thesis by induction hypothesis.

Sect. 5.1 ends with examples not satisfying properties **S2** and **S3**.

#### **4.3 Conditions ensuring soundness-may**

As discussed in Sect. 4.1, in the trace semantics we can only express a weaker form of soundness: at least one computation is not stuck (*soundness-may*). As the reader can expect, to ensure this property weaker sufficient conditions are enough: namely, condition **S1**, and another condition named *progress-may* and defined below.

We write R *c* ⇒ if *c does not converge* (there is no *r* such that R *c* ⇒ *r* ).

**Definition 4 (S4: progress-may).** *For each c* ∈ Π\*R, there is* ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) *such that:*

*if there is a (first)* k ∈ 1..n + 1 *such that* R *j*<sup>k</sup> *and, for all* h<k*,* R *j*h*, then* R *C* (*j*k) ⇒ *.*

This condition can be informally understood as follows: we have to show that there is an either finite or infinite computation for *c*. If we find a rule where all premises are derivable (no k), then there is a finite computation. Otherwise, *c* does not converge. In this case, we should find a rule where the configuration in the first non-derivable premise k does not converge as well. Indeed, by coinductive reasoning (use of Lemma 2 below), we obtain that *c* diverges. The following proposition states that this condition is indeed a weakening of **S2** and **S3**.

**Proposition 2.** *Conditions* **S2** *and* **S3** *imply condition* **S4***.*

*Soundness-may in trace semantics* Recall that Rtr is the extension of R with traces, defined in Sect. 3.1, where judgements have shape *c* ⇒tr t, with t ∈ *C* <sup>∞</sup>.

The following lemma provides a proof principle useful to coinductively show that a property ensures the existence of an infinite trace, in particular to show Theorem 4. It is a slight variation of an analogous principle presented in [8].

**Lemma 2.** *Let* S ⊆ *C be a set. If, for all c* ∈ S*, there are* ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) *and* k ∈ 1..n + 1 *such that*

*1. for all* h<k*,* R *j*h*, and 2. C* (*j*k) ∈ S

*then, for all c* ∈ S*, there is* <sup>t</sup> <sup>∈</sup> *<sup>C</sup>* <sup>ω</sup> *such that* <sup>R</sup>tr *c* ⇒tr t*.*

**Theorem 4.** *Let* R *and* Π *satisfy conditions* **S1** *and* **S4***. If c* ∈ Π*, then there is* t *such that* Rtr *c* ⇒tr t*.*

*Proof.* First note that, thanks to Theorem 1, the statement is equivalent to the following:

If *c* ∈ Π and R *<sup>c</sup>* <sup>⇒</sup> , then there is <sup>t</sup> <sup>∈</sup> *<sup>C</sup>* <sup>ω</sup> such that <sup>R</sup>tr *c* ⇒tr t. Then, the proof follows from Lemma 2. We define S = {*c* | *c*∈Π and R *c* ⇒ }, and show that, for all *c* ∈ S, there are ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) and k ∈ 1..n+ 1 such that, for all h<k, R *j*h, and *C* (*j*k) ∈ S.

Consider *c* ∈ S, then, by **S4**, there is ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*). By definition of S, we have R *c* ⇒ , hence there exists a (first) k ∈ 1..n+ 1 such that R *j*k, since, otherwise, we would have R *c* ⇒ *R*(*j*n+1). Then, since k is the first index with such property, for all h<k, we have R *j*h, hence, again by condition **S4**, we have that R *C* (*j*k) ⇒ . Finally, since for all h<k we have R *j*h, by Prop. 1, we get *C* (*j*k) ∈ Π, hence *C* (*j*k) ∈ S, as needed.

# **5 Examples**

Sect. 5.1 explains in detail how a typical soundness proof can be rephrased in terms of our technique, by reasoning directly on big-step rules. Sect. 5.2 shows a case where this is advantageous, since the property to be checked is *not preserved* by intermediate computation steps, whereas it holds for the final result. Sect. 5.3 considers a more sophisticated type system, with intersection and union types. Finally, Sect. 5.4 shows another example where subject reduction is not preserved, whereas soundness can be proved with our technique. This example is intended as a preliminary step towards a more challenging case.

#### **5.1 Simply-typed** *λ***-calculus with recursive types**

As a first example, we take the λ-calculus with natural constants, successor, and choice used in Sect. 2 (Fig. 1). We consider a standard simply-typed version with recursive types, obtained by interpreting the production in Fig. 4 coinductively. Introducing recursive types makes the calculus non-normalising and permits to write interesting programs such as Ω (see Sect. 3.1).

The typing rules are recalled in Fig. 4. Type environments, written Γ, are finite maps from variables to types, and Γ{*T*/*x*} denotes the map which returns *T* on *x* and coincides with Γ elsewhere. We write *e* : *T* for ∅ *e* : *T*.

Let R<sup>1</sup> be the big-step semantics defined in Fig. 1, and let Π1<sup>T</sup> (*e*) hold if *e* : *T*, for *T* defined in Fig. 4. To prove the three conditions **S1**, **S2** and **S3** of T ::= Nat | T<sup>1</sup> → T<sup>2</sup> type

(t-var) <sup>Γ</sup> <sup>x</sup> : <sup>T</sup> <sup>Γ</sup>(<sup>x</sup> ) = <sup>T</sup> (t-const) <sup>Γ</sup> <sup>n</sup> : Nat (t-abs) Γ{T- /x} e : T Γ λx .e : T- <sup>→</sup> <sup>T</sup> (t-app) Γ e<sup>1</sup> : T- → T Γ e<sup>2</sup> : T- Γ e<sup>1</sup> e<sup>2</sup> : T (t-succ) Γ e : Nat <sup>Γ</sup> succ <sup>e</sup> : Nat (t-choice) Γ e<sup>1</sup> : T Γ e<sup>2</sup> : T Γ e<sup>1</sup> ⊕ e<sup>2</sup> : T

**Fig. 4.** <sup>λ</sup>-calculus: type system

Sect. 4.2, we need lemmas of inversion, substitution and canonical forms, as in the standard technique.

#### **Lemma 3 (Inversion).**

*1. If* Γ *x* : *T , then* Γ(*x* ) = *T . 2. If* Γ *n* : *T , then T* = Nat*. 3. If* Γ λ*x* .*e* : *T , then T* = *T*<sup>1</sup> → *T*<sup>2</sup> *and* Γ{*T*1/*x*} *e* : *T*2*. 4. If* Γ *e*<sup>1</sup> *e*<sup>2</sup> : *T , then* Γ *e*<sup>1</sup> : *T* → *T , and* Γ *e*<sup>2</sup> : *T . 5. If* Γ succ *<sup>e</sup>* : *T , then T* <sup>=</sup> Nat *and* <sup>Γ</sup> *e* : Nat*. 6. If* Γ *e*<sup>1</sup> ⊕ *e*<sup>2</sup> : *T , then* Γ *e*<sup>i</sup> : *T with* i ∈ 1, 2*.*

**Lemma 4 (Substitution).** *If* Γ{*T* /*x*} *e* : *T and* Γ *e* : *T , then* Γ *e*[*e* /*x* ] : *T .*

**Lemma 5 (Canonical Forms).**


**Theorem 5 (Soundness).** *The big-step semantics* R<sup>1</sup> *and the indexed predicate* Π1 *satisfy the conditions* **S1***,* **S2** *and* **S3** *of Sect. 4.2.*

Since the aim of this first example is to illustrate the proof technique, we provide a proof where we explain the reasoning in detail.

*Proof of S1.* We should prove this condition for each (instantiation of meta-)rule. (app): Assume that *e*<sup>1</sup> *e*<sup>2</sup> : *T* holds. We have to find types for the premises, notably *T* for the last one. We proceed as follows:


(succ): This rule has an implicit continuation n + 1 ⇒ n + 1. Assume that succ *<sup>e</sup>* : *<sup>T</sup>* holds. By Lemma <sup>3</sup> (5), *<sup>T</sup>* <sup>=</sup> Nat, and *e* : Nat, hence we find Nat as type for the first premise. Moreover, n +1: Nat holds by rule (t-const). (choice): Assume that *e*<sup>1</sup> ⊕ *e*<sup>2</sup> : *T* holds. By Lemma 3 (6), we have *e*<sup>i</sup> : *T*, with i ∈ 1, 2. Hence we find *T* as type for the premise.

*Proof of S2.* We should prove that, for each non-result configuration (here, expression *e* which is not a value) such that *e* : *T* holds for some *T*, there is a rule with this configuration in the consequence. The expression *e* cannot be a variable, since a variable cannot be typed in the empty environment. Application, successor and choice appear as consequence in the reduction rules.

*Proof of* **S3**. We should prove this condition for each (instantiation of meta-)rule. (app): Assuming *e*<sup>1</sup> *e*<sup>2</sup> : *T*, again by Lemma 3 (4) we get Γ *e*<sup>1</sup> : *T* → *T*.


(succ): Assuming succ *<sup>e</sup>* : *<sup>T</sup>*, again by Lemma <sup>3</sup> (5) we get *<sup>e</sup>* : Nat. If *<sup>e</sup>* <sup>⇒</sup> *<sup>v</sup>* is derivable, there should be a rule with succ *<sup>e</sup>* in the consequence and *<sup>e</sup>* <sup>⇒</sup> *<sup>v</sup>* as first premise. Indeed, by preservation (Lemma 1) and Lemma 5 (2), *v* has shape n. For the second premise, if n + 1 ⇒ *v* is derivable, then *v* is necessarily n + 1. (choice): Trivial since the meta-variable *v* can be freely instantiated.

An interesting remark is that, differently from the standard approach, there is *no induction* in the proof: everything is *by cases*. This is a consequence of the fact that, as discussed in Sect. 4.2, the three conditions are *local*, that is, they are conditions on single rules. Induction is "hidden" in the proof that those three conditions are sufficient to ensure soundness.

If we drop in Fig. 1 rule (succ), then condition **S2** fails, since there is no longer a rule for the well-typed non-result configuration succ *n*. If we add the (fool) rule 00 : Nat, then condition **S3** fails for rule (app), since 0 <sup>⇒</sup> 0 is derivable, but there is no rule with 0 0 in the conclusion and 0 ⇒ 0 as first premise.

#### **5.2 MiniFJ&***λ*

In this example, the language is a subset of FJ&λ [12], a calculus extending Featherweight Java (FJ) with λ-abstractions and intersection types, introduced in Java 8. To keep the example small, we do not consider intersections and focus on one key typing feature: λ-abstractions can only be typed when occurring in a context requiring a given type (called the *target type*). In a small-step semantics, this poses a problem: reduction can move λ-abstractions into arbitrary contexts, leading to intermediate terms which would be ill-typed. To maintain subject reduction, in [12] λ-abstractions are decorated with their initial target type. In a big-step semantics, there is no need of intermediate terms and annotations.

The syntax is given in the first part of Fig. 5. We assume sets of *variables x* , *class names* C, *interface names* I, J, *field names* f, and *method names* m. Interfaces which have *exactly* one method (dubbed *functional interfaces*) can be used as target types. Expressions are those of FJ, plus λ-abstractions, and types are class and interface names. In λ*xs*.*e* we assume that *xs* is not empty and *e* is not a λ-abstraction. For simplicity, we only consider *upcasts*, which have no runtime effect, but are important to allow the programmer to use λ-abstractions, as exemplified in discussing typing rules.

To be concise, the class table is abstractly modelled as follows:


The big-step semantics is given in the last part of Fig. 5. MiniFJ&λ shows an example of instantiation of the framework where configurations include an auxiliary structure, rather than being just language terms. In this case, the structure is an *environment* e (a finite map from variables to values) modelling the current stack frame. Results are values, which are either *objects*, of shape [*vs*] <sup>C</sup>, or λ-abstractions.

Rules for FJ constructs are straightforward. Note that, since we only consider upcasts, casts have no runtime effect. Indeed, they are guaranteed to succeed on well-typed expressions. Rule (λ-invk) shows that, when the receiver of a method is a λ-abstraction, the method name is not significant at runtime, and the effect is that the body of the function is evaluated as in the usual application.

The type system is given in Fig. 6. Method bodies are expected to be welltyped with respect to method types. Formally, mbody(C, m) and mtype(C, m) are either both defined or both undefined: in the first case mbody(C, m) = *x*<sup>1</sup> ... *<sup>x</sup>*n, *<sup>e</sup>*, mtype(C, <sup>m</sup>) = *<sup>T</sup>*<sup>1</sup> ... *<sup>T</sup>*<sup>n</sup> <sup>→</sup> *<sup>T</sup>*, and *<sup>x</sup>*1:*T*1,..., *<sup>x</sup>*n:*T*n, this:<sup>C</sup> *e* : *T*. Moreover, we assume other standard FJ constraints on the class table, such as no field hiding, no method overloading, the same parameter and return types in overriding.

Besides the standard typing features of FJ, the MiniFJ&λ type system ensures the following.



```
interface J {}
interface I extends J { A m(A x); }
class C {
  C m(I y) { return new C().n(y); }
  C n(J y) { return new C(); }
}
```

$$\vdash\_{\langle \mathtt{T-cosv} \rangle} \vdash\_{i} v\_{i} : T\_{i} \quad \forall i \in \mathtt{1}..n \quad x\_{1} ; T'\_{1}, \ldots, x\_{n} ; T'\_{n} \vdash e : T \quad T \quad T\_{i} \quad \land \colon T'\_{i} \quad \forall i \in \mathtt{1}..n$$

$$\Gamma^{(\text{r-van})} \xrightarrow[]{} \Gamma \vdash x:T \quad \Gamma \{x\} = T \quad \quad \text{(r-r.p.n.access)} \xrightarrow[]{} \frac{\Gamma \vdash e:\mathbf{C}}{\Gamma \vdash e.f:T\_i} \quad \stackrel{\text{fields} (\mathbf{C}) = T\_1 \ \mathbf{f}\_1; \dots \ \mathbf{f}\_n \ \mathbf{f}\_n; \mathbf{f}\_i}{i \in 1..n}$$

$$\Gamma^{(\mathsf{r}\text{-}\mathsf{new})} \stackrel{\Gamma \vdash e\_i : T\_i \quad \forall i \in \mathsf{1}...n}{\Gamma \vdash \mathsf{new}} \stackrel{\mathsf{c}}{\mathsf{C}\{e\_1, \ldots, e\_n\} : \mathsf{C}} \quad \mathsf{fields}(\mathsf{C}) = T\_1 \mathsf{f}\_1; \ldots \mathsf{f}\_n \ \mathsf{f}\_n; \mathsf{c}$$

(t-invk) Γ e<sup>i</sup> : T<sup>i</sup> ∀i ∈ 0..n Γ e0.m(e1,..., en) : T e<sup>0</sup> not of shape λxs.e mtype(T0, m) = T<sup>1</sup> ... T<sup>n</sup> → T

$$\mathbf{r}\_{\left(\mathsf{r}\cdot\lambda\right)} \xrightarrow{\begin{array}{c} x\_1 \colon T\_1, \ldots, x\_n \colon T\_n \vdash e \ \mathsf{i} \ \mathsf{i} \ \mathsf{T} \end{array}} \mathsf{l}\text{ } \mathsf{mtype}(\mathsf{l}) = \begin{array}{c} T\_1 \ldots \ T\_n \rightarrow \ \mathsf{T} \end{array}$$

$$\mathop{\text{(\text{\$\text{\$\text{\$}}\$}}\text{vec}\text{)}\ }\dfrac{\begin{array}{c}\Gamma\vdash e\mathrel{\text{\$}}\text{:}\ T\\\Gamma\vdash\{\text{\$\text{\$\text{\$}}\$}\text{}\text{:}\end{array}}{\Gamma\vdash\{\text{\$\text{\$}}\$}\text{:}\end{array}}\mathop{\begin{array}{c}\Gamma\vdash v\_{i}\mathrel{\text{\$}}\text{:}\ T\_{i}^{\prime}\quad\forall i\in\text{\$}...n\\\Gamma\vdash[v\_{1},\ldots,v\_{n}]^{\text{C}}\text{:}\ \mathbb{C}\qquad\text{|}\text{|}\text{:}\ \text{ }\forall i\in\text{\$}1...n\end{array}}\mathop{\text{\textquotesingle{}}\text{fields(\text{\$\textquotesingle{}}\$\textquotesingle{}}=T\_{1}\,\text{f}\_{1}\text{;}\ldots\,T\_{n}\,\text{f}\_{n};\text{}\mathtt{w}\text{ }\text{ }\langle\rangle\text{ }$$

(t-sub) Γ e : T Γ e : T- e not of shape λxs.e T <: T-

# **Fig. 6.** MiniFJ&λ: type system

and the main expression new C().n(λ*x* .*x*). Here, the λ-abstraction has target type J, which is *not* a functional interface, hence the expression is illtyped in Java (the compiler has no functional type against which to typecheck the λ-abstraction). On the other hand, in the body of method m, the parameter y of type I can be passed, as usual, to method n expecting a supertype. For instance, the main expression new C().m(λ*x* .*x*) is well-typed, since the λ-abstraction has target type I, and can be safely passed to method n, since it is not used as function there. To formalise this behaviour, it is forbidden to apply subsumption to λ-abstractions, see rule (t-sub).

**–** However, λ-abstractions occurring as results rather than in source code (that is, in the environment and as fields of objects) are allowed to have a subtype of the required type, see the explicit side condition in rules (t-conf) and (t-object). For instance, if C is a class with one field J f, the expression new C((I)λx.x) is well-typed, whereas new C(λx.x) is ill typed, since rule (t-sub) cannot be applied to λ-abstractions. When the expression is evaluated, the result is [λx.x] <sup>C</sup>, which is well-typed.

As mentioned at the beginning, the obvious small-step semantics would produce not typable expressions. In the above example, we get

new <sup>C</sup>((I)λx.x) −→ new <sup>C</sup>(λx.x) −→ [λx.x] C

and new C(λx.x) has no type, while new C((I)λx.x) and [λx.x] <sup>C</sup> have type C.

We write Γ *e* :<: *T* as short for Γ *e* : *T* and *T* <: *T* for some *T* . In order to state soundness, set R<sup>2</sup> the big-step semantics defined in Fig. 5, and let <sup>Π</sup>2<sup>T</sup> (e, *<sup>e</sup>*) hold if e, *<sup>e</sup>* :<: *<sup>T</sup>*, <sup>Π</sup>2<sup>T</sup> (*v*) if *v* :<: *T*, for *T* defined in Fig. 5.

**Theorem 6 (Soundness).** *The big-step semantics* R<sup>2</sup> *and the indexed predicate* Π2 *satisfy the conditions* **S1***,* **S2** *and* **S3** *of Sect. 4.2.*

#### **5.3 Intersection and union types**

We enrich the type system of Fig. 4 by adding intersection and union type constructors and the corresponding typing rules, see Fig. 7. As usual we require an infinite number of arrows in each infinite path for the trees representing types. Intersection types for the λ-calculus have been widely studied [11]. Union types naturally model conditionals [26] and non-deterministic choice [22].

T ::= Nat | T<sup>1</sup> → T<sup>2</sup> | T<sup>1</sup> ∧ T<sup>2</sup> | T<sup>1</sup> ∨ T<sup>2</sup> type


The typing rules for the introduction and the elimination of intersection and union are standard, except for the absence of the union elimination rule: Γ{*T*/*x*} *e* : V Γ{*S*/*x*} *e* : V Γ *e* : *T* ∨ *S*

$$\begin{array}{ccccc} (\lor E) & \xrightarrow{\alpha \to \neg \alpha \quad \lor \quad \lor \quad \lor \quad \lor \quad \lor \quad \lor \quad \lor \quad \lor} & \cdot \\ \dots & \text{\r.t. } \dots & \dots & \dots & \dots & \dots & \dots \\ \dots & \text{\r.t. } \dots & \dots & \dots & \dots & \dots & \dots \end{array}$$

As a matter of fact rule (∨E) is unsound for ⊕. For example, let split the type Nat into Even and Odd and add the expected typings for natural numbers. The prefix addition + has type

$$(\text{Even} \rightarrow \text{Even} \rightarrow \text{Even}) \land (\text{0dd} \rightarrow \text{0dd} \rightarrow \text{Even})$$

and we derive

$$\begin{array}{c} \vdash \begin{array}{l} \vdash \mathtt{1} : \mathtt{0dd} \\ \vdash \mathtt{1} : \mathtt{Even} \ \mathtt{0dd} \end{array} (\lor \text{I}) \end{array} (\begin{array}{l} \vdash \mathtt{2} : \mathtt{Even} \\ \vdash \mathtt{1} : \mathtt{Even} \lor \mathtt{0dd} \end{array} (\lor \text{I}) \end{array} (\begin{array}{l} \vdash \mathtt{2} : \mathtt{Even} \\ \vdash \mathtt{2} : \mathtt{Even} \lor \mathtt{0dd} \end{array} (\begin{array}{l} \vdash \mathtt{2} : \mathtt{Even} \\ \vdash \mathtt{2} : \mathtt{Even} \lor \mathtt{0dd} \end{array} (\begin{array}{l} (\lor \text{I}) \end{array} \right)$$

We cannot assign the type Even to 3, which is a possible result, so strong soundness is lost. In the small-step approach, we cannot assign Even to the intermediate term + 1 2, so subject reduction fails. In the big-step approach, there is no such intermediate term; however, condition **S1** fails for the reduction rule for +. Indeed, considering the following instantiation of the rule:

$$(+)\begin{array}{c} 1 \oplus 2 \Rightarrow 1 \quad 1 \oplus 2 \Rightarrow 2 \quad 3 \Rightarrow 3\\ \hline + (\mathbf{1} \oplus \mathbf{2}) (\mathbf{1} \oplus \mathbf{2}) \Rightarrow 3 \end{array}$$

and the type Even for the consequence, we cannot assign this type to the (configuration in) last premise (continuation).

Intersection types allow to derive meaningful types also for expressions containing variables applied to themselves, for example we can derive

$$\vdash \lambda x.x \; x : (T \to S) \land T \to S$$

 With union types all non-deterministic choices between typable expressions can be typed too, since we can derive Γ *e*<sup>1</sup> ⊕ *e*<sup>2</sup> : *T*<sup>1</sup> ∨ *T*<sup>2</sup> from Γ *e*<sup>1</sup> : *T*<sup>1</sup> and Γ *e*<sup>2</sup> : *T*2.

In order to state soundness, let Π3<sup>T</sup> (*e*) be *e* : *T*, for *T* defined in Fig. 7.

**Theorem 7 (Soundness).** *The big-step semantics* R<sup>1</sup> *and the indexed predicate* Π3 *satisfy the conditions* **S1***,* **S2** *and* **S3** *of Sect. 4.2.*

#### **5.4 MiniFJ&O**

A well-known example in which proving soundness with respect to small-step semantics is extremely challenging is the standard type system with intersection and union types [10] w.r.t. the pure λ-calculus with full reduction. Indeed, the standard subject reduction technique fails5, since, for instance, we can derive the type (*T* → *T* → V ) ∧ (*S* → *S* → V ) → (U → *T* ∨ *S*) → U → V for both λx.λy.λz.x((λt.t)(y z))((λt.t)(y z)) and λx.λy.λz.x(y z)(y z), but the intermediate expressions λx.λy.λz.x((λt.t)(y z))(y z) and λx.λy.λz.x(y z)((λt.t)(y z)) do not have this type.

As the example shows, the key problem is that rule (∨E) can be applied to expression *e* where the same subexpression *e* occurs more than once. In the non-deterministic case, as shown by the example in the previous section, this is unsound, since *e* can reduce to different values. In the deterministic case, instead, this is sound, but cannot be proved by subject reduction. Since using big-step semantics there are no intermediate steps to be typed, our approach seems very promising to investigate an alternative proof of soundness. Whereas we leave this challenging problem to future work, here as first step we describe a (hypothetical) calculus with a much simpler version of the problematic feature.

The calculus is a variant of FJ [27] with intersection and union types. Methods have intersection types with the same return type and different parameter types, modelling a form of *overloading*. Union types enhance typability of conditionals. The more interesting feature is the possibility of replacing an arbitrary number of parameters with the same expression having an union type. We dub this calculus MiniFJ&O.

Fig. 8 gives the syntax, big-step semantics and typing rules of MiniFJ&O. We omit the standard big-step rule for conditional, and typing rules for boolean

<sup>5</sup> For this reason, in [10] soundness is proved by an ad-hoc technique, that is, by considering parallel reduction and an equivalent type system `a la Gentzen, which enjoys the cut elimination property.

$$\begin{array}{lcl} & \mathsf{if } i & \mathsf{if } j \in \mathsf{c} \text{ } (\mathsf{c} \text{ } \mathsf{in} \ (\mathsf{c} \text{ } \mathsf{in} \ (\mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \mathsf{c} \text{ } \\ & T \coloneqq \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{int} \left( \begin{array}{l} \neg{\mathsf{c} \mathsf{in} \mathsf{c}} \neg{\mathsf{c} \mathsf{c} \mathsf{c}} \neg{\mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c}} \quad \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \\ \neg{\mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \vee} \mathsf{c} \mathsf{c} \mid \mathsf{c} \text{ } \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mid \mathsf{c} \text{ } \mathsf{c} \mathsf{c} \mathsf{c} \\ \mathsf{m} \mathsf{T} \mathrel{\mathsf{c} \leq \mathsf{c} \mathsf{c} \mathsf{c}} \mid \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \\ \mathsf{m} \mathsf{T} \mathrel{\mathsf{c} \leq \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c}} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \mathsf{c} \\ \mathsf{m} \mathsf{$$

constants. The subtyping relation <: is the reflexive and transitive closure of the union of the extends relation and the standard rules for union:

$$T\_1 \stackrel{<}{\cdot} \stackrel{<}{\cdot} ? \stackrel{\cdot}{\cdot} T\_2 \stackrel{\cdot}{\cdot} \qquad T\_1 \stackrel{<}{\cdot} \stackrel{<}{\cdot} ? \stackrel{<}{T\_1} \stackrel{<}{\cdot} T\_1$$

On the other hand, *method types* (results of the mtype function) are now *intersection types*, and the subtyping relation on them is the reflexive and transitive closure of the standard rules for intersection:

$$MT\_1 \land MT\_2 < \colon MT\_1 \qquad MT\_1 \land MT\_2 < \colon MT\_2$$

The functions fields and mbody are defined as for MiniFJ&λ. Instead mtype(C, m) gives, for each method m in class C, an intersection type. We assume mbody(C, m) and mtype(C, m) either both defined or both undefined: in the first case mbody(C, m)=*x*<sup>1</sup> ... *x*n, *e*, mtype(C, m)= <sup>1</sup>≤i≤<sup>m</sup>(C(i) <sup>1</sup> ... C(i) <sup>n</sup> → D), and *x*1:C(i) <sup>1</sup> ,..., *x*n:C(i) <sup>n</sup> , this:<sup>C</sup> *e* : D for i ∈ 1..m.

Clearly rule (t-invk) is inspired by rule (∨E), but the restriction to method calls endows a standard inversion lemma. The subtyping in this rule allows to choose the types for the method best fitting the types of the arguments. Not surprisingly, subject reduction fails for the expected small-step semantics. For example, let class C have a field point which contains cartesian coordinates and class D have a field point which contains polar coordinates. The method eq takes two objects and compares their point fields returning a boolean value. A type for this method is (C C <sup>→</sup> Bool) <sup>∧</sup> (D D <sup>→</sup> Bool) and we can type eq(*e*, *<sup>e</sup>*), where *e* = if false then new C( ... ) else new D( ... )

In fact *e* has type C ∨ D. Notice that in a standard small-step semantics

eq(*e*, *<sup>e</sup>*) −→ eq(new <sup>D</sup>( ... ), if false then new <sup>C</sup>( ... ) else new <sup>D</sup>( ... )) and this last expression cannot be typed.

In order to state soundness, let R<sup>4</sup> be the big-step semantics defined in Fig. 8, and let Π4<sup>T</sup> (*e*) hold if *e* : *T*, for *T* defined in Fig. 8.

**Theorem 8 (Soundness).** *The big-step semantics* R<sup>4</sup> *and the indexed predicate* Π4 *satisfy the conditions* **S1***,* **S2** *and* **S3** *of Sect. 4.2.*

# **6 The partial evaluation construction**

In this section, our aim is to provide a *formal* justification that the constructions in Sect. 3 are correct. For instance, for the wrong semantics we would like to be sure that *all* the cases are covered. To this end, we define a *third construction*, dubbed pev for "partial evaluation", which makes explicit the *computations* of a big-step semantics, intended as the sequences of execution steps of the naturally associated evaluation algorithm. Formally, we obtain a reduction relation on approximated proof trees, so non-termination and stuck computation are distinguished, and both soundness-must and soundness-may can be expressed.

To this end, first of all we introduce a special result ?, so that a judgment *c* ⇒ ? (called *incomplete*, whereas a judgment in R is *complete*) means that the evaluation of *c* is not completed yet. Analogously to the previous constructions, we define an augmented set of rules R? for the judgment extended with ?:

**? introduction rules** These rules derive ? whenever a rule is partially applied: for each rule ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) in R, index i ∈ 1..n + 1, and result *r* ∈ *R*, we define the rule intro?(ρ, i, *r* ) as

$$\begin{array}{c c c} j\_1 & \dots & j\_{i-1} & C(j\_i) \Rightarrow r \\ \hline & c \Rightarrow ? \\ \dots & \dots & \dots & \dots \end{array}$$

We also add an axiom *<sup>c</sup>* <sup>⇒</sup> ? for each configuration *<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>* .

**? propagation rules** These rules propagate ? analogously to those for divergence and wrong propagation: for each ρ ≡ rule(*j*<sup>1</sup> ... *j*n, *j*n+1, *c*) in R, and index i ∈ 1..n + 1, we add the rule prop(ρ, i, ?) as follows:

$$\frac{j\_1 \quad \dots \quad j\_{i-1} \quad C(j\_i) \Rightarrow ?}{c \Rightarrow ?}$$

Finally, we consider the set T of the (finite) proof trees τ in R?. Each τ can be thought as a *partial proof* or *partial evaluation* of the root configuration. In particular, we say it is *complete* if it is a proof tree in R (that is, it only contains complete judgments), *incomplete* otherwise. We define a reduction relation <sup>R</sup> −−−→

(r?) r ⇒ ? <sup>R</sup> −−−→ (r) <sup>r</sup> <sup>⇒</sup> <sup>r</sup> (c?) c ⇒ ? <sup>R</sup> −−−→ (prop(ρ,1,?)) c- ⇒ ? c ⇒ ? C(ρ) = c C(ρ, 1) = c- (intro?(ρ, i, r)) τ<sup>1</sup> ... τ<sup>i</sup> c ⇒ ? <sup>R</sup> −−−→ (ρ-) τ<sup>1</sup> ... τ<sup>i</sup> c ⇒ r ρ- ∼<sup>i</sup> ρ R(ρ- , i) = r #ρ- = i (intro?(ρ, i, r)) τ<sup>1</sup> ... τ<sup>i</sup> c ⇒ ? <sup>R</sup> −−−→ (prop(ρ-,i+1,?)) τ<sup>1</sup> ... τ<sup>i</sup> c- ⇒ ? c ⇒ ? ρ- ∼<sup>i</sup> ρ R(ρ- , i) = r C(ρ- , i + 1) = c- (prop(ρ,i,?)) τ<sup>1</sup> ... τ<sup>i</sup> c ⇒ ? <sup>R</sup> −−−→ (prop(ρ,i,?)) τ<sup>1</sup> ... τ<sup>i</sup>−<sup>1</sup> τ - i c ⇒ ? τi <sup>R</sup> −−−→τ - i R?(r(τ - <sup>i</sup> )) = ? (prop(ρ,i,?)) τ<sup>1</sup> ... τ<sup>i</sup> c ⇒ ? <sup>R</sup> −−−→ (intro?(ρ, i, <sup>r</sup>)) τ<sup>1</sup> ... τ<sup>i</sup>−<sup>1</sup> τ - i c ⇒ ? τi <sup>R</sup> −−−→τ - i R?(r(τ - <sup>i</sup> )) = r

**Fig. 9.** Reduction relation on <sup>T</sup>

on T such that, starting from the initial proof tree *<sup>c</sup>* <sup>⇒</sup> ? , we derive a sequence where, intuitively, at each step we detail the proof (evaluation). In this way, a sequence ending with a complete tree ... *c* ⇒ *r* models terminating computation, whereas an infinite sequence (tending to an infinite proof tree) models divergence, and a stuck sequence models a stuck computation.

The one-step reduction relation <sup>R</sup> −−−→ on <sup>T</sup> is inductively defined by the rules in Fig. 9. In this figure #ρ denotes the number of premises of ρ, and r(τ ) the root of τ . We set *R*?(*c* ⇒ u) = u where u ∈ *R* ∪ {?}. Finally, ∼<sup>i</sup> is the *equivalence up-to an index* of rules, introduced at the beginning of Sect. 3.2. As said above, each reduction step makes "less incomplete" the proof tree. Notably, reduction rules apply to nodes with consequence *c* ⇒ ?, whereas subtrees with root *c* ⇒ *r* represent terminated evaluation. In detail:


In Fig. 10 we report an example of pev reduction.

We end by stating the three constructions to be equivalent to each other, thus providing a coherency result of the approach. In particular, first we show that pev is conservative with respect to <sup>R</sup>, and this ensures the three constructions are equivalent for finite computations. Then, we prove traces and wrong

(λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ ? (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x n ⇒ ? (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x n ⇒ n (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x n ⇒ n n ⇒ ? (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x n ⇒ n n ⇒ n (λx .x ) n ⇒ ? <sup>R</sup> −−−→ λx .x ⇒ λx .x n ⇒ n n ⇒ n (λx .x ) n ⇒ n

**Fig. 10.** The evaluation in pev of (λ<sup>x</sup> .<sup>x</sup> ) <sup>n</sup>.

constructions to be equivalent to pev for diverging and stuck computations, respectively, and this ensures they cover all possible cases.

**Theorem 9.** *1.* R *c* ⇒ *r iff <sup>c</sup>* <sup>⇒</sup> ? <sup>R</sup> −−−→τ *, where* r(τ ) = *c* ⇒ *r . 2.* Rtr *<sup>c</sup>* <sup>⇒</sup>tr <sup>t</sup> *for some* <sup>t</sup> <sup>∈</sup> *<sup>C</sup>* <sup>ω</sup> *iff <sup>c</sup>* <sup>⇒</sup> ? <sup>R</sup> −−−→<sup>ω</sup>*. 3.* Rwr *c* ⇒ wrong *iff <sup>c</sup>* <sup>⇒</sup> ? <sup>R</sup> −−−→τ *, where* τ *is stuck.*

# **7 Related work**

*Modeling divergence* The issue of modelling divergence in big-step semantics dates back to [18], where a stratified approach with a separate coinductive judgment for divergence is proposed, also investigated in [30].

In [5] the authors models divergence by interpreting coinductively standard big-step rules and considering also non-well-founded values. In [17] a similar technique is exploited, by adding a special result modelling divergence. Flag-based big-step semantics [36] captures divergence by interpreting the same semantic rules both inductively and coinductively. In all these approaches, spurious judgements can be derived for diverging computations.

Other proposals [32,3] are inspired by the notion of definitional interpreter [37], where a counter limits the number of steps of a computation. Thus, divergence can be modelled on top of an inductive judgement: a program diverges if the timeout is raised for any value of the counter, hence it is not directly modelled in the definition. Instead, [20] provides a way to directly model divergence using definitional interpreters, relying on the coinductive partiality monad [16].

The trace semantics in Sect. 3.1 has been inspired by [29]. Divergence propagation rules are very similar to those used in [8,9] to define a big-step judgment which directly includes divergence as result. However, this direct definition relies on a non-standard notion of inference system, allowing *corules* [7,19], whereas for the trace semantics presented in this work standard coinduction is enough, since all rules are *productive*, that is, they always add an element to the trace.

Differently from all the previously cited papers which consider specific examples, the work [2] shares with us the aim of providing a *generic construction* to model non-termination, basing on an arbitrary big-step semantics. Ager considers a class of big-step semantics identified by a specific shape of rules, and defines, in a small-step style, a proof-search algorithm which follows the big-step rules; in this way, converging, diverging and stuck computations are distinguished. This approach is somehow similar to our pev semantics, even tough the transition system we propose is directly defined on proof trees.

There is an extensive body of work on coalgebraic techniques, where the difference between semantics can be simply expressed by a change of functor. In this paper we take a set-theoretic approach, simple and accessible to a large audience. Furthermore, as far as we know [38], coalgebras abstract several kinds of transition systems, thus being more similar to a small-step approach. In our understanding, the coalgebra models a single computation step with possible effects, and from this it is possible to derive a unique morphism into the final coalgebra modelling the "whole" semantics. Our trace semantics, being big-step, seems to roughly correspond to directly get this whole semantics. In other words, we do not have a coalgebra structure on configurations.

*Proving soundness* As we have discussed, also proving (type) soundness with respect to a big-step semantics is a challenging task, and some approaches have been proposed in the literature. In [24], to show soundness of large steps semantics, they prove a coverage lemma, which ensures that the rules cover all cases, including error situations. In [30] the authors prove a soundness property similar to Theorem 4, but by using a separate judgment to represent divergence, thus avoiding using traces. In [5] there is a proof of soundness of a coinductive type system with respect to a coinductive big-step semantics for a Java-like language, defining a relation between derivations in the type system and in the big-step semantics. In [8] there is a proof principle, used to show type soundness with respect to a big-step semantics defined by an inference system with corules [7]. In [4] the proof of type soundness of a calculus formalising path-dependent types relies on a big-step semantics, while in [3] soundness is shown for the polymorphic type systems F<:, and for the DOT calculus, using definitional interpreters to model the semantics. In both cases they extend the original semantics adding error and timeout, and adopt inductive proof strategies, as in [39]. A similar approach is followed by [32] to show type soundness of the Core ML language.

Also [6] proposes an inductive proof of type soundness for the big-step semantics of a Java-like language, but relying on a notion of approximation of infinite derivation in the big-step semantics.

Pretty big-step semantics [17] aims at providing an efficient representation of big-step semantics, so that it can be easily extended without duplication of meta-rules. In order to define and prove soundness, they propose a generic error rule based on a *progress judgment*, whose definition can be easily derived manually from the set of evaluation rules. This is partly similar to our wrong extension, with two main differences. First, by factorising rules, they introduce intermediate steps as in small-step semantics, hence there are similar problems when intermediate steps are ill-typed (as in Sect. 5.2, Sect. 5.4). Second, wrong introduction is handled by the progress judgment, that is, at the level of sideconditions. Moreover, in [13] there is a formalisation of the pretty-big-step rules for performing a generic reasoning on big-step semantics by using abstract interpretation. However, the authors say that they interpret rules inductively, hence non-terminating computations are not modelled.

Finally, some (but not all) infinite trees of our trace semantics can be seen as cyclic proof trees, see end of Sect. 3.1. Proof systems supporting cyclic proofs can be found, e.g., in [14,15] for classical first order logic with inductive definitions.

# **8 Conclusion and future work**

The most important contribution is a general approach for reasoning on soundness with respect to a big-step operational semantics. Conditions can be proven by a case analysis on the semantic (meta-)rules avoiding small-step-style intermediate configurations. This can be crucial since there are calculi where the property to be checked is *not preserved* by such intermediate configurations, whereas it holds for the final result, as illustrated in Sect. 5.

In future work, we plan to use the meta-theory in Sect. 2 as basis to investigate yet other constructions, notably the approach relying on corules [8,9], and that, adding a counter, based on timeout [32,3].

We also plan to compare our proof technique for proving soundness with the standard one for small-step semantics: if a predicate satisfies progress and subject reduction with respect to a small-step semantics, does it satisfy our soundness conditions with respect to an equivalent big-step semantics? To formally prove such a statement, the first step will be to express equivalence between small-step and big-step semantics. On the other hand, the converse does not hold, as shown by the examples in Sect. 5.2 and Sect. 5.4.

For what concerns significant applications, we plan to use the approach to prove soundness for the λ-calculus with full reduction and intersection/union types [10]. The interest of this example lies in the failure of the subject reduction, as discussed in Sect. 5.4. In another direction, we want to enhance MiniFJ&O with λ-abstractions and allowing everywhere intersection and union types [23]. This will extend typability of shared expressions. We plan to apply our approach to the big-step semantics of the statically typed virtual classes calculus developed in [24], discussing also the non terminating computations not considered there.

With regard to proofs, that are mainly omitted here, and can be found in the extended version at http://arxiv.org/abs/2002.08738, we plan to investigate if we can simplify them by means of enhanced conductive techniques.

As a proof-of-concept, we provided a mechanisation<sup>6</sup> in Agda of Lemma 1. The mechanisations of the other proofs is similar. However, as future work, we think it would be more interesting to provide a software for writing big-step definitions and for checking that the soundness conditions hold.

*Acknowledgments* The authors are grateful to the referees: the paper strongly improved thanks to their useful suggestions and remarks.

<sup>6</sup> Available at https://github.com/fdgn/soundness-big-step-semantics.

# **References**


Related Methods, International Conference, TABLEAUX 2005, volume 3702 of Lecture Notes in Computer Science, pages 78–92. Springer, 2005. doi:10.1007/ 11554554\\_8.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Liberate Abstract Garbage Collection from the Stack by Decomposing the Heap**

Kimball Germane1() and Michael D. Adams<sup>2</sup>

<sup>1</sup> Brigham Young University, Provo UT, USA kimball@cs.byu.edu

<sup>2</sup> University of Michigan, Ann Arbor MI, USA adamsmda@umich.edu

**Abstract.** Abstract garbage collection and the use of pushdown systems each enhance the precision of control-flow analysis (CFA). However, their respective needs conflict: abstract garbage collection requires the stack but pushdown systems obscure it. Though several existing techniques address this conflict, none take full advantage of the underlying interplay. In this paper, we dissolve this conflict with a technique which exploits the precision of pushdown systems to decompose the heap across the continuation. This technique liberates abstract garbage collection from the stack, increasing its effectiveness and the compositionality of its host analysis. We generalize our approach to apply compositional treatment to abstract timestamps which induces the context abstraction of m-CFA, an abstraction more precise than k-CFA's for many common programming patterns.

**Keywords:** Control-Flow Analysis · Abstract Garbage Collection · Pushdown Systems

# **1 Introduction**

Among the many enhancements available to improve the precision of control-flow analysis (CFA), abstract garbage collection and pushdown models of control flow stand out as particularly effective ones. But their combination is non-trivial.

Abstract garbage collection (GC) [10] is the result of applying standard GC which calculates the heap data reachable from a root set derived from a given environment and continuation—to an abstract semantics. Though it operates in the same way as concrete GC, abstract GC has a different effect on the semantics to which it's applied. Concrete GC is semantically irrelevant in that it has no effect on a program's observable behavior.<sup>3</sup> Abstract GC, on the other hand, is semantically relevant in that, by eliminating some merging in the abstract heap, it prevents a utilizing CFA from conflating some distinct heap data. In the setting of a higher-order language, where data can represent control, this superior approximation of data translates to a superior approximation of control as well, manifest by the CFA exploring fewer infeasible execution paths.

Pushdown models of control flow [16, 3] encode the call–return relation of a program's flow of execution as precisely as an unbounded control stack would

<sup>3</sup> It is irrelevant only if space consumption is unobservable, as is typical.

allow. Consequently, and in contrast to the finite-state models which preceded them, pushdown models enable a utilizing CFA—a stack-precise CFA—to avoid relating a given return to any but its originating call. Thus, pushdown models also induce CFAs which explore fewer infeasible execution paths.

Not only do abstract GC and pushdown systems each enhance the control precision of CFA, they also appear to do so in complementary ways. Is it possible for a CFA to use both and gain the benefits of each? This question's answer is not immediate, as these techniques have competing requirements: abstract GC must examine the stack to extract the root set of reachability but the use of pushdown models obscures the control stack to the abstract semantics.

This question has been addressed by two techniques: The first introspective technique [4] introduces a primitive operation into the analyzing machine which introspects the stack and delivers the set of frames which may be live; this technique has a variety of alternative formulations, some of which alter its complexity–precision profile [8, 7]. The second technique [1], which modifies the first to work with definitional interpreters, dictates that the analyzer implement a set-passing style abstract semantics where each passed set contains the heap addresses present in the continuation at that point. Each of these techniques reconciles the competing requirements of abstract GC and pushdown models of control flow and allows the utilizing CFA to enjoy the precision-enhancing benefits of both at once.

However, each of these techniques—hereafter referred to collectively as pushdown GC—yields a setting in which abstract GC and pushdown models of control flow merely coexist. In contrast, this paper prescribes a technique which exploits the pushdown model of control flow to enable a new mode of garbage collection—compositional garbage collection—which does not require the ability to inspect the continuation.

The key observation is that, in a stack-precise CFA, the heap present at the point of a call is in scope at the point of its return. Thus, the analysis can offload some of the contents of the callee's heap to the caller's—in particular, the data irrelevant to the callee's execution. When this offloading is performed, the final heap of the callee (just as it returns) is incomplete with respect to subsequent execution. But, since the caller's heap is in scope at this point, the analysis can reconstitute the subsequent heap by combining the caller's heap with the callee's final heap.

The data relevant to the callee's execution is the data reachable from its local environment and excludes the data reachable from its continuation alone. Offloading heap data, then, consists of GC-ing each callee's heap with respect to its local environment only. When one applies this practice consistently to all calls, one associates with each active call not a heap but a heap fragment, effectively decomposing the heap across the continuation. As we will show, careful separation and combination of these heap fragments can perfectly simulate the presence of the full heap.

This liberation of GC from the continuation has several consequences for the host CFA.


In sum, relative to pushdown GC, compositional GC offers quantitative benefits to the host CFA, being strictly more powerful, as well as qualitative.

#### **1.1 Examples**

Let's look at an example where compositional GC makes memoization more effective. Consider the following Scheme program

```
(let* ([id (lambda (x) x)]
       [y (id 42)]
       [z (id y)])
  (+ y z))
```
which calls id twice, each time on 42.

We would hope that a CFA would be able to memoize its analysis of the first call and, upon recognizing that the second call is semantically-identical, reuse its results. However, contemporary CFAs will not because each call is made with a different heap—the second call's heap includes a binding for y that the first's doesn't. Moreover, this distinction persists even with pushdown GC since y's binding is needed to continue execution after the call. Since CFAs have no means but reachability to determine what is relevant to a given execution point, and since what is relevant constitutes a memoization key, pushdown GC is too weak to identify these two calls.

In contrast, a CFA with compositional GC produces a heap fragment for each call which is closed over only data reachable from the local environment for a call, the procedure and argument values themselves. Accordingly, from its perspective, these two calls are identical and specify a single memoization key.

Now let's look at an example where compositional GC keeps co-live bindings of the same variable distinct. Consider the following Scheme program

```
(letrec ([f (lambda (x)
               (if (prime? x)
                 (let ([y (f (+ x 1))])
                   (+ x y))
                 x))])
  (f 2))
```
which defines and calls a recursive procedure f.

Concrete evaluation of this program proceeds first calls f with 2, and then 3, and then 4, returning 4, and then 3 + 4 = 7 and then 2 + 7 = 9. The procedure f is properly recursive—so these calls are nested—and, after f is called with 4 but before it returns, three distinct bindings of x are live. Moreover, since each binding of x is needed until its binding call returns, each is continuationreachable and therefore not claimed by GC. These facts and limitations translate to the analysis setting: a CFA will discover multiple co-live bindings of x which persist in the face of pushdown GC. Consequently, even with pushdown GC, a CFA will in general join these bindings to some degree, concluding that x can be 2 whenever it can be 3 and can be 3 whenever it can be 4.

In constrast, just before a CFA with compositional GC performs each call to f, it GCs with respect to the operator and argument values which, in each case, consist of the closure of f (which reaches only itself in the heap) and a number (which doesn't reach anything). Thus, each binding to x is the first in its respective heap fragment and doesn't interfere with the live bindings of x in other heap fragments. Using a numeric abstraction in which arithmetic operations propagate but do not introduce approximation [1], a CFA with compositional GC will produce an exact answer (whereas one with pushdown GC will not).

#### **1.2 Generalizing the Approach**

The conventional treatment of the heap by CFA is to thread it through execution, allowing it to evolve as it goes. In contrast, compositional GC advocates that the CFA treat the heap with the same discipline that it treats the environment: saved at the evaluation of a subexpression and restored when its evaluation completes and its value is delivered. That is, compositional GC is achieved by, in effect, treating the heap compositionally.

What happens if we impose the same compositional discipline on other threaded components, such as the timestamp? In that case, we move from the last-k-call-sites<sup>4</sup> context abstraction of k-CFA [14] to the top-m-stack-frames<sup>5</sup> context abstraction of m-CFA [11] This appearance of m-CFA's abstraction in a stack-precise CFA is the first such, to our knowledge.

With compositional treatment of both the heap and timestamp, we arrive at a stack-precise CFA which treats each of its components compositionally. This treatment also leads to a CFA closer to being compositional in the sense that the analysis of a compound expression is a function of the analyses of its constituent parts. Accordingly, we refer to such a stack-precise CFA as a compositional control-flow analysis.

The remainder of the paper is as follows. We first introduce the syntax of the language we will use throughout the paper in Section 2. We then discuss the enhancements of perfect stack precision, garbage collection, and their combination in Section 3. We then proceed through a series of semantics which transition

<sup>4</sup> as in, most-recent <sup>k</sup> call sites <sup>5</sup> as in, youngest <sup>m</sup> stack frames

from a threaded heap to a compositional, garbage-collected heap in Section 4. We then abstract the compositional semantics to obtain our CFA in Section 5. We discuss the ramifications of the compositional treatment of each of the heap and abstract time in Section 6. We finally discuss related work in Section 7 and conclusions and future work in Section 8.

**Note** In the remainder of the paper, we use the standard term store to refer to the analysis component which models the heap. Thus, we will describe our technique as, e.g., treating stores compositionally.

# **2 A-Normal Form** *λ***-Calculus**

For presentation, we keep the language small: we use a unary λ-calculus in Anormal form [5], the grammar of which is given below.

$$\begin{aligned} Exp &\ni e ::= ce \mid \mathsf{let}\, x = ce \, \mathsf{in}\, e\\ CExp &\ni ce ::= ae \mid (ae\_0 \, ae\_1) \mid \mathsf{set!}\, x \, ae\\ AExp &\ni ae ::= x \mid \lambda x.e\\ Var &\ni x \quad \text{[an infinite set of variables]} \end{aligned}$$

A proper expression e is a call expression ce or a let-expression, which binds a variable to the result of a call expression. (Restricting the bound expression to a call expression prevents let-expressions from nesting there, a hallmark of A-normal form.) A call expression ce is an atomic expression ae, an application, or a set!-expression. An atomic expression ae is a variable reference or a λ abstraction.

Atomic expressions are trivial [13]. We include set!-expressions to produce mutative effects that must be threaded through evaluation. (The approach we present in this paper can also handle more-general forms of mutation, such as boxes.) For our purposes, we consider a set!-expression "serious" [13] since it has an effect on the store.

A program is a closed expression; we assume (without loss of generality) that programs are alphatised—that is, that each bound variable has a distinct name.

Expressions of the form (ae<sup>0</sup> ae1) for some ae<sup>0</sup> and ae<sup>1</sup> constitute the set App; similarly, expressions of the form λx.e for some x and e constitute the set Lam.

# **3 Background**

In this section, we review abstract garbage collection and the k-CFA context abstraction. We begin by introducing a small-step concrete semantics which defines the ground truth of evaluation.

#### **3.1 Semantic Domains**

First, we introduce some semantic components that we will use heavily throughout the rest of the paper.

$$\begin{aligned} v \in Val &= Lam \times Env & \rho \in Env = Var \rightharpoonup Time \\ t \in Time &= App^\* & a \in Address = Var \times Time \\ \sigma \in Store &= Address \rightharpoonup Val & \kappa \in Cont \dots = \mathfrak{m} \mid \mathfrak{k}(x, \rho, e, \kappa) \end{aligned}$$

A value v is closure, a pair of a λ abstraction and an environment which closes it. An environment ρ is a finite map from each variable x to a time t; a time t is a finite sequence of call sites. Let ρ|<sup>e</sup> denote the restriction of the domain of the environment ρ to the free variables of e. An address a is a pair of a variable and time and a store σ is a map from addresses to values. A continuation κ is either the empty continuation or the continuation of a let binding.

#### **3.2 Concrete Semantics**

We define our concrete semantics as a small-step relation over abstract machine states. The state space of our machine is given formally as follows.

$$\begin{aligned} \varsigma &\in State = Eval + Apply \\ \varsigma\_{\mathsf{ev}} &\in \mathit{Eval} = Exp \times Env \times Store \times Cont \times Time \\ \varsigma\_{\mathsf{sp}} &\in Apply = Val \times Store \times Cont \times Time \end{aligned}$$

Machine states come in two variants. An Eval machine state represents a point in execution in which an expression will be evaluated; it contains registers for an expression e, its closing environment ρ, the store σ (modelling the heap), the continuation κ (modelling the stack), and the time t. An Apply machine state represents a point in execution at which a value is in hand and must be delivered to the continuation; it contains registers for the value v to deliver, the store σ, the continuation κ, and the time t.

Figure 1 contains the definitions of two relations over machine states, the union of which constitutes the small-step relation. The →ev relation transitions an Eval state to its successor. The Let rule pushes a continuation frame to save the bound variable, environment, and body expression. The resultant Eval state is poised to evaluate the bound expression ce. The Call rule first uses aeval defined

$$\mathsf{a}\mathsf{eval}(\sigma,\rho,x) = \sigma(x,\rho(x)) \qquad \qquad \mathsf{a}\mathsf{eval}(\sigma,\rho,\lambda x.e) = (\lambda x.e,\rho|\_{\lambda x.e})^2$$

to obtain values for each of the operator and argument. It then increments the time, extends the store and environment with the incremented time, and arranges evaluation of the operator body at the incremented time. The Set! rule remaps a location in the store designated by a given variable (which is resolved in the environment) to a value obtained by aeval. It returns the identity function.

$$\begin{array}{c} \text{LET} \\ \hline \\ \text{ev}(\text{let } x = ce \text{ in } e, \rho, \sigma, \kappa, t) \rightarrow\_{\mathsf{e}\mathsf{v}} \mathsf{ev}(ce, \rho, \sigma, \mathsf{lt}(x, \rho, e, \kappa), t) \\\\ \text{CALL} \\ (\lambda x.e, \rho') = \mathsf{a}\mathsf{eval}(\sigma, \rho, ae\_{0}) \qquad v = \mathsf{a}\mathsf{eval}(\sigma, \rho, ae\_{1}) \qquad t' = (ae\_{0}, ae\_{1}) \ :: t \\ \hline \\ \sigma' = \sigma[(x, t') \mapsto v] \qquad \rho' = \rho'[x \mapsto t'] \\\\ \text{SET} \\ v = \mathsf{a}\mathsf{eval}(\sigma, \rho, ae\_{1}), \rho, \sigma, \kappa, t) \rightarrow\_{\mathsf{e}\mathsf{v}} \mathsf{ev}(e, \rho', \sigma', \kappa, t') \\\\ \frac{v = \mathsf{a}\mathsf{eval}(\sigma, \rho, ae) \qquad a = (x, \rho(x)) \qquad \sigma' = \sigma[a \mapsto v] \\\\ \text{Aтоміс} \\ v = \mathsf{a}\mathsf{eval}(\sigma, \rho, ae) \qquad \frac{\mathsf{A}\mathsf{P}\mathsf{r}\mathsf{r}\mathsf{r}}{\mathsf{e}\mathsf{v}(\mathsf{a}\mathsf{s}, \rho, \mathsf{e}, \kappa, t)} \qquad \frac{\mathsf{A}\mathsf{P}\mathsf{r}\mathsf{r}}{\mathsf{e}\mathsf{p}(\mathsf{c}, \mathsf{x}, \mathsf{t})\mathsf{e}\mathsf{e}\mathsf{v}, \mathsf{c}, \mathsf{t}, \mathsf{t}}{\mathsf{e}\mathsf{p}(\mathsf{c}, \mathsf{x}, \mathsf{t}(\mathsf{a}, \mathsf{e}, \mathsf{x}, \mathsf{t}, t)) \rightarrow\_{\mathsf{e}\mathsf{p}} \mathsf{e}\mathsf{e}(x, \mathsf{t}) \mapsto v]} \\\\ \frac{\mathsf{$$

The Atomic rule evaluates an atomic expression. The Apply rule applies a continuation to a value, extending the environment and store and arranging for the evaluation of the let body.

We inject a program pr into the initial evaluation state ev(pr , ⊥, ⊥, mt,) which arranges evaluation in the empty environment, empty store, halt continuation, and empty time.

**Adding Garbage Collection** At this point, we have a small-step relation defining execution by abstract machine and are perfectly positioned to apply, e.g., the Abstracting Abstract Machines (AAM) [15] recipe to abstract the semantics and thereby obtain a sound, computable CFA. Before doing so, however, we will extend our semantics to garbage-collect the store on each transition. This extension has no semantic effect in the concrete semantics but, as we will discuss, greatly increases the precision of the abstracted (or, simply, abstract) semantics.

We extend the semantics by defining two garbage collection transitions, one which collects an Eval state and one which collects an Apply state. Because our abstract machine explicitly models local environments, heaps (via stores), and stacks (via continuations), we can apply a copying collector to perform garbage collection.

First, we define a family root of metafunctions to extract the reachability root set from values, environments, and continuations.

$$\begin{aligned} \mathsf{root}\_v(\lambda x.e,\rho) &= \mathsf{root}\_\rho(\rho) & \mathsf{root}\_\kappa(\mathsf{mt}) &= \emptyset \\ \mathsf{root}\_\rho(\rho) &= \rho & \mathsf{root}\_\kappa(\mathsf{lt}(x,\rho,e,\kappa)) &= \mathsf{root}\_\rho(\rho|\_e) \cup \mathsf{root}\_\kappa(\kappa) \end{aligned}$$

The root<sup>v</sup> metafunction extracts the root addresses from a closure by using root<sup>ρ</sup> to extract the root addresses from its environment. By the root<sup>ρ</sup> metafunction, the root addresses of an environment are simply the variable–time pairs that define it—that is, the definition of root<sup>ρ</sup> views its argument ρ extensionally as a set of addresses. The root<sup>κ</sup> metafunction extracts the root addresses from a continuation. The empty continuation has no root addresses whereas the root addresses of a non-empty continuation are those of its stored environment (restricted to the free variables of the expression it closes) combined with those of the continuation it extends.

Next, we define a reachability relation →<sup>σ</sup> parameterized by a store σ and over addresses by

$$a\_0 \to\_{\sigma} a\_1 \Leftrightarrow a\_1 \in \mathsf{root}\_v(\sigma(a\_0))$$

We then define the reachability of a root set with respect to a store

$$\mathcal{R}(\sigma, A) = \{a' : a \in A, a \to\_{\sigma}^\* a'\}$$

where →<sup>∗</sup> <sup>σ</sup> is the reflexive, transitive closure of →σ. From here, we obtain the transitions

$$\begin{array}{l} \text{GC-EvAL} \\ A = \mathsf{root}\_{\rho}(\rho|\_{e}) \cup \mathsf{root}\_{\kappa}(\kappa) & \sigma' = \sigma|\_{\mathcal{R}(\sigma,A)} \\ \hline \mathsf{ev}(e,\rho,\sigma,\kappa,t) \to\_{\mathsf{GC}} \mathsf{ev}(e,\rho,\sigma',\kappa,t) \\ \hline A = \mathsf{root}\_{v}(v) \cup \mathsf{root}\_{\kappa}(\kappa) & \sigma' = \sigma|\_{\mathcal{R}(\sigma,A)} \\ \hline \mathsf{ap}(v,\sigma,\kappa,t) \to\_{\mathsf{GC}} \mathsf{ap}(v,\sigma',\kappa,t) \end{array}$$

where σ|R(σ,A) is σ restricted to the reachable addresses R(σ, A). We compose this garbage-collecting transition with each of →ev and →ap. Altogether, the garbage-collecting semantics are given by →GC ◦[→ev ∪ →ap].

#### **3.3 Abstracting Abstract Machines with Garbage Collection**

Now that we have a small-step abstract machine semantics with GC, we are ready to apply the AAM recipe to obtain a sound, computable CFA with GC.

We apply the AAM recipe in two steps.

First, we refactor the state space so that all inductively-defined components are redirected through the store. Practically, this refactoring has the effect of allocating continuations in the store. For our semantics, this refactoring yields the state space StateSA defined

$$\begin{aligned} State\_{SA} &=Eval\_{SA} + Apply\_{SA} \\ Level\_{SA} &= Exp \times Env \times Store\_{SA} \times ContAddr \times Time \\ Apply\_{SA} &=Storage \times ContAddr \times Val \times Time \end{aligned}$$

in which a continuation address α ∈ ContAddr replaces the continuation drawn from Cont. The space of continuations becomes defined by

$$\kappa\_{SA} \in \operatorname{Cont}\_{SA} \coloneqq \mathfrak{m} \mid \mathfrak{k}(x, \rho, e, \alpha)$$

and of stores by

$$Store\_{SA} = Address + ContAddr \rightarrow Val + Cont\_{SA}$$

Not reflected in this structure is the typical constraint that an address a will only ever locate a value and a continuation address α will only ever locate a continuation.

Second, we finitely partition the unbounded address space of the store and treat the constituent sets as abstract addresses (via some finite representative). Practically, this partitioning is achieved by limiting the time t to at most k call sites where k becomes a parameter of the CFA (leading to the designation k-CFA). Any addresses which agree on the k-length prefix of their time component are identified and the finite representative for this set of addresses uses simply that prefix. Accordingly, we define an abstract time domain Time - = Time≤<sup>k</sup> and let it reverberate through the state space definitions, obtaining

$$\begin{aligned} \widehat{State} &= \widehat{Eval} + \widehat{Apply} \\ \widehat{Eval} &= \widehat{Eval} \times \widehat{Env} \times \widehat{Store} \times \widehat{ConfAdr} \times \widehat{Time} \\ \widehat{Apply} &= \widehat{Store} \times \widehat{ContAdr} \times \widehat{Val} \times \widehat{Time} \end{aligned}$$

(in which we allow the definition of ContAddr to depend, directly or not, on that of Time).

Finitization of the address space is key to producing a computable CFA. Practically, however, it means that some values located previously by distinct addresses will after be located by the same abstract address. When this conflation occurs, the CFA must behave as if either access was intended; this behavior is manifested by non-deterministically choosing the value located by a particular address. Because our language is higher-order, this non-determinism also affects the control flows the CFA considers. This effect is evident in the Call rule defined

$$\begin{array}{llll} \text{CALL} & & \\ (\lambda x.e,\hat{\rho}') \in \widehat{\mathsf{aeval}(\hat{\sigma},\hat{\rho},ae\_{0})} & \hat{v} = \widehat{\mathsf{aeval}(\hat{\sigma},\hat{\rho},ae\_{1})} & \hat{t}' = \lfloor(ae\_{0}\,ae\_{1}) \dots \hat{t}\rfloor\_{k} \\ & & \hat{\sigma}' = \hat{\sigma}[(x,\hat{t}') \mapsto \hat{v}] & \hat{\rho}'' = \hat{\rho}'[x \mapsto \hat{t}'] \\ \hline & & \mathbf{ev}((ae\_{0}\,ae\_{1}),\hat{\rho},\hat{\sigma},\hat{\alpha},\hat{t}) \to\_{\mathsf{ev}} \mathbf{ev}(e,\hat{\rho}'',\hat{\sigma}',\hat{\alpha},\hat{t}') \end{array}$$

which is structurally identical to that of the concrete semantics except in two respects:


In short, a finite address space introduces a value approximation and, in a higherorder language such as ours, a control approximation as well.

While the strategy to store-allocate continuations facilitates the systematic abstraction process of AAM, it also imposes a similar approximation on the continuation space as it does the value space. In consequence, a CFA obtained by AAM approximates not only the value and control flow of the program, but the return flow as well. Return-flow approximation is manifest as a single abstract call returning to caller contexts that did not make that call.<sup>6</sup>

On the other hand, because the AAM abstraction process preserves the overall structure of the state space—in particular, the explicit models of the local environment, heap, and stack—applying GC to an abstract state is straightforward. In addition, GC in the abstract semantics improves precision and reduces the workload of the analyzer [10].

To see how GC improves precision, consider a 0CFA (that is, [k = 1]CFA) without GC of the Scheme program

```
(let* ([id (lambda (x) x)]
       [y (id 42)]
       [z (id 35)])
  z)
```
at the call (id 42). As the abstract call is made, the abstract value 42 is stored an address a derived from x. Once the call returns, the abstract value 42 still resides in the heap at a which is now unreachable. However, as the abstract call (id 35) is made, the address a is derived again (a consequence of the finite address space), and the abstract value 35 is merged with the abstract value 42 which persists at a. Since the value at a is returned and becomes the result of the program, the CFA reports that the program results in either 42 or 35.

Now consider a 0CFA with GC of the same program. Once the call (id 42) returns and α becomes unreachable, its heap entry is reaped by GC. The abstract call (id 35) then allocates the abstract value 35 at a which is, from the allocator's perspective, a fresh heap cell. Consequently, the CFA precisely reports that the program results in 35.

The above example also illustrates how GC reduces the workload of the analyzer. Though we didn't call it out, when using a naive continuation allocator without GC, the abstract call (id 35) not only correctly returns to the continuation binding z but also spuriously returns to the continuation binding y. In this example, this spurious control (return) flow does no more damage to the precision of 0CFA's approximation of the final program result, but does cause it to explore infeasible control flows which damage the precision of the 0CFA's approximation of intermediate values. GC prevents the spurious flows in this example from arising at all; however, in general, it does not prevent all spurious return flows.

<sup>6</sup> P4F [6] uses a particular continuation allocator which is able to avoid return-flow approximation. However, the P4F technique applies only when the store is globallywidened and, in such a setting, no data ever becomes unreachable which renders GC completely ineffective.

#### **3.4 Stack-Precise CFA with Garbage Collection**

In contrast to an AAM-derived analysis, a stack-precise CFA does not approximate the return flow of the program. A stack-precise CFA achieves this feat by modelling control flow with a pushdown system which allows it to precisely match returns with their corresponding calls. However, to do so, it requires full control of the continuation which we abide by factoring it out of the state space, obtaining

> StatePD <sup>=</sup>EvalPD <sup>+</sup> ApplyPD EvalPD <sup>=</sup>Exp <sup>×</sup> Env <sup>×</sup> Store <sup>×</sup> Time ApplyPD <sup>=</sup>Val <sup>×</sup> Store <sup>×</sup> Time

before we abstract it to produce a CFA. (Some CFAs factor the store out of machine states to be managed globally, part of widening the store. In a sense, factoring out the continuation is part of widening the continuation.) Without a continuation component, an EvalPD state is an evaluation configuration and an ApplyPD state is an evaluation result. Except for the presence of the time component, StatePD exhibits precisely the configuration and result shapes one finds in many stack-precise CFAs [17, 8, 1, 18].

However, factoring the continuation out and ceding control of it to the analysis presents an obstacle to abstract GC, which needs to extract the root set of reachable addresses from it. Earl et al. [4] developed a technique whereby the analysis could introspect the continuation and extract the root set of reachable addresses from the continuation. Johnson and Van Horn [8] reformulated this incomplete technique for an operational setting and offered a complete—albeit theoretically more-expensive—technique capable of more precision. Johnson et al. [7] unified these techniques within an expanded framework. Darais et al. [1] then showed that the Abstracting Definitional Interpreters-approach—currently the state of the art—is compatible with the complete technique by including the set of stack root addresses as a component in the evaluation configuration.

**Context Irrelevance** These techniques indeed reconcile the conflicting needs of GC and stack-precise control yielding an analysis which enjoys the precisionenhancing benefits of each. However, the addition of garbage collection causes the resultant analysis to violate context irrelevance [8], the property that the evaluation of a configuration is independent of its continuation. In terms of the concrete semantics of Section 3.2, context irrelevance is the property that ev(e, ρ, σ, κ, t) <sup>→</sup><sup>+</sup> ap(σ , κ, v) if and only if ev(e, ρ, σ, κ , t) <sup>→</sup><sup>+</sup> ap(σ , κ , v) for any κ and κ .

The incomplete and complete techniques to achieve stack-precise abstract GC each violate context irrelevance. Under the incomplete technique, abstract GC prevents spurious paths from being explored and changes the store yielded by those that are explored. Thus, the abstract evaluation of a configuration becomes dependent on (the root set of reachable addresses embedded in) its continuation. The complete technique, achieved by introducing the set of root addresses as a component in the evaluation configuration, vacuously restores context irrelevance by distinguishing otherwise-identical configurations based on the continuation. That is, the states ev(e, ρ, σ, κ, t) and ev(e, ρ, σ, κ , t) with identical configurations but distinct continuations become the continuation-less evaluation configurations ev(e, ρ, σ, A, t) and ev(e, ρ, σ, A , t) with distinct root address sets A and A . This address set is a close approximation of the continuation and effectively makes the control context relevant to evaluation.

#### **3.5 The** *k***-CFA Context Abstraction**

In the concrete semantics, the time component t serves two purposes. The first purpose is to provide the allocator with a source of freshness, so that when the allocator must furnish a heap cell for a variable bound previously in execution, it is able to furnish a distinct one. Were freshness the only constraint on t, the Time domain could simply consist of N. In anticipation of its role in the downstream CFA, the time component assumes a second purpose which is to capture some notion of the context in which execution is occurring. The hope is that the notion of context it captures is semantically meaningful so that, when an unbounded set of times are identified by the process of abstraction, each address, which is qualified by such an abstracted time, locates a semantically-coherent set of values.

To get a better idea of what notion of context our treatment of time captures, let's examine how our concrete semantics treats time, as dictated by k-CFA. Time begins as the empty sequence . It is passed unchanged across all Eval transitions, save one, and the Apply transition. The exception is the Call transition, which instead passes the (at-most-)k-length prefix of the application prepended to the incoming time. Hence, the k-CFA context abstraction is the k-most-recent calls made in execution history.

In Section 6.2, we consider the ramifications of threading the time component through evaluation and compare it to an alternative treatment.

# **4 From Threaded to Compositional Stores**

In this section, we present a series of four semantics that gradually transition from a threaded treatment of stores without GC to a compositional treatment of stores with GC. We define each of these semantics in terms of big-step judgments of (or close to) the form σ, ρ, t e ⇓ (v, σ ). This judgment expresses that the evaluation configuration consisting of the expression e under the store σ, environment ρ, and timestamp t evaluates to the evaluation result consisting of the value v and the store σ . When discussing the evaluation of e, we will refer to σ as the incoming store and σ as the resultant store. We will also refer to the time component t as the binding context since, in the big-step semantics, its connection to the history of execution becomes more distant.

Formulating our semantics in big-step style offers two advantages to our setting: First, we can readily express them by big-step definitional interpreters at which point we can apply systematic abstraction techniques [1, 18] to obtain corresponding CFAs exhibiting perfect stack precision. Second, they emphasize the availability of the configuration store at the delivery point of the evaluation result; this availability is crucial to our ability to shift to a compositional treatment of the store.

#### **4.1 Threaded-Store Semantics**

To orient ourselves to the big-step setting, we present the reference semantics for our language in big-step style in Figure 2. This reference semantics is equivalent to the reference semantics given in small-step style in Section 3.2 except that there is no corresponding Apply rule; its responsibility—to deliver a value to a continuation—is handled implicitly by the big-step formulation. In terms of big-step semantics, this reference semantics is characterized by the threading of the store through each rule; the resultant store of evaluation is the configuration store plus the allocation and mutation incurred during evaluation. Hence, we refer to this semantics as the threaded-store semantics. We use natural numbers as store subscripts in each rule to emphasize the store's monotonic increase.

$$\begin{array}{ccl} \text{LET} & \sigma\_{0}, \rho, t \vdash ce \Downarrow (v\_{0}, \sigma\_{1})\\ \rho' = \rho[x \mapsto t] & \sigma\_{2} = \sigma\_{1}[(x,t) \mapsto v\_{0}] & \sigma\_{2}, \rho', t \vdash e \Downarrow (v, \sigma\_{3})\\ & \sigma\_{0}, \rho, t \vdash \texttt{let} \, x = ce \text{ in } e \Downarrow (v, \sigma\_{3}) \end{array}$$

$$\begin{array}{c} \text{CALL} \\ (\lambda x. e, \rho\_{0}), \sigma\_{1}) = \texttt{aeval}(\sigma\_{0}, \rho, ae\_{0}) \\ (v\_{1}, \sigma\_{2}) = \texttt{aeval}(\sigma\_{1}, \rho, ae\_{1}) & t' = (ae\_{0}, ae\_{1}) \dots t \\ \hline \rho\_{1} = \rho\_{0}[x \mapsto t'] & \sigma\_{3} = \sigma\_{2}[(x, t') \mapsto v\_{1}] & \sigma\_{3}, \rho\_{1}, t' \vdash e \Downarrow (v, \sigma\_{4}) \end{array}$$

$$\begin{array}{c} \text{SET}! \\ \text{SET}! \\ \hline \sigma\_{0}, \rho, t \vdash \texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt$$


A program pr is evaluated in an initial configuration with an empty store ⊥, an empty environment ⊥, and an empty binding context . In such a configuration, pr evaluates to a value v if ⊥, ⊥, pr ⇓ (v, σ).

The Let rule evaluates the bound call expression ce under the incoming environment and store. If evaluation results in a value–store pair, this incoming environment is extended with a binding derived from the bound variable and

incoming binding context.<sup>7</sup> The resultant store is extended with mapping from that binding to the resultant value. The body expression is evaluated under the extended environment and store and its result becomes that of the overall expression.

Contrasting the treatment of the environment and the store by the Let rule is instructive. On the one hand, the environment is treated compositionally: the incoming environment of evaluation is restored and extended after evaluation of the bound value. On the other hand, the store is treated non-compositionally: the store resulting from the evaluation of the bound expression is extended after it has accumulated the effects of its evaluation.

Under this criteria, we classify the treatment of the binding context as compositional rather than threaded. This compositional treatment departs from typical practice of CFA and is the first such treatment in a stack-precise CFA to our knowledge. In Section 6.2, we examine the ramifications of this treatment.

The Call rule evaluates the atomic expressions ae<sup>0</sup> and ae<sup>1</sup> for the operator and argument, respectively. It then derives a new binding context, extends the environment and store with a binding using that context, and evaluates the operator body under the extended environment, store, and derived binding context. The result of evaluation the body is that of the overall expression.

The Set! rule evaluates the atomic body expression ae and updates the binding of the referenced variable in the store. Its result is the identity function paired with the updated store.

The Atomic rule evaluates an atomic expression ae using the aeval atomic evaluation metafunction. Foreshadowing the succeeding semantics, we define aeval to return a pair of its calculated value and the given store. In this semantics, the store is passed through unmodified; in forthcoming semantics, it will be altered according to the calculated value. Atomic evaluation is unchanged from the small-step semantics:

$$\mathsf{a}\mathsf{a}\mathsf{e}\mathsf{a}\mathsf{l}(\sigma,\rho,x) = (\sigma(x,\rho(x)),\sigma) \qquad \mathsf{a}\mathsf{e}\mathsf{e}\mathsf{a}\mathsf{l}(\sigma,\rho,\lambda x.e) = ((\lambda x.e,\rho|\_{\lambda x.e}),\sigma)$$

#### **4.2 Threaded-Store Semantics with Effect Log**

The second semantics enhances the reference semantics with an effect log ξ which explicitly records the allocation and mutation that occurs through evaluation. The effect log is considered part of the evaluation result; accordingly the effect log semantics are in terms of judgments of the form σ, ρ, t e ⇓! (v, σ ), ξ. Figure 3 presents the effect log semantics, identical to the reference semantics except for (1) the addition of the effect log and (2) the use of the metavariable a to denote an address (x,t). (This usage persists in all subsequent semantics as well.)

The effect log is represented by a function from store to store. The definition of each log is given by either a literal identity function, a use of the extendlog

<sup>7</sup> Because the program is alphatised, the binding of a let-bound variable in a particular calling context will not interfere with the binding of any other variable.

$$\begin{array}{ll} \text{LET} & \sigma\_{0}, \rho, t \vdash ce \Downarrow \Downarrow (v\_{0}, \sigma\_{1}), \xi\_{0} \\\rho' = \rho[x \mapsto t] & \sigma\_{2} = \sigma\_{1}[(x, t) \mapsto v\_{0}] & \sigma\_{2}, \rho', t \vdash e \Downarrow (v, \sigma\_{3}), \xi\_{1} \\\hline \sigma\_{0}, \rho, t \vdash \mathsf{let} \, x = ce \text{ in } e \Downarrow \Downarrow (v, \sigma\_{3}), \xi\_{1} \circ \mathsf{extended}\_{\log}((x, t), v\_{0}, \sigma\_{1}) \diamond \xi\_{0} \end{array}$$

Call

$$\begin{array}{c} ((\lambda x.e,\rho\_0),\sigma\_1) = \mathsf{aeval}(\sigma\_0,\rho,ae\_0) \\ (\upsilon\_1,\sigma\_2) = \mathsf{aeval}(\sigma\_1,\rho,ae\_1) \\ \rho\_1 = \rho\_0[x\mapsto t'] \qquad \sigma\_3 = \sigma\_2[(x,t')\mapsto\upsilon\_1] \qquad \sigma\_3,\rho\_1,t'\vdash e\Downarrow,(\upsilon,\sigma\_4),\xi \\ \hline \sigma\_0,\rho,t\vdash(ae\_0\ a e\_1)\Downarrow,\mathsf{l}\mathbin{!}\ (\upsilon,\sigma\_4),\xi\circ\mathsf{cendend} \end{array}$$

Set!

(v, σ1) = aeval(σ0, ρ, ae) a = (x, ρ(x)) σ<sup>1</sup> = σ0[a → v] <sup>σ</sup>0, ρ, t set! <sup>x</sup> ae ⇓! ((λx.x, <sup>⊥</sup>), σ1), extendlog (a, v, σ1)

Atomic

σ, ρ, t ae ⇓! aeval(σ, ρ, ae), λσ.σ

**Fig. 3.** Threaded-store semantics with an effect log

metafunction, or the composition of effect logs. The extendlog metafunction is defined

$$\mathsf{extend}\_{\log}(a, v, \sigma') = \lambda \sigma.\sigma[a \mapsto v] \cup \sigma'$$

where the union of the extended store σ[a → v] and the value-associated store σ treats each store extensionally as a set of pairs but the result is always a function—i.e. any given address is paired with at most one value. The effect log of the Atomic rule is the identity function, reflecting that no allocation or mutation is performed when evaluating an atomic expression. The effect log of the Set! rule is constructed by the metafunction extendlog ; the store argument to extendlog is the store after the mutation has occurred. The use of this store is necessary to propagate the mutative effect and ensures that its union with the store on which this log is replayed agrees on all common bindings. The effect log of the Call rule is composed of the effect log of evaluation of the body and an entry for the allocation of the bound variable. Finally, the effect log of the Let rule is composed of the effect logs of evaluation of both the body and binding expression interposed by an entry for the allocation of the bound variable.

In this semantics (and the next), the bindings in σ are redundant: once extendlog applies the the mutative or allocative binding to its argument <sup>σ</sup>, <sup>σ</sup> already contains all the bindings of σ . Once we introduce GC to the semantics, however, this will no longer be the case.

The intended role of the effect log is captured by the following lemma, which states that one may obtain the resultant store by applying the resultant log to the initial store of evaluation.

**Lemma 1.** If σ, ρ, t e ⇓! (v, σ ), ξ, then σ = ξ(σ).

The proof proceeds straightforwardly by induction on the judgment's derivation.

#### **4.3 Compositional-Store Semantics**

The third semantics (seen in Figure 4) shifts the previous semantics from threading the store to treating it compositionally. Under this treatment, evaluation results still consist of a value, store, and effect log, but the store is associated directly to the value—at least conceptually—and not treated as a global effect repository. This alternative role is particularly apparent in the Let rule: the store resulting from evaluation of the bound expression is not extended to be used as the initial store of evaluation of the body. Instead, the effect log resulting from evaluation of the bound expression is applied to the initial store (of the overall let expression). We emphasize this compositional treatment by no longer using numeric subscripts, which suggest "evolution" of the store, and instead using ticks, which suggest distinct (but related) instances.

Let σ, ρ, t ce ⇓◦ (v- , σv- ), ξ σ- = ξ- (σ) (ρ- , σ--) = extend(ρ, σ- , x, t, v- , σv- ) σ--, ρ- , t e ⇓◦ (v, σv), ξ σ, ρ, t let <sup>x</sup> <sup>=</sup> ce in <sup>e</sup> ⇓◦ (v, σv), ξ ◦ extendlog ((x, t), v- , σv- ) ◦ ξ- Call ((λx.e, ρ0), σ0) = aeval(σ, ρ, ae<sup>0</sup>) (v1, σ1) = aeval(σ, ρ, ae<sup>1</sup>) <sup>t</sup> - = (ae<sup>0</sup> ae<sup>1</sup>) :: <sup>t</sup> (ρ- , σ- ) = extend(ρ0, σ0, x, t- , v1, σ1) σ- , ρ- , t- e ⇓◦ (v, σv), ξ σ, ρ, t (ae<sup>0</sup> ae<sup>1</sup>) ⇓◦ (v, σv), ξ ◦ extendlog ((x, t- ), v1, σ1) Set! (v, σv) = aeval(σ, ρ, ae) a = (x, ρ(x)) σ- = σv[a → v] σ, ρ, t set! <sup>x</sup> ae ⇓◦ ((λx.x, <sup>⊥</sup>), σ- ), extendlog (a, v, σ- ) Atomic σ, ρ, t ae ⇓◦ aeval(σ, ρ, ae), λσ.σ

**Fig. 4.** The compositional-store semantics

We use the extend metafunction to bind a value v (with an associated store σv) to a variable x in a given binding context t within a given environment ρ and store σ, defined

$$\mathsf{extend}(\rho, \sigma, x, t, v, \sigma\_v) = (\rho[x \mapsto t], \sigma[(x, t) \mapsto v] \cup \sigma\_v)$$

When we extend σ with a mapping for v, we also copy all of the mappings from σv. This copying will yield a well-formed store since σ[(x, t) → v] and σ<sup>v</sup> agree on any common bindings.

Although the role of the store has changed, the same lemma holds in this semantics as does in the previous. We repeat it in terms of this semantics.

**Lemma 2.** If σ, ρ, t e ⇓◦ (v, σv), ξ, then ξ(σ) = σv.

Like the previous lemma, its proof can be obtained by induction on the judgment's derivation.

#### **4.4 Compositional-Store Semantics with Garbage Collection**

Our final semantics (seen in Figure 5) continues the compositional treatment of the store but GCs stores to remove irrelevant bindings. Under this compositional treatment, the role of the store is to model the fragment of the heap which is reachable from an associated environment: the store of a configuration closes the associated environment and the store of a result closes the environment of the associated value. Accordingly, the root set of reachability used by GC includes the addresses of the closed environment only and, in particular, does not include addresses from the continuation. We define reachability just as we did for GC in Section 3.2, using the root<sup>v</sup> and root<sup>ρ</sup> metafunctions to extract a root set from a value and environment, respectively.

In this semantics, we use a modified atomic evaluation function aevalgc which garbage-collects the store associated with a value. It is defined

$$\begin{aligned} \mathsf{aeval}\_{gc}(\sigma, \rho, x) &= (v, \mathsf{gc}(v, \sigma)) \text{ where } v = \sigma(x, \rho(x)) \\ \mathsf{aeval}\_{gc}(\sigma, \rho, \lambda x.e) &= (v, \mathsf{gc}(v, \sigma)) \text{ where } v = (\lambda x.e, \rho|\_{\lambda x.e}) \end{aligned}$$

where gc(v, σ) prunes the unreachable bindings from σ with respect to v.

This semantics is careful to ensure that each evaluation is performed under a store which contains no values unreachable from the environment via frequent use of the restrict metafunction. For a given expression e, closing environment ρ, and closing store σ, the restrict metafunction first determines the restriction ρ|<sup>e</sup> of ρ to the free variables of e and then the bindings of σ reachable from ρ|e; it then garbage-collects the store by pruning unreachable bindings. Formally, restrict is defined

$$\text{restrict}(e, \rho, \sigma) = (\rho|\_e, \mathbf{gc}(\rho|\_e, \sigma))$$

where gc(ρ, σ) prunes the unreachable bindings from σ with respect to ρ.

The Let rule proceeds by first obtaining the restriction of the environment and store with respect to the bound expression ce, before evaluating ce under that restriction. The evaluation of ce produces a value v , an associated store σ<sup>v</sup>- which closes only that value, and an effect log ξ . The Let rule then replays the effect log ξ on the initial store σ thereby accumulating any mutation (and allocation on which it depends) which occurred. After replaying the log, it extends the resultant store σ and initial environment ρ with a binding for v and copies Let

$$\begin{aligned} \frac{(\rho\_{sc}, \sigma\_{sc}) = \text{restritt}(ec, \rho, \sigma)}{(\rho\_{sc}, \sigma\_{sc}) = \text{restritt}(ec, \rho, \sigma)}\\ \frac{\sigma\_{sc}, \rho\_{sc}, t \vdash ce \underset{\wp}{\downarrow}(v', \sigma\_{v}), \zeta' \qquad \sigma' = \zeta'(o) \qquad (\rho', \sigma') = \text{extrend}(\rho, \sigma', x, t, v', \sigma\_{v'})}{(\rho\_{sc}, \sigma\_{c}) = \text{restritt}(e, \rho', \sigma') \qquad \sigma\_{c}, \rho\_{sc}, t \vdash e \underset{\wp}{\downarrow}(\sigma\_{v}, \sigma\_{v}), \zeta'}{(\sigma\_{v}, \sigma\_{v}) \in \text{extrend}(\rho\_{c}(x), t, v', \sigma\_{v'}) \diamond \zeta'} \end{aligned}$$

$$\begin{aligned} \text{CALL} \\ \frac{\text{CLL}}{(\lambda \sigma, e, \rho\_{0}) \sigma\_{0}) = \text{awel}\_{sc}(\sigma, \rho, a\_{0}) \qquad (v\_{1}, \sigma\_{1}) = \text{awel}\_{sc}(\sigma, \rho, a\_{1}) \\ t' = (ae\_{0}, a\_{1}) \div t \qquad (\rho', \sigma') = \text{extrend}(\rho\_{0}, \sigma\_{0}, x, t', v\_{1}, \sigma\_{1}) \\ (\rho\_{s}, \sigma\_{c}) = \text{restritt}(e, \rho', \sigma') \qquad \sigma\_{e}, \rho\_{s}, t' \vdash e \downarrow\_{sc}(v\_{.}, v\_{.}), \zeta' \\ \sigma, \rho, t \vdash (ae\_{0}, ae\_{1}) \Downarrow\_{sc}(v, \sigma\_{v}), \xi \diamond \text{extend}(\rho\_{s}, t') \text{v\_{1} \frown a\_{1}}) \end{$$

**Fig. 5.** The compositional-store semantics with garbage collection

the bindings of its associated store σ<sup>v</sup>- . Finally, the extended environment and store are restricted with respect to the body expression e before e's evaluation under them.

The Call rule proceeds by first evaluating the atomic operator and argument expressions. After calculating the new binding context t , the operator value environment and store are extended with the new binding. Before evaluation of the body e commences, the extended environment and store are restricted with respect to it.

The Set! rule atomically evaluates the expression ae producing the assigned value. It returns the identity function which, with an empty environment, is closed by an empty store.

The Atomic rule evaluates an atomic expression with aevalgc.

To connect this semantics to the previous, we show that the addition of GC has no semantic effect by the following lemma.

**Lemma 3.** If σ, ρ, t e ⇓◦ (v, σv), ξ and σ = gc(ρ|e, σ) then σ , ρ, t <sup>e</sup> ⇓gc (v, σ <sup>v</sup>), ξ where σ <sup>v</sup> = gc(v, σv).

In prose, this lemma states that two evaluation configurations, identical except that one's store is the other's with unreachable bindings pruned, will yield the same evaluation result: their evaluation will produce the same value and, modulo unreachable bindings, the same closing store.

# **5 Abstract Compositional-Store Semantics with Garbage Collection**

We now abstract the compositional-store semantics with GC—the final semantics of the preceding section. Abstracting the semantics involves (1) defining a finite counterpart of each component of the evaluation configuration and result and (2) defining a counterpart of each semantic rule in terms of these finite components. With each component of the configuration finite, configurations themselves become finite. Then we show that each abstracted rule simulates its counterpart—that it admits the full range of its counterpart's behavior. Doing this for each rule ensures that the abstract semantics includes every behavior included by the exact semantics. Once that's complete, we can directly implement our big-step semantics in an abstract definitional interpreter [1, 18] to obtain our stack-precise CFA with GC.

We begin by abstracting each configuration component.

$$\begin{aligned} \hat{v} \in \widehat{Val} &= \mathcal{P}(Lam \times \widehat{Env}) & \hat{\rho} \in \widehat{Env} &= \widehat{Year} \\ \hat{t} \in \widehat{Time} &= App^{\leq m} & \hat{a} \in \widehat{Adress} &= Var \times \widehat{Time} \\ \hat{\sigma} \in \widehat{Store} &= \widehat{Adres} = \widehat{Adres} & \hat{\xi} \in \widehat{Log} &= \widehat{Adres} \rightarrow \widehat{Val} \end{aligned}$$

Like its concrete counterpart, an abstract store ˆσ maps an abstract address to an abstract value. Abstract addresses remain a pair of a variable and binding context, only the context is abstract. An abstract value ˆv, however, is a set of abstract closures rather than a single closure. An abstract closure is a λ paired with an abstract environment ˆρ which itself is a finite map from variables to binding contexts. An abstract timestamp t ˆis a sequence of at most m application sites, where m is a parameter to the analysis.<sup>8</sup> An abstract log ˆξ is an extensional account of the added and modified store mappings relative to the initial store, and takes the same form of an abstract store itself. We define abstract join, composition, and application operators by

$$
\hat{\sigma}\_0 \sqcup \hat{\sigma}\_1 = \lambda \hat{a}.
\hat{\sigma}\_0(\hat{a}) \cup \hat{\sigma}\_1(\hat{a}) \qquad \qquad \hat{\xi}\_0 \hat{\alpha} \hat{\xi}\_1 = \hat{\xi}\_0 \sqcup \hat{\xi}\_1 \qquad \qquad \hat{\xi}(\hat{\sigma}) = \hat{\sigma} \sqcup \hat{\xi}\_1
$$

To help show that the abstract semantics simulates the concrete, we make a connection between the state space of the abstract and that of the concrete. We make this connection by means of a polymorphic abstraction function |·|<sup>9</sup>, defined for all domains except stores by

$$|\rho| = \lambda x. |\rho(x)| \qquad |t| = \lfloor t \rfloor\_m \qquad |(\lambda x. e. \rho)| = \{(\lambda x. e. |\rho|) \} \qquad |\xi| = |\xi(\bot)|$$

and for stores by

$$|\sigma| = \lambda \hat{a}. \bigcup\_{|a|=\hat{a}} |\sigma(a)|$$

<sup>8</sup> The parameter m is used similarly to the parameter k of k-CFA.

<sup>9</sup> The abstraction function is typically accompanied by a complementary concretization function to complete a Galois connection. For simplicity here, we leave it incomplete.

Abstracting a store groups entries by their abstracted address in a large set. Abstracting an environment ρ abstracts its range. Abstracting a binding context t takes its at-most-m-length prefix. Abstracting a closure produces a singleton of that closure with an abstracted environment. Finally, abstracting a log ξ produces the abstract store that results from apply the log to the empty store ⊥ and then abstracting.

Figure 6 defines the abstract compositional-store semantics with garbage collection. Structurally, nearly every rule is identical to the exact counterpart that it abstracts; most of the work of abstraction is defining the abstract domains and metafunctions and connecting them to those of the exact semantics. The Call rule differs structurally from its exact counterpart in two notable ways: First, because an abstract value is a set of closures, it applies for each such closure in the operator set. Second, it defines the new binding context t ˆ to be the prefix of the application site prepended to the previous abstract time t ˆ and limited to a length of at most <sup>m</sup>. The abstract aeval metafunction is defined

$$\begin{aligned} \widehat{\mathsf{aeval}}(\hat{\sigma}, \hat{\rho}, x) &= \left(\hat{v}, \hat{\mathsf{gc}}(\hat{v}, \hat{\sigma})\right) \text{ where } \hat{v} = \hat{\sigma}(\hat{\rho}(x)) \\ \widehat{\mathsf{aeval}}(\hat{\sigma}, \hat{\rho}, \lambda x. e) &= \left(\hat{v}, \hat{\mathsf{gc}}(\hat{v}, \hat{\sigma})\right) \text{ where } \hat{v} = \{(\lambda x. e, \hat{\rho}|\_{\lambda x. e})\} \end{aligned}$$

We omit the straightforward definitions of the abstract variants of gc , restrict -, and extend -.

Let

$$\begin{array}{c} \langle \hat{\rho}\_{ce}, \hat{\sigma}\_{ce} \rangle = \widehat{\mathtt{strict}}(ce, \hat{\rho}, \hat{\sigma}) \\ \langle \hat{\sigma}\_{ce}, \hat{\rho}\_{ce}, \hat{t} \vdash ce \, \hat{\Downarrow}(\hat{v}', \hat{\sigma}\_{v'}), \hat{\xi}' \qquad \hat{\sigma}' = \hat{\xi}'(\hat{\sigma}) \\ \langle \hat{\rho}\_e, \hat{\sigma}\_e \rangle = \widehat{\mathtt{strict}}(e, \hat{\rho}', \hat{\sigma}'') \qquad \hat{\sigma}\_e, \hat{\rho}\_e, \hat{t} \vdash e \, \hat{\Downarrow}(\hat{v}, \hat{\sigma}\_v), \hat{\xi} \\ \hline \end{array}$$

$$\begin{array}{c} \text{CAL} \\ (\hat{\boldsymbol{v}}\_{0}, \hat{\boldsymbol{\sigma}}\_{0}) = \widehat{\mathtt{a}\mathbf{eval}}(\hat{\boldsymbol{\sigma}}, \hat{\boldsymbol{\rho}}, \boldsymbol{a}e\_{0}) \quad (\lambda x.e, \hat{\boldsymbol{\rho}}\_{0}) \in \boldsymbol{\upsilon}\_{0} \\ (\hat{\boldsymbol{v}}\_{1}, \hat{\boldsymbol{\sigma}}\_{1}) = \texttt{avel}(\hat{\boldsymbol{\sigma}}, \hat{\boldsymbol{\rho}}, \boldsymbol{a}e\_{1}) \\ \hat{t}' = \lfloor(\boldsymbol{a}e\_{0}\,\boldsymbol{a}e\_{1}) \coloneqq \hat{t}\rfloor\_{m} \quad (\hat{\boldsymbol{\rho}}', \hat{\boldsymbol{\sigma}}') = \texttt{cxtend}(\hat{\boldsymbol{\rho}}\_{0}, \hat{\boldsymbol{\sigma}}\_{0}, \boldsymbol{x}, \hat{t}', \hat{\boldsymbol{v}}\_{1}, \hat{\boldsymbol{\sigma}}\_{1}) \\ (\hat{\boldsymbol{\rho}}\_{e}, \hat{\boldsymbol{\sigma}}\_{e}) = \texttt{sxtrict}(e, \hat{\boldsymbol{\rho}}', \hat{\boldsymbol{\sigma}}') \quad \hat{\boldsymbol{\sigma}}\_{e}, \hat{\boldsymbol{\rho}}\_{e}, \hat{t}' \vdash e \, \boldsymbol{\Downarrow} (\hat{\boldsymbol{v}}, \hat{\boldsymbol{\sigma}}\_{v}), \hat{\boldsymbol{\xi}} \\ \hline \\ \hat{\boldsymbol{\sigma}}, \hat{\boldsymbol{\rho}}, \hat{t} \vdash (ae\_{0}\,ae\_{1}) \, \color[{]}{\Downarrow} (\hat{\boldsymbol{v}}, \hat{\boldsymbol{\sigma}}\_{v}), \hat{\boldsymbol{\xi}} \end{array}$$

Set!

$$\begin{array}{c} (\hat{\upsilon}, \hat{\sigma}\_{v}) = \widehat{\mathsf{aeval}}(\hat{\sigma}, \hat{\rho}, ae) \\ (\ldots, \hat{\xi}) = \widehat{\mathsf{extend}(\perp, \perp, x, \hat{\rho}(x), \hat{\upsilon}, \hat{\sigma}\_{v})} \\ \widehat{\sigma}, \hat{\rho}, \hat{t} \vdash \mathsf{set} \, !x \, ae \, \hat{\Downarrow}(\{ \{ (\lambda x.x, \perp) \}, \perp), \hat{\xi} )} \end{array} \qquad \begin{array}{c} \text{ArroMic} \\ \widehat{\sigma}, \hat{\rho}, \hat{t} \vdash ae \, \hat{\Downarrow}(\text{let } \hat{\sigma}) \end{array}$$

**Fig. 6.** The abstract compositional-store semantics with garbage collection

As a final step before we establish the simulation relationship, we define an ordering on stores (and logs, extending it in the natural way):

$$
\hat{\sigma}\_0 \sqsubseteq \hat{\sigma}\_1 \Leftrightarrow \forall \hat{a} \in \widehat{Adres} . \hat{\sigma}\_0(\hat{a}) \subseteq \hat{\sigma}\_1(\hat{a}) \qquad \qquad \hat{v}\_0 \sqsubseteq \hat{v}\_1 \Leftrightarrow \hat{v}\_0 \subseteq \hat{v}\_1
$$

We formally connect this abstract semantics with the concrete compositionalstore semantics given in Section 4.4 by the following abstraction theorem.

**Theorem 1.** If |σ| σˆ and |ρ| = ˆρ and |t| = t <sup>ˆ</sup> and σ, ρ, t <sup>e</sup> ⇓gc (v, σv), ξ, then σ, ˆ ρ, ˆ t <sup>ˆ</sup> <sup>e</sup> ⇓<sup>ˆ</sup> (ˆv, <sup>σ</sup>ˆv), <sup>ˆ</sup><sup>ξ</sup> where <sup>|</sup>v| <sup>v</sup><sup>ˆ</sup> and <sup>|</sup>σv| <sup>σ</sup>ˆ<sup>v</sup> and <sup>|</sup>ξ| <sup>ˆ</sup>ξ.

This theorem states that if the configuration components are related by abstraction, then, for any given derivation in the exact semantics, there is an derivation in the abstract semantics which yields an abstraction of its results. It can be proved by induction on the derivation.

# **6 Discussion**

Now we examine the ramifications of a compositional treatment of analysis components. We do so in turn, first considering the ramifications of treating the store compositionally and then of treating the time compositionally.

#### **6.1 The Effects of Treating the Store Compositionally**

We saw in Section 4.3 that a semantics could treat stores compositionally without employing GC. In this case, the caller's store and callee's final store agreed on common entries and combining them produced the same store as the threadedstore semantics. However, the compositional machinery liberates evaluation from the stack. With evaluation so-liberated, GC need not preserve any heap data reachable solely from the stack. This relaxation


We discuss each of these aspects in more detail.

**Simplified and More-Effective Garbage Collection** Classical abstract GC and its succeeding pushdown GC each preserve heap data reachable from both the local environment and the stack. Once one has determined the root set of reachable addresses from these two components, it determines the transitive closure of reachability. When GC is performed with respect to only the local environment, both the initial root set and its transitive closure are smaller and it requires less work to calculate them. If the CFA employs incomplete garbage collection [8], the garbage collector is also freed from calculating the root set of stack addresses as a fixed point. A smaller transitive closure of reachable addresses is not only less costly to calculate but also leads to more collected garbage.

**General Yet Precise Summaries** A stack-precise CFA without GC will falsely distinguish abstract evaluations of the same call which are identical modulo GC-able heap data. In such cases, the addition of pushdown GC will allow the CFA to identify them. However, even with pushdown GC, a stack-precise CFA will falsely distinguish abstract evaluations of the same call which are identical modulo continuation-reachable heap data. On the other hand, compositional GC soundly disregards such data and thereby identifies such evaluations.

Compositional GC is able to achieve this feat because its calculates the fragments of the heap reachable from the local environment alone. Since this environment is restricted to the free variables of the expression it closes, the resultant heap fragment includes a tight overapproximation of the actually-relevant heap data. One effect is that evaluation summaries—the association of an evaluation configuration with its results—are general yet precise. They are general since, with a minimum of irrelevant heap data, more contexts are consistent with them. They are precise since, with a minimum of irrelevant heap data, they are less likely to allocate an entry at an existing address. In fact, the precision of compositional GC dominates that of pushdown GC.

**Restored Context Irrelevance** A semantics determines which parts of a given configuration are relevant to its evaluation [8]. When the continuation is irrelevant to evaluation, the semantics exhibits the property of context irrelevance. Context irrelevance is an intuitive property: unless our semantics has control effects or some other explicit dependence, we would be surprised if a configuration's continuation was relevant to its evaluation. Even a concrete semantics with GC exhibits context irrelevance since data reachable from the stack alone will not effect the result of evaluation. In an abstract semantics with GC, however, where new allocations can occur at old addresses, the presence of data reachable from the stack alone can affect evaluation. The set of data preserved by GC, which determines how evaluation is affected, is itself determined by the continuation. Thus, an abstract semantics in which GC is defined with respect to the stack violates context irrelevance.

Put this way, it is clear why compositional GC restores context irrelevance to the semantics: it removes the dependence on the stack from GC itself and allows all data reachable from the stack alone to be collected. This restoration makes evaluation easier to reason about and increases the effectiveness of memoization.

#### **6.2 The Effect of Treating the Time Compositionally**

The k-CFA context abstraction consists of a sequence of k call sites—for each point in execution, the last k call sites encountered. In Section 3.5, we discussed how the last-k-call-sites abstraction arose as a consequence of the semantics threading the abstract time (i.e. the context) through execution.

In contrast, the big-step, concrete semantics of Section 4 and the big-step, abstract semantics of Section 5 didn't thread the abstract time through execution but treated it compositionally, installing a new time at a call but restoring the previous time at the corresponding return. This treatment of time induces a different notion of context than k-CFA; instead of yielding the last-k call sites, it yields the top-m stack frames.

This top-m-stack-frames context abstraction is not novel and originates with m-CFA [11], a family of polynomial-time CFAs. However, to our knowledge, its appearance here is its first in a stack-precise setting: many stack-precise CFAs encode context using other means than a time component (or don't use context in the first place) [16, 3, 1]; still others achieve the last-k-call-sites abstraction, incidentally or intentionally [4, 18].

Using the top-m stack frames to qualify heap allocation has certain advantages to using the last-k call sites; in particular, its power to distinguish bindings is not diluted by static call sequences. To see how k-CFA's and m-CFA's context abstractions compare, let's consider a few examples.

First, consider a [k = 2]CFA of the program

(define (f x) x) (define (g y) (f y)) (g 42) (g 35)

the abstract resource 42 is allocated in the heap twice—first when the call to g is made and second when the call to f is made. At the point of the second allocation, the two most-recently-encountered call sites in evaluation are (f y) and (g 42); hence, these call sites are used to qualify the binding of 42 to x in the heap. The treatment of the abstract resource 35 is similar except its second allocation is qualified by (f y) and (g 35). For this program, [k = 2]CFA is able to keep the two allocations distinct.

Next, consider a [k = 2]CFA of the similar program

```
(define (f x) x)
(define (g y)
  (displayln y)
  (f y))
(g 42)
(g 35)
```
which includes the call (displayln y) in the body of g. As in the previous program, the analysis of this program allocates the abstract resources 42 and 35 twice each. However, in this program, the second of each of their allocations is qualified by (f y) and (displayln y). In fact, every call to f made via g will occur in that same context. In a sense, the static sequence of (displayln y) and (f y) eats up the context budget ensuring that the analysis conflates all bindings made at the call (f y). (Incrementing k would remove the conflation in this example, but it makes the analysis more expensive and such a strategy can always be confounded by a longer "static" trace of calls.)

To constrast, consider an [m = 2]CFA of the same program. Because the context consists of the top two stack frames, the allocation of 42 is qualified by

(f y) and (g 42) and the allocation of 35 is qualified by (f y) and (g 35). Because the second stack frame of each allocation is distinct, [m = 2]CFA is able to keep the bindings distinct in the analysis.

The top-m-stack-frames context abstraction is itself susceptible to deep nests of calls which serve only to pass parameters: if the nesting depth exceeds m, then the analysis will conflate the bindings made by the innermost calls. And, as with k-CFA, an increased m can always be confounded by a deeper nesting. In spite of that, the m-CFA context abstraction has been shown to work well relative to k-CFA in practice in a stack-imprecise setting where variables are aggressively re-bound [11]. Future work is needed to verify that its advantages carry over to a stack-precise setting.

# **7 Related Work**

Broadly, this work is an instance of abstract interpretation and, more specifically, of control-flow analysis (CFA) [9, 14]. It inherits from the Abstracting Abstract Machines methodology [15] of systematically deriving CFAs from purely operational specifications. More specifically, this work is an instance of stack-precise CFA which is preceded by many variations [16, 3, 8, 6, 12, 1, 18].

Might and Shivers [10] first introduced GC to CFA. Reconciling GC with stack-precise CFAs has been the focus of significant effort. Earl et al. [4] introduced the first technique to do so which approximated the the set of frames that could be on any possible stack at any given control point. Johnson and Van Horn [8] cast this technique into a more operational framework and considered a more-precise variant in which a control point splits for each possible stack with its heap being collected with respect to that stack alone. Johnson et al. [7] unified these previous two works in one formal framework. Darais et al. [1] show that the Abstracting Definitional Interpreters approach easily accommodates abstract GC by introducing a machine component which contains the addresses embedded in stack frames; this realization of GC amounts essentially to the fullyprecise technique. Our work sidesteps the need for all of this previous effort by decomposing the heap into continuation-independent fragments.

A significant concept in the work of Johnson and Van Horn [8] is context irrelevance, the property that the evaluation of a configuration is independent of its continuation, and they note that the approximate abstract GC technique introduced by Earl et al. [4] violates context irrelevance. Once again, the independence of GC from the stack under our technique sidesteps these issues; evaluation under our technique exhibits context irrelevance effortlessly.

As part of the resolution of an apparent paradox regarding the complexities of object-oriented k-CFA and functional k-CFA, Might et al. [11] develop m-CFA, a stack-imprecise, polynomial-time family of CFA that employs the top-m stack frames as a context abstraction as opposed to the last-k call sites of k-CFA. They show that this abstraction is more resilient against approximation in the face of the aggressive rebinding that m-CFA effects. Our treatment of the abstract time component induces this same top-m-stack-frames context abstraction but in a stack-precise setting, the first such appearance in the literature, to our knowledge.

Although not inspired by it, our work surprisingly shares much of the perspective and approach of the work of Dillig et al. [2] to verify C and C++ programs. In particular, both works employ a compositional approach to analysis by producing evaluation summaries and decompose the heap to support their approach. In addition, both works have some notion of propagation of summary effects: theirs is a summary transfer function; ours is an effect log. In contrast, our work does not produce summaries in a bottom-up fashion and is targeted toward explicitly higher-order languages with effects. Interesting future work could explore whether any precision-enhancing techniques of Dillig et al. [2] could be ported and applied, whether the bottom-up production of summaries is viable, or whether their general approach can be used for verification in our setting.

# **8 Conclusion and Future Work**

In this paper, we showed that treating the heap compositionally in a stack-precise CFA removes its dependence on the stack, at once simplifying GC and increasing its effectiveness. As a result, the analysis produces more compact and precise evaluation summaries that are more amenable to reuse. We also showed that treating the time component compositionally induces the top-m-stack-frames context abstraction of m-CFA. Unlike k-CFA's last-k-call-sites context abstraction, m-CFA's need not devote any precision to static call sequences.

Interestingly, the notion of context shared by k-CFA and m-CFA—calling context, roughly—seems to be at odds with summary reuse. In a stack-precise 1CFA (which exhibits the same context abstraction whether it is [k = 1]CFA or [m = 1]CFA), the syntactic call site of the caller is encoded in the summary of the callee, preventing the summary's reuse at any other call site. If this tension is fundamental, it might benefit to look to alternative notions of context—extant and novel.

The complement to abstract GC is abstract counting [10] which keeps track of the number of concrete resources that correspond to an abstract resource and enables certain abstract transitions, such as a strong store update. If an abstact counting can be applied to heap fragments such that the overlap among fragments is accounted for correctly, it might be possible to detect opportunities to perform strong updates to heap bindings which would further increase the precision of our technique.

Finally, Darais et al. [1] consider a particular value abstraction in which primitive operations propagate imprecision but do not introduce it. Their abstraction suggests a generalization in which each "basic block" is analyzed at full precision and imprecision occurs only at the join points of control flow. CFA2's stack environments capture an aspect of this generalization and it appears our technique does as well. However, a focused investigation would reveal whether such a generalization can be more-fully realized.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SMT-Friendly Formalization of the Solidity Memory Model**

Akos Hajdu ´ <sup>1</sup>and Dejan Jovanovi´c<sup>2</sup>

<sup>1</sup> Budapest University of Technology and Economics, Budapest, Hungary hajdua@mit.bme.hu <sup>2</sup> SRI International, New York City, USA dejan.jovanovic@sri.com

**Abstract.** Solidity is the dominant programming language for Ethereum smart contracts. This paper presents a high-level formalization of the Solidity language with a focus on the memory model. The presented formalization covers all features of the language related to managing state and memory. In addition, the formalization we provide is effective: all but few features can be encoded in the quantifier-free fragment of standard SMT theories. This enables precise and efficient reasoning about the state of smart contracts written in Solidity. The formalization is implemented in the solc-verify verifier and we provide an extensive set of tests that covers the breadth of the required semantics. We also provide an evaluation on the test set that validates the semantics and shows the novelty of the approach compared to other Solidity-level contract analysis tools.

# **1 Introduction**

Ethereum [32] is a public blockchain platform that provides a novel computing paradigm for developing decentralized applications. Ethereum allows the deployment of arbitrary programs (termed smart contracts [31]) that operate over the blockchain state. The public can interact with the contracts via transactions. It is currently the most popular public blockchain with smart contract functionality. While the nodes participating in the Ethereum network operate a low-level, stack-based virtual machine (EVM) that executes the compiled smart contracts, the contracts themselves are mostly written in a high-level, contract-oriented programming language called Solidity [30].

Even though smart contracts are generally short, they are no less prone to errors than software in general. In the Ethereum context, any flaws in the contract code come with potentially devastating financial consequences (such as the infamous DAO exploit [17]). This has inspired a great interest in applying formal verification techniques to Ethereum smart contracts (see e.g., [4] or [14] for surveys). In order to apply formal verification of any kind, be it static analysis or

<sup>-</sup> The author was also affiliated with SRI International as an intern during this project. Supported by the UNKP-19-3 New National Excellence Program of the Ministry for ´ Innovation and Technology.

model checking, the first step is to formalize the semantics of the programming language that the smart contracts are written in. Such semantics should not only remain an exercise in formalization, but should preferably be developed, resulting in precise and automated verification tools.

Early approaches to verification of Ethereum smart contracts focused mostly on formalizing the low-level virtual machine precisely (see, e.g., [11,19,21,22,2]). However, the unnecessary details of the EVM execution model make it difficult to reason about high-level functional properties of contracts (as they were written by developers) in an effective and automated way. For Solidity-level properties of smart contracts, Solidity-level semantics are preferred. While some aspects of Solidity have been studied and formalized [23,10,15,33], the semantics of the Solidity memory model still lacks a detailed and precise formalization that also enables automation.

The memory model of Solidity has various unusual and non-trivial behaviors, providing a fertile ground for potential bugs. Smart contracts have access to two classes of data storage: a permanent storage that is a part of the global blockchain state, and a transient local memory used when executing transactions. While the local memory uses a standard heap of entities with references, the permanent storage has pure value semantics (although pointers to storage can be declared locally). This memory model that combines both value and reference semantics, with all interactions between the two, poses some interesting challenges but also offers great opportunities for automation. For example, the value semantics of storage ensures non-aliasing of storage data. This can, if supported by an appropriate encoding of the semantics, potentially improve both the precision and effectiveness of reasoning about contract storage.

This paper provides a formalization of the Solidity semantics in terms of a simple SMT-based intermediate language that covers all features related to managing contract storage and memory. A major contribution of our formalization is that all but few of its elements can be encoded in the quantifier-free fragment of standard SMT theories. Additionally, our formalization captures the value semantics of storage with implicit non-aliasing information of storage entities. This allows precise and effective verification of Solidity smart contracts using modern SMT solvers. The formalization is implemented in the open-source solc-verify tool [20], which is a modular verifier for Solidity based on SMT solvers. We validate the formalization and demonstrate its effectiveness by evaluating it on a comprehensive set of tests that exercise the memory model. We show that our formalization significantly improves the precision and soundness compared to existing Solidity-level verifiers, while remarkably outperforming low-level EVMbased tools in terms of efficiency.

# **2 Background**

#### **2.1 Ethereum**

Ethereum [32,3] is a generic blockchain-based distributed computing platform. The Ethereum ledger is a storage layer for a database of accounts (identified by addresses) and the data associated with the accounts. Every account has an associated balance in Ether (the native cryptocurrency of Ethereum). In addition, an account can also be associated with the executable bytecode of a contract and the contract state.

Although Ethereum contracts are deployed to the blockchain in the form of the bytecode of the Ethereum Virtual Machine (EVM) [32], they are generally written in a high-level programming language called Solidity [30] and then compiled to EVM bytecode. After deployment, the contract is publicly accessible and its code cannot be modified. An external user, or another contract, can interact with a contract through its API by invoking its public functions. This can be done by issuing a transaction that encodes the function to be called with its arguments, and contains the contract's address as the recipient. The Ethereum network then executes the transaction by running the contract code in the context of the contract instance.

A contract instance has access to two different kinds of memory during its lifetime: contract storage and memory.<sup>3</sup> *Contract storage* is a dedicated data store for a contract to store its persistent state. At the level of the EVM, it is an array of 256-bit storage *slots* stored on the blockchain. Contract data that fits into a slot, or can be sliced into fixed number of slots, is usually allocated starting from slot 0. More complex data types that do not fit into a fixed number of slots, such as mappings, or dynamic arrays, are not supported directly by the EVM. Instead, they are implemented by the Solidity compiler using storage as a hash table where the structured data is distributed in a deterministic collisionfree manner. *Contract memory* is used during the execution of a transaction on the contract, and is deleted after the transaction finishes. This is where function parameters, return values and temporary data can be allocated and stored.

#### **2.2 Solidity**

Solidity [30] is the high-level programming language supporting the development of Ethereum smart contracts. It is a full-fledged object-oriented programming language with many features focusing on enabling rapid development of Ethereum smart contracts. The focus of this paper is the semantics of the Solidity memory model: the Solidity view of contract storage and memory, and the operations that can modify it. Thus, we restrict the presentation to a generous fragment of Solidity that is relevant for discussing and formalizing the memory model. An example contract that illustrates relevant features is shown in Figure 1, and the abstract syntax of the targeted fragment is presented in Figure 2. We omit parts of Solidity that are not relevant to the memory model (e.g., inheritance, loops, blockchain-specific members). We also omit low-level, unsafe features that can break the Solidity memory model abstractions (e.g., assembly and delegatecall).

<sup>3</sup> There is an additional data location named *calldata* that behaves the same as memory, but is used to store parameters of external functions. For simplicity, we omit it in this paper.

```
contract DataStorage {
  struct Record {
    bool set;
    int [] data;
  }
  mapping ( address =>Record) private records;
  function append( address at , int d ) public {
    Record storage r = records[at];
    r.set = true ;
    r.data. push (d);
  }
  function isset(Record storage r ) internal view returns ( bool s) {
    s = r.set;
  }
  function get( address at) public view returns ( int [] memory ret) {
    require (isset(records[at]));
    ret = records[at].data;
  }
}
```
Fig. 1: An example contract illustrating commonly used features of the Solidity memory model. The contract keeps an association between addresses and data and allows users to query and append to their data.

*Contracts.* Solidity contracts are similar to classes in object-oriented programming. A contract can define any additional types needed, followed by the declaration of the *state variables* and contract *functions*, including an optional single *constructor* function. The contract's state variables define the only persistent data that the contract instance stores on the blockchain. The constructor function is only used once, when a new contract instance is deployed to the blockchain. Other public contract functions can be invoked arbitrarily by external users through an Ethereum transaction that encodes the function call data and designates the contract instance as the recipient of the transaction.

*Example 1.* The contract DataStorage in Figure 1 defines a struct type Record. Then it defines the contract storage as a single state variable records. Finally three contract functions are defined append(), isset(), and get(). Note that a constructor is not defined and, in this case, a default constructor is provided to initialize the contract state to default values.

Solidity supports further concepts from object-oriented programming, such as inheritance, function modifiers, and overloading (also covered by our implementation [20]). However, as these are not relevant for the formalization of the memory model we omit them to simplify our presentation.

*Types.* Solidity is statically typed and provides two classes of types: *value* types and *reference* types. Value types include elementary types such as addresses, integers, and Booleans that are always passed by value. Reference types, on the other hand, are passed by reference and include structs, arrays and mappings.


Fig. 2: Syntax of the targeted Solidity fragment.

A struct consists of a fixed number of members. An array is either fixed-size or dynamically-sized and besides the elements of the base type, it also includes a length field holding the number of elements. A mapping is an associative array mapping keys to values. The important caveat is that the table does not actually store the keys so it is not possible to check if a key is defined in the map.

*Example 2.* The contract in Figure 1 uses the following types. The records variable is a mapping from addresses to Record structures which, in turn, consist of a Boolean value and a dynamically-sized integer array. It is a common practice to define a struct with a Boolean member (set) to indicate that a mapping value has been set. This is because Solidity mappings do not store keys: any key can be queried, returning a default value if no value was associated previously.

*Data locations for reference types.* Data of reference types resides in a *data location* that is either *storage* or *memory*. Storage is the persistent store used for state variables of the contract. In contrast, memory is used during execution of a transaction to store function parameters, return values and local variables, and it is deleted after the transaction finishes.

Semantics of reference types differ fundamentally depending on the data location that they are stored in. Layout of data in the memory data location resembles the memory model common in Java-like programming languages: there is a heap where reference types are allocated and any entity in the heap can contain values of value types, and *references* to other memory entities. In contrast, the storage data location treats and stores all entities, including those of reference types, as *values* with no references involved. Mixing storage and memory is not possible: the data location of a reference type is propagated to its elements and members. This means that storage entities cannot have references to memory entities, and memory entities cannot have reference types as values. Storage of a contract can be viewed as a single value with no aliasing possible.

Fig. 3: An example illustrating reference types (structs and arrays) and their layout in storage and memory: (a) a contract defining types and state variables; (b) an abstract representation of the contract storage as values; and, (c) a function using the memory data location and a possible layout of the data in memory.

*Example 3.* Consider the contract C defined in Figure 3a. The contract defines two reference struct types S and T, and declares state variables s, t, and sa. These variables are maintained in storage during the contract lifetime and they are represented as values with no references within. A potential value of these variables is shown in Figure 3b. On the other hand, the top of Figure 3c shows a function with three variables in the memory data location, one as the argument to the function, and two defined within the function. Because they are in memory, these variables are references to heap locations. Any data of reference types, stored within the structures and arrays, is also a reference and can be reallocated or assigned to point to an existing heap location. This means that the layout of the data can contain arbitrary graphs with arbitrary aliasing. A potential layout of these variables is shown at the bottom of Figure 3c.

*Functions.* Functions are the Solidity equivalent of methods in classes. They receive data as arguments, perform computations, manipulate state variables and interact with other Ethereum accounts. Besides accessing the storage of the contract through its state variables, functions can also define local variables, including function arguments and return values. Variables of value types are stored as values on a stack. Variables of reference types must be explicitly declared with a data location, and are always pointers to an entity in that data location (storage or memory). A pointer to storage is called a *local storage pointer*. As the storage is not memory in the usual sense, but a value instead, one can see storage pointers as encoding a path to one reference type entity in the storage.

*Example 4.* Consider the example in Figure 1. The local variable r in function append() points to the struct at index at of the state variable records (residing in the contract storage). In contrast, the return value ret of function get() is a pointer to an integer array in memory.

*Statements and expressions.* Solidity includes usual programming statements and control structures. To keep the presentation simple, we focus on the statements that are related to the formalization of the memory model: local variable declarations, assignments, array manipulation, and the delete statement.<sup>4</sup> Solidity expressions relevant for the memory model are identifiers, member and array accesses, conditionals and allocation of new arrays and structs in memory.

If a value is not provided, local variable declarations automatically initialize the variable to a default value. For reference types in memory, this allocates new entities on the heap and performs recursive initialization of its members. For reference types in storage, the local storage pointers must always be explicitly initialized to point to a storage member. This ensures that no pointer is ever "null". Value types are initialized to their simple default value (0, false). Behavior of assignment in Solidity is complex (see Section 3.5) and depends on the data location of its arguments (e.g., deep copy or pointer assignment). Dynamicallysized storage arrays can be extended by pushing an element to their end, or can be shrunk by popping. The delete statement assigns the default value (recursively for reference types) to a given entity based on its type.

*Example 5.* The assignment r.set = true in the append() function of Figure 1 is a simple value assignment. On the other hand, ret = records[at].data in the get() function allocates a new array on the heap and performs a deep copy of data from storage to memory.

#### **2.3 SMT-Based Programs**

We formalize the semantics of the Solidity fragment by translating it to a simple programming language that uses SMT semantics [9,12] for the types and data. The syntax of this language is shown in Figure 4. The syntax is purposefully

<sup>4</sup> Our implementation [20] supports a majority of statements, excluding low-level operations (such as inline assembly). Loops are also supported and can be specified with loop invariants.


Fig. 4: Syntax of SMT-based programs.

minimal and generic, so that it can be expressed in any modern SMT-based verification tool (e.g., Boogie [5], Why3 [18] or Dafny [26]).<sup>5</sup>

The types of SMT-based programs are the SMT types: simple value types such as Booleans and mathematical integers, and structured types such as arrays [27,16] and inductive datatypes [8]. The expressions of the language are standard SMT expressions such as identifiers, array reads and writes, datatype constructors, member selectors, conditionals and basic arithmetic [7]. All variables are declared at the beginning of a program. The statements of the language are limited to assignments, the if-then-else statement, and assumption statement.

SMT-based programs are a good fit for modeling of program semantics. For one, they have clear semantics with no ambiguities. Furthermore, any property of the program can be checked with SMT solvers: the program can be translated directly to a SMT formula by a single static assignment (SSA) transformation.

Note that the syntax requires the left hand side of an assignment to be an identifier. However, to make our presentation simpler, we will allow array read, member access and conditional expressions (and their combination) as LHS. Such constructs can be eliminated iteratively in the following way until only identifiers appear as LHS in assignments.


<sup>5</sup> Our current implementation is based on Boogie, but we have plans to introduce a generic intermediate representation that could incorporate alternate backends such as Why3 or Dafny.

# **3 Formalization**

In this section we present our formalization of the Solidity semantics through a translation that maps Solidity elements to constructs in the SMT-based language. The formalization is described top-down in separate subsections for types, contracts, state variables, functions, statements, and expressions.

# **3.1 Types**

We use <sup>T</sup> (.) to denote the function that maps a Solidity type to an SMT type. This function is used in the translation of contract elements and can, as a side effect, introduce datatype definitions and variable declarations. This is denoted with [*decl*] in the result of the function. To simplify the presentation, we assume that such side effects are automatically added to the preamble of the SMT program. Furthermore, we assume that declarations with the same name are only added once. We use type(*expr*) to denote the original (Solidity) type of an expression (to be used later in the formalization). The definition of <sup>T</sup> (.) is shown in Figure 5.

```
T (bool) ˙= bool
T (address) ˙= T (int) ˙= T (uint) ˙= int
T (mapping(K=>V ) storage) ˙= [T (K)]T (V )
T (mapping(K=>V ) storptr) ˙= [int]int
T (T[n] storage) ˙= T (T[] storage)
T (T[n] storptr) ˙= T (T[] storptr)
T (T[n] memory) ˙= T (T[] memory)
T (T[] storage) ˙= StorArrT with [StorArrT (arr : [int]T (T), length : int)]
T (T[] storptr) ˙= [int]int
T (T[] memory) ˙= int with [MemArrT (arr : [int]T (T), length : int)]
                                  [arrheapT : [int]MemArrT ]
T (struct S storage) ˙= StorStructS with [StorStructS(...,mi : T (Si),...)]
T (struct S storptr) ˙= [int]int
T (struct S memory) ˙= int with [MemStructS(...,mi : T (Si),...)]
                                          [structheapS : [int]MemStructS]
```
Fig. 5: Formalization of Solidity types. Members of struct S are denoted as m<sup>i</sup> with types Si.

*Value types.* Booleans are mapped to SMT Booleans while other value types are mapped to SMT integers. Addresses are also mapped to SMT integers so that arithmetic comparison and conversions between integers and addresses is supported. For simplicity, we map all integers (signed or unsigned) to SMT

integers.<sup>6</sup> Solidity also allows function types to store, pass around, and call functions, but this is not yet supported by our encoding.

*Reference types.* The Solidity syntax does not always require the data location for variable and parameter declarations. However, for reference types it is always required (enforced by the compiler), except for state variables that are always implicitly storage. In our formalization, we assume that the data location of reference types is a part of the type. As discussed before, memory entities are always accessed through pointers. However, for storage we distinguish whether it is the storage reference itself (e.g., state variable) or a storage pointer (e.g., local variable, function parameter). We denote the former with storage and the latter with storptr in the type name. Our modeling of reference types relies on the generalized theory of arrays [16] and the theory of inductive data-types [8], both of which are supported by modern SMT solvers (e.g., cvc4 [6] and z3 [28]).

*Mappings and arrays.* For both arrays and mappings, we abstract away the implementation details of Solidity and model them with the SMT theory of arrays and inductive datatypes. We formalize Solidity mappings simply as SMT arrays. Both fixed- and dynamically-sized arrays are translated using the same SMT type and we only treat them differently in the context of statements and expressions. Strings and byte arrays are not discussed here, but we support them as particular instances of the array type. To ensure that array size is properly modeled we keep track of it in the datatype (*length*) along with the actual elements (*arr* ).

For *storage array types* with base type T, we introduce an SMT datatype *StorArr*<sup>T</sup> with a constructor that takes two arguments: an inner SMT array (*arr* ) associating integer indexes and the recursively translated base type (<sup>T</sup> (T)), and an integer *length*. The advantage of this encoding is that the value semantics of storage data is provided by construction: each array element is a separate entity (no aliasing) and assigning storage arrays in SMT makes a deep copy. This encoding also generalizes if the base type is a reference type.

For *memory array types* with base type T, we introduce a separate datatype *MemArr*<sup>T</sup> (side effect). However, memory arrays are stored with pointer values. Therefore the memory array type is mapped to integers, and a heap (*arrheap*<sup>T</sup> ) is introduced to associate integers (pointers) with the actual memory array datatypes. Note that mixing data locations within a reference type is not possible: the element type of the array has the same data location as the array itself. Therefore, it is enough to introduce two datatypes per element type T: one for storage and one for memory. In the former case the element type will have value semantics whereas in the latter case elements will be stored as pointers.

*Structs.* For each *storage struct type* S the translation introduces an inductive datatype *StorStruct*S, including a constructor for each struct member with types

<sup>6</sup> Note that this does not capture the precise machine integer semantics, but this is not relevant from the perspective of the memory model. Precise computation can be provided by relying on SMT bitvectors or modular arithmetic (see, e.g., [20]).

mapped recursively. Similarly to arrays, this ensures the value semantics of storage such as non-aliasing and deep copy assignments. For each *memory struct* S we also introduce a datatype *MemStruct*<sup>S</sup> and a constructor for each member.<sup>7</sup> However, the memory struct type itself is mapped to integers (pointer) and a heap (*structheap*S) is introduced to associate the pointers with the actual memory struct datatypes. Note that if a memory struct has members with reference types, they are also pointers, which is ensured recursively by our encoding.

# **3.2 Local Storage Pointers**

An interesting aspect of the storage data location is that, although the stored data has value semantics, it is still possible to define pointers to an entity in storage within a local context, e.g., with function parameters or local variables. These pointers are called *local storage pointers*.

*Example 6.* In the append() function of Figure 1 the variable r is defined to be a convenience pointer into the storage map records[at]. Similarly, the isset() function takes a storage pointer to a Record entity in storage as an argument.

Since our formalization uses SMT datatypes to encode the contract data in storage, it is not possible to encode these pointers directly. A partial solution would be to substitute each occurrence of the local pointer with the expression that is assigned to it when it was defined. However, this approach is too simplistic and has limitations. Local storage pointers can be reassigned, or assigned conditionally, or it might not be known at compile time which definition should be used. Furthermore, local storage pointers can also be passed in as function arguments: they can point to different storage entities for different calls.

We propose an approach to encode local storage pointers while overcoming these limitations. Our encoding relies on the fact that storage data of a contract can be viewed as a finite-depth tree of values. As such, each element of the stored data can be uniquely identified by a finite path leading to it.<sup>8</sup>

*Example 7.* Consider the contract C in Figure 6a. The contract defines structs T and S, and state variables of these types. If we are interested in all storage entities of type T, we can consider the sub-tree of the contract storage tree that has leaves of type T, as depicted in Figure 6b. The root of the tree is the contract itself, with indexed sub-nodes for state variables, in order. For nodes of struct type there are indexed sub-nodes leading to its members, in order. For each node of array type there is a sub-node for the base type. Every pointer to a storage T entity can be identified by a path in this tree: by fixing the index to each state

<sup>7</sup> Mappings in Solidity cannot reside in memory. If a struct defines a mapping member and it is stored in memory, the mapping is simply inaccessible. Such members could be omitted from the constructor.

<sup>8</sup> Solidity does support a limited form of recursive data-types. Such types could make the storage a tree of potentially arbitrary depth. We chose not to support such types as recursion is non-existing in Solidity types used in practice.

Fig. 6: An example of packing and unpacking: (a) contract with struct definitions and state variables; (b) the storage tree of the contract for type T; and (c) the unpacking expression for storage pointers of type T.

variable, member, and array index, as seen in brackets in Figure 6b, such paths can be encoded as an array of integers. For example, the state variable t1 can be represented as [0], the member s1.t as [1, 0], and ss[8].ts[5] as [2, 8, 1, 5].

This idea allows us to encode storage pointer types (pointing to arrays, structs or mappings) simply as SMT arrays ([*int*]*int*). The novelty of our approach is that storage pointers can be encoded and passed around, while maintaining the value semantics of storage data, without the need for quantifiers to describe non-aliasing. To encode storage pointers, we need to address initialization and dereference of storage pointers, while assignment is simply an assignment of array values. When a storage pointer is initialized to a concrete expression, we *pack* the indexed path to the storage entity (that the expression references) into an array value. When a storage pointer is dereferenced (e.g., by indexing into or accessing a member), the array is *unpacked* into a conditional expression that will evaluate to a storage entity by decoding paths in the tree.

*Storage tree.* The storage tree for a given type T can be easily obtained by filtering the AST nodes of the contract definition to only include state variable declarations and to, further, only include nodes that lead to a sub-node of type T. We denote the storage tree for type T as tree(T).<sup>9</sup>

*Packing.* Given an expression (such as ss[8].ts[5]), pack(.) uses the storage tree for the type of the expression and encodes it to an array (e.g., [2, 8, 1, 5]) by fitting the expression into the tree. Pseudocode for pack(.) is shown in Figure 7. To start, the expression is decomposed into a list of base sub-expressions. The base expression of an identifier *id* is *id* itself. For an array index e[i] or a member

<sup>9</sup> In our implementation we do not explicitly compute the storage tree but instead traverse directly the AST provided by the Solidity compiler.

```
def packpath (node, subExprs, d, result):
    foreach expr in subExprs do
        if expr = id ∨ expr = e.id then
            find edge node id (i) −−−→ child;
            result := result[d ← i];
        if expr = e[idx] then
            find edge node (i) −−→ child;
            result := result[d ← E(idx)];
        node, d := child, d + 1;
    return result
def pack(expr):
    baseExprs := list of base sub-expressions of expr;
    baseExpr := car(baseExprs);
    if baseExpr is a state variable then
        return packpath(tree(type(expr)), baseExprs, 0, constarr[int]int(0 ))
    if baseExpr is a storage pointer then
        result := constarr[int]int(0);
        prefix := E(baseExpr );
        foreach path to a leaf in tree(type(baseExpr )) do
            pathResult, pathCond := prefix , true;
            foreach kth edge on the path with label id (i) do
                pathCond := pathCond ∧ prefix [k] = i
            pathResult := packpath(leaf , cdr(baseExprs), len(path), pathResult);
            result := ite(pathCond, pathResult, result);
        return result
```
Fig. 7: Packing of an expressions. It returns a symbolic array expression that, when evaluated, can identify the path to the storage entity that the expression references.

access e.m<sup>i</sup> it is recursively the base expressions of e. We call the first element of this list (denoted by car) the base expression (the innermost base expression). The base expression is always either a state variable or a storage pointer, and we consider these two cases separately.

If the *base expression is a state variable*, we simply align the expression along the storage tree with the packpath function. The packpath function takes the list of base sub-expressions, and the storage tree to use for alignment, and then processes the expressions in order. If the current expression is an identifier (state variable or member access), the algorithm finds the outgoing edge annotated with the identifier (from the current node) and writes the index into the result array. If the expression is an index access, the algorithm maps and writes the index expression (symbolically) in the array. The expression mapping function <sup>E</sup>(.) is introduced later in Section 3.6.

If the *base expression is a storage pointer*, the process is more general since the "start" of the packing must accommodate any point in storage where the base expression can point to. In this case the algorithm finds all paths to leaves in the

tree of the base pointer, identifies the condition for taking that path and writes the labels on the path to an array. Then it uses packpath to continue writing the array with the rest of the expression (denoted by cdr), as before. Finally, a conditional expression is constructed with all the conditions and packed arrays. Note, that the type of this conditional is still an SMT array of integers as it is the case for a single path.

*Example 8.* For contract in Figure 6a, pack(ss[8].ts[5]) produces [2, 8, 1, 5] by calling packpath on the base sub-expressions [ss, ss[8], ss[8].ts, ss[8].ts[5]]. First, 2 is added as ss is the state variable with index 2. Then, ss[8] is an index access so 8 is mapped to 8 and added to the result. Next, ss[8].ts is a member access with ts having the index 1. Finally, ss[8].ts[5] is an index access so 5 is mapped to 5 and added.

```
def unpack(ptr):
    return unpack(ptr,tree(type(ptr)), empty, 0);
def unpack(ptr, node, expr, d):
    result := empty;
    if node has no outgoing edges then result := expr ;
    if node is contract then
        foreach edge node id (i) −−−→ child do
            result := ite(ptr[d] = i, unpack(ptr, child, id, d + 1), result);
    if node is struct then
        foreach edge node id (i) −−−→ child do
            result := ite(ptr[d] = i, unpack(ptr, child, expr.id, d + 1), result);
    if node is array/mapping with edge node (i) −−→ child then
        result := unpack(ptr, child, expr[ptr[d]], d + 1);
    return result;
```
Fig. 8: Unpacking of a local storage pointer into a conditional expression.

*Unpacking.* The opposite of pack() is unpack(), shown in Figure 8. This function takes a storage pointer (of type [*int*]*int*) and produces a conditional expression that decodes any given path into one of the leaves of the storage tree. The function recursively traverses the tree starting from the contract node and accumulates the expressions leading to the leaves. The function creates conditionals when branching, and when a leaf is reached the accumulated expression is simply returned. For contracts we process edges corresponding to each state variable by setting the subexpression to be the state variable itself. For structs we process edges corresponding to each member by wrapping the subexpression into a member access. For both contracts and structs, the subexpressions are collected into a conditional as separate cases. For arrays and mappings we process the single outgoing edge by wrapping the subexpression into an index access using the current element (at index d) of the pointer.

*Example 9.* For example, the conditional expression corresponding to the tree in Figure 6b can be seen in Figure 6c. Given a pointer *ptr*, if *ptr*[0] = 0 then the conditional evaluates to *t1*. Otherwise, if *ptr*[0] = 1 then *s1* has to be taken, where two leaves are possible: if *ptr*[1] = 0 then the result is *s1.t* otherwise it is *s1.ts*[*ptr*[2]], and so on. If *ptr* is [2, 8, 1, 5] then the conditional evaluates exactly to ss[8].ts[5] from which *ptr* was packed.<sup>10</sup>

Note that with inheritance and libraries [30] it is possible that a contract defines a type T but has no nodes in its storage tree. The contract can still define functions with storage pointers to T, which can be called by derived contracts that define state variables of type T. In such cases we declare an array of type [*int*]<sup>T</sup> (T), called the *default context*, and unpack storage pointers to <sup>T</sup> as if the default context was a state variable. This allows us to reason about abstract contracts and libraries, modeling that their storage pointers can point to arbitrary entities not yet declared.

#### **3.3 Contracts, State Variables, Functions**

The focus of our discussion is the Solidity memory model and, for presentation purposes, we assume a minimalist setting where the important aspects of storage and memory can be presented: we assume a single contract and a single function to translate. Interactions between multiple functions are handled differently depending on the verification approach. For example, in modular verification functions are checked individually against specifications (pre- and post-conditions) and function calls are replaced by their specification [20].

*State variables.* Each state variable s<sup>i</sup> of a contract is mapped to a variable declaration <sup>s</sup><sup>i</sup> : <sup>T</sup> (type(si)) in the SMT program.<sup>11</sup> The data location of state variables is always storage. As discussed previously, reference types are mapped using SMT datatypes and arrays, which ensures non-aliasing by construction. While Solidity optionally allows inline initializer expressions for state variables, without the loss of generality we can assume that they are initialized in the constructor using regular assignments.

<sup>10</sup> Note that due to the "else" branches, unpack is a is a non-injective surjective function. For example, [a, <sup>8</sup>, <sup>1</sup>, 5] with any <sup>a</sup> <sup>≥</sup> 2 would evaluate to the same slot. However this does not affect our encoding as pointers cannot be compared and pack always returns the same (unique) values.

<sup>11</sup> Generalizing this to multiple contracts can be done directly by using a separate one-dimensional heap for each state variable, indexed by a receiver parameter (*this* : *address*) identifying the current contract instance (see, e.g., [20]).

```
defval(bool) ˙= false
defval(address) ˙= defval(int) ˙= defval(uint) ˙= 0
defval(mapping(K=>V )) ˙= constarr[T (K)]T (V )(defval(V ))
defval(T[] storage) ˙= defval(T[0] storage)
defval(T[] memory) ˙= defval(T[0] memory)
defval(T[n] storage) ˙= StorArrT (constarr[int]T (T )(defval(T)), n)
defval(T[n] memory) ˙= [ref : int] (fresh symbol)
                         {ref := refcnt := refcnt + 1}
                         {arrheapT [ref].length := n}
                         {arrheapT [ref].arr[i] := defval(T)} for 0 ≤ i ≤ n
                         ref
defval(struct S storage) ˙= StorStructS(..., defval(Si),...)
defval(struct S memory) ˙= [ref : int] (fresh symbol)
                              {ref := refcnt := refcnt + 1}
                              {structheapS[ref].mi = defval(Si)} for each mi
                              ref
```
Fig. 9: Formalization of default values. We denote struct S members as m<sup>i</sup> with types Si.

*Functions calls.* From the perspective of the memory model, the only important aspect of function calls is the way parameters are passed in and how function return values are treated. Our formalization is general in that it allows us to treat both of the above as plain assignments (explained later in Section 3.5). For each parameter p<sup>i</sup> and return value r<sup>i</sup> of a function, we add declarations <sup>p</sup><sup>i</sup> : <sup>T</sup> (type(pi)) and <sup>r</sup><sup>i</sup> : <sup>T</sup> (type(ri)) in the SMT program. Note that for reference types appearing as parameters or return values of the function, their types are either memory or storage pointers.

*Memory allocation.* In order to model allocation of new memory entities, while keeping some non-aliasing information, we introduce an allocation counter *refcnt* : *int* variable in the preamble of the SMT program. This counter is incremented for each allocation of memory entities and used as the address of the new entity. For each parameter p<sup>i</sup> with memory data location we include an assumption *assume*(p<sup>i</sup> <sup>≤</sup> *refcnt*) as they can be arbitrary pointers, but should not alias with new allocations within the function. Note that if a parameter of memory pointer type is a reference type containing other references, such non-aliasing constraints need to be assumed recursively [25]. This can be done for structs by enumerating members. But, for dynamic arrays it requires quantification that is nevertheless still decidable (array property fragment [13]).

*Initialization and default values.* If we are translating the constructor function, each state variable s<sup>i</sup> is first initialized to its default value with a statement s<sup>i</sup> := defval(type(si)). For regular functions, we set each return value r<sup>i</sup> to its default value with a statement r<sup>i</sup> := defval(type(ri)). We use defval(.), as defined in Figure 9, to denote the function that maps a Solidity type to its default value as an SMT expression. Note that, as a side effect, this function can do allocations for memory entities, introducing extra declarations and statements, denoted by [*decl*] and {*stmt*}. As expected, the default value is *false* for Booleans and 0 for other primitives that map to integers. For mappings from K to V , the default value is an SMT constant array returning the default value of the value type <sup>V</sup> for each key <sup>k</sup> <sup>∈</sup> <sup>K</sup> (see, e.g., [16]). The default value of storage arrays is the corresponding datatype value constructed with a constant array of the default value for base type T, and a length of n or 0 for fixed- or dynamicallysized arrays. For storage structs, the default value is the corresponding datatype value constructed with the default values of each member.

The default value of uninitialized memory pointers is unusual. Since Solidity doesn't support "null" pointers, a new entity is automatically allocated in memory and initialized to default values (which might include additional recursive initialization). Note, that for fixed-size arrays Solidity enforces that the array size n must be an integer literal or a compile time constant, so setting each element to its default value is possible without loops or quantifiers. Similarly for structs, each member is recursively initialized, which is again possible by explicitly enumerating each member.

#### **3.4 Statements**

We use <sup>S</sup>-. to denote the function that translates Solidity statements to a list of statements in the SMT program. It relies on the type mapping function <sup>T</sup> (.) (presented previously in Section 3.1) and on the expression mapping function <sup>E</sup>(.) (to be introduced in Section 3.6). Furthermore, we define a helper function <sup>A</sup>(., .) dedicated to modeling Solidity assignments (to be discussed in Section 3.5).

The definition of <sup>S</sup>-. is shown in Figure 10. As a side effect, extra declarations can be introduced to the preamble of the SMT program (denoted by [*decl*]). The Solidity documentation [30] does not precisely state the order of evaluating subexpressions in statements. It only specifies that subnodes are processed before the parent node. This problem is independent form the discussion of the memory models so we assume that side effects of subexpressions are added in the same order as it is implemented in the compiler. Furthermore, if a subexpression is mapped multiple times, we assume that the side effects are only added once. This makes our presentation simpler by introducing fewer temporary variables.

Local variable declarations introduce a variable declaration with the same identifier in the SMT program by mapping the type.<sup>12</sup> If an initialization expression is given, it is mapped using <sup>E</sup>(.) and assigned to the variable. Otherwise, the default value is used as defined by defval(.) in Figure 9. Delete assigns the default value for a type, which is simply mapped to an assignment in our formalization. Solidity supports multiple assignments as one statement with a tuple-like syntax. The documentation [30] does not specify the behavior precisely, but the

<sup>12</sup> Without the loss of generality we assume that identifiers in Solidity are unique. The compiler handles scoping and assigns an unique identifier to each declaration.

S-*T id* = [ ˙ *id* : <sup>T</sup> (T)]; <sup>A</sup>(*id*, defval(T)) S-*T id* <sup>=</sup> *expr* = [ ˙ *id* : <sup>T</sup> (T)]; <sup>A</sup>(*id*, <sup>E</sup>(*expr*)) Sdelete *<sup>e</sup>* ˙= <sup>A</sup>(E(e), defval(type(e))) S<sup>l</sup>1,...,l<sup>n</sup> <sup>=</sup> <sup>r</sup>1,...,r<sup>n</sup> = [ ˙ *tmp*<sup>i</sup> : <sup>T</sup> (type(ri))] for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> (fresh symbols) <sup>A</sup>(*tmp*i, <sup>E</sup>(ri)) for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> <sup>A</sup>(E(li), *tmp*i) for <sup>n</sup> <sup>≥</sup> <sup>i</sup> <sup>≥</sup> 1 (reversed) S<sup>e</sup>1.push(e2) ˙= <sup>A</sup>(E(e1).*arr*[E(e1).*length*], <sup>E</sup>(e2)) <sup>E</sup>(e1).*length* := <sup>E</sup>(e1).*length* + 1 S<sup>e</sup>.pop() ˙= <sup>E</sup>(e).*length* := <sup>E</sup>(e).*length* <sup>−</sup> <sup>1</sup> <sup>A</sup>(E(e).*arr*[E(e).*length*], defval(arrtype(E(e))))


Fig. 10: Formalization of statements.

```
contract C {
  struct S { int x; }
  S[] a;
  constructor() {
    a . push (S(1));
    S storage s = a[0];
    a.pop();
    assert (s.x == 1); // Ok
    // Following is error
    // assert(a[0].x == 1);
  }
}
```
Fig. 12: Example illustrating a dangling pointer to storage.

Fig. 11: Example illustrating the right-to-left assignment order and the treatment of reference types in storage in tuple assignment.

compiler first evaluates the RHS and LHS tuples (in this order) from left to right and then assignment is performed component-wise from right to left.

*Example 10.* Consider the tuple assignment in function primitiveAssign() in Figure 11. From right to left, s2.x is assigned first with the value of s1.x which is 1. Afterwards, when s3.x is assigned with s2.x, the already evaluated (old) value of 2 is used instead of the new value 1. Finally, s1.x gets the old value of s3.x, i.e., 3. Note however, that storage expressions on the RHS evaluate to storage pointers. Consider, for example, the function storageAssign() in Figure 11. From right to left, s2 is assigned first, with a pointer to s1 making s2.x become 1. However, as opposed to primitive types, when s3 is assigned next, s2 on the RHS is a storage pointer and thus the new value in the storage of s2 is assigned to s3 making s3.x become 1. Similarly, s1.x also becomes 1 as the new value behind s3 is used.

Array push increases the length and assigns the given expression as the last element. Array pop decreases the length and sets the removed element to its default value. While the removed element can no longer be accessed via indexing into an array (a runtime error occurs), it can still be accessed via local storage pointers (see Figure 12).<sup>13</sup>

#### **3.5 Assignments**

Assignments between reference types in Solidity can be either pointer assignments or value assignments, involving deep copying and possible new allocations in the latter case. We use <sup>A</sup>(*lhs*, *rhs*) to denote the function that assigns a *rhs* SMT expression to a *lhs* SMT expression based on their original types and data locations. The definition of <sup>A</sup>(., .) is shown in Figure 13. Value type assignments are simply mapped to an SMT assignment. To make our presentation more clear, we subdivide the other cases into separate functions for array, struct and mapping operands, denoted by <sup>A</sup>A(., .), <sup>A</sup>S(., .) and <sup>A</sup>M(., .) respectively.

*Mappings.* As discussed previously, Solidity prohibits direct assignment of mappings. However, it is possible to declare a storage pointer to a mapping, in which case the RHS expression is packed. It is also possible to assign two storage pointers, which simply assigns pointers. Other cases are a no-op.<sup>14</sup>

*Structs and arrays.* For structs and arrays the semantics of assignment is summarized in Figure 14. However, there are some notable details in various cases that we expand on below.

Assigning anything *to storage* LHS always causes a deep copy. If the RHS is storage, this is simply mapped to a datatype assignment in our encoding (with an additional unpacking if the RHS is storage pointer).<sup>15</sup> If the RHS is memory, deep copy for structs can be done member wise by accessing the heap with the RHS pointer and performing the assignment recursively (as members can be reference types themselves). For arrays, we access the datatype corresponding to the array via the heap and do an assignment, which does a deep copy in SMT. Note however, that this only works if the base type of the array is a value type. For reference types, memory array elements are pointers and would require being dereferenced during assignment to storage. As opposed to struct members, the number of array elements is not known at compile time so loops or quantifiers have to be used (as in traditional software analysis). However, this is a

<sup>13</sup> The current version (0.5.x) of Solidity supports resizing arrays by assigning to the length member. However, this behavior is dangerous and has been since removed in the next version (0.6.0) (see https://solidity.readthedocs.io/en/v0.6.0/ 060-breaking-changes.html). Therefore, we do not support this in our encoding.

<sup>14</sup> This is consequence of the fact that keys are not stored in mappings and so the assignment is impossible to perform.

<sup>15</sup> This also causes mappings to be copied, which contradicts the current semantics. However, we chose to keep the deep copy as assignments of mappings is planned to be disallowed in the future (see https://github.com/ethereum/solidity/issues/7739).

<sup>A</sup>(*lhs*, *rhs*) ˙= *lhs* := *rhs* for value type operands <sup>A</sup>(*lhs*, *rhs*) ˙= <sup>A</sup>M(*lhs*, *rhs*) for mapping type operands <sup>A</sup>(*lhs*, *rhs*) ˙= <sup>A</sup>S(*lhs*, *rhs*) for struct type operands <sup>A</sup>(*lhs*, *rhs*) ˙= <sup>A</sup>A(*lhs*, *rhs*) for array type operands <sup>A</sup>M(*lhs* : sp, *rhs* : <sup>s</sup>) ˙= *lhs* := pack(*rhs*) <sup>A</sup>M(*lhs* : sp, *rhs* : sp) ˙= *lhs* := *rhs* <sup>A</sup>M(*lhs*, *rhs*) ˙= {} (all other cases) <sup>A</sup>S(*lhs* : <sup>s</sup>, *rhs* : <sup>s</sup>) ˙= *lhs* := *rhs* <sup>A</sup>S(*lhs* : <sup>s</sup>, *rhs* : <sup>m</sup>) ˙= <sup>A</sup>(*lhs*.mi, *structheap*type(rhs)[*rhs*].mi) for each <sup>m</sup><sup>i</sup> <sup>A</sup>S(*lhs* : <sup>s</sup>, *rhs* : sp) ˙= <sup>A</sup>S(*lhs*, unpack(*rhs*)) <sup>A</sup>S(*lhs* : <sup>m</sup>, *rhs* : <sup>m</sup>) ˙= *lhs* := *rhs* <sup>A</sup>S(*lhs* : <sup>m</sup>, *rhs* : <sup>s</sup>) ˙= *lhs* := *refcnt* := *refcnt* + 1 <sup>A</sup>(*structheap*type(lhs)[*lhs*].mi, *rhs*.mi) for each <sup>m</sup><sup>i</sup> <sup>A</sup>S(*lhs* : <sup>m</sup>, *rhs* : sp) ˙= <sup>A</sup>S(*lhs*, unpack(*rhs*)) <sup>A</sup>S(*lhs* : sp, *rhs* : <sup>s</sup>) ˙= *lhs* := pack(*rhs*) <sup>A</sup>S(*lhs* : sp, *rhs* : sp) ˙= *lhs* := *rhs* <sup>A</sup>A(*lhs* : <sup>s</sup>, *rhs* : <sup>s</sup>) ˙= *lhs* := *rhs* <sup>A</sup>A(*lhs* : <sup>s</sup>, *rhs* : <sup>m</sup>) ˙= *lhs* := *arrheap*type(rhs)[*rhs*] <sup>A</sup>A(*lhs* : <sup>s</sup>, *rhs* : sp) ˙= <sup>A</sup>A(*lhs*, unpack(*rhs*)) <sup>A</sup>A(*lhs* : <sup>m</sup>, *rhs* : <sup>m</sup>) ˙= *lhs* := *rhs* <sup>A</sup>A(*lhs* : <sup>m</sup>, *rhs* : <sup>s</sup>) ˙= *lhs* := *refcnt* := *refcnt* + 1 *arrheap*type(lhs)[*lhs*] := *rhs* <sup>A</sup>A(*lhs* : <sup>m</sup>, *rhs* : sp) ˙= <sup>A</sup>A(*lhs*, unpack(*rhs*)) <sup>A</sup>A(*lhs* : sp, *rhs* : <sup>s</sup>) ˙= *lhs* := pack(*rhs*) <sup>A</sup>A(*lhs* : sp, *rhs* : sp) ˙= *lhs* := *rhs*

Fig. 13: Formalization of assignment based on different type categories and data locations for the LHS and RHS. We use s, sp and m after the arguments to denote storage, storage pointer and memory types respectively.

special case, which can be encoded in the decidable array property fragment [13]. Assigning storage (or storage pointer) *to memory* is also a deep copy but in the other direction. However, instead overwriting the existing memory entity, a new one is allocated (recursively for reference typed elements or members). We model this by incrementing the reference counter, storing it in the LHS and then accessing the heap for deep copy using the new pointer.

#### **3.6 Expressions**

We use <sup>E</sup>(.) to denote the function that translates a Solidity expression to an SMT expression. As a side effect, declarations and statements might be introduced (denoted by [*decl*] and {*stmt*} respectively). The definition of <sup>E</sup>(.) is shown in Figure 15. As discussed in Section 3.4 we assume that side effects are added from subexpressions in the proper order and only once.

Member access is mapped to an SMT member access by mapping the base expression and the member name. There is an extra unpacking step for storage


Fig. 14: Semantics of assignment between array and struct operands based on their data location.

```
E(id) ˙= id
E(expr.id) ˙= E(expr).E(id) if type(expr) = struct S storage
E(expr.id) ˙= unpack(E(expr)).E(id) if type(expr) = struct S storptr
E(expr.id) ˙= structheapS[E(expr)].E(id) if type(expr) = struct S memory
E(expr.id) ˙= E(expr).E(id) if type(expr) = T[] storage
E(expr.id) ˙= unpack(E(expr)).E(id) if type(expr) = T[] storptr
E(expr.id) ˙= arrheapT [E(expr)].E(id) if type(expr) = T[] memory
E(expr[idx]) ˙= E(expr).arr [E(idx)] if type(expr) = T[] storage
E(expr[idx]) ˙= unpack(E(expr)).arr [E(idx)] if type(expr) = T[] storptr
E(expr[idx]) ˙= arrheapT [E(expr)].arr [E(idx)] if type(expr) = T[] memory
E(expr[idx]) ˙= E(expr)[E(idx)] if type(expr) = mapping(K=>V ) storage
E(expr[idx]) ˙= unpack(E(expr))[E(idx)] if type(expr) = mapping(K=>V ) storptr
E(cond ? exprT : exprF ) ˙= [varT : T (type(cond ? exprT : exprF ))] (fresh symbol)
                             [varF : T (type(cond ? exprT : exprF ))] (fresh symbol)
                             {A(varT , E(exprT ))}
                             {A(varF , E(exprF ))}
                             ite(E(cond), varT , varF )
E(new T[](expr)) ˙= [ref : int] (fresh symbol)
                     {ref := refcnt := refcnt + 1}
                     {arrheapT [ref].length := E(expr)}
                     {arrheapT [ref].arr[i] := defval(T)} for 0 ≤ i ≤ E(expr)
                     ref
E(S( ..., expri,... )) ˙= [ref : int] (fresh symbol)
                       {ref := refcnt := refcnt + 1}
                       {structheapS[ref].mi := E(expri)} for each member mi
                       ref
```
Fig. 15: Formalization of expressions. We denote struct S members as m<sup>i</sup> with types Si.

pointers and a heap access for memory. Note that the only valid member for arrays is length. Index access is mapped to an SMT array read by mapping the base expression and the index, and adding en extra member access for arrays to get the inner array *arr* of elements from the datatype. Furthermore, similarly to member accesses, an extra unpacking step is needed for storage pointers and a heap access for memory.

Conditionals in Solidity can be mapped to an SMT conditional in general. However, data locations can be different for the true and false branches, causing possible side effects. Therefore, we first introduce fresh variables for the true and false branch with the common type (of the whole conditional), then make assignments using <sup>A</sup>(., .) and finally use the new variables in the conditional. The documentation [30] does not specify the common type, but the compiler returns memory if any of the branches is memory, and storage pointer otherwise.

Allocating a new array in memory increments the reference counter, sets the length and the default values for each element (recursively). Note that in general the length might not be a compile time constant,in which case setting default values could be encoded with the array property fragment (similarly to deep copy in assignments) [13]. Allocating a new memory struct also increments the reference counter and sets each value by translating the provided arguments.

# **4 Evaluation**

The formalization described in this paper serves as the basis of our Solidity verification tool solc-verify [20].<sup>16</sup> In this section we provide an evaluation of the presented formalization and our implementation by validating it on a set of relevant test cases. For illustrative purposes we also compare our tool with other available Solidity analysis tools.<sup>17</sup>

"Real world" contracts currently deployed on Ethereum (e.g., contract available on Etherscan) have limited value for evaluating memory model semantics. Many such contracts use old compiler versions with constructs that are not supported anymore, and do not use newer features. There are also many toy and trivial contracts that are deployed but not used, and popular contracts (e.g. tokens) are over-represented with many duplicates. Furthermore, the inconsistent usage of assert and require [20] makes evaluation hard. Evaluating the memory semantics requires contracts that exercise diverse features of the memory model. There are larger dApps that do use more complex features (e.g., Augur or ENS), but these contracts also depend on many other features (e.g. inheritance, modifiers, loops) that would skew the results.

Therefore we have manually developed a set of tests that try to capture the interesting behaviors and corner cases of the Solidity memory semantics. The tests are targeted examples that do not use irrelevant features. The set is structured so that every target test behavior is represented with a test case that sets up the state, exercises a specific feature and checks the correctness of the behavior with assertions. This way a test should only pass if the tool provides a correct verification result by modeling the targeted feature precisely.

<sup>16</sup> solc-verify is open source, available at https://github.com/SRI-CSL/solidity. Besides certain low-level constructs (such as inline assembly) solc-verify supports a majority of Solidity features that we omitted from the presentation, including inheritance, function modifiers, for/while loops and if-then-else.

<sup>17</sup> All tests, with a Truffle test harness, a docker container with all the tools, and all individual results are available at https://github.com/dddejan/solidity-semantics-tests.

The correctness of the tests themselves is determined by running them through the EVM with no assertion failures. Test cases are expanded to use all reference types and combinations of reference types. This includes structures, mappings, dynamic and fixed-size arrays, both single- and multi-dimensional.

The tests are organized into the following classes. Tests in the assignment class check whether the assign statement is properly modeled. This includes assignments in the same data location, but also assignments across data locations that need deep copying, and assignments and re-assignments of memory and storage pointers. The delete class of tests checks whether the delete statement is properly modeled. Tests in the init class check whether variable and data initialization is properly modeled. For variables in storage, we check if they are properly initialized to default values in the contract constructor. Similarly, we check whether memory variables are properly initialized to provided values, or default values when no initializer is provided. The storage class of tests checks whether storage itself is properly modeled for various reference types, including for example non-aliasing. Tests in the storageptr class check whether storage pointers are modeled properly. This includes checking if the model properly treats storage pointers to various reference types, including nested types. In addition, the tests check that the storage pointers can be properly passed to functions and ensure non-aliasing for distinct parts of storage.

For illustrative purposes we include a comparison with the following available Solidity analysis tools: mythril v0.21.17 [29], verisol v0.1.1-alpha [24], and smt-checker v0.5.12 [1]. mythril is a Solidity symbolic execution tool that runs analysis at the level of the EVM bytecode. verisol is similar to solc-verify in that it uses Boogie to model the Solidity contracts, but takes the traditional approach to modeling memory and storage with pointers and quantifiers. smt-checker is an SMT-based analysis module built into the Solidity compiler itself. There are other tools that can be found in the literature, but they are either basic prototypes that cannot handle realistic features we are considering, or are not available for direct comparison.

We ran the experiments on a machine with Intel Xeon E5-4627 v2 @ 3.30GHz CPU enforcing a 60s timeout and a memory limit of 64GB. Results are shown in Table 1. As expected, mythril has the most consistent results on our test set. This is because mythril models contract semantics at the EVM level and does not need to model complex Solidity semantics. Nevertheless, the results also indicate that the performance penalty for this precision is significant (8 timeouts). verisol, as the closest to our approach, still doesn't support many features and has a significant amount of false reports for features that it does support. Many false reports are because their model of storage is based on pointers and tries to ensure storage consistency with the use of quantifiers. smt-checker doesn't yet support the majority of the Solidity features that our tests target.

Based on the results, solc-verify performs well on our test set, matching the precision of mythril at very low computational cost. The few false alarms we have are either due to Solidity features that we chose to not implement (e.g., proper treatment of mapping assignments), or parts of the semantics that we


Table 1: Results of evaluating mythril, verisol, smt-checker, and solcverify on our test suite.

only implemented partially (such as deep copy of arrays with reference types and recursively initializing memory objects). There are no technical difficulties in supporting them and they are planned in the future.

# **5 Related Work**

There is a strong push in the Ethereum community to apply formal methods to smart contract verification. This includes many attempts to formalize the semantics of smart contracts, both at the level of EVM and Solidity.

*EVM-level semantics.* Bhargavan et al. [11] decompile a fragment of EVM to F\*, modeling EVM as a stack based machine with word and byte arrays for storage and memory. Grishchenko et al. [19] extend this work by providing a small step semantics for EVM. Kevm [21] provides an executable formal semantics of EVM in the K framework. Hirai [22] formalizes EVM in Lem, a language used by some interactive theorem provers. Amani et al. [2] extends this work by defining a program logic to reason about EVM bytecode.

*Solidity-level semantics.* Jiao et al. [23] formalize the operational semantics of Solidity in the K framework. Their formalization focuses on the details of bitprecise sizes of types, alignment and padding in storage. They encode storage slots, arrays and mappings with the full encoding of hashing. However, the formalization does not describe assignments (e.g., deep copy) apart from simple cases. Furthermore, user defined structs are also not mentioned. In contrast, our semantics is high-level and abstracts away some details (e.g., hashes, alignments) to enable efficient verification. Additionally, we provide proper modeling of different cases for assignments between storage and memory. Bartotelli et al. [10] propose TinySol, a minimal core calculus for a subset of Solidity, required to model basic features such as asset transfer and reentrancy. Contract data is modeled as a key value store, with no differences in storage and memory, or in value and reference types. Crafa et al. [15] introduce Featherweight Solidity, a calculus formalizing core features of the language, with focus on primitive types. Data locations and reference types are not discussed, only mappings are mentioned briefly. The main focus is on the type system and type checking. They propose an improved type system that can statically detect unsafe casts and callbacks. The closest to our work is the work of Zakrzewski [33], a Coq formalization focusing on functions, modifiers, and the memory model. The memory model is treated similarly: storage is a mapping from names to storage objects (values), memory is a mapping from references to memory objects (containing references recursively) and storage pointers define a path in storage. Their formalization is also highlevel, without considering alignment, padding or hashing. The formalization is provided as big step functional semantics in Coq. While the paper presents some example rules, the formalization does not cover all cases. For example the details of assignments (e.g., memory to storage), push/pop for arrays, treating memory aliasing and new expressions. Furthermore, our approach focuses on SMT and modular verification, which enables automated reasoning.

# **6 Conclusion**

We presented a high-level SMT-based formalization of the Solidity memory model semantics. Our formalization covers all aspects of the language related to managing both the persistent contract storage and the transient local memory. The novel encoding of storage pointers as arrays allows us to precisely model nonaliasing and deep copy assignments between storage entities without the need for quantifiers. The memory model forms the basis of our Solidity-level modular verification tool solc-verify. We developed a suite of test cases exercising all aspects of memory management with different combinations of reference types. Results indicate that our memory model outperforms existing Solidity-level tools in terms of soundness and precision, and is on par with low-level EVM-based implementations, while having a significantly lower computational cost for discharging verification conditions.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Exploring Type-Level Bisimilarity towards More Expressive Multiparty Session Types

Sung-Shik Jongmans1,2,<sup>3</sup> and Nobuko Yoshida<sup>3</sup>

<sup>1</sup> Department of Computer Science, Open University, Heerlen, the Netherlands <sup>2</sup> CWI, Amsterdam, the Netherlands

<sup>3</sup> Department of Computing, Imperial College, London, UK

Abstract. A key open problem with multiparty session types (MPST) concerns their expressiveness: current MPST have inflexible choice, no existential quantification over participants, and limited parallel composition. This precludes many real protocols to be represented by MPST. To overcome these bottlenecks of MPST, we explore a new technique using weak bisimilarity between global types and endpoint types, which guarantees deadlock-freedom and absence of protocol violations. Based on a process algebraic framework, we present well-formed conditions for global types that guarantee weak bisimilarity between a global type and its endpoint types and prove their check is decidable. Our main practical result, obtained through benchmarks, is that our well-formedness conditions can be checked orders of magnitude faster than directly checking weak bisimilarity using a state-of-the-art model checker.

# 1 Introduction

Background. To take advantage of modern parallel and distributed computing platforms, message-passing concurrency is becoming increasingly important. Modern programming languages, however, offer insufficiently effective linguistic support to guide programmers towards safe usage of message-passing abstractions (e.g., to prevent deadlocks or protocol violations).

Multiparty session types (MPST) [34] constitute a static, correct-by-construction approach to simplify concurrent programming, by offering a type-based framework to specify message-passing protocols and ensure deadlock-freedom and protocol conformance. The idea is to use behavioural types [1,37] to enforce protocols (i.e., patterns of admissible communications) between roles (e.g., threads, processes, services) to avoid con-

currency bugs. The framework is illustrated in Fig. 1: first, a global type G (protocol specification; written by the programmer) is projected onto every role; then, every resulting endpoint type (local type) <sup>L</sup>i (role specification) is type-checked with the corresponding process <sup>P</sup>i (role implementation). If every process is welltyped against its local type, then their parallel composition is guaranteed to be free of deadlocks and protocol violations relative to the global type. Notably, common concurrency bugs as sends without receives, receives without sends, and type mismatches (actual type sent vs. expected type received) are ruled out statically. The MPST framework is language-agnostic: in recent years, practical implementations of MPST have been developed for several programming languages, including Erlang, F#, Go, Java, and Scala [18,35,36,45,46,50].

Three open problems. Many practically relevant protocols cannot be specified as global types; this limits MPST's applicability to real-world concurrent programs. Specifically, while the original work [33] has been extended with several advanced features (e.g., time [7,44], security [11,12,13,17], and parametrisation [18,25,47]), core features still have significant restrictions: inflexible choice, no existential quantification over participants, and limited parallel composition.

1. Inflexible choice: In the original work [33], if there is a choice between multiple branches, the sender in the first communication of each branch must be the same, the receiver must be the same, and the message type must be different (i.e., no non-determinism). Moreover, each role not involved in the first communication of each branch, must have the same behaviour in each continuation. For instance, the following global type specifies a protocol where Client c repeatedly requests an arithmetic Server s to compute the sum or product of two numbers:

$$\begin{bmatrix} \mu X.\left[\left[\mathbf{c} \multimap \mathbf{s} : \mathbf{Add} \cdot \mathbf{s} \multimap \mathbf{c} : \mathbf{Sum} \cdot X\right] + \left[\mathbf{c} \multimap \mathbf{s} : \mathbf{Multi} \cdot \mathbf{s} \multimap \mathbf{c} : \mathbf{Prod} \cdot X\right]\right] \\ \vdots \dots \vdots \dots \end{bmatrix}$$

Here, <sup>c</sup>s:Add specifies a communication of an Add-message (with two numbers as payload) from the Client to the Server, while · and + specify sequencing and branching, and square brackets indicate operator precedence. This is a "good" global type that satisfies the conditions. In contrast, the following "bad" global type specifies a protocol where Client c repeatedly requests addition and multiplication Servers <sup>s</sup><sup>1</sup> and <sup>s</sup><sup>2</sup> via Router <sup>r</sup> (payload types omitted; <sup>r</sup><sup>1</sup>r<sup>2</sup>r3 :t abbreviates r<sup>1</sup><sup>r</sup><sup>2</sup> :t· <sup>r</sup><sup>2</sup><sup>r</sup><sup>3</sup> :t):

$$\begin{bmatrix} \mu X.\left[\left[\mathsf{c}\multimap\mathsf{r}\multimap\mathsf{s}\_{1}\mathrel{\mathop{:}}\mathsf{Add}\cdot\mathsf{s}\_{1}\dashv\mathsf{c}\mathrel{\mathop{:}}\mathsf{Sum}\cdot X\right]+\left[\mathsf{c}\multimap\mathsf{r}\dashv\mathsf{s}\_{2}\mathrel{\mathop{:}}\mathsf{Mul}\cdot\mathsf{s}\_{2}\dashv\mathsf{c}\mathrel{\mathop{:}}\mathsf{Prod}\cdot X\right]\right] \end{bmatrix}$$

Several improvements to the original work have been proposed: Honda et al. managed to allow each role r not involved in a choice to have different behaviour in different branches [15], so long as r is made aware of which branch is chosen in a timely and unambiguous fashion (e.g., the previous global type is still forbidden), while Lange et al., Castagna et al., and Hu & Yoshida managed to allow choices between different receivers [16,23,36,40]. For instance, the following global type (the Client directly requests the specialised server) is allowed:

$$\begin{bmatrix} \mu X.\left[\left[\mathsf{c}\multimap\mathsf{s}\_{1}:\mathsf{Add}\cdot\mathsf{s}\_{1}\multimap\mathsf{c}\mathrel{\mathop{\mathsf{c}\mathsf{s}\_{2}}\operatorname{Sym}\cdot X}\right]+\left[\mathsf{c}\multimap\mathsf{s}\_{2}:\mathsf{Mul}\cdot\mathsf{s}\_{2}\multimap\mathsf{c}\mathrel{\mathop{\mathsf{c}\mathsf{s}\_{1}}\operatorname{Sym}\cdot X}\right]\right] \end{bmatrix}$$

But, the following global type (two Clients c<sup>1</sup> and c<sup>2</sup> use Server S) is forbidden:

$$\mu X.\left[\begin{bmatrix} \mathbf{c}\_1 \multimap \mathbf{s} \mathbf{:} \mathbf{Add} \cdot \mathbf{s} \multimap \mathbf{c}\_1 \mathbf{:} \mathbf{Sum} \cdot X \end{bmatrix} + \begin{bmatrix} \mathbf{c}\_1 \multimap \mathbf{s} \mathbf{:} \mathbf{Mon} \cdot \mathbf{s} \multimap \mathbf{c}\_1 \mathbf{:} \mathbf{Proof} \cdot X \end{bmatrix} + \right]$$

$$\begin{bmatrix} \mathbf{c}\_2 \multimap \mathbf{s} \mathbf{:} \mathbf{Add} \cdot \mathbf{s} \multimap \mathbf{c}\_2 \mathbf{:} \mathbf{Sum} \cdot X \end{bmatrix} + \begin{bmatrix} \mathbf{c}\_2 \multimap \mathbf{s} \mathbf{:} \mathbf{Mul} \cdot \mathbf{s} \multimap \mathbf{c}\_2 \mathbf{:} \mathbf{Proof} \cdot X \end{bmatrix} \right]$$

None of the existing works allow the above nondeterministic choices between different senders. We call this the +-problem: how to add a choice constructor, denoted by +, to specify choices between disjoint sender-receiver-label triples?

2. No existential quantification: Related to the +-problem is the ∃ problem: how to add an existential role quantifier, denoted by ∃, to specify the execution of ∃'s body for some role in ∃'s domain? For instance, instead of writing a separate global type for 2 Clients, 3 Clients, etc., existential role quantification allows us to write only one global type for any n><sup>1</sup> Clients:

<sup>μ</sup>X. <sup>∃</sup>r∈{ci <sup>|</sup> <sup>1</sup>≤i≤n}. -- r<sup>s</sup>:Add ·sr :Sum · X + - r<sup>s</sup>:Mul ·sr :Prod · X

The ∃-problem was first formulated by Deniélou & Yoshida [22] as the dual of the ∀-problem (i.e., specify the execution of ∀'s body for each role in ∀'s domain): the ∀-problem was solved in the same paper, but the ∃-problem "raises many semantic issues" [22] and has remained open for almost a decade.

3. Limited parallel composition: The third open problem related to choice is the -problem: how to add a constructor, denoted by , that allows infinite branching (i.e., non-finite control) through unbounded parallel interleaving? While extensions of the original work with parallel composition exist (e.g., [16,22,23,43]), none of these works supports unbounded interleaving. For instance, the following global type allows an unbounded number of requests to be served by the Server in parallel (instead of sequentializing them):

<sup>μ</sup>X. <sup>∃</sup>r∈{ci <sup>|</sup> <sup>1</sup>≤i≤n}. -- r<sup>s</sup>:Add · - sr :Sum X + - r<sup>s</sup>:Mul · - sr :Prod X

Contributions. We overcome these three bottlenecks of MPST with an approach based on three key novelties: first, we have a new definition of projection that keeps more information in the local types than existing definitions; second, we exploit this extra information to formulate our well-formedness conditions; third, we use an unexplored proof method for MPST, namely to prove the operational equivalence between a global type and its projections modulo weak bisimilarity. This makes the proofs cleaner and ultimately allows for more flexibility (e.g., our approach can be modularly combined with traditional session type checking, but potentially also with other verification methods, such as model checking or conformance testing). To summarise the highlights:


To our knowledge, we are the first to use (weak) bisimilarity to prove the correctness of a projection operator from global to local types. By doing so,

Fig. 2: Example executions of the Key-Value Store protocol

we decouple (a) the act of reasoning about projection and (b) the act of establishing compliance between local types and process implementations; until our work, these two concerns have always been conflated.

– Our main practical results are: (1) to provide representative protocols typable in our approach; and (2) the well-formedness conditions of (1) can be checked orders of magnitude faster than directly checking weak bisimilarity using mCRL2 [10,20,29], a state-of-the-art model checker.

In Sect. 2, we present an overview of our contribution through a representative example protocol that is not supported by previous work. In Sect. 3, we present the details of our theoretical contribution. In Sect. 4, we present the details of our practical contribution (implementation and evaluation). In Sect. 5, we discuss related work. We conclude and discuss future work in Sect. 6.

Detailed formal definitions and proofs of all lemmas and theorems can be found in our supplement [38].

# 2 Overview of our Approach

Scenario. To highlight our solutions to the +-problem, ∃-problem, and problem, we consider a Key-Value Store protocol, similar to those used in modern NoSQL databases [21,27]. Specifically, our Key-Value Store protocol is inspired by the transaction mechanism of the popular Redis database [48,49]. This protocol is not supported by any of the existing MPST works.

The Key-Value Store protocol consists of n Clients that require access to the store, represented by role names <sup>c</sup><sup>1</sup>, ..., <sup>c</sup>n, and one Server that provides access to the store, represented by role name s. The store has keys of type Str (strings) and values of type Nat (numbers). Fig. 2 shows valid and invalid example executions of the protocol (n=2) as message sequence charts; it works as follows.

First, a Lock-message is communicated from some Client <sup>c</sup>i (1≤i≤n) to Server <sup>s</sup> (Fig. 2a, arrows 1, 5); this grants <sup>c</sup>i exclusive access to the store. Then, a sequence of messages to write and/or read values is communicated:


The sequence ends with the communication of an Unlock-message from <sup>c</sup>i to <sup>s</sup> (arrow 12). The protocol is then repeated for some Client <sup>c</sup>j (1≤j≤n); possibly, but not necessarily, i=j. In this way, the Server atomically processes accesses to the store between Lock/Unlock-messages.

Global and local types. The corresponding global type and local types, inferred via projection (for some n), are as follows:

$$G = \mu X. \exists r \in \{\mathsf{c}\_i \mid 1 \le i \le n\}. \, r \multimap \mathsf{s} \colon \mathsf{Lock} \cdot$$

$$\mu Y. \left[ \begin{split} & \left[ \mathsf{l} \,\mu Z. \left[ \left[ r \multimap \mathsf{s} : \mathsf{Set} \left( \mathsf{Str} \right) \cdot \left[ \mathsf{s} \multimap r : \mathsf{Value} \left( \mathsf{Str}, \mathsf{Nat} \right) \parallel Z \right] \right] + r \multimap \mathsf{s} : \mathsf{Barnier} \right] \cdot Y \right] \right] \\ & \left[ \left[ r \multimap \mathsf{s} : \mathsf{Set} \left( \mathsf{Str}, \mathsf{Nat} \right) \cdot Y \right] + \left[ r \multimap \mathsf{s} : \mathsf{Unblock} \cdot X \right] \end{split} \right] \, \begin{split} G \vdash \begin{split} G = \left[ \left[ \left[ r \multimap \mathsf{s} : \mathsf{Set} \left( \mathsf{Rart} \right) \right] \cdot Y \right] \\ & \left[ \left[ r \multimap s : \mathsf{Set} \left( \mathsf{Nat} \right) \right] \cdot Y \right] \end{split} \right] \end{split}$$

$$\begin{split} L\_{\mathsf{C}\_{i}} &= \mu X. \,\mathsf{c\_{i}} \mathtt{s\_{i}} \mathtt{l\_{\mathsf{Loc}}} \cdot \\ & \quad \mu Y. \left[ \left[ \mathsf{l}\boldsymbol{\omega}. \left[ \left[ \mathsf{c\_{i}} \mathtt{s\_{i}} \mathtt{Set} (\mathsf{Str}) \cdot \left[ \mathsf{sc\_{i}} \mathtt{?} \mathtt{Value} (\mathsf{Str}, \mathsf{Nat}) \parallel \boldsymbol{Z} \right] \right] + \mathsf{c\_{i}} \mathtt{s\_{i}} \mathtt{?} \mathtt{Barrier} \right] \cdot Y \right] \right] \\ L\_{\mathsf{S}} &= \mu X. \,\exists r \in \{\mathsf{c\_{i}} \, | \, 1 \le i \le n\}. \, r\textbf{s} \,\mathsf{?} \mathtt{Loc} \,\mathsf{k} \cdot \\ & \quad \mu Y. \left[ \left[ \mu Z. \left[ \left[ \left[ rs \, \mathtt{?} \mathtt{Set} (\mathsf{Str}) \cdot \left[ \left[ \mathtt{s\_{i}} \mathtt{?} \mathtt{Value} (\mathsf{Str}, \mathsf{Nat}) \parallel \boldsymbol{Z} \right] \right] + r \mathtt{s\_{i}} \mathtt{?} \mathtt{Barrier} \right] \cdot Y \right] \right] \right. \\ & \quad \left. + \left[ rs \, \mathtt{?} \mathtt{Set} (\mathsf{Str}, \mathsf{Nat}) \cdot Y \right] + \left[ rs \, \mathtt{?} \mathtt{Loc} \,\mathsf{k} \cdot X \right] \right] \end{split}$$

Global type r<sup>1</sup><sup>r</sup><sup>2</sup> :-(t) specifies the communication of a message labelled - with a payload typed <sup>t</sup> from sender <sup>r</sup><sup>1</sup> to receiver <sup>r</sup><sup>2</sup>; global type <sup>G</sup><sup>1</sup> · <sup>G</sup><sup>2</sup> specifies the sequential composition of global types <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup>; global type <sup>G</sup><sup>1</sup> <sup>+</sup> <sup>G</sup><sup>2</sup> specifies the alternative composition (choice) of global types <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup>; global type <sup>∃</sup>r∈{r<sup>1</sup>, ..., rn}. G specifies the existential role quantification over domain {r<sup>1</sup>, ..., rn} (i.e., the alternative composition of <sup>G</sup>[r<sup>1</sup>/r] and ... and <sup>G</sup>[rn/r], where <sup>G</sup>[ri/r] denotes the substitution of <sup>r</sup>i for every <sup>r</sup> in <sup>G</sup>); global type <sup>G</sup><sup>1</sup> <sup>G</sup><sup>2</sup> specifies the interleaving composition of <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup> (free merge [4]); global type <sup>μ</sup>X. G specifies recursion (i.e., X is bound to <sup>μ</sup>X. G in G).

Local type <sup>r</sup>1r<sup>2</sup> !-(t) specifies the send of a -(t)-message through the channel from <sup>r</sup><sup>1</sup> to <sup>r</sup>2; dually, local type <sup>r</sup>1r<sup>2</sup> ?-(t) specifies a receive. Because every Client participates in only one branch of the quantification, their local types do not contain ∃ under the recursion. In contrast, because the Server participates in all branches, <sup>L</sup><sup>S</sup> does contain <sup>∃</sup> under the recursion.

By Thm. 3, <sup>G</sup> and the parallel composition of <sup>L</sup><sup>C</sup><sup>1</sup> , ..., <sup>L</sup><sup>C</sup><sup>n</sup> , <sup>L</sup><sup>S</sup> are operationally equivalent (weakly bisimilar), which in turn implies deadlock-freedom and absence of protocol violations. Note also that our global type for the Key-Value Store protocol indeed relies on solutions to the +-problem (choice between multiple clients that send a Lock-message), the ∃-problem (existential quantification over clients), and the -problem (unbounded interleaving to support asynchronous responses of a statically unknown number of requests).

# 3 An MPST Theory with **+**, *∃*, and

#### 3.1 Types as Process Algebraic Terms

We define our languages of global and local types as algebras over sets of (global) communications and (local) sends/receives. This subsection presents preliminaries on the generic algebraic framework we use, based on the existing algebras PA [3] and TCP+REC [2]; the next subsection presents our specific instantiations for global and local types.

Let <sup>A</sup> denote a set of actions, ranged over by <sup>α</sup>, and let {X1, X2,...,Y,...} denote a set of recursion variables. Then, let Term(A) denote the set of (algebraic) terms, ranged over by T, generated by the following grammar:

<sup>T</sup> ::= <sup>1</sup> <sup>|</sup> <sup>α</sup> <sup>|</sup> <sup>T</sup><sup>1</sup> <sup>+</sup> <sup>T</sup><sup>2</sup> <sup>|</sup> <sup>T</sup><sup>1</sup> · <sup>T</sup><sup>2</sup> <sup>|</sup> <sup>T</sup><sup>1</sup> <sup>T</sup><sup>2</sup> <sup>|</sup> <sup>X</sup> | Xk | {Xi → <sup>T</sup>i}i∈I (k <sup>∈</sup> I)

Term 1 specifies a skip; the grey background indicates it should not be explicitly written by programmers (but it is used only implicitly in the operational semantics). Term <sup>α</sup> specifies an atomic action from <sup>A</sup>. Terms <sup>T</sup><sup>1</sup> <sup>+</sup> <sup>T</sup><sup>2</sup>, <sup>T</sup><sup>1</sup> · <sup>T</sup><sup>2</sup>, and <sup>T</sup><sup>1</sup> <sup>T</sup><sup>2</sup> specify the alternative composition, the sequential composition, and the interleaving composition (free merge [4]; a form of parallel composition without interaction between the operands) of <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup>. Terms <sup>X</sup> and Xk | {Xi → <sup>T</sup>i}i∈I specify recursion, where {Xi → <sup>T</sup>i}i∈I is a recursive specification that maps recursion variables to terms, <sup>X</sup>k is the initial call (for <sup>T</sup>k), and every <sup>X</sup>j that occurs in <sup>T</sup>k is a subsequent recursive call (for <sup>T</sup>j ); we write <sup>μ</sup>X. T instead of X | {X → T} .

Let <sup>X</sup> <sup>T</sup>erm(A) denote the set of all recursive specifications (i.e., every recursive specification is a partial function), ranged over by E,F, and let sub(E,T) denote the simultaneous substitution of term E(X) for each recursion variable X in T. Fig. <sup>3</sup> defines the operational semantics of terms. It consists of two components: relation −→ defines reduction of terms, while relation ↓ defines successful termination of terms. In words, term <sup>T</sup><sup>1</sup> <sup>+</sup> <sup>T</sup><sup>2</sup> is reduced by reducing either <sup>T</sup><sup>1</sup> or <sup>T</sup><sup>2</sup>; term <sup>T</sup><sup>1</sup> · <sup>T</sup><sup>2</sup> is reduced by reducing first <sup>T</sup><sup>1</sup> and then <sup>T</sup><sup>2</sup>; term

$$\begin{array}{c} \begin{array}{c} T\_1 \stackrel{\alpha}{\longrightarrow} T\_1'\\ T\_1 \cdot T\_2 \stackrel{\alpha}{\longrightarrow} T\_1' \cdot T\_2 \end{array} \quad \begin{array}{c} T\_1 \downarrow \quad T\_2 \stackrel{\alpha}{\longrightarrow} T\_2'\\ T\_1 \cdot T\_2 \stackrel{\alpha}{\longrightarrow} T\_2' \end{array} \quad \begin{array}{c} T\_1 \stackrel{\alpha}{\longrightarrow} T\_2'\\ T\_1 \cdot T\_2 \stackrel{\alpha}{\longrightarrow} T\_2' \end{array} \quad \begin{array}{c} T\_1 \stackrel{\alpha}{\longrightarrow} T\_1'\\ T\_1 + T\_2 \stackrel{\alpha}{\longrightarrow} T\_1' \end{array} \quad \begin{array}{c} T\_2 \stackrel{\alpha}{\longrightarrow} T\_2'\\ T\_1 + T\_2 \stackrel{\alpha}{\longrightarrow} T\_2' \end{array} \\\hline \begin{array}{c} T\_1 \stackrel{\alpha}{\longrightarrow} T\_1'\\ T\_1 \stackrel{\alpha}{\longrightarrow} T\_1 \stackrel{\alpha}{\longrightarrow} T\_1 \end{array} \quad \begin{array}{c} \mathsf{sub}(E, E(X)) \stackrel{\alpha}{\longrightarrow} T'\\ \langle X \mid E \rangle \stackrel{\alpha}{\longrightarrow} T' \end{array} \\\hline \text{(a) Reduction} \end{array}$$

$$\begin{array}{ccccc}\hline \overline{\mathbb{1}\downarrow} & \overline{\mathbb{1}\_1}\downarrow & \overline{\mathbb{1}\_2}\downarrow & \overline{\mathbb{1}\_1}\downarrow & \overline{\mathbb{1}\_2}\downarrow & \overline{\mathbb{1}\_1}\downarrow & \overline{\mathbb{1}\_2}\downarrow & \overline{\mathbb{1}\_2}\downarrow & \overline{\mathbb{1}\_2}\downarrow & \overline{\mathbb{1}\_2}\downarrow \\\\ & & & & & \\ \text{(b) Termination} & & & & \\ \hline \end{array}$$

#### Fig. 3: Operational semantics of terms

<sup>T</sup><sup>1</sup> <sup>T</sup><sup>2</sup> is reduced by reducing <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup> interleaved; and term <sup>X</sup> <sup>|</sup><sup>E</sup> is reduced by reducing the version of E(X) where recursion variables have been substituted.

A term is 1 -free if it has no occurrences of 1 . A term is closed if it has no occurrences of free recursion variables. A term T is deterministic if (1) for every action α, there exists at most one term T such that <sup>T</sup> can reduce to <sup>T</sup> by performing α, and (2) every term to which T can reduce is deterministic as well. Henceforth, we consider only 1 -free, closed, and deterministic terms.

We note that A, <sup>+</sup>, ·, is the signature of PA [3], while <sup>1</sup> , <sup>A</sup>, <sup>+</sup>, ·, , <sup>X</sup>,-|- is a subsignature of TCP+REC [2]. As the operational semantics of terms in Term(A) coincides with the operational semantics of terms in (the corresponding subalgebra of) TCP+REC, our languages of global and local types inherit TCP+REC's sound and complete axiomatisation, used in our tool (Sect. 4.1).

#### 3.2 Global Types and Local Types

Actions. We instantiate Term(A) to obtain languages of global and local types by defining action sets for (global) communications and for (local) sends/receives.

Let <sup>R</sup> <sup>=</sup> {a, <sup>b</sup>, ...} denote the set of all role names, ranged over by <sup>r</sup>. Let <sup>L</sup>ab <sup>=</sup> {Lock, Get, ...} denote the set of all labels, ranged over by -. Let T = {Nat, Bool,...} denote the set of all payload types, ranged over by t. Let <sup>U</sup> <sup>=</sup> <sup>L</sup>ab <sup>×</sup> <sup>T</sup> denote the set of all message types, ranged over by U; we write -(t) instead of -, t . Finally, let <sup>A</sup>g and <sup>A</sup>l denote the sets of all (global) communications and (local) sends/receives, ranged over by g and l, generated by:

$$\begin{aligned} \{g & \coloneqq = \ r\_1 \rightsquigarrow r\_2 \colon U \quad (\text{if: } r\_1 \neq r\_2) \\\ l & \coloneqq = \ r\_1 r\_2 \upharpoonright U \mid \ r\_1 r\_2 \upharpoonright U \mid \ \varepsilon^r\_{r\_1 r\_2} \ (\text{if: } r\_1 \neq r\_2 \text{ and } r\_1 \neq r \neq r\_2) \end{aligned}$$

Global action r<sup>1</sup><sup>r</sup><sup>2</sup> :<sup>U</sup> specifies the communication of a <sup>U</sup>-message from sender <sup>r</sup><sup>1</sup> to receiver <sup>r</sup><sup>2</sup>; we note that communications are synchronous, as actions in the underlying algebra are indivisible [2,3], but asynchrony can be encoded (Exmp. 1, below). Local action <sup>r</sup><sup>1</sup>r<sup>2</sup> !<sup>U</sup> specifies the send of a <sup>U</sup>-message through channel <sup>r</sup><sup>1</sup>r<sup>2</sup> (from <sup>r</sup><sup>1</sup> to <sup>r</sup><sup>2</sup>). Dually, local action <sup>r</sup><sup>1</sup>r<sup>2</sup> ?<sup>U</sup> specifies a receive. Local 258 S. Jongmans and N. Yoshida

split(r, r<sup>1</sup><sup>r</sup><sup>2</sup> :U) = - ( <sup>1</sup> , r<sup>1</sup>r<sup>2</sup> :U) if: r ∈ {r1, r2} (r<sup>1</sup>r<sup>2</sup> :U, 1 ) otherwise split(r, G<sup>1</sup> · G2) = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ (G- 1, G-- <sup>1</sup> · G2) if: split(r, G1)=(G- 1, G-- <sup>1</sup> ) and G-- <sup>1</sup> = 1 (G<sup>1</sup> · G- 2, G-- <sup>2</sup> ) if: split(r, G1)=(G- 1, G-- <sup>1</sup> ) and G-- <sup>1</sup> = 1 and split(r, G2)=(G- 2, G-- <sup>2</sup> ) and G-- <sup>2</sup> = 1 (G<sup>1</sup> · G2, 1 ) otherwise

$$\begin{array}{l} \begin{array}{l} M \leadsto G \quad \mathsf{split} \mathsf{left} \begin{array}{l} \mathsf{split} \mathsf{left} \begin{array}{l} \mathsf{split} \mathsf{right} \\ r\_{1} \leadsto r\_{2} : U \cdot M \leadsto r\_{1} \leadsto r\_{1} r\_{2} : U \cdot \left[r\_{1} r\_{2} \leadsto r\_{2} : U \parallel G' \right] \end{array} \end{array} \text{(assuming\textquotedbl} \end{array} \text{(asyon\mathsf{cl}ron\,y)} \\\\ \begin{array}{l} M\_{k} \leadsto G\_{k} \quad \Sigma \left\{M\_{i} \right\}\_{i \in I \;\middle| \,{} \ell \left| k \right| \leadsto G} \quad k \in I \\\\ \overline{\mathbb{L} \left\{X, \ell\_{\mathtt{c}}, \ell\_{\mathtt{c}}, \emptyset, \emptyset \right\} \smile G\_{k} + G} \quad \text{(n-ary choice)} \\\\ \overline{\mathbb{1} \left(X, \ell\_{\mathtt{c}}, \ell\_{\mathtt{c}}, \emptyset \right) \smile \left\| \, \left\| \begin{array}{l} \text{finite recursion: base} \end{array} \right\|} \end{array} \text{(n-ary choice)} \end{array} $$

$$\begin{aligned} M\_i' &= \overline{\mathfrak{u}}(X, \ell\_\mathbf{c}, \ell\_\mathbf{a}, \{\langle r\_{1j}, r\_{2j}, M\_j \rangle\}\_{j \in I \backslash \{i\}}) \text{ for all } i \in I\\ &\overline{\mathfrak{u}}\left( [r\_{1i} \rightsquigarrow r\_{2i} \; ; \ell\_\mathbf{c} \; M\_i \cdot X] + [r\_{1i} \rightsquigarrow r\_{2i} \; ; \ell\_\mathbf{a} \; M\_i'] \right)\_{i \in I \textstyle \simeq G}}{\overline{\mathfrak{u}}(X, \ell\_\mathbf{c}, \ell\_\mathbf{a}, \{\{r\_{1i}, r\_{2i}, G\_i\}\}\_{i \in I}) \leadsto \mu X . G} \text{ (finite recursion: step)}\\ &\xrightarrow{\Sigma} \begin{aligned} \underline{\mathfrak{L}}\left( M[r\_i \; r\_i] \right)\_{i \in I \textstyle \simeq G} \stackrel{\sim}{\to} G\\ &\overline{\exists r \in \{r\_1\}\_{i \in I} \; M \; \curvearrowleft G} \text{ (existential role quantification)} \end{aligned} \text{ (existential role quantification)} \end{aligned}$$

Fig. 4: Macros

action ε<sup>r</sup> <sup>r</sup>1r<sup>2</sup> specifies the idling of role <sup>r</sup> during a communication between roles <sup>r</sup><sup>1</sup> and <sup>r</sup><sup>2</sup>. The inclusion of such annotated idling actions in local types is novel; we shortly elaborate on its purpose.

We can now define <sup>G</sup>lob <sup>=</sup> <sup>T</sup>erm(Ag) and <sup>L</sup>oc <sup>=</sup> <sup>T</sup>erm(Al) as the sets of all global and local types, ranged over by G and L.

Macros. As a testimony to the unique expressive power of our language of global types, we extend it with a number of macros that can be expanded to "normal" global types in <sup>G</sup>lob. A macro M is generated by the following grammar:

$$\begin{aligned} M &:= G \in \mathbb{G} \text{LOB } \mid \ r\_1 \to r\_2 \cdot M \mid \Sigma \{M\_i\}\_{i \in I} \mid \\ \overline{\mu}(X, \ell\_\mathbf{e}, \ell\_\mathbf{e}, \{\langle r\_{1i}, r\_{2i}, M\_i \rangle\}\_{i \in I}) \mid \exists r \in \{r\_i\}\_{i \in I}. M \end{aligned}$$

Degenerate "macro" G is a normal global type; it is part of the grammar to nest global types inside macros. Macro <sup>r</sup><sup>1</sup>r<sup>2</sup> · <sup>M</sup> specifies an asynchronous communication from sender <sup>r</sup><sup>1</sup> to receiver <sup>r</sup><sup>2</sup>. Macro <sup>Σ</sup>{Mi}i∈I specifies an n-ary choice among <sup>|</sup>I<sup>|</sup> alternatives. Macro <sup>μ</sup>(X, c, <sup>e</sup>, {r<sup>1</sup>i, r<sup>2</sup>i, Mi }i∈I ) specifies finite recursion: at the start of each unfolding of recursion variable X, for some i <sup>∈</sup> I, either an <sup>c</sup>-message is communicated from sender <sup>r</sup><sup>1</sup>i to receiver <sup>r</sup><sup>2</sup>i (in which case they <sup>c</sup>ontinue their participation in the recursion), or an e-message is communicated (in which case they <sup>e</sup>xit). Macro <sup>∃</sup>r∈{ri}i∈I . M specifies existential role quantification. Macros can be nested. Slightly abusing notation, we allow macros to occur and be expanded freely in "normal" global types.

Fig. 4 defines the macro expansion rules. We note that the left-hand side of - is a macro, while the right-hand side is a normal global type. We demonstrated existential role quantification in Sect. 2; below, we give two more examples to illustrate our encoding of asynchronous communication and finite recursion.

Example 1 (Asynchrony). Although communications are synchronous, we can encode asynchrony by representing buffered channels (unordered, as in asynchronous π-calculus [32]) explicitly as roles that participate in a protocol. To this end, assume for all <sup>r</sup>1, r<sup>2</sup> <sup>∈</sup> <sup>R</sup>, there exists a role <sup>r</sup>1r<sup>2</sup> <sup>∈</sup> <sup>R</sup> as well (to represent the buffer from <sup>r</sup><sup>1</sup> to <sup>r</sup><sup>2</sup>); alternatively <sup>r</sup>1r<sup>2</sup> could be any fresh name.

The following global types (message types omitted) specify paradigmatic cases for protocols with asynchronous communications:

$$\begin{aligned} M\_1 &= \mathbf{a} \rightsquigarrow \mathbf{b} \cdot \|\mathbf{I}\| &\rightsquigarrow G\_1 = \mathbf{a} \rightsquigarrow \mathbf{a} \mathbf{b} \cdot \mathbf{a} \mathbf{b} \rightsquigarrow \mathbf{b} \\\ M\_2 &= \mathbf{a} \rightsquigarrow \mathbf{b} \cdot \mathbf{a} \rightsquigarrow \mathbf{b} \cdot \|\mathbf{I}\| \rightsquigarrow G\_2 = \mathbf{a} \rightsquigarrow \mathbf{a} \mathbf{b} \cdot \left[\mathbf{a} \mathbf{b} \rightsquigarrow \mathbf{b}\right] \|\mathbf{a} \rightsquigarrow \mathbf{a}\mathbf{b}\right] \\\ M\_3 &= \mathbf{a} \rightsquigarrow \mathbf{b} \cdot \mathbf{b} \rightsquigarrow \mathbf{a} \cdot \|\mathbf{I}\| \rightsquigarrow G\_3 = \mathbf{a} \rightsquigarrow \mathbf{a} \mathbf{b} \cdot \mathbf{a} \mathbf{b} \rightsquigarrow \mathbf{b} \cdot \mathbf{b} \rightsquigarrow \mathbf{a} \mathbf{b} \rightsquigarrow \mathbf{a} \\\ M\_4 &= \mathbf{a} \rightsquigarrow \mathbf{b} \cdot \mathbf{a} \rightsquigarrow \mathbf{b} \quad \rightsquigarrow G\_4 = \mathbf{a} \rightsquigarrow \mathbf{a} \mathbf{b} \cdot \mathbf{a} \mathbf{b} \rightsquigarrow \mathbf{b} \cdot \mathbf{a} \rightsquigarrow \mathbf{b} \end{aligned}$$

(For brevity, we omit 1 from the resulting global types; this can be incorporated in the macro expansion rules, at the expense of a more complex formulation.)

Global type <sup>G</sup><sup>1</sup> specifies an asynchronous communication from Alice to Bob. Global type <sup>G</sup><sup>2</sup> specifies two asynchronous communications from Alice to Bob; Alice can do the second send already before Bob has done the first receive. Global type <sup>G</sup><sup>3</sup> specifies an asynchronous communication from Alice to Bob, followed by one from Bob to Alice; in contrast to G<sup>2</sup>, Bob can send only after he has received (i.e., this encoding of asynchrony preserves causality of messages sent and received by the same role). Global type <sup>G</sup><sup>4</sup> specifies an asynchronous communication from Alice to Bob, followed by a synchronous communication from Bob to Alice; it highlights that, unlike existing languages of global types, ours supports mixing synchrony and asynchrony in a single global type.

Example 2 (Finite recursion). The Key-Value Store protocol in Sect. 2 does not terminate: in its global type, the inner recursions (Y and Z) can be exited, but the outer recursion (X) cannot. A version of this protocol that terminates once each of the Clients has indicated it has finished using the store (e.g., by sending an Exit-message) can also be specified.

We illustrate the key idea in a simplified example:

$$\begin{aligned} G\_1 &= \mu X. \left[ \left[ \texttt{a} \multimap \texttt{c} \texttt{:Con} \cdot X \right] + \texttt{a} \multimap \texttt{c} \texttt{:Exit} \right] \\ G &= \mu X. \left[ \left[ \texttt{a} \multimap \texttt{c} \texttt{:Con} \cdot X \right] + \left[ \texttt{a} \multimap \texttt{c} \texttt{:Exit} \cdot G\_2 \right] \right] + \left[ \texttt{b} \multimap \texttt{c} \texttt{:Con} \cdot X \right] + \left[ \texttt{b} \multimap \texttt{c} \texttt{:Exit} \cdot G\_1 \right] \end{aligned}$$

Global type <sup>G</sup><sup>1</sup> specifies the communication of either a Con-message (to continue the recursion) or an Exit-message (to break it) from Alice to Carol. Global type <sup>G</sup><sup>2</sup> is similar. Global type <sup>G</sup> specifies the communication of a Con-message from

$$\begin{array}{cccc} \mathcal{L}(r)\downarrow\\ \textbf{for all }r\in\text{dom}\mathcal{L} & \mathcal{L} & \mathcal{L}\prime\_{r} \xrightarrow{r\_{1}r\_{2}\mathbin{\mathsf{L}}\mathcal{U}} \mathcal{L}\_{r\_{1}}^{\prime} & \mathcal{L}(r\_{2}) \xrightarrow{r\_{1}r\_{2}\mathbin{\mathsf{L}}\mathcal{U}} \mathcal{L}\_{r\_{2}}^{\prime} & \mathcal{L}\_{r\_{2}}^{\prime} & \mathcal{L}\prime\_{r} \xrightarrow{\varepsilon\_{1}^{r}r\_{2}\mathbin{\mathsf{L}}\mathcal{U}} \mathcal{L}\_{r\_{1}}^{\prime} \\\hline \mathcal{L}\downarrow\\ \textbf{(a) Termination} & & \textbf{(b) Reduction} \end{array}$$

Fig. 5: Operational semantics of groups of local types

$$T \mid r = T \quad \text{iff } G \in \{\Pi\} \cup \mathbb{X}$$

$$\langle (G\_1 \ast G\_2) \mid r = (G\_1 \restriction r) \ast (G\_2 \restriction r) \qquad r\_1 \multimap r\_2 \colon U \sqcap r = \begin{cases} r\_1 r\_2 \upharpoonright U & \text{iff } r\_1 = r \neq r\_2 \\ r\_1 r\_2 \upharpoonright U & \text{iff } r\_1 \neq r = r\_2 \\ \varepsilon\_{r\_1 r\_2}^r & \text{iff } r\_1 \neq r \neq r\_2 \end{cases}$$

$$\langle X \mid E \rangle \mid r = \langle X \mid E \mid r \rangle \qquad E \mid r = \{X \mapsto E(X) \mid r \mid X \in \text{dom}\, E\}$$

$$G \upharpoonright R = \{r \mapsto G \upharpoonright r \mid r \in R\} \quad \text{iff } \mathfrak{r}(G) \subseteq R \neq \emptyset$$

Fig. 6: Projection

either Alice or Bob to Carol, or an Exit-message. In the latter case, Carol stops communicating with a role, while she proceeds communicating with the other role. Thus, the communications between Alice and Carol, and between Bob and Carol, are decoupled (i.e., decisions to continue or break recursions are made per role). Macro μ generalizes this pattern to arbitrary recursion bodies.

Groups. Finally, let <sup>R</sup> <sup>L</sup>oc denote the set of all groups of local types (i.e., every group is a partial function from role names to local types), ranged over by <sup>L</sup>. The idea is that while a global type specifies a protocol among n roles from one global perspective, a group of local types specifies a protocol from the n local perspectives. Fig. <sup>5</sup> defines the operational semantics of groups, built on top of the operational semantics of local types; we use the f[x → y] notation to update function f with entry x → y. In words, group <sup>L</sup> is reduced either by synchronously reducing the local types of a sender <sup>r</sup><sup>1</sup> and a receiver <sup>r</sup><sup>2</sup> (yielding a communication from <sup>r</sup><sup>1</sup> to <sup>r</sup><sup>2</sup>), or by reducing the local type of an idling role.

#### 3.3 End-Point Projection: from Global Types to Local Types

A key part of MPST (Fig. 1) is a projection operator that consumes a global type G as input and produces a group of local types <sup>L</sup> as output; it is correct if, under certain well-formedness conditions, G and <sup>L</sup> are operationally equivalent.

Let <sup>r</sup>(G) denote the set of all role names that occur in G. Fig. <sup>6</sup> defines our projection operator. In words, the projection of a communication r<sup>1</sup><sup>r</sup><sup>2</sup> :<sup>U</sup> onto a role <sup>r</sup> is a send <sup>r</sup><sup>1</sup>r<sup>2</sup> !<sup>U</sup> if the role is sender in the communication, a receive <sup>r</sup><sup>1</sup>r<sup>2</sup> ?<sup>U</sup> if it is receiver, or an idling action <sup>ε</sup><sup>r</sup> <sup>r</sup>1r<sup>2</sup> if it is not involved; the projections of all other forms of global types onto r are homomorphic; the projection of a global type onto a set of roles R is the corresponding group of

$$\begin{array}{llll} \frac{T\downarrow}{T\downarrow} & \xrightarrow{T\stackrel{\tau}{\longrightarrow}} T'\downarrow\\ \frac{T\downarrow}{T\downarrow} & \xrightarrow{T\stackrel{\alpha}{\Longrightarrow}} T' & \xrightarrow{T\stackrel{\tau}{\Longrightarrow}} T' \\ \end{array} \quad \begin{array}{llll} \xrightarrow{T\stackrel{\alpha}{\Longrightarrow}} T' & \xrightarrow{T\stackrel{\tau}{\Longrightarrow}} T' & \xrightarrow{T'} T' \\ \xrightarrow{\alpha} T'' & \xrightarrow{T\stackrel{\alpha}{\Longrightarrow}} T'' & \xrightarrow{T\stackrel{\tau}{\Longrightarrow}} T' \\ \text{(a) Hermitian} \end{array}$$

Fig. 7: Weak operational semantics; T,T , T <sup>∈</sup> <sup>G</sup>lob <sup>∪</sup> <sup>L</sup>oc <sup>∪</sup> (<sup>R</sup> <sup>L</sup>oc)

projections, where the side condition implies that the group is nonempty and contains a local type for at least every role name that occurs in G. Thus, a group of projections of G is a partial function relative to the set of all roles <sup>R</sup>, but it is total relative to the set of roles <sup>r</sup>(G) <sup>⊆</sup> <sup>R</sup> that occur in G. (We note that we also continue to assume global types are 1 -free, closed, and deterministic.)

Our projection operator is similar to existing projection operators in the MPST literature [34], but it also differs on a fundamental account: it produces local types with annotated idling actions. These idling actions will be instrumental in the definition of our well-formedness conditions. We note that no idling actions occur in the local types for the Key-Value Store protocol in Sect. 2. This is because after the idling actions have been used to establish well-formedness, they are of no more use and can be eliminated to simplify the local types.

The following lemmas state key properties about termination and reduction behaviour of global types and their projections: Lem. 1 states projection is sound and complete for termination; Lem. 2 states the same for reduction.

Lemma 1. - G <sup>↓</sup> implies (G r) <sup>↓</sup> and - (G r) <sup>↓</sup> implies G <sup>↓</sup> 

Proof. By induction on G.

$$\begin{aligned} \text{Lemma 2. } \left[ G \xrightarrow{g} G' \text{ implies } (G \restriction r) \xrightarrow{g \restriction r} (G' \restriction r) \right] \\ \text{and } \left[ (G \restriction r) \xrightarrow{g \restriction r} L' \text{ implies } \left[ \left[ G \xrightarrow{g} G' \text{ and } L = G' \restriction r \right] \text{ for some } G' \right] \right] \end{aligned}$$

Proof. Both conjuncts are proven by induction on the structure of G, also using Lem. 1 (needed because termination plays a role in reduction of ·).

#### 3.4 Weak Bisimilarity of Global Types, Local Types, and Groups

The idling actions introduced in local types by our projection operator are internal, because they never compose into communications that emerge between local types in groups. Therefore, the operational equivalence relation under which we prove the correctness of projection should be insensitive to idling actions.

First, let <sup>A</sup>τ <sup>=</sup> {ε<sup>r</sup> <sup>r</sup>1r<sup>2</sup> <sup>|</sup> <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>r</sup><sup>2</sup> and <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>r</sup> <sup>=</sup> <sup>r</sup><sup>2</sup>} denote the set of all internal actions, ranged over by τ,σ. Second, Fig. <sup>7</sup> defines an extension of our operational semantics (Fig. 3) with relations that assert weak termination and weak reduction (i.e., versions of termination and reduction that are insensitive to internal actions). Third, Fig. 8 defines weak bisimilarity (≈), in terms of weak similarity (), in terms of weak termination and weak reduction; it coincides with the definition found in the literature (e.g., [2]), with the administrative

$$\begin{array}{c c c} \begin{bmatrix} T\_1' \preceq T\_2' \text{ and } T\_2 \xRightarrow{\alpha} T\_2' \end{bmatrix} \text{for some } T\_2' \end{array} \begin{aligned} \begin{bmatrix} \left[T\_1' \preceq T\_2' \text{ and } T\_2 \xRightarrow{\alpha} T\_2' \right] \text{for some } T\_2' \end{bmatrix} \end{aligned}$$

$$\begin{array}{c c} \begin{aligned} \text{for all } T\_1 \xRightarrow{\alpha} T\_1' \end{aligned} \end{array} \begin{aligned} \begin{aligned} \begin{aligned} \begin{array}{c} \mathcal{R}, \mathcal{R}^{-1} \subseteq \preceq \\ T\_1 \end{array} \end{aligned} \end{cases}$$

Fig. 8: Weak operational equivalence; T1, T <sup>1</sup>, T2, T <sup>2</sup> <sup>∈</sup> <sup>G</sup>lob <sup>∪</sup> <sup>L</sup>oc <sup>∪</sup> (<sup>R</sup> <sup>L</sup>oc)

exception that we need the fourth rule in Fig. 7b to account for the fact we have multiple different internal actions. We use a double horizontal line in the formulation of rules to indicate they should be applied coinductively.

The notion of weak reduction allows us to generalize the soundness and completeness of projection from roles (Lem. 2) to groups of roles: Lem. 3 states (1) if <sup>G</sup> can <sup>g</sup>-reduce to <sup>G</sup> and the projection of G is defined, then the group of projections of G can reduce to the group of projections of G , either directly or with a trailing weak τ -reduction; (2) conversely, if the group of projections of G can g-reduce to <sup>L</sup> , then G can g-reduce to G and either <sup>L</sup> equals the group of projections of G , or it can get there with a weak reduction.

$$\begin{aligned} \text{Lemma 3.} \quad & \left[ \begin{bmatrix} G \xrightarrow{g} G' \text{ and} \\ G \upharpoonright R \text{ is defined} \end{bmatrix} \text{ implies} \left[ \begin{bmatrix} (G \upharpoonright R) \xrightarrow{g} (G' \upharpoonright R) \text{ or} \\ (G \upharpoonright R) \xrightarrow{g} \mathcal{L}' \xleftarrow{\tau} (G' \upharpoonright R) \end{bmatrix} \right] \\ & \text{and} \left[ \begin{bmatrix} (G \upharpoonright R) \xrightarrow{g} \mathcal{L}' \text{ implies} \\ (G \upharpoonright R) \xrightarrow{g} \mathcal{L}' \text{ implies} \end{bmatrix} \begin{bmatrix} \mathcal{L}' = G' \upharpoonright R \text{ or} \\ \mathcal{L}' \xRightarrow{\tau} (G' \upharpoonright R) \end{bmatrix} \right] \end{aligned}$$

Proof. Both conjuncts are proven by induction on R, also using Lem. 2.

#### 3.5 Well-formedness of Global Types

In general, projection does not preserve weak operational semantics.

Example 3 (Bad protocols). The following global types (message types omitted) specify "bad" protocols that do not permit "good" concurrent implementations:

$$G\_1 = \mathbf{a} \multimap \mathbf{b} + \mathbf{a} \multimap \mathbf{c} \qquad \text{or} \qquad G\_2 = \mathbf{a} \multimap \mathbf{b} \cdot \mathbf{c} \multimap \mathbf{d}$$

$$\underbrace{\mathbf{a}\mathbf{b}! + \mathbf{a}\mathbf{c}!}\_{G\_1 \upharpoonright \mathbf{a}} \quad \underbrace{\mathbf{a}\mathbf{b}? + \varepsilon\_{\mathbf{a}\mathbf{c}}^{\mathbf{b}}}\_{G\_1 \upharpoonright \mathbf{b}} \quad \underbrace{\varepsilon\_{\mathbf{a}\mathbf{b}}^{\mathbf{c}} + \mathbf{a}\mathbf{c}}\_{G\_1 \upharpoonright \mathbf{c}} \quad\quad \underbrace{\mathbf{a}\mathbf{b}! \cdot \varepsilon\_{\mathbf{c}\mathbf{d}}^{\mathbf{a}}}\_{G\_2 \upharpoonright \mathbf{b}} \quad \underbrace{\varepsilon\_{\mathbf{a}\mathbf{b}}^{\mathbf{c}} \cdot \mathbf{c} \mathbf{d}!}\_{G\_2 \upharpoonright \mathbf{b}} \quad\underbrace{\varepsilon\_{\mathbf{a}\mathbf{b}}^{\mathbf{d}} \cdot \mathbf{c} \mathbf{d}!}\_{G\_2 \upharpoonright \mathbf{d}}$$

Global type <sup>G</sup><sup>1</sup> specifies a communication from Alice to either Bob or Carol, chosen by Alice. This is a bad protocol, because if Alice chooses Bob, there is no way for Carol to know (and vice versa): Carol cannot locally distinguish between whether Alice has not made her choice yet, or whether Alice has chosen Bob. Formally, this is manifested in the fact that Carol's local type can at any time choose to perform idling action ε<sup>c</sup> ab (i.e., local type <sup>G</sup><sup>1</sup> <sup>c</sup> has two reductions, neither one of which has priority), thereby assuming that Alice has chosen Bob. However, Bob can symmetrically assume that Alice has chosen Carol. As a result, the group projection can reduce as follows: <sup>G</sup><sup>1</sup> {a, <sup>b</sup>, <sup>c</sup>} <sup>ε</sup><sup>c</sup> ab −−→ L<sup>1</sup> εb ac −−→ L2. Now, L<sup>2</sup> cannot reduce further, but Alice has not terminated yet. This sequence of reductions cannot be (weakly) simulated by G1.

Global type <sup>G</sup><sup>2</sup> specifies a communication from Alice to Bob, followed by a communication from Carol to Dave. This is a bad protocol, because there is no way for Carol and Dave to know when the communication from Alice to Bob has occurred. Formally, this is manifested in the fact that Carol's and Dave's local types can at any time choose to perform idling actions, thereby assuming that the communication from Alice to Bob has occurred. As a result, the group projection can reduce as follows: G2{a, <sup>b</sup>, <sup>c</sup>, <sup>d</sup>} <sup>ε</sup><sup>c</sup> ab −−→ L<sup>1</sup> εd ab −−→ L<sup>2</sup> dd −−−→ L<sup>3</sup> ab −−−→ <sup>L</sup>4. This sequence cannot be (weakly) simulated by G<sup>2</sup>.

Next, we define two well-formedness conditions that invalidate the previous examples; in Sect. 3.6, we prove that if these conditions are satisfied by a global type G, it is indeed guaranteed that G and G R are operationally equivalent (i.e., weakly bisimilar). Instead of defining the conditions in terms of global types, we define them in terms of projections (i.e., local types). Informally:

	- 1. For every every weak reduction <sup>l</sup> <sup>=</sup><sup>⇒</sup> that local type G r can perform (where l is a send or a receive, but not an idling action), it can perform a reduction <sup>l</sup> −→. That is, if G r can perform l in the future after idling actions, it can do l already eagerly in the present.
	- 2. Local type Gr is the start of a causal chain: a sequence of τ -reductions, followed by a non-τ -reduction, that are "causally related" to each other. An ε<sup>r</sup> <sup>r</sup>1r<sup>2</sup> -reduction is causally related to a <sup>ε</sup><sup>r</sup> <sup>r</sup>3r<sup>4</sup> -reduction iff {r<sup>1</sup>, r<sup>2</sup>} ∩ {r<sup>3</sup>, r<sup>4</sup>} <sup>=</sup> <sup>∅</sup>. Globally speaking, this means communication between <sup>r</sup><sup>3</sup> and <sup>r</sup><sup>4</sup> must be preceded by communication between <sup>r</sup><sup>1</sup> and <sup>r</sup><sup>2</sup>.

These conditions must hold coinductively for all local types that Gr can reduce to. Essentially, these conditions state that by performing idling actions, a local type can neither decrease its possible behaviour (C), nor increase it (EC-1), unless it is guaranteed the added behaviour cannot be exercised yet, because it is causally related to other communications that need to happen first (EC-2).

Example 4 (Bad protocols, continued). Global type <sup>G</sup><sup>1</sup> (Exmp. 3) is ill-formed: its projections onto <sup>b</sup> and <sup>c</sup> violate condition <sup>C</sup>. Global type <sup>G</sup><sup>2</sup> (Exmp. 3) is also ill-formed: its projections onto <sup>c</sup> and <sup>d</sup> violate condition EC.

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ Λ-- <sup>1</sup> ≈ Λ-- <sup>2</sup> and Λ- 1 <sup>α</sup><sup>2</sup> ==<sup>⇒</sup> <sup>Λ</sup>-- <sup>1</sup> and Λ- 2 <sup>α</sup><sup>1</sup> ==<sup>⇒</sup> <sup>Λ</sup>-- 2 or Λ-- <sup>1</sup> ≈ Λ- <sup>2</sup> and Λ- 1 <sup>α</sup><sup>2</sup> ==<sup>⇒</sup> <sup>Λ</sup>-- <sup>1</sup> and <sup>α</sup><sup>1</sup> <sup>∈</sup> <sup>A</sup><sup>τ</sup> or Λ- <sup>1</sup> ≈ Λ-- <sup>2</sup> and Λ- 2 <sup>α</sup><sup>1</sup> ==<sup>⇒</sup> <sup>Λ</sup>-- <sup>2</sup> and <sup>α</sup><sup>2</sup> <sup>∈</sup> <sup>A</sup><sup>τ</sup> or Λ- <sup>1</sup> ≈ Λ- <sup>2</sup> and <sup>α</sup>1, α<sup>2</sup> <sup>∈</sup> <sup>A</sup><sup>τ</sup> ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ for some Λ-- <sup>1</sup> , Λ-- 2 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ for all <sup>Λ</sup> <sup>α</sup><sup>1</sup> ==<sup>⇒</sup> <sup>Λ</sup>- <sup>1</sup> and <sup>Λ</sup> <sup>α</sup><sup>2</sup> ==<sup>⇒</sup> <sup>Λ</sup>- 2 C<sup>α</sup><sup>1</sup> <sup>α</sup><sup>2</sup> (Λ) C(Λ) Cα <sup>τ</sup> (Λ) for all α, τ C(Λ- ) for all Λ <sup>α</sup> −→ Λ- ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎣ Λ-- <sup>≈</sup> <sup>Λ</sup>∗∗ and <sup>Λ</sup> <sup>α</sup><sup>2</sup> −−→ <sup>Λ</sup><sup>∗</sup> <sup>α</sup><sup>1</sup> ==<sup>⇒</sup> <sup>Λ</sup>∗∗ or Λ-- <sup>≈</sup> <sup>Λ</sup><sup>∗</sup> and <sup>Λ</sup> <sup>α</sup><sup>2</sup> −−→ <sup>Λ</sup><sup>∗</sup> and <sup>α</sup><sup>1</sup> <sup>∈</sup> <sup>A</sup><sup>τ</sup> or Chain Λ ⎤ ⎥ ⎦ for some Λ<sup>∗</sup>, Λ∗∗ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ for all <sup>Λ</sup> <sup>α</sup><sup>1</sup> ==<sup>⇒</sup> <sup>Λ</sup> <sup>α</sup><sup>2</sup> −−→ <sup>Λ</sup>-- EC<sup>α</sup><sup>1</sup> <sup>α</sup><sup>2</sup> (Λ) EC(Λ) EC<sup>τ</sup> <sup>α</sup>(Λ) for all α /<sup>∈</sup> <sup>A</sup><sup>τ</sup> , τ EC(Λ- ) for all Λ <sup>α</sup> −→ Λ- Chain L L- <sup>1</sup> = L- <sup>2</sup> and l<sup>1</sup> = l<sup>2</sup> for all <sup>L</sup> <sup>l</sup><sup>1</sup> −−→ <sup>L</sup>- <sup>1</sup> and <sup>L</sup> <sup>l</sup><sup>2</sup> −−→ <sup>L</sup>- 2 r(τ ) ∩ r(l) = ∅ and Chain L or l /<sup>∈</sup> <sup>A</sup><sup>τ</sup> for all L <sup>τ</sup> −→ L l −→ L--

Fig. 9: Well-formedness conditions; Λ, Λ , Λ, Λ <sup>1</sup>, Λ <sup>1</sup> , Λ <sup>2</sup>, Λ <sup>2</sup> <sup>∈</sup> <sup>L</sup>oc <sup>∪</sup> (<sup>R</sup> <sup>L</sup>oc)

Fig. 9 defines C and EC formally. We define C not only for local types, but also for groups of local types, as this simplifies some notation later on. We prove key properties of <sup>C</sup>: Thm. <sup>1</sup> states commutativity of local sends/receives/idling (l) in local types gets lifted to commutativity of global communications/idling (α) in groups of local types; Lem. 4 states weak bisimilarity preserves commutativity.

$$\begin{aligned} \text{Theorem 1.} \quad & \left[ \begin{bmatrix} \mathbb{C}^{l}\_{\tau}(\mathcal{L}(r)) \\ \text{for all } l, \tau \end{bmatrix} \text{ for all } r \in \text{dom}\,\mathcal{L} \right] \text{ implies} \quad \begin{bmatrix} \mathbb{C}^{\alpha}\_{\tau}(\mathcal{L}) \\ \text{for all } \alpha, \tau \end{bmatrix} \right] \\ \text{and } & \left[ \begin{bmatrix} \mathbb{C}(\mathcal{L}(r)) \text{ for all } r \in \text{dom}\,\mathcal{L} \end{bmatrix} \text{ implies } \mathbb{C}(\mathcal{L}) \right] \end{aligned}$$

Proof. The first conjunct is proven by induction on the rules of =⇒. The second is proven by coinduction on the rule of C, also using the first conjunct.

$$\begin{aligned} \text{Lemma 4. } \left[ \left[ \mathbb{C}\_{\alpha\_2}^{\alpha\_1}(\mathcal{L}\_1) \text{ and } \mathcal{L}\_1 \approx \mathcal{L}\_2 \right] \text{ implies } \mathbb{C}\_{\alpha\_2}^{\alpha\_1}(\mathcal{L}\_2) \right] \\ \text{and } \left[ \left[ \mathbb{C}(\mathcal{L}\_1) \text{ and } \mathcal{L}\_1 \approx \mathcal{L}\_2 \right] \text{ implies } \mathbb{C}(\mathcal{L}\_2) \right] \end{aligned}$$

Proof. The first conjunct is proven by applying the definitions of C and ≈; the second is proven by coinduction on the rule of C, also using the first conjunct.

We also prove key properties of Chain and EC, both of which work specifically for groups of projections: Lem. <sup>5</sup> states if the projections of <sup>r</sup><sup>1</sup> and <sup>r</sup><sup>2</sup> are both causal chains, they cannot weakly reduce to local types where they can perform reciprocal actions (r<sup>1</sup> the send; <sup>r</sup><sup>2</sup> the receive); Thm. <sup>2</sup> states eagerness of local sends/receives (not idling) in projections gets lifted to eagerness of global communications in groups of projections (cf. Thm. 1).

$$\begin{array}{c} \textbf{Lemma 5.} \begin{bmatrix} \textbf{Chain} \left( G \sqcap R \right)(r\_1) \xrightarrow{\tau\_1} \mathcal{L}'(r\_1) \xrightarrow{r\_1 r\_2 \colon U} \mathcal{L}''(r\_1) \text{ and} \\ \textbf{Chain} \left( G \sqcap R \right)(r\_2) \xrightarrow{\tau\_2} \mathcal{L}'(r\_2) \xrightarrow{r\_1 r\_2 \colon U} \mathcal{L}''(r\_2) \end{bmatrix} \begin{array}{c} \textbf{implies } \textbf{false} \end{array}$$

Proof. By induction on the rules of =⇒.

$$\begin{aligned} \text{Theorem 2.} \quad & \left[ \begin{array}{l} \mathsf{EC}\_{l}^{\tau} \left( (G \upharpoonright R)(r) \right) \\ \text{for all } l \notin \mathbb{A}\_{\tau}, \tau, r \in R \end{array} \right] \text{ implies} \begin{bmatrix} \mathsf{EC}\_{\alpha}^{\tau} (G \upharpoonright R) \\ \text{for all } \alpha, \tau \end{bmatrix} \\ \text{and } & \left[ \begin{bmatrix} \mathsf{EC} (\mathcal{L}(r)) \ \text{for all } r \in \text{dom } \mathcal{L} \end{bmatrix} \text{ implies } \mathsf{EC}(\mathcal{L}) \right] \end{aligned}$$

Proof. The first conjunct is proven by using Lem. 5; the second is proven by coinduction on the rule of EC, also using the first conjunct.

We note that, in contrast to Lem. 4 for C, we do not have a lemma that states weak bisimilarity preserves EC. Such a lemma would have been highly useful in our subsequent proofs, but it is unfortunately false, because weak bisimilarity does not preserve Chain. A simple counterexample, for local types, is this: <sup>L</sup><sup>1</sup> <sup>=</sup> <sup>r</sup>1r<sup>2</sup> !<sup>U</sup> and <sup>L</sup><sup>2</sup> <sup>=</sup> <sup>ε</sup><sup>r</sup><sup>3</sup> <sup>r</sup>4r<sup>5</sup> · <sup>r</sup>1r<sup>2</sup> !U, where {r1, r<sup>2</sup>}∩{r3, r4, r<sup>5</sup>} <sup>=</sup> <sup>∅</sup>. While <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>2</sup> are weakly bisimilar, <sup>L</sup><sup>1</sup> is the start of a unary causal chain, but <sup>L</sup><sup>2</sup> is not. The problem here is that Chain depends on the role names associated with idling actions, whereas weak bisimilarity abstracts those role names away.

We call a global type well-formed if each of its projections satisfies C and EC.

#### 3.6 Correctness of Projection under Well-Formedness

We now to prove our main result: if a global type is well-formed, it is weakly bisimilar to the group of its projections. We start by defining a relation to relate global types with groups of local types (denoted by R in Fig. 8):

$$\frac{\mathsf{C}(G\upharpoonright R)\quad\mathsf{E}\mathsf{C}(G\upharpoonright R)\quad(G\upharpoonright R)\stackrel{\star}{\Rightarrow}\mathcal{L}'\stackrel{\star}{\Leftarrow}\mathcal{L}\quad\mathsf{C}(\mathcal{L})}{\cline{1-4} \end{bmatrix}}$$

Here, we write L<sup>1</sup> =⇒ L<sup>2</sup> as an abbreviation for:

$$\left[\mathcal{L}\_1 \approx \mathcal{L}\_1' \stackrel{\tau}{\Rightarrow} \mathcal{L}\_2' \approx \mathcal{L}\_2 \text{ for some } \mathcal{L}\_1', \mathcal{L}\_2'\right] \text{or } \mathcal{L}\_1 \approx \mathcal{L}\_2$$

In words, L<sup>1</sup> <sup>=</sup>⇒ L<sup>2</sup> means <sup>L</sup><sup>1</sup> has a silent reduction (only <sup>τ</sup> -s) to a term that is weakly bisimilar to L2, or L<sup>1</sup> is already weakly bisimilar to L<sup>2</sup> (without any reductions). Essentially, if <sup>C</sup>(G R) and EC(G R), then relates G to a set of groups S <sup>=</sup> {L | G L} that can roughly be characterised as follows:


The following technical lemma states if a well-formed group of projections G R can weakly g-reduce to some group <sup>L</sup> , then the original global type G can g-reduce to some G , and <sup>L</sup> and the group of projections of G either are weakly bisimilar, or they can weakly reduce to a weakly bisimilar group L.

$$\begin{aligned} \text{Lemma 6. } [[\mathbb{C}(G \upharpoonright R) \text{ and } \mathbb{E}\mathbb{C}(G \upharpoonright R) \text{ and } (G \upharpoonright R) \stackrel{g}{\Longrightarrow} \mathcal{L}'] \\ \text{implies } \left[ [G \xright \xrightarrow{g} G' \text{ and } (G' \upharpoonright R) \stackrel{\star}{\Longrightarrow} \mathcal{L}'' \stackrel{\star}{\hookrightarrow} \mathcal{L}'] \text{ for some } \mathcal{L}'' \right] \end{aligned}$$

Proof. By induction on the rules of =⇒, also using Lem. 3.

The following two lemmas state key properties of : Lem. <sup>7</sup> states preserves termination (as weak termination); Lem. <sup>8</sup> states coinductively preserves reduction (as weak reduction). Together, these lemmas imply ⊆ and -1⊆ , which in turn imply ⊆ ≈.

$$\begin{aligned} \text{Lemma 7. } & \left[ \left[ G \ltimes \mathcal{L} \text{ and } G \downarrow \right] \text{ implies } \mathcal{L} \Downarrow \right] \\ & \text{and } \left[ \left[ G \ltimes \mathcal{L} \text{ and } \mathcal{L} \downarrow \right] \text{ implies } G \Downarrow \right] \end{aligned}$$

Proof. The first conjunct is proven by induction on the rules of =⇒, also using Lem. 1; the second is proven by contradiction (assume not G <sup>↓</sup>; derive false; conclude G <sup>↓</sup>; it implies G ⇓).

$$\begin{array}{c} \textbf{Lemma 8.} \left[ \left[ G \lnot \mathcal{L} \text{ and } G \xrightarrow{g} \mathcal{L}' \right] \text{ implies } \left[ \left[ G' \lnot \mathcal{L}' \text{ and } \mathcal{L} \stackrel{g}{\Longrightarrow} \mathcal{L}' \right] \right] \text{]} \right] \\ \text{and } \left[ \left[ G \lnot \mathcal{L} \text{ and } \mathcal{L} \stackrel{g}{\to} \mathcal{L}' \right] \text{ implies } \left[ \left[ G' \lnot \mathcal{L}' \text{ and } G \xrightarrow{g} G' \right] \right] \text{]} \text{]} \\ \text{and } \left[ \left[ G \lnot \mathcal{L} \text{ and } \mathcal{L} \stackrel{\tau}{\to} \mathcal{L}' \right] \text{ implies } G \lnot \mathcal{L}' \end{array} \right] \text{]}$$

Proof. The first and second conjunct are proven by induction on the rules of =⇒, also using Lemmas 3–4; the third is proven by induction on the rules of =⇒.

Theorem 3. - <sup>C</sup>(G R) and EC(G R) implies G <sup>≈</sup> (G R)

Proof. By coinduction on the rule of (Fig. 8), also using Lemmas 7-8.

A group of local types L enjoys deadlock-freedom if it either has successfully terminated (L ↓; Fig. 5a) or can make another reduction. A group of local types <sup>L</sup> enjoys absence of protocol violations relative to global type G if, coinductively, every non-τ reduction of <sup>L</sup> can be simulated by G (i.e., every communication in the group is "permitted" by G). The following corollary relates Thm. <sup>3</sup> of operational equivalence to these classical MPST properties:

Corollary 1. If global type G is well-formed, then the group of G's projections enjoys deadlock-freedom and absence of protocol violations relative to G.

The key insight to understand this, is that global types are by definition free of deadlocks (they either reduce to 1 , or they never terminate; Fig. 3), while weak bisimilarity preserves deadlock-freedom of global types in their projections (notably, weak bisimilarity is sensitive to termination, and a group of local types terminates only if all individual local types terminate; Fig. 5a). Weak bisimilarity also directly implies freedom of protocol violations.

#### 3.7 Decidability of Checking Well-Formedness

We note our proof of Thm. <sup>3</sup> is non-constructive, in the sense that is infinitely large (i.e., for each group of local types, there exist infinitely many weakly bisimilar groups). The following proposition states this is not a problem in practice.

# Proposition 1. Checking C(L) and EC(L) is decidable.

The rationale behind this proposition is as follows. First, to check C(L) and EC(L), by Thm. <sup>1</sup> and Thm. 2, it suffices to check <sup>C</sup>(L(r)) and EC(L(r)) for each r <sup>∈</sup> dom <sup>L</sup>. For each such local type <sup>L</sup>(r), there are two possibilities.

If local type <sup>L</sup>(r) has finite control, its state space can be exhaustively explored in finite time, so checking <sup>C</sup>(L(r)) and EC(L(r)) is obviously decidable.

In contrast, if <sup>L</sup>(r) has non-finite control, we make two observations. The first observation is that the only possibly source of infinity is the occurrence of recursion variables under parallel composition. The second observation is that <sup>C</sup> and EC are true for <sup>L</sup><sup>1</sup> L<sup>2</sup> if they are true for <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>2</sup> separately; this is because C and EC essentially assert a "diamond structure" on the reductions of <sup>L</sup><sup>1</sup>L<sup>2</sup>, which is precisely the operational semantics of (Fig. 3). Thus, we can check <sup>C</sup>(L<sup>1</sup>L<sup>2</sup>) and EC(L<sup>1</sup>L<sup>2</sup>) by checking <sup>C</sup>(L<sup>1</sup>), <sup>C</sup>(L<sup>2</sup>), EC(L<sup>1</sup>), and EC(L<sup>2</sup>), thereby "avoiding" the possible source of infinity.

We note that splitting the checks for parallel composition in this way not only ensures decidability; it also avoids exponential state explosion (in the number of nested -operators in a single local type) in local types with finite control.

#### 3.8 Discussion of Challenges

Our use of (weak) bisimilarity, plus the key insight to annotate silent actions with additional information to keep track of choices, made the problem of proving the correctness of projection (Thm. 3) feasible. The major technical challenges to achieve this were defining the right bisimulation relation (Sect. 3.5) and discovering corresponding well-formedness conditions (Sect. 3.6).

A naive weak bisimulation relation, <sup>R</sup>naive, relates every global type only with its group of projections. <sup>R</sup>naive is sufficient to prove that every reduction of a global type can be weakly simulated with one non-silent reduction of the group (sender and receiver), followed by a number of silent reductions (idling

Fig. 10: Overview of mpstpp

processes). In contrast, <sup>R</sup>naive is insufficient to prove that every reduction of the group can be simulated by its global type, because of silent actions: if global type <sup>G</sup> is related to group of projections <sup>L</sup> by <sup>R</sup>naive, and a silent action subsequently reduces L to L , the simulation fails, as <sup>R</sup>naive does not relate <sup>G</sup> to <sup>L</sup> .

To alleviate this issue, we defined the bisimulation relation in such a way that it relates every global type G to a group of local types that are not necessarily equal to the projections of G, but every local type can be behind the corresponding projection (the local type can reach the projection with silent actions) or ahead (the projection can reach the local type with silent actions).

# 4 Practical Experience with the Theory

#### 4.1 Implementation

Tool. We implemented a tool, mpstpp, based on the core theoretical contributions of this paper. Fig. 10 shows a high-level overview of the tool, including the main components (boxes) and data flows (arrows).

First, mpstpp parses an input .glob-file to a data structure for a global type G (programmer-friendly Scribble-style syntax [35] is also supported as input). Then, it projects G onto all roles that occur in G. Then, it checks each of the resulting local types for well-formedness, depending on settings, either sequentially or in parallel: a key advantage of the formulation of our well-formedness conditions is that they can be checked modularly for every role in isolation, enabling us to take advantage of modern multicore hardware. Finally, if the local types are well-formed, idling actions are eliminated and typed communication APIs are generated from the local types to enable MPST++-based programming in Java.

Optimisations. Parsing, computing projections, and generating APIs is relatively inexpensive; instead, the run times of our tool are dominated by checks for well-formedness. We therefore implemented several optimisations to make these checks more efficient. Before we present these optimisations, we first note that the complexity of checking well-formedness of a local type L is polynomial in the number of successors that can be reached from L (Fig. 9).

(1) Our first optimisation targets local types with parallel composition; local type <sup>L</sup><sup>1</sup> L<sup>2</sup> is potentially a serious bottleneck, as its number of successors is exponential in the number of nested -operators. Therefore, even with finite state spaces, we check the well-formedness of <sup>L</sup><sup>1</sup> L<sup>2</sup> by checking the well-formedness of <sup>L</sup><sup>1</sup> and <sup>L</sup>2, without explicitly considering the exponentially many successors of <sup>L</sup><sup>1</sup> L2, exploiting the same observation as with decidability (Sect. 3.7).

(2) Our second optimisation concerns computation of weak reductions. In particular, to check whether <sup>C</sup> and EC are true for a local type L, according to their definitions (Fig. 9), we need to iterate over each of their weak reductions. Especially if L has many τ -reductions (Fig. 7), computing the set of weak reductions can be expensive. To avoid this, mpstpp computes sound (but incomplete) approximations of C and EC. We implemented two kinds of approximations: (a) checking versions of C and EC where every occurrence of =⇒ in the definition is replaced with −→, and (b) checking <sup>L</sup> <sup>≈</sup> <sup>L</sup> for every τ -reduction from L to L . Approximation (a) is sound for both C and EC (rationale: if individual reductions can commute, sequences of reductions consisting of those individual reductions can commute as well), but approximation (b) is sound only for C (rationale: auxiliary relation Chain of EC is not preserved by weak bisimilarity). To ensure soundness, thus, mpstpp never uses approximation (b) for EC.

(3) Our third optimisation targets the checks for weak bisimilarity that occur in several places in the definitions of C and EC (Fig. 9). Instead of computing the full reduction relations and run an algorithm to decide their weak bisimilarity (which would be computationally costly), we take advantage of the fact that our language of local types is based on existing algebras (Sect. 3.1) that have sound and complete axiomatisations. Specifically, to check whether two local types are weakly bisimilar, mpstpp applies the axioms as rewrite rules and compares the resulting normal forms for structural equality. To ensure rewriting is fast, we sacrificed completeness (i.e., we use rewriting only to eliminate as many silent actions as possible in a sound way, but for instance, our rewrite procedure cannot prove that (L<sup>1</sup> · <sup>τ</sup> ) + <sup>L</sup><sup>2</sup> and <sup>L</sup><sup>2</sup> <sup>+</sup> <sup>L</sup><sup>1</sup> are weakly bisimilar); however, for the ample examples we tried (including this paper's), this optimisation is highly effective.

Optimisations (2) and (3) are conservative: mpstpp may conclude C or EC is false, even though it is actually true. While this affects completeness, soundness is guaranteed: if mpstpp concludes a local type is well-formed, it really is.

#### 4.2 Evaluation of the Approach

Setup. In the previous section, we formulated and proved the theoretical correctness of our well-formedness conditions (Thm. 3). In this section, we demonstrate the practical usefulness through experimental evaluation in benchmarks. Specifically, we show that checking our well-formedness conditions is faster and more scalable than explicitly checking operational equivalence (which currently seems the only alternative to attain the same level of expressiveness as our work).

In our benchmarks, we compare three approaches to check operational equivalence between a global type and its group of projected local types:

– mpstpp-seq (baseline): In this approach, the mpstpp tool is used to check our well-formedness conditions (which imply operational equivalence; Thm. 3), without using any form of parallel processing.


We identified six example protocols (details below) that can naturally be scaled in the number of roles N (e.g., the number of Clients in the Key-Value Store protocol). Using each of the three approaches, for each of the protocols, for each value of <sup>N</sup> between the minimal number of roles <sup>N</sup>min (e.g, <sup>N</sup>min=2 in the Key-Value Store protocol: the Server and one Client) and 16, we subsequently checked operational equivalence; varying N in this way, yields insights not only in per-case performance, but also scalability. To get statistically reliable results [31], we repeated executions as many times as was necessary until the 95% confidence interval was within 5% of our reported means (i.e., there is a 95% probability that the true mean is within 5% of our reported means).

We ran our benchmarks on a machine with an Intel Xeon 6130 processor (16 cores; no hyper-threading), using Debian 9, Java 13, and mCRL2 201908.0.

Translation to mCRL2. In the explicit approach, we use mCRL2 [10,20,29] to explicitly check if global type G and its group of projections <sup>L</sup> are operationally equivalent. Our choice for mCRL2 is motivated by the fact our languages of global and local types are based on the same process algebra as mCRL2's specification language, so their translation to mCRL2 specifications is direct and straightforward. Moreover, mCRL2 is mature (e.g., used in industry [5]), and it uses optimised, state-of-the-art algorithms to check behavioural equivalences (e.g., [28]), so we are comparing our tool with a serious competitor.

First, we translate global type <sup>G</sup> to mCRL2 specification G. Then, we use mCRL2 tools mcrl22lps and lps2lts to normalize G to a linear process specification (LPS) and generate a corresponding labelled transition system (LTS). Because of the directness of the translation, the transition labels in the resulting LTS are all global communication actions of the form r<sup>1</sup><sup>r</sup><sup>2</sup> :U.

Second, we translate group of projections <sup>L</sup>, consisting of roles <sup>r</sup><sup>1</sup>, ..., rn, to mCRL2 specification L. It looks as follows (in formal mCRL2 notation [29]):

$$\begin{split} & \nabla\_{\left\{ r\_{i} \to r\_{j} : U \mid 1 \le i, j \le n, i \ne j, U \in \mathbb{U} \right\}} \\ & \Gamma\_{\left\{ \left( r\_{i} r\_{j} \, ! \, U \sqcup r\_{i} r\_{j} \, ? \right) \to \left( r\_{i} \rightharpoonup r\_{j} : U \right) \mid 1 \le i, j \le n, i \ne j, U \in \mathbb{U} \right\}} \left( \left[ \mathcal{L}(r\_{1}) \right] \parallel \dots \parallel \left[ \left[ \mathcal{L}(r\_{n}) \right] \right] \right) \end{split}$$

where each L(ri) is a direct translation of local type <sup>L</sup>(ri) to an mCRL2 specification; is a form of parallel composition that prescribes both interleaving and synchronisation of operand actions; is synchronous composition of actions; Γ is the communication operator that replaces synchronised local send/receive actions <sup>r</sup>irj !<sup>U</sup> <sup>r</sup>irj ?<sup>U</sup> with global communication action <sup>r</sup>i<sup>r</sup>j :U; and <sup>∇</sup> is the allow operator that allows only global communication actions to be executed (i.e., unsynchronized, individual send/receive actions cannot be executed).

When translating a local type <sup>L</sup>(ri) to an mCRL2 specification L(ri), to make mCRL2's subsequent verification easier, we already eliminate as many idling actions ε<sup>r</sup> <sup>r</sup>1r<sup>2</sup> as possible (modulo branching bisimulation); those that remain are represented as a general τ action, because mCRL2 does not need the additional information provided by ε<sup>r</sup> <sup>r</sup>1r<sup>2</sup> . Then, we use mcrl22lps and lps2lts to generate an LPS and LTS for L.

Third, we use mCRL2 tool ltscompare to check if the LTS for G is weakly bisimilar to the LTS for L. We note that normalisation to an LPS using mcrl22lps is a requirement to use ltscompare.


The table on the right summarises the features used in each of these protocols.

For each <sup>1</sup>≤n≤15, we instantiated the Key-Value Store, Load Balancer, Work Stealing, and Map/Reduce protocols with <sup>1</sup> Server/Master + n Clients/Workers.


For each <sup>2</sup>≤n≤16, we instantiated the Peer-to-Peer protocol with n Peers. For

Fig. 11: Speedups (y-axis; y>1E+0 means faster, y<1E+0 means slower) of explicit relative to mpstpp-seq as the number of roles increases (x-axis)

each <sup>2</sup>≤n≤7, we instantiated the Pub/Sub protocol with <sup>1</sup> Publisher and n Subscribers; we did not instantiate the Pub/Sub protocol with n><sup>7</sup> Subscribers, as the resulting global types are too large (their size grows exponentially in n).

Benchmark results. Figures 11–12 shows the results of our benchmarks. The x-axis indicates the number of roles; the y-axis indicates relative speed-ups. The baselines are at y=1E+0 and y=1: above it, a competing approach is faster than mpstpp-seq; below it, it is slower. We draw two conclusions.

(1) For each protocol and number of roles, mpstpp-seq outperforms explicit. In the cases of Key-Value Store and Load Balancer, explicit grows towards mpstpp-seq, but the growth levels off as the number of roles increases, while explicit is still about two order of magnitude slower than mpstpp-seq in the best of circumstances. In the cases of Work Stealing, Peer-to-Peer, and Pub/Sub, the LTSs generated from the translated mCRL2 specifications were too large to be compared (i.e., ltscompare produced an error) beyond 7, 5, and 5 roles; this was no issue for mpstpp-seq. In the case of Map/Reduce, the LTSs were small enough to compare using mCRL2's ltscompare, but after an initial upwards slope for <sup>2</sup>≤N≤<sup>7</sup> roles, explicit starts to perform progressively worse.

(2) Especially for larger numbers of roles, parallelisation can yield serious performance improvements. In the cases of Key-Value Store and Load Balancer, mpstpp-par outperforms mpstpp-seq only with 14–16 roles; for smaller numbers of roles, parallel execution is slower. In the worst case (Load Balancer, 2 roles), the slowdown is roughly <sup>10</sup>.9μs <sup>3</sup>.2μs = 3.4; we hypothesise that be-

Fig. 12: Speedups (y-axis; y><sup>1</sup> means faster, y<<sup>1</sup> means slower) of mpstpp-par relative to mpstpp-seq as the number of roles increases (x-axis)

cause of the low absolute execution times, the cost of spawning and synchronising threads outweighs their benefit. However, the ascending gradient indicates that as the number of roles increases, relatively more of the total work can be parallelised, yielding progressive rewards. In the cases of Work Stealing, Map/Reduce, Peer-to-Peer, and Pub/Sub, similar trends can be observed, except y=1 is crossed sonner. The absolute execution times for these protocols and for small numbers of roles are higher than for Key-Value Store and Load Balancer.

# 5 Related Work

Multiparty compatibility. Closest to this paper is existing literature on multiparty compatibility [6,24,40,42]. The key idea, initially developed by Deniélou and Yoshida for the original MPST [23,24], is to represent (groups of) local types operationally as (systems of) communicating finite state machines (CFSM) [8]. A CFSM M is a state machine where transitions are labelled with sends/receives; a system of CFSMs S is a parallel composition where CFSMs communicate through asynchronous buffers. Multiparty compatibility, then, is a condition on the reachable states and transitions of a system <sup>S</sup> = (M<sup>1</sup>, ..., Mn): if it is satisfied by S, the system is guaranteed to be safe (no deadlocks; no unmatched sends/receives) and live (<sup>S</sup> terminates, assuming at least one <sup>M</sup>i can terminate). Multiparty compatibility is a sufficient condition to guarantee safety and liveness, but not necessary: there exist safe/live systems that are not multiparty compatible. Therefore, several generalisations have been proposed to cover timed behaviour [6], undirected choice [40], and non-synchronisability [42].

The main similarities between our method in this paper and the multiparty compatibility approach are: (1) we also use an operational interpretation of local types; (2) we guarantee similar liveness/safety properties; (3) and we also neatly factor out the act of checking conformance of processes to local types (resp. CF-SMs). In contrast, we support a wider range of behaviours. Moreover, from a practical/computational perspective, multiparty compatibility is a global condition that needs to be checked on the whole state space of a system (i.e., parallel composition of the CFSMs), prone to exponential blow-up; our well-formedness conditions, in contrast, are completely local and require only polynomial time to check. The reason we do not require CFSM-like machinery in this paper is that our operational correspondence (weak bisimilarity) is sensitive to termination: notably, in Fig. 5a, a group of local types terminates iff every individual local type terminates (for multiparty compatibility, proofs are done modulo trace equivalence [24], which cannot distinguish between successful/abnormal termination and is therefore in itself too weak to show deadlock-freedom).

Expressiveness of MPST. In the original MPST theory [33], and many of its descendants (e.g., [14,19,22,24,25,43]), the restrictions on choices are enforced through a combination of syntax and additional well-formedness conditions. Notably, in these works, communications in global types are specified as r1<sup>r</sup><sup>2</sup> :{i · <sup>G</sup>i}i∈I , so syntactically, it is impossible to specify choices among senders or receivers. There exist also papers where a seemingly more general binary +-like operator is introduced, particularly those that support choices among receivers [16,23,36,40], but the well-formedness conditions still basically restrict the use of <sup>+</sup> in these works to r1<sup>r</sup><sup>2</sup> :{i · <sup>G</sup>i}i∈I or <sup>r</sup>-{ri :i · <sup>G</sup>i}i∈I .

This is the first paper where well-formedness conditions do not force the use of + into one of those two restricted forms. Moreover, our well-formedness conditions are compatible with unbounded interleaving (recursion under parallel), beyond similar operators in previous work [16,22,23,43]. An alternative approach is to completely omit statically checked well-formedness conditions (and projection), and to only dynamically verify communication actions against global types through monitoring, as recently proposed [30]. The language of global types in that paper is more expressive than ours in this paper, but all verification happens at run-time, whereas we provide correctness guarantees already at compile-time.

Session types and model checking. Recently, there has been growing interest in using model checking to verify properties of (multiparty) session types, similar to our use of mCRL2 as an alternative to checking well-formedness (Sect. 4.2). Lange et al. [39] infer behavioural types from Go programs and use mCRL2 to verify the inferred types, to establish safety properties (combined with another tool, KITTeL [26], to establish liveness). Hu and Yoshida [36] use a custom model checker to verify safety and progress properties of local types (represented as CFSMs) as part of API generation in the Scribble toolchain for MPST [35].

Closest to our use of mCRL2 is the work of Scalas et al. [52,53], where mCRL2 is used to verify properties of local types (e.g., deadlock-freedom), while a form of dependent type-checking is used to verify conformance of processes against those types (i.e., actors in Scala); no global types and projection are used, though (programmers write local types manually). The idea is that properties model-checked on the types carry over to the processes. Similarly, Scalas and Yoshida [51] use mCRL2 to model-check session environments, as a more expressive alternative to the classical consistency condition needed to prove subject reduction. Note that [51, Theorem 5.15] shows that, in the case that a set of processes is typable by a single multiparty session (i.e. a single global type), type-level properties including safety, deadlock-freedom and liveness guarantee the same properties for multiparty session π-processes. Hence our type-level analysis is directly usable to provide decidable procedures to verify session π-calculi with extended expressiveness [51, Theorem 7.2].

# 6 Conclusion

A key open problems with multiparty session types (MPST) concerns expressiveness: none of the previous languages of global and local types supports arbitrary choice (e.g., choices between different senders), existential quantification over roles, and unbounded interleaving of subprotocols (in the same session). In this paper, we presented the first theory that supports these features. Our main theoretical result is operational equivalence under weak bisimilarity: this guarantees classical MPST properties for groups of local types projected from a global type, namely freedom of deadlocks and absence of protocol violations. Our main practical result is that our well-formedness conditions, which guarantee operational equivalence, can be checked orders of magnitude faster than directly checking weak bisimilarity, which is demonstrated by our benchmark results.

We identify several interesting avenues for future work. First, it is useful to extend our theory with parametrisation along the lines of Castro et al. [18] (which currently works only for restrictive choices); their proof technique for correctness seems to offer substantial synergy with our bisimilarity-based approach in this paper. Second, we aim to investigate extensions of our theory with subtyping (e.g., in terms of weak similarity). Notably, while asynchronous communication can be encoded in our current theory, asynchronous subtyping is known to be undecidable [9,41], so the connection between the two is interesting to explore.

Acknowledgments. Funded by the Netherlands Organisation of Scientific Research (NWO): 016.Veni.192.103. This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. Supported by EP-SRC projects EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1, EP/N028201/1, EP/T006544/1.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verifying Visibility-Based Weak Consistency**

Siddharth Krishna1, Michael Emmi2, Constantin Enea3, and Dejan Jovanović<sup>2</sup>

<sup>1</sup> New York University, New York, NY, USA, siddharth@cs.nyu.edu

<sup>2</sup> SRI International, New York, NY, USA, michael.emmi@gmail.com,

dejan.jovanovic@sri.com

<sup>3</sup> Université de Paris, IRIF, CNRS, F-75013 Paris, France, cenea@irif.fr

**Abstract.** Multithreaded programs generally leverage efcient and thread-safe *concurrent objects* like sets, key-value maps, and queues. While some concurrentobject operations are designed to behave atomically, each witnessing the atomic efects of predecessors in a linearization order, others forego such strong consistency to avoid complex control and synchronization bottlenecks. For example, contains (value) methods of key-value maps may iterate through key-value entries without blocking concurrent updates, to avoid unwanted performance bottlenecks, and consequently overlook the efects of some linearization-order predecessors. While such *weakly-consistent* operations may not be atomic, they still ofer guarantees, e.g., only observing values that have been present.

In this work we develop a methodology for proving that concurrent object implementations adhere to weak-consistency specifcations. In particular, we consider (forward) simulation-based proofs of implementations against *relaxedvisibility specifcations*, which allow designated operations to overlook some of their linearization-order predecessors, i.e., behaving as if they never occurred. Besides annotating implementation code to identify *linearization points*, i.e., points at which operations' logical efects occur, we also annotate code to identify *visible operations*, i.e., operations whose efects are observed; in practice this annotation can be done automatically by tracking the writers to each accessed memory location. We formalize our methodology over a general notion of transition systems, agnostic to any particular programming language or memory model, and demonstrate its application, using automated theorem provers, by verifying models of Java concurrent object implementations.

# **1 Introduction**

Programming efcient multithreaded programs generally involves carefully organizing shared memory accesses to facilitate inter-thread communication while avoiding synchronization bottlenecks. Modern software platforms like Java include reusable abstractions which encapsulate low-level shared memory accesses and synchronization into familiar high-level abstract data types (ADTs). These so-called *concurrent objects* typically include mutual-exclusion primitives like locks, numeric data types like atomic integers, as well as collections like sets, key-value maps, and queues; Java's standardedition platform contains many implementations of each. Such objects typically provide strong consistency guarantees like *linearizability* [18], ensuring that each operation appears to happen atomically, witnessing the atomic efects of predecessors according to some linearization order among concurrently-executing operations.

While such strong consistency guarantees are ideal for logical reasoning about programs which use concurrent objects, these guarantees are too strong for many operations, since they preclude simple and/or efcient implementation — over half of Java's concurrent collection methods forego atomicity for *weak-consistency* [13]. On the one hand, basic operations like the get and put methods of key-value maps typically admit relatively-simple atomic implementations, since their behaviors essentially depend upon individual memory cells, e.g., where the relevant key-value mapping is stored. On the other hand, making aggregate operations like size and contains (value) atomic would impose synchronization bottlenecks, or otherwise-complex control structures, since their atomic behavior depends simultaneously upon the values stored across many memory cells. Interestingly, such implementations are not linearizable even when their underlying memory operations are sequentially consistent, e.g., as is the case with Java 8's concurrent collections, whose memory accesses are data-race free.4

For instance, the contains (value) method of Java's concurrent hash map iterates through key-value entries without blocking concurrent updates in order to avoid unreasonable performance bottlenecks. Consequently, in a given execution, a containsvalue-<sup>v</sup> operation <sup>o</sup><sup>1</sup> will overlook operation <sup>o</sup>2's concurrent insertion of <sup>k</sup><sup>1</sup> -<sup>→</sup> v for a key <sup>k</sup><sup>1</sup> it has already traversed. This oversight makes it possible for <sup>o</sup><sup>1</sup> to conclude that value <sup>v</sup> is not present, and can only be explained by <sup>o</sup><sup>1</sup> being linearized before <sup>o</sup>2. In the case that operation <sup>o</sup><sup>3</sup> removes <sup>k</sup><sup>2</sup> -<sup>→</sup> <sup>v</sup> concurrently before <sup>o</sup><sup>1</sup> reaches key <sup>k</sup>2, but only after <sup>o</sup><sup>2</sup> completes, then atomicity is violated since in every possible linearization, either mapping <sup>k</sup><sup>2</sup> -<sup>→</sup> <sup>v</sup> or <sup>k</sup><sup>1</sup> -<sup>→</sup> v is always present. Nevertheless, such weakly-consistent operations still ofer guarantees, e.g., that values never present are never observed, and initially-present values not removed are observed.

In this work we develop a methodology for proving that concurrent-object implementations adhere to the guarantees prescribed by their weak-consistency specifcations. The key salient aspects of our approach are the lifting of existing sequential ADT specifcations via *visibility relaxation* [13], and the harnessing of simple and mechanizable reasoning based on *forward simulation* [25] by relaxed-visibility ADTs. Efectively, our methodology extends the predominant forward-simulation based linearizabilityproof methodology to concurrent objects with weakly-consistent operations, and enables automation for proving weak-consistency guarantees.

To enable the harnessing of existing sequential ADT specifcations, we adopt the recent methodology of *visibility relaxation* [13]. As in linearizability [18], the return value of each operation is dictated by the atomic efects of its predecessors in some (i.e., existentially quantifed) linearization order. To allow consistency weakening, operations are allowed, to a certain extent, to overlook some of their linearization-order predecessors, behaving as if they had not occurred. Intuitively, this (also existentially quantifed) *visibility* captures the inability or unwillingness to atomically observe the values stored across many memory cells. To provide guarantees, the extent of

<sup>4</sup> Java 8 implementations guarantee data-race freedom by accessing individual shared-memory cells with atomic operations via volatile variables and compare-and-swap instructions. Starting with Java 9, the implementations of the concurrent collections use the VarHandle mechanism to specify shared variable access modes. Java's ofcial language and API specifcations do not clarify whether these relaxations introduce data races.

visibility relaxation is bounded to varying degrees. Notably, the visibility of an *absolute* operation must include all of its linearization-order predecessors, while the visibility of a *monotonic* operation must include all happens-before predecessors, along with all operations visible to them. The majority of Java's concurrent collection methods are absolute or monotonic [13]. For instance, in the contains-value example described above, by considering that operation <sup>o</sup><sup>2</sup> is not visible to <sup>o</sup>1, the conclusion that <sup>v</sup> is not present can be justifed by the linearization <sup>o</sup>2; <sup>o</sup>3; <sup>o</sup><sup>1</sup>, in which <sup>o</sup><sup>1</sup> sees <sup>o</sup><sup>3</sup>'s removal of <sup>k</sup><sup>2</sup> -<sup>→</sup> <sup>v</sup> yet not <sup>o</sup><sup>2</sup>'s insertion of <sup>k</sup><sup>1</sup> -<sup>→</sup> v. Ascribing the monotonic visibility to the contains-value method amounts to a guarantee that initially-present values are observed unless removed (i.e., concurrently).

While relaxed-visibility specifcations provide a means to describing the guarantees provided by weakly-consistent concurrent-object operations, systematically establishing implementations' adherence requires a strategy for demonstrating *simulation* [25], i.e., that each step of the implementation is simulated by some step of (an operational representation of) the specifcation. The crux of our contribution is thus threefold: frst, to identify the relevant specifcation-level actions with which to relate implementation-level transitions; second, to identify implementation-level annotations relating transitions to specifcation-level actions; and third, to develop strategies for devising such annotations systematically. For instance, the existing methodology based on *linearization points* [18] essentially amounts to annotating implementation-level transitions with the points at which its specifcation-level action, i.e., its atomic efect, occurs. Relaxed-visibility specifcations require not only a witness for the existentiallyquantifed linearization order, but also an existentially-quantifed visibility relation, and thus requires a second kind of annotation to resolve operations' visibilities. We propose a notion of *visibility actions* which enable operations to declare their visibility of others, e.g., specifying the writers of memory cells it has read.

The remainder of our approach amounts to devising a systematic means for constructing simulation proofs to enable automated verifcation. Essentially, we identify a strategy for systematically annotating implementations with visibility actions, given linearization-point annotations and visibility bounds (i.e., absolute or monotonic), and then encode the corresponding simulation check using an of-the-shelf verifcation tool. For the latter, we leverage civl [16], a language and verifer for Owicki-Gries style modular proofs of concurrent programs with arbitrarily-many threads. In principle, since our approach reduces simulation to safety verifcation, any safety verifer could be used, though civl facilitates reasoning for multithreaded programs by capturing interference at arbitrary program points. Using civl, we have verifed monotonicity of the contains-value and size methods of Java's concurrent hash-map and concurrent linked-queue, respectively — and absolute consistency of add and remove operations. Although our models are written in civl and assume sequentially-consistent memory accesses, they capture the difcult aspects of weak-consistency in Java, including heapbased memory access; furthermore, our models are also sound with respect to Java 8's memory model, since their Java 8 implementations guarantee data-race freedom.

In summary, we present the frst methodology for verifying weakly-consistent operations using sequential specifcations and forward simulation. Contributions include:


Aside from the outline above, this article summarizes an existing weak-consistency specifcation methodology via visibility relaxation (§2), summarizes related work (§6), and concludes (§7). Proofs of all theorems and lemmas are listed in Appendix A.

# **2 Weak Consistency**

Our methodology for verifying weakly-consistent concurrent objects relies both on the precise characterization of weak consistency specifcations, as well as a proof technique for establishing adherence to specifcations. In this section we recall and outline a characterization called *visibility relaxation* [13], an extension of sequential abstract data type (ADT) specifcations in which the return values of some operations may not refect the efects of previously-efectuated operations.

Notationally, in the remainder of this article, ε denotes the empty sequence, <sup>∅</sup> denotes the empty set, \_ denotes an unused binding, and and ⊥ denote the Boolean values true and false, respectively. We write R(x) to denote the inclusion x <sup>∈</sup> R of a tuple <sup>x</sup> in the relation <sup>R</sup>; and <sup>R</sup>[<sup>x</sup> -<sup>→</sup> y] to denote the extension R ∪ {xy} of R to include xy; and R <sup>|</sup> X to denote the projection R <sup>∩</sup> X<sup>∗</sup> of R to set X; and R to denote the complement {<sup>x</sup> : x /<sup>∈</sup> <sup>R</sup>} of <sup>R</sup>; and <sup>R</sup>(x) to denote the image {<sup>y</sup> : xy <sup>∈</sup> <sup>R</sup>} of <sup>R</sup> on x; and R<sup>−</sup>1(y) to denote the pre-image {x : xy <sup>∈</sup> R} of R on y; whether R(x) refers to inclusion or an image will be clear from its context. Finally, we write <sup>x</sup><sup>i</sup> to refer to the <sup>i</sup>th element of tuple <sup>x</sup> <sup>=</sup> <sup>x</sup>0x<sup>1</sup> ....

#### **2.1 Weak-Visibility Specifcations**

For a general notion of ADT specifcations, we consider fxed sets M and X of method names and argument or return values, respectively. An *operation label* λ <sup>=</sup> m, x, y is a method name m <sup>∈</sup> <sup>M</sup> along with argument and return values x, y <sup>∈</sup> <sup>X</sup>. A *readonly predicate* is a unary relation <sup>R</sup>(λ) on operation labels, an *operation sequence* <sup>s</sup> <sup>=</sup> <sup>λ</sup><sup>0</sup>λ<sup>1</sup> ... is a sequence of operation labels, and a *sequential specifcation* <sup>S</sup> <sup>=</sup> {s<sup>0</sup>, s<sup>1</sup>,...} is a set of operation sequences. We say that R is *compatible* with S when S is closed under deletion of read-only operations, i.e., <sup>λ</sup><sup>0</sup> ...λ<sup>j</sup>−<sup>1</sup>λ<sup>j</sup>+1 ...λ<sup>i</sup> <sup>∈</sup> <sup>S</sup> when <sup>λ</sup><sup>0</sup> ...λ<sup>i</sup> <sup>∈</sup> <sup>S</sup> and <sup>R</sup>(λ<sup>j</sup> ).

*Example 1.* The *key-value map* ADT sequential specifcation <sup>S</sup><sup>m</sup> is the prefx-closed set containing all sequences <sup>λ</sup><sup>0</sup> ...λ<sup>i</sup> such that <sup>λ</sup><sup>i</sup> is either:


**–** has, v, b, and b <sup>=</sup> iff no prior put, kv , \_ nor rem, k, \_ follows some prior put, kv, \_.

The read-only predicate <sup>R</sup><sup>m</sup> holds for the following cases:

Rm( put, \_, b) if <sup>¬</sup>b Rm( rem, \_, b) if <sup>¬</sup>b Rm( get, \_, \_) Rm( has, \_, \_).

This is a simplifcation of Java's Map ADT, i.e., with fewer methods.<sup>5</sup>

To derive weak specifcations from sequential ones, we consider a set V of exactly two *visibility* labels from prior work [13]: *absolute* and *monotonic*. <sup>6</sup> A *visibility annotation* V : <sup>M</sup> <sup>→</sup> <sup>V</sup> maps each method m <sup>∈</sup> <sup>M</sup> to a visibility V (m) <sup>∈</sup> <sup>V</sup>.

Intuitively, absolute visibility requires operations to observe the efects of all of their linearization-order predecessors. The weaker monotonic visibility requires operations to observe the efects of all their happens-before (i.e., program- and synchronizationorder) predecessors, along with the efects already observed by those predecessors, i.e., so that sets of visible efects are monotonically increasing over happens-before chains of operations; conversely, operations may ignore efects which have been ignored by their happens-before predecessors, so long as those efects are not transitively related by program and synchronization order.

**Defnition 1.** *<sup>A</sup>* weak-visibility specifcation W <sup>=</sup> S, R, V *is a sequential specifcation* S *with a compatible read-only predicate* R *and a visibility annotation* V *.*

*Example 2.* The *weakly-consistent contains-value map* <sup>W</sup><sup>m</sup> <sup>=</sup> Sm, Rm, V<sup>m</sup> annotates the key-value map ADT methods of <sup>S</sup><sup>m</sup> from Example 1 with:

<sup>V</sup>m(put) = Vm(rem) = Vm(get) = absolute, Vm(has) = monotonic.

Java's concurrent hash map appears to be consistent with this specifcation [13].

We ascribe semantics to specifcations by characterizing the values returned by concurrent method invocations, given constraints on invocation order. In practice, the *happens-before* order among invocations is determined by a *program order*, i.e., among invocations of the same thread, and a *synchronization order*, i.e., among invocations of distinct threads accessing the same atomic objects, e.g., locks. A *history* <sup>h</sup> <sup>=</sup> O, *inv*, *ret*, *hb* is a set <sup>O</sup> <sup>⊆</sup> <sup>N</sup> of numeric operation identifers, along with an invocation function *inv* : O <sup>→</sup> <sup>M</sup> <sup>×</sup> <sup>X</sup> mapping operation identifers to method names and argument values, a partial return function *ret* : O <sup>X</sup> mapping operation identifers to return values, and a (strict) partial happens-before relation *hb* <sup>⊆</sup> O <sup>×</sup> O; the *empty history* <sup>h</sup><sup>∅</sup> has <sup>O</sup> <sup>=</sup> *inv* <sup>=</sup> *ret* <sup>=</sup> *hb* <sup>=</sup> <sup>∅</sup>. An operation <sup>o</sup> <sup>∈</sup> <sup>O</sup> is *complete* when *ret*(o) is defned, and is otherwise *incomplete*; then h is *complete* when each operation is. The *label* of a complete operation o with *inv*(o) = m, x and *ret*(o) = y is m, x, y.

To relate operations' return values in a given history back to sequential specifcations, we consider certain sequencings of those operations. A *linearization* of a history h <sup>=</sup> O, \_, \_, *hb* is a total order *lin* <sup>⊇</sup> *hb* over O which includes *hb*, and a *visibility*

<sup>5</sup> For brevity, we abbreviate Java's remove and contains-value methods by rem and has.

<sup>6</sup> Previous work refers to absolute visibility as *complete*, and includes additional visibility labels.

*projection vis* of *lin* maps each operation o <sup>∈</sup> O to a subset *vis*(o) <sup>⊆</sup> *lin*−1(o) of the operations preceding <sup>o</sup> in *lin*; note that <sup>o</sup>1, o2<sup>∈</sup> *vis* means <sup>o</sup><sup>1</sup> observes <sup>o</sup><sup>2</sup>. For a given read-only predicate R, we say o's visibility is *monotonic* when it includes every happens-before predecessor, and operation visible to a happens-before predecessor, which is not read-only,<sup>7</sup> i.e., *vis*(o) <sup>⊇</sup> - *hb*−1(o) <sup>∪</sup> *vis*(*hb*−1(o)) <sup>|</sup> R. We says o's visibility is *absolute* when *vis*(o) = *lin*−1(o), and *vis* is itself *absolute* when each *vis*(o) is. An *abstract execution* e <sup>=</sup> h, *lin*, *vis* is a history h along with a linearization of h, and a visibility projection *vis* of *lin*. An abstract execution is *sequential* when *hb* is total, *complete* when h is, and *absolute* when *vis* is.

*Example 3.* An abstract execution can be defned using the linearization8

 put, <sup>1</sup>, <sup>1</sup>, get, <sup>1</sup>, <sup>1</sup> put, <sup>0</sup>, <sup>1</sup>, put, <sup>1</sup>, <sup>0</sup>, <sup>⊥</sup>has, <sup>1</sup>, <sup>⊥</sup>

along with a happens-before order that, compared to the linearization order, keeps has, <sup>1</sup>, <sup>⊥</sup> unordered w.r.t. put, <sup>0</sup>, <sup>1</sup>, and put, <sup>1</sup>, <sup>0</sup>, <sup>⊥</sup>, and a visibility projection where the visibility of every put and get includes all the linearization predecessors and the visibility of has, <sup>1</sup>, <sup>⊥</sup> consists of put, <sup>1</sup>, <sup>1</sup>, and put, <sup>1</sup>, <sup>0</sup>, <sup>⊥</sup>. Recall that in the argument k, v to put operations, the key k precedes value v.

To determine the consistency of individual histories against weak-visibility specifcations, we consider adherence of their corresponding abstract executions. Let h <sup>=</sup> O, *inv*, *ret*, *hb* be a history and e <sup>=</sup> h, *lin*, *vis* a complete abstract execution. Then e is *consistent* with a visibility annotation V and read-only predicate R if for each operation o <sup>∈</sup> dom(*lin*) with *inv*(o) = m, \_, *vis*(o) is absolute or monotonic, respectively, according to <sup>V</sup> (m) and <sup>R</sup>. The *labeling* <sup>λ</sup>0λ<sup>1</sup> ... of a total order <sup>o</sup><sup>0</sup> <sup>≺</sup> <sup>o</sup><sup>1</sup> <sup>≺</sup> ... of complete operations is the sequence of operation labels, i.e., <sup>λ</sup><sup>i</sup> is the label of o<sup>i</sup>. Then e is *consistent* with a sequential specifcation S when the labeling9 of *lin* <sup>|</sup> (*vis*(o) ∪ {o}) is included in S, for each operation o <sup>∈</sup> dom(*lin*). <sup>10</sup> Finally, we say e is *consistent* with a weak-visibility specifcation S, R, V when it is consistent with S, R, and V .

*Example 4.* The execution in Example 3 is consistent with the weakly-consistent contains-value map <sup>W</sup><sup>m</sup> defned in Example 2.

*Remark 1.* Consistency models suited for modern software platforms like Java are based on *happens-before* relations which abstract away from *real-time* execution order. Since happens-before, unlike real-time, is not necessarily an *interval order*, the composition

<sup>7</sup> For convenience we rephrase Emmi and Enea [13]'s notion to ignore read-only predecessors.

<sup>8</sup> For readability, we list linearization sequences with operation labels in place of identifers.

<sup>9</sup> As is standard, adequate labelings of incomplete executions are obtained by completing each linearized yet pending operation with some arbitrarily-chosen return value [18]. It is sufcient that one of these completions be included in the sequential specifcation.

<sup>10</sup> We consider a simplifcation from prior work [13]: rather than allowing the observers of a given operation to pretend they see distinct return values, we suppose that all observers agree on return values. While this is more restrictive in principle, it is equivalent for the simple specifcations studied in this article.

of linearizations of two distinct objects in the same execution may be cyclic, i.e., not linearizable. Recovering compositionality in this setting is orthogonal to our work of proving consistency against a given model, and is explored elsewhere [11].

The *abstract executions* E(W) of a weak-visibility specifcation W <sup>=</sup> S, R, V include those complete, sequential, and absolute abstract executions derived from sequences of <sup>S</sup>, i.e., when <sup>s</sup> <sup>=</sup> <sup>λ</sup><sup>0</sup> ...λ<sup>n</sup> <sup>∈</sup> <sup>S</sup> then each <sup>e</sup><sup>s</sup> labels each <sup>o</sup><sup>i</sup> by <sup>λ</sup><sup>i</sup>, and orders *hb*(oi, o<sup>j</sup> ) iff i<j. In addition, when <sup>E</sup>(W) includes an abstract execution h, *lin*, *vis* with h <sup>=</sup> O, *inv*, *ret*, *hb*, then E(W) also includes any:


Note that while *happens-before weakening hb* <sup>⊆</sup> *hb* always yields consistent executions, unguarded *visibility weakening vis* <sup>⊆</sup> *vis* generally breaks consistency with visibility annotations and sequential specifcations: visibilities can become non-monotonic, and return values can change when operations observe fewer operations' efects.

**Lemma 1.** *The abstract executions* E(W) *of a specifcation* W *are consistent with* W*.*

*Example 5.* The abstract executions of <sup>W</sup><sup>m</sup> include the complete, sequential, and absolute abstract execution defned by the following happens-before order

> put, <sup>1</sup>, <sup>1</sup>, get, <sup>1</sup>, <sup>1</sup> put, <sup>0</sup>, <sup>1</sup>, put, <sup>1</sup>, <sup>0</sup>, <sup>⊥</sup>has, <sup>1</sup>,

which implies that it also includes one in which just the happens-before order is modifed such that has, <sup>1</sup>, becomes unordered w.r.t. put, <sup>0</sup>, <sup>1</sup>, and put, <sup>1</sup>, <sup>0</sup>, <sup>⊥</sup>. Since it includes the latter, it also includes the execution in Example 3 where the visibility of has is weakened which also modifes its return value from to ⊥.

**Defnition 2.** *The* histories *of a weak-visibility specifcation* <sup>W</sup> *are the projections* H(W) = {h : h, *\_*, *\_*<sup>∈</sup> E(W)} *of its abstract executions.*

### **2.2 Consistency against Weak-Visibility Specifcations**

To defne the consistency of implementations against specifcations, we leverage a general model of computation to capture the behavior of typical concurrent systems, e.g., including multiprocess and multithreaded systems. A *sequence-labeled transition system* Q, A, q, <sup>→</sup> is a set Q of states, along with a set A of actions, initial state q <sup>∈</sup> Q and transition relation → ∈ Q <sup>×</sup> A<sup>∗</sup> <sup>×</sup> Q. An *execution* is an alternating sequence <sup>η</sup> <sup>=</sup> <sup>q</sup><sup>0</sup>a<sup>0</sup>q<sup>1</sup>a<sup>1</sup> ...q<sup>n</sup> of states and action sequences starting with <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup> such that qi <sup>a</sup><sup>i</sup> −→ <sup>q</sup><sup>i</sup>+1 for each <sup>0</sup> <sup>≤</sup> i<n. The *trace* <sup>τ</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> of the execution <sup>η</sup> is its projection a<sup>0</sup>a<sup>1</sup> ... to individual actions.

To capture the histories admitted by a given implementation, we consider sequencelabeled transition systems (SLTSs) which expose actions corresponding to method call, return, and happens-before constraints. We refer to the actions call(o, m, x), ret(o, y), and hb(o, o ), for o, o <sup>∈</sup> <sup>N</sup>, m <sup>∈</sup> <sup>M</sup>, and x, y <sup>∈</sup> <sup>X</sup>, as *the history actions*, and a *history transition system* is an SLTS whose actions include the history actions. We say that an

action over operation identifer o is an o*-action*, and assume that executions are *well formed* in the sense that for a given operation identifer o: at most one call o-action occurs, at most one ret o-action occurs, and no ret nor hb o-actions occur prior to a call o-action. Furthermore, we assume call o-actions are enabled, so long as no prior call o-action has occurred. The *history* of a trace τ is defned inductively by fh(h∅, τ ), where <sup>h</sup><sup>∅</sup> is the empty history, and,

$$\begin{array}{lll} f\_{\mathbf{h}}(h,\varepsilon) &= h & g\_{\mathbf{h}}(h,\operatorname{call}(o,m,x)) = \langle O \cup \{o\}, inv[o \mapsto \langle m,x\rangle], ret, hb\rangle \\ f\_{\mathbf{h}}(h,a\tau) &= f\_{\mathbf{h}}(g\_{\mathbf{h}}(h,a), \tau) & g\_{\mathbf{h}}(h,\operatorname{ret}(o,y)) &= \langle O, inv, ret[o \mapsto y], hb\rangle \\ f\_{\mathbf{h}}(h,\tilde{a}\tau) &= f\_{\mathbf{h}}(h,\tau) & g\_{\mathbf{h}}(h,\operatorname{hb}(o,o')) &= \langle O, inv, ret, hb \cup \langle o, o'\rangle\rangle \end{array}$$

where h <sup>=</sup> O, *inv*, *ret*, *hb*, and a is a call, ret, or hb action, and a˜ is not. An *implementation* I is a history transition system, and the *histories* H(I) of I are those of its traces. Finally, we defne consistency against specifcations via history containment.

**Defnition 3.** *Implementation* I *is* consistent *with specifcation* W *iff* H(I) <sup>⊆</sup> H(W)*.*

# **3 Establishing Consistency with Forward Simulation**

To obtain a consistency proof strategy, we more closely relate implementations to specifcations via their admitted abstract executions. To capture the abstract executions admitted by a given implementation, we consider SLTSs which expose not only historyrelated actions, but also actions witnessing linearization and visibility. We refer to the actions lin(o) and vis(o, o ) for o, o <sup>∈</sup> <sup>N</sup>, along with the history actions, as *the abstract-execution actions*, and an *abstract-execution transition system* (AETS) is an SLTS whose actions include the abstract-execution actions. Extending the corresponding notion from history transition systems, we assume that executions are *well formed* in the sense that for a given operation identifer o: at most one lin o-action occurs, and no lin or vis o-actions occur prior to a call o-action. The *abstract execution* of a trace τ is defned inductively by <sup>f</sup>e(e<sup>∅</sup>, τ ), where <sup>e</sup><sup>∅</sup> <sup>=</sup> h<sup>∅</sup>, <sup>∅</sup>, <sup>∅</sup> is the empty execution, and,

$$\begin{array}{ll} f\_{\mathbf{e}}(e,\varepsilon) &= e & g\_{\mathbf{e}}(e,\hat{a}) &= \langle g\_{\mathbf{h}}(h), lin, vis \rangle \\ f\_{\mathbf{e}}(e,a\tau) &= f\_{\mathbf{e}}(g\_{\mathbf{e}}(e,a),\tau) & g\_{\mathbf{e}}(e,\lim(o)) &= \langle h, lin\cup\{\langle o',o\rangle : o'\in lin\}, vis \rangle \\ f\_{\mathbf{e}}(e,\tilde{a}\tau) &= f\_{\mathbf{e}}(e,\tau) & g\_{\mathbf{e}}(e,\text{vis}(o,o')) &= \langle h, lin, vis \cup\{\langle o,o'\rangle\} \rangle \end{array}$$

where e <sup>=</sup> h, *lin*, *vis*, and a is a call, ret, hb, lin, or vis action, a˜ is not, and a<sup>ˆ</sup> is a call, ret, or hb action. A *witnessing implementation* I is an abstract-execution transition system, and the *abstract executions* E(I) of I are those of its traces.

We adopt forward simulation [25] for proving consistency against weak-visibility specifcations. Formally, a *simulation relation* from one system <sup>Σ</sup><sup>1</sup> <sup>=</sup> Q<sup>1</sup>, A<sup>1</sup>, χ<sup>1</sup>, <sup>→</sup><sup>1</sup> to another <sup>Σ</sup><sup>2</sup> <sup>=</sup> <sup>Q</sup><sup>2</sup>, A<sup>2</sup>, χ<sup>2</sup>, <sup>→</sup><sup>2</sup> is a binary relation <sup>R</sup> <sup>⊆</sup> <sup>Q</sup><sup>1</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> such that initial states are related, R(χ<sup>1</sup>, χ<sup>2</sup>), and: for any pair of related states R(q<sup>1</sup>, q<sup>2</sup>) and sourcesystem transition <sup>q</sup><sup>1</sup> <sup>a</sup><sup>1</sup> −→<sup>1</sup> <sup>q</sup> <sup>1</sup>, there exists a target-system transition <sup>q</sup><sup>2</sup> <sup>a</sup><sup>2</sup> −→<sup>2</sup> <sup>q</sup> <sup>2</sup> to related states, i.e., R(q 1, q <sup>2</sup>), over common actions, i.e., (a<sup>1</sup> <sup>|</sup> <sup>A</sup><sup>2</sup>)=(a<sup>2</sup> <sup>|</sup> <sup>A</sup><sup>1</sup>). We say <sup>Σ</sup><sup>2</sup> *simulates* <sup>Σ</sup><sup>1</sup> and write <sup>Σ</sup><sup>1</sup> <sup>Σ</sup><sup>2</sup> when a simulation relation from <sup>Σ</sup><sup>1</sup> to <sup>Σ</sup><sup>2</sup> exists.

We derive transition systems to model consistency specifcations in simulation. The following lemma establishes the soundness and completeness of this substitution, and the subsequent theorem asserts the soundness of the simulation-based proof strategy. **Defnition 4.** *The* transition system -<sup>W</sup><sup>s</sup> *of a weak-visibility specifcation* <sup>W</sup> *is the AETS whose actions are the abstract execution actions, whose states are abstract executions, whose initial state is the empty execution, and whose transitions include* <sup>e</sup><sup>1</sup> a −→ <sup>e</sup><sup>2</sup> *iff* <sup>f</sup>e(e1, a) = <sup>e</sup><sup>2</sup> *and* <sup>e</sup><sup>2</sup> *is consistent with* <sup>W</sup>*.*

**Lemma 2.** *A weak-visibility spec. and its transition system have identical histories.*

**Theorem 1.** *A witnessing implementation* I *is consistent with a weak-visibility specifcation* W *if the transition system* -<sup>W</sup><sup>s</sup> *of* <sup>W</sup> *simulates* <sup>I</sup>*.*

Our notion of simulation is in some sense *complete* when the sequential specifcation S of a weak-consistency specifcation W <sup>=</sup> S, R, V is *return-value deterministic*, i.e., there is a single label m, x, y such that λ · m, x, y<sup>∈</sup> S for any method <sup>m</sup>, argument-value x, and admitted sequence λ <sup>∈</sup> S. In particular, -<sup>W</sup><sup>s</sup> simulates any witnessing implementation I whose abstract executions E(I) are included in E(-Ws). 11 This completeness, however, extends only to inclusion of abstract executions, and not all the way to consistency, since consistency is defned on histories, and any given operation's return value is not completely determined by the other operation labels and happens-before relation of a given history: return values generally depend on linearization order and visibility as well. Nevertheless, sequential specifcations typically are return-value deterministic, and we have used simulation to prove consistency of Java-inspired weakly-consistent objects.

Establishing simulation for an implementation is also helpful when reasoning about clients of a concurrent object. One can use the specifcation in place of the implementation and encode the client invariants using the abstract execution of the specifcation in order to prove client properties, following Sergey et al.'s approach [35].

#### **3.1 Reducing Consistency to Safety Verifcation**

Proving simulation between an implementation and its specifcation can generally be achieved via product construction: complete the transition system of the specifcation, replacing non-enabled transitions with error-state transitions; then ensure the synchronized product of implementation and completed-specifcation transition systems is *safe*, i.e., no error state is reachable. Assuming that the individual transition systems are safe, then the product system is safe *if* the specifcation simulates the implementation. This reduction to safety verifcation is also generally applicable to implementation and specifcation programs, though we limit our formalization to their underlying transition systems for simplicity. By the upcoming Corollary 1, such reductions enable consistency verifcation with existing safety verifcation tools.

# **3.2 Verifying Implementations**

While Theorem 1 establishes forward simulation as a strategy for proving the consistency of implementations against weak-visibility specifcations, its application to

<sup>11</sup> This is a consequence of a generic result stating that the set of traces of an LTS A<sup>1</sup> is included in the set of traces of an LTS A<sup>2</sup> if A<sup>2</sup> simulates A1, provided that A<sup>2</sup> is deterministic [25].

real-world implementations requires program-level mechanisms to signal the underlying AETS lin and vis actions. To apply forward simulation, we thus develop a notion of programs whose commands include such mechanisms.

This section illustrates a toy programming language with AETS semantics which provides these mechanisms. The key features are the lin and vis program commands, which emit linearization and visibility actions for the currently-executing operation, along with load,store, and cas(compare-and-swap) commands, which record and return the set of operation identifers having written to each memory cell. Such augmented memory commands allow programs to obtain handles to the operations whose efects it has observed, in order to signal the corresponding vis actions.

While one can develop similar mechanisms for languages with any underlying memory model, the toy language presented here assumes a sequentially-consistent memory. Note that the assumption of sequentially-consistent memory operations is practically without loss of generality for Java 8's concurrent collections since they are designed to be data-race free — their anomalies arise not from weak-memory semantics, but from non-atomic operations spanning several memory cells.

For generality, we assume abstract notions of commands and memory, using κ, μ, , and M respectively to denote a *program command*, *memory command*, *local state*, and *global memory*. So that operations can assert their visibilities, we consider memory which stores, and returns upon access, the identifer(s) of operations which previously accessed a given cell. A *program* P <sup>=</sup> init, cmd, idle, done consists of an init(m, x) = function mapping method name m and argument values x to local state , along with a cmd() = κ function mapping local state to program command κ, and idle() and done() predicates on local states . Intuitively, identifying local states with threads, the idle predicate indicates whether a thread is outside of atomic sections, and subject to interference from other threads; meanwhile the done predicate indicates whether whether a thread has terminated.

The *denotation* of a memory command μ is a function <sup>μ</sup><sup>m</sup> from global memory M<sup>1</sup>, argument value x, and operation o to a tuple μ<sup>m</sup>(M<sup>1</sup>, x, o) = M<sup>2</sup>, y consisting of a global memory M<sup>2</sup>, along with a return value y.

*Example 6.* A sequentially-consistent memory system which records the set of operations to access each location can be captured by mapping addresses x to value and operation-set pairs M(x) = y, O, along with three memory commands:

$$\begin{aligned} \left[\text{lad}\right]\_{\text{m}}(M, x, \\_) &= \langle M, M(x) \rangle \\ \left[\text{lstee}\right]\_{\text{m}}(M, xy, o) &= \langle M[x \mapsto \langle y, M(x)\_{1} \cup \{o\} \rangle], \varepsilon \rangle \\ \left[\text{lcs}\right]\_{\text{m}}(M, xyz, o) &= \begin{cases} \langle M[x \mapsto \langle z, M(x)\_{1} \cup \{o\} \rangle], \langletrue, M(x)\_{1} \rangle \rangle & \text{if } M(x)\_{0} = y \\ \langle M, \langle false, M(x)\_{1} \rangle \rangle & \text{if } M(x)\_{0} \neq y \end{cases} \end{aligned}$$

where the compare-and-swap (CAS) operation stores value z at address x and returns *true* when y was previously stored, and otherwise returns *false*.

The *denotation* of a program command κ is a function <sup>κ</sup><sup>c</sup> from local state <sup>1</sup> to a tuple κ<sup>c</sup>(<sup>1</sup>) = μ, x, f consisting of a memory command μ and argument value x, and a update continuation f mapping the memory command's return value y to a pair f(y) = <sup>2</sup>, α, where <sup>2</sup> is an updated local state, and <sup>α</sup> maps an operation <sup>o</sup> to an LTS action α(o). We assume the denotation ret x<sup>c</sup>(<sup>1</sup>) = nop, ε, λy. <sup>2</sup>, λo.ret(z) of the ret command yields a local state <sup>2</sup> with done(<sup>2</sup>) without executing memory commands, and outputs a corresponding LTS ret action.

*Example 7.* A simple goto language over variables <sup>a</sup>, <sup>b</sup>,... for the memory system of Example 6 would include the following commands:

$$\begin{aligned} \left[\mathsf{goto }\mathsf{a}\right]\_c(\ell) &= \langle \mathsf{nop}, \varepsilon, \lambda y. \langle jump(\ell, \ell(a)), \lambda o.\varepsilon \rangle \rangle \\ \left[\mathsf{assum}\ \mathsf{a}\right]\_c(\ell) &= \langle \mathsf{nop}, \varepsilon, \lambda y. \langle next(\ell), \lambda o.\varepsilon \rangle \rangle \text{ if } \ell(\mathsf{a}) \neq 0 \\ \left[\mathsf{b}, \mathsf{c} = \mathsf{lond}(\mathsf{a})\right]\_c(\ell) &= \langle \mathsf{lond}, \ell(\mathsf{a}), \lambda y\_1, y\_2. \langle next(\ell[\mathsf{b} \mapsto y\_1][\mathsf{c} \mapsto y\_2]), \lambda o.\varepsilon \rangle \rangle \\ \left[\mathsf{store}(\mathsf{a}, \mathsf{b})\right]\_c(\ell) &= \langle \mathsf{store}, \ell(\mathsf{a})\ell(\mathsf{b}), \lambda y. \langle next(\ell), \lambda o.\varepsilon \rangle \rangle \\ \left[\mathsf{d}, \mathsf{e} = \mathsf{cas}(\mathsf{a}, \mathsf{b}, \mathsf{c})\right]\_c(\ell) &= \langle \mathsf{cas}, \ell(\mathsf{a})\ell(\mathsf{b})\ell(\mathsf{c}), \lambda y\_1, y\_2. \langle next(\ell[\mathsf{d} \mapsto y\_1][\mathsf{e} \mapsto y\_2]), \lambda o.\varepsilon \rangle \rangle \rangle \end{aligned}$$

where the *jump* and *next* functions update a program counter, and the load command stores the operation identifer returned from the corresponding memory commands. Linearization and visibility actions are captured as program commands as follows:

$$\begin{aligned} \left[\mathbf{1}\mathbf{in}\right]\_{\mathbf{c}}(\ell) &= \left< \mathbf{n}\mathbf{op}, \varepsilon, \lambda y. \langle next(\ell), \lambda o. \mathrm{lin}(o) \rangle \right\rangle \\ \left[\mathbf{vis(a)}\right]\_{\mathbf{c}}(\ell) &= \left< \mathbf{n}\mathbf{op}, \varepsilon, \lambda y. \langle next(\ell), \lambda o. \mathrm{vis}(o, \ell(\mathbf{a})) \rangle \right> \end{aligned}$$

Atomic sections can be captured with a lock variable and a pair of program commands,

$$\begin{aligned} \left[\mathsf{begin}[\mathsf{begin}]{}\_{\mathsf{c}}(\ell) = \langle \mathsf{nop}, \varepsilon, \lambda y. \langle next(\ell[\mathtt{1ock} \mapsto true]), \lambda o.\varepsilon \rangle \rangle} \right] \\ \left[\mathsf{end}\right]\_{\mathsf{c}}(\ell) = \langle \mathsf{nop}, \varepsilon, \lambda y. \langle next(\ell[\mathtt{1ock} \mapsto false]), \lambda o.\varepsilon \rangle \rangle \end{aligned} \end{aligned}$$

such that idle states are identifed by not holding the lock, i.e., idle() = <sup>¬</sup>(lock), as in the initial state init(m, x)(lock) = *false*.

Figure 1 lists the semantics -<sup>P</sup><sup>p</sup> of a program <sup>P</sup> as an abstract-execution transition system. The states M,L of -<sup>P</sup><sup>p</sup> include a global memory <sup>M</sup>, along with a partial function L from operation identifers o to local states L(o); the initial state is M<sup>∅</sup>, <sup>∅</sup>, where <sup>M</sup><sup>∅</sup> is an initial memory state. The transitions for call and hb actions are enabled independently of implementation state, since they are dictated by implementations' environments. Although we do not explicitly model client programs and platforms here, in reality, client programs dictate call actions, and platforms, driven by client programs, dictate hb actions; for example, a client which acquires the lock released after operation <sup>o</sup><sup>1</sup>, before invoking operation <sup>o</sup><sup>2</sup>, is generally ensured by its platform that <sup>o</sup><sup>1</sup> happens before o<sup>2</sup>. The transitions for all other actions are dictated by implementation commands. While the ret, lin, and vis commands generate their corresponding LTS actions, all other commands generate ε transitions.

Each atomic a −→ step of the AETS underlying a given program is built from a sequence of steps for the individual program commands in an atomic section. Individual program commands essentially execute one small step from shared memory and local state M<sup>1</sup>, <sup>1</sup> to M<sup>2</sup>, <sup>2</sup>, invoking memory command μ with

$$\begin{array}{ll} \frac{\phi \notin \text{dom}(L)}{\langle M,L\rangle} & \ell = \mathsf{init}(m,x) \\ \langle M,L\rangle & \xrightarrow{\mathsf{clil}(o,m,x)} \langle M,L[o \mapsto \ell] \rangle \end{array} \qquad \begin{array}{ll} \underline{\text{dom}(L(o\_{1}))} & o\_{2} \notin \mathsf{dom}(L) \\ \langle M,L\rangle \xrightarrow{\mathsf{clil}(o\_{1},o\_{2})} \langle M,L\rangle \end{array}$$

$$\begin{array}{ll} \frac{\langle M\_{1},\ell\_{1},o,e\rangle \sim \stackrel{\star}{\dashv}\,\langle M\_{2},\ell\_{2},o,\overline{a}\rangle \qquad \mathsf{idle}(\ell\_{2})}{\langle M\_{1},L[o \mapsto \ell\_{1}]\rangle \xrightarrow{\overline{a}} \langle M\_{2},L[o \mapsto \ell\_{2}]\rangle} \\\\ \mathsf{cmd}(\ell\_{1}) = \kappa & \left[\kappa\right]\_{c}(\ell\_{1}) = \langle\mu,x,f\rangle \\ \underline{\lVert\mu\rVert\_{\text{m}}(M\_{1},x,o)} = \langle M\_{2},y\rangle & f(y) = \langle\ell\_{2},\alpha\rangle \\ \hline \langle M\_{1},\ell\_{1},o,\overline{a}\rangle \sim \stackrel{\star}{\dashv}\,\langle M\_{2},\ell\_{2},o,\overline{a}\cdot\alpha(o)\rangle \end{array}$$

**Fig. 1.** The semantics of program P = init, cmd, idle, done as an abstract-execution transition system, where -·<sup>c</sup> and -·<sup>m</sup> are the denotations of program and memory commands, respectively.

argument x, and emitting action α(o). Besides its efect on shared memory, each step uses the result M2, y of memory command μ to update local state and emit an action using the continuation <sup>f</sup>, i.e., <sup>f</sup>(y) = <sup>2</sup>, α. Commands which do not access memory are modeled by a no-op memory commands. We defne the consistency of programs by reduction to their transition systems.

**Defnition 5.** *A program* P *is* consistent *with a specifcation iff its semantics* -<sup>P</sup><sup>p</sup> *is.*

Thus the consistency of P with W amounts to the inclusion of -P<sup>p</sup>'s histories in <sup>W</sup>'s. The following corollary of Theorem 1 follows directly by Defnition 5, and immediately yields a program verifcation strategy: validate a simulation relation from the states of -<sup>P</sup><sup>p</sup> to the states of -<sup>W</sup><sup>s</sup> such that each command of <sup>P</sup> is simulated by a step of -W<sup>s</sup>.

**Corollary 1.** *A program* P *is consistent with specifcation* W *if* -<sup>W</sup><sup>s</sup> *simulates* -P<sup>p</sup>*.*

# **4 Proof Methodology**

In this section we develop a systematic means to annotating concurrent objects for relaxed-visibility simulation proofs. Besides leveraging an auxiliary memory system which tags memory accesses with the operation identifers which wrote read values (see §3.2), annotations signal linearization points with lin commands, and indicate visibility of other operations with vis commands. As in previous works [3, 37, 2, 18] we assume linearization points are given, and focus on visibility-related annotations.

As we focus on data-race free implementations (e.g., Java 8's concurrent collections) for which sequential consistency is sound, it can be assumed without loss of generality that the happens-before order is exactly the *returns-before* order between operations, which orders two operations <sup>o</sup><sup>1</sup> and <sup>o</sup><sup>2</sup> if the return action of <sup>o</sup><sup>1</sup> occurs in real-time before the call action of o<sup>2</sup>. This assumption allows to guarantee that linearizations are consistent with happens-before just by ensuring that the linearization point of each operation occurs in between its call and return action (like in standard linearizability).

```
var table: array of T;
procedure absolute put(k: int, v: T) {
 atomic {
   store(table[k], v);
   vis(getLin());
   lin();
 }
}
procedure absolute get(k: int) {
 atomic{
   v, O = load(table[k]);
   vis(getLin());
   lin();
 }
 return v;
}
                                                           procedure monotonic has(v: T)
                                                             vis(getModLin());
                                                           {
                                                             store(k, 0);
                                                             while (k < table.length) {
                                                               atomic{
                                                                 tv, O = load(table[k]);
                                                                 vis(O ∩ getModLin());
                                                               }
                                                               if (tv = v) then {
                                                                 lin();
                                                                 return true;
                                                               }
                                                               inc(k);
                                                             }
                                                             lin();
                                                             return false;
                                                           }
```
**Fig. 2.** An implementation Ichm modeling Java's concurrent hash map. The command inc(k) increments counter k, and commands within atomic {...} are collectively atomic.

It is without loss of generality because the clients of such implementations can use auxiliary variables to impose synchronization order constraints between every two operations ordered by returns-before, e.g., writing a variable after each operation returns which is read before each other operation is called (under sequential consistency, every write happens-before every other read which reads the written value).

We illustrate our methodology with the key-value map implementation <sup>I</sup>chm of Figure 2, which models Java's concurrent hash map. The lines marked in blue and red represent linearization/visibility commands added by the instrumentation that will be described below. Key-value pairs are stored in an array table indexed by keys. The implementation of put and get are obvious while the implementation of has returns true if the input value is associated to some key consists of a while loop traversing the array and searching for the input value. To simplify the exposition, the shared memory reads and writes are already adapted to the memory system described in Section 3.2 (essentially, this consists in adding new variables storing the set of operation identifers returned by a shared memory read). While put and get are obviously linearizable, has is weakly consistent, with monotonic visibility. For instance, given the two thread program {get(1); has(1)} || {put(1, <sup>1</sup>); put(0, <sup>1</sup>); put(1, <sup>0</sup>)} it is possible that get(1) returns 1 while has(1) returns false. This is possible in an interleaving where has reads table[0] before put(0,1) writes into it (observing the initial value 0), and table[1] after put(1,0) writes into it (observing value 0 as well). The only abstract execution consistent with the weakly-consistent contains-value map <sup>W</sup><sup>m</sup> (Example 2) which justifes these return values is given in Example 3. We show that this implementation is consistent with a simplifcation of the contains-value map W<sup>m</sup>, without remove key operations, and where put operations return no value.

Given an implementation I, let <sup>L</sup>(I) be an instrumentation of I with program commands lin() emitting linearization actions. The execution of lin() in the context of an operation with identifer <sup>o</sup> emits a linearization action lin(o). We assume that <sup>L</sup>(I) leads to well-formed executions (e.g., at most one linearization action per operation).

*Example 8.* For the implementation in Figure 2, the linearization commands of put and get are executed atomically with the store to table[k] in put and the load of table[k] in get, respectively. The linearization command of has is executed at any point after observing the input value v or after exiting the loop, but before the return. The two choices correspond to diferent return values and only one of them will be executed during an invocation.

Given an instrumentation <sup>L</sup>(I), a visibility annotation V for I's methods, and a read-only predicate R, we defne a witnessing implementation <sup>V</sup>(L(I)) according to a generic heuristic that depends only on V and R. This defnition uses a program command getLin() which returns the set of operations in the current linearization sequence.12 The current linearization sequence is stored in a history variable which is updated with every linearization action by appending the corresponding operation identifer. For readability, we leave this history variable implicit and omit the corresponding updates. As syntactic sugar, we use a command getModLin() which returns the set of *modifers* (non read-only operations) in the current linearization sequence. To represent visibility actions, we use program commands vis(A) where A is a set of operation identifers. The execution of vis(A) in the context of an operation with identifer o emits the set of visibility actions vis(o, o ) for every operation o <sup>∈</sup> <sup>A</sup>.

Therefore, <sup>V</sup>(L(I)) extends the instrumentation <sup>L</sup>(I) with commands generating visibility actions as follows:


*Example 9.* The blue lines in Figure 2 demonstrate the visibility commands added by the instrumentation V(·) to the key-value map in Figure 2 (in this case, the modifers are put operations). The frst visibility command in has precedes the procedure body to emphasize the fact that it is executed *atomically* with the procedure call. Also, note that the read of the array table is the only shared memory read in has.

**Theorem 2.** *The abstract executions of the witnessing implementation* <sup>V</sup>(L(I)) *are consistent with* V *and* R*.*

*Proof.* Let h, *lin*, *vis* be the abstract execution of a trace τ of <sup>V</sup>(L(I)), and let o be an invocation in <sup>h</sup> of a monotonic method (w.r.t. <sup>V</sup> ). By the defnition of <sup>V</sup>, the call action of <sup>o</sup> is *immediately* followed in <sup>τ</sup> by a sequence of visibility actions vis(o, o )

<sup>12</sup> We rely on retrieving the identifers of currently-linearized operations. More complex proofs may also require inspecting, e.g., operation labels and happens-before relationships.

for every modifer o which has been already linearized. Therefore, any operation which has returned before o (i.e., happens-before o) has already been linearized and it will necessarily have a smaller visibility (w.r.t. set inclusion) because the linearization sequence is modifed only by appending new operations. The instrumentation of shared memory reads may add more visibility actions vis(o, \_) but this preserves the monotonicity status of o's visibility. The case of absolute methods is obvious.

The consistency of the abstract executions of <sup>V</sup>(L(I)) with a given sequential specifcation S, which completes the proof of consistency with a weak-visibility specifcation W <sup>=</sup> S, R, V , can be proved by showing that the transition system -<sup>W</sup><sup>s</sup> of W simulates <sup>V</sup>(L(I)) (Theorem 1). Defning a simulation relation between the two systems is in some part implementation specifc, and in the following we demonstrate it for the key-value map implementation <sup>V</sup>(L(Ichm)).

We show that -<sup>W</sup>m<sup>s</sup> simulates implementation <sup>I</sup>chm. A state of <sup>I</sup>chm in Figure 2 is a valuation of table and the history variable lin storing the current linearization sequence, and a valuation of the local variables for each active operation. Let *ops*(q) denote the set of operations which are active in an implementation state q. Also, for <sup>a</sup> has operation o <sup>∈</sup> *ops*(q), let *index* (o) be the maximal index k of the array table such that o has already read table[k] and table[k] <sup>=</sup> <sup>v</sup>. We assume *index* (o) = <sup>−</sup><sup>1</sup> if o did not read any array cell.

**Defnition 6.** *Let* <sup>R</sup>chm *be a relation which associates every implementation state* <sup>q</sup> *with a state of* -Wm<sup>s</sup>*, i.e., an* S, R, V *-consistent abstract execution* e <sup>=</sup> h, *lin*, *vis with* h <sup>=</sup> O, *inv*, *ret*, *hb, such that:*

	- **–** *all the* put *operations* <sup>o</sup> *which returned before* o *was invoked,*
	- **–** *for each* <sup>i</sup> <sup>≤</sup> *index* (o)*, all the* put(i,\_) *operations from a prefx of lin that wrote a value* diferent *from* v*,*
	- **–** *all the* put(*index* (o)+1,\_) *operations from a prefx of lin that ends with a* put(*index* (o)+1,v) *operation, provided that* tv <sup>=</sup> <sup>v</sup>*.*

*Above, the linearization prefx associated to an index* <sup>j</sup><sup>1</sup> < j<sup>2</sup> *should be a prefx of the one associated to* j<sup>2</sup>*.*

A large part of this defnition is applicable to any implementation, only points (5), (6), and (7) being specifc to the implementation we consider. The points (6) and (7) ensure that the return values of operations are consistent with S and mimic the efect of the vis commands from Figure 2.

**Theorem 3.** <sup>R</sup>chm *is a simulation relation from* <sup>V</sup>(L(Ichm)) *to* -W<sup>m</sup><sup>s</sup>*.*

# **5 Implementation and Evaluation**

In this section we efectuate our methodology by verifying two weakly-consistent concurrent objects: Java's ConcurrentHashMap and ConcurrentLinkedQueue. <sup>13</sup> We use an of-the-shelf deductive verifcation tool called civl [16], though any concurrent program verifer could sufce. We chose civl because comparable verifers either require a manual encoding of the concurrency reasoning (e.g. Dafny or Viper) which can be error-prone, or require cumbersome reasoning about interleavings of threadlocal histories (e.g. VerCors). An additional beneft of civl is that it directly proves simulation, thereby tying the mechanized proofs to our theoretical development. Our proofs assume no bound on the number of threads or the size of the memory.

Our use of civl imposes two restrictions on the implementations we can verify. First, civl uses the Owicki-Gries method [29] to verify concurrent programs. These methods are unsound for weak memory models [22], so civl, and hence our proofs, assume a sequentially-consistent memory model. Second, civl's strategy for building the simulation relation requires implementations to have statically-known linearization points because it checks that there exists exactly one atomic section in each code path where the global state is modifed, and this modifcation is simulated by the specifcation.

Given these restrictions, we can simplify our proof strategy of forward refnement by factoring the simulations we construct through an atomic version of the specifcation transition system. This atomic specifcation is obtained from the specifcation AETS -<sup>W</sup><sup>s</sup> by restricting the interleavings between its transitions.

**Defnition 7.** *The* atomic transition system *of a specifcation* <sup>W</sup> *is the AETS* -<sup>W</sup><sup>a</sup> <sup>=</sup> Q, A, q, <sup>→</sup><sup>a</sup>*, where* -<sup>W</sup><sup>s</sup> <sup>=</sup> Q, A, q, <sup>→</sup>*is the AETS of* <sup>W</sup> *and* <sup>e</sup><sup>1</sup> a −→<sup>a</sup> <sup>e</sup><sup>2</sup> *if and only if* e1 a −→ <sup>e</sup><sup>2</sup> *and*a ∈ {call(o, m, x)}∪{ret(o, y)}∪{hb(o, o )}∪{ a<sup>1</sup> lin(o) : a<sup>1</sup> ∈ {vis(o, *\_*)} ∗ }*.*

Note that the language of -<sup>W</sup><sup>a</sup> is included in the language of -<sup>W</sup><sup>s</sup> and simulation proofs towards -<sup>W</sup><sup>a</sup> apply to -<sup>W</sup><sup>s</sup> as well.

Our civl proofs show that there is a simulation from an implementation to its atomic specifcation, which is encoded as a program whose state consists of the components of an abstract execution, i.e., O, *inv*, *ret*, *hb*, *lin*, *vis*. These were encoded as maps from operation identifers to values, sequences of operation identifers, and maps from operation identifers to sets of operation identifers respectively. Our axiomatization of sequences and sets were adapted from those used by the Dafny verifer [23]. For each method in M, we defned atomic procedures corresponding to call actions, return actions, and combined visibility and linearization actions in order to obtain exactly the atomic transitions of -W<sup>a</sup>.

It is challenging to encode Java implementations faithfully in civl, as the latter's input programming language is a basic imperative language lacking many Java features. Most notable among these is dynamic memory allocation on the heap, used by almost all of the concurrent data structure implementations. As civl is a frst-order prover, we needed an encoding of the heap that lets us perform reachability reasoning on the

<sup>13</sup> Our verifed implementations are open source, and available at: https://github.com/siddharth-krishna/weak-consistency-proofs.

heap. We adapted the frst-order theory of reachability and footprint sets from the GRASShopper verifer [30] for dynamically allocated data structures. This fragment is decidable, but relies on local theory extensions [36], which we implemented by using the trigger mechanism of the underlying SMT solver [27, 15] to ensure that quantifed axioms were only instantiated for program expressions. For instance, here is the "cycle" axiom that says that if a node x has a feld f[x] that points to itself, then any y that it can reach via that feld (encoded using the between predicate Btwn(f, x, y, y)) must be equal to x:

```
axiom (forall f: [Ref]Ref, x: Ref, y:Ref :: {known(x), known(y)}
       f[x] == x && Btwn(f, x, y, y) ==> x == y);
```
We use the trigger known(x), known(y) (known is a dummy function that maps every reference to true) and introduce known(t) terms in our programs for every term t of type Ref (for instance, by adding assert known(t) to the point of the program where t is introduced). This ensures that the cycle axiom is only instantiated for terms that appear in the program, and not for terms that are generated by instantations of axioms (like f[x] in the cycle axiom). This process was key to keeping the verifcation time manageable.

Since we consider fne-grained concurrent implementations, we also needed to reason about interference by other threads and show thread safety. civl provides Owicki-Gries [29] style thread-modular reasoning, by means of demarcating atomic blocks and providing preconditions for each block that are checked for stability under all possible modifcations by other threads. One of the consequences of this is that these annotations can only talk about the local state of a thread and the shared global state, but not other threads. To encode facts such as distinctness of operation identifers and ownership of unreachable nodes (e.g. newly allocated nodes) in the shared heap, we use civl's linear type system [40].

For instance, the proof of the push method needs to make assertions about the value of the newly-allocated node x. These assertions would not be stable under interference of other threads if we didn't have a way of specifying that the address of the new node is known only by the push thread. We encode this knowledge by marking the type of the variable x as linear – this tells civl that all values of x across all threads are distinct, which is sufcient for the proof. civl ensures soundness by making sure that linear variables are not duplicated (for instance, they cannot be passed to another method and then used afterwards).

We evaluate our proof methodology by considering models of two of Java's weaklyconsistent concurrent objects.

**Concurrent Hash Map** One is the ConcurrentHashMap implementation of the Map ADT, consisting of absolute put and get methods and a monotonic has method that follows the algortihm given in Figure 2. For simplicity, we assume here that keys are integers and the hash function is identity, but note that the proof of monotonicity of has is not afected by these assumptions.<sup>14</sup>

<sup>14</sup> Our civl implementation assumes the hash function is injective to avoid reasoning about the dynamic bucket-list needed to resolve hash collisions. While such reasoning is possible within


**Fig. 3.** Case study detail: for each object we show lines of code, lines of proof, total lines, and verifcation time in seconds. We also list common defnitions and axiomatizations separately.

civl can construct a simulation relation equivalent to the one defned in Defnition 6 automatically, given an inductive invariant that relates the state of the implementation to the abstract execution. A frst attempt at an invariant might be that the value stored at table[k] for every key k is the same as the value returned by adding a get operation on k by the specifcation AETS. This invariant is sufcient for civl to prove that the return value of the absolute methods (put and get) is consistent with the specifcation.

However, it is not enough to show that the return value of the monotonic has method is consistent with its visibility. This is because our proof technique constructs a visibility set for has by taking the union of the memory tags (the set of operations that wrote to each memory location) of each table entry it reads, but without additional invariants this visibility set could entail a diferent return value. We thus strengthen the invariant to say that tableTags[k], the memory tags associated with hash table entry k, is exactly the set of linearized put operations with key k. A consequence of this is that the abstract state encoded by tableTags[k] has the same value for key k as the value stored at table[k]. civl can then prove, given the following loop invariant, that the value returned by has is consistent with its visibility set.

```
(forall i: int :: 0 <= i && i < k ==> Map.ofVis(my_vis, lin)[i] != v)
```
This loop invariant says that among the entries scanned thus far, the abstract map given by the projection of lin to the current operation's visibility my\_vis does not include value v.

**Concurrent Linked Queue** Our second case study is the ConcurrentLinkedQueue implementation of the Queue ADT, consisting of absolute push and pop methods and a monotonic size method that traverses the queue from head to tail without any locks and returns the number of nodes it sees (see Figure 4 for the full code). We again model the core algorithm (the Michael-Scott queue [26]) and omit some of Java's optimizations, for instance to speed up garbage collection by setting the next feld of popped nodes to themselves, or setting the values of nodes to null when popping values.

The invariants needed to verify the absolute methods are a straightforward combination of structural invariants (e.g. that the queue is composed of a linked list from the head to null, with the tail being a member of this list) and a relation between the

civl, see our queue case study, this issue is orthogonal to the weak-consistency reasoning that we study here.

**var** head, tail: Ref; **struct** Node { **var** data: K; **var** next: Ref; }

```
procedure absolute push(k: K) {
 x = new Node(k, null);
 while (true) {
   t, _ = load(tail);
   tn, _ = load(tail.next);
   if (tn == null) {
     atomic {
       b, _ = cas(t.next, tn, x);
       if (b) {
         vis(getLin());
         lin();
       }
     }
     if (b) then break;
   } else {
     b, _ = cas(tail, t, tn);
   }
 }
}
                                     procedure absolute pop() {
                                       while (true) {
                                         h, _ = load(head);
                                         t, _ = load(tail);
                                         hn, _ = load(h.next);
                                         if (h != t) {
                                           k, _ = load(hn.data);
                                           atomic {
                                             b, _ = cas(head, h, hn);
                                             if (b) {
                                               vis(getLin());
                                               lin();
                                             }
                                           }
                                           if (b) then return k;
                                         }
                                       }
                                     }
                                                                           procedure monotonic size()
                                                                             vis(getModLin());
                                                                           {
                                                                             store(s, 0);
                                                                             c, _ = load(head);
                                                                             atomic {
                                                                              cn, O = load(c.next);
                                                                              vis(O ∩ getModLin());
                                                                             }
                                                                             while (cn != null) {
                                                                              inc(s);
                                                                              c = cn;
                                                                              atomic {
                                                                                cn, O = load(c.next);
                                                                                vis(O ∩ getModLin());
                                                                              }
                                                                             }
                                                                             lin();
                                                                             return s;
                                                                           }
```
**Fig. 4.** The simplifed implementation of Java's ConcurrentLinkedQueue that we verify.

abstract and concrete states. Once again, we need to strengthen this invariant in order to verify the monotonic size method, because otherwise we cannot prove that the visibility set we construct (by taking the union of the memory tags of nodes in the list during traversal) justifes the return value.

The key additional invariant is that the memory tags for the next feld of each node (denoted x.nextTags for each node x) in the queue contain the operation label of the operation that pushed the next node into the queue (if it exists). Further, the sequence of push operations in lin are exactly the operations in the nextTags feld of nodes in the queue, and in the order they are present in the queue.

Figure 5 shows a simplifed version of the civl encoding of these invariants. In it, we use the following auxiliary variables in order to avoid quantifer alternation: nextInvoc maps nodes to the operation label (type Invoc in civl) contained in the nextTags feld; nextRef maps operations to the nodes whose nextTags feld contains them, i.e. it is the inverse of nextInvoc; and absRefs maps the index of the abstract queue (represented as a mathematical sequence) to the corresponding concrete heap node. We omit the triggers and known predicates for readability; the full invariant can be found in the accompanying proof scripts.

Given these invariants, one can show that the return value s computed by size is consistent with the visibility set it constructs by picking up the memory tags from each node that it traverses. The loop invariant is more involved, as due to concurrent updates size could be traversing nodes that have been popped from the queue; see our civl proofs for more details.

**Results** Figure 3 provides a summary of our case studies. We separate the table into sections, one for each case study, and a common section at the top that contains the common theories of sets and sequences and our encoding of the heap. In each case study section, we separate the defnitions of the atomic specifcation of the ADT (which can

```
// nextTags only contains singleton sets of push operations
(forall y: Ref ::
 (Btwn(next, start, y, null) && y != null && next[y] != null
   ==> nextTags[y] == Set(nextInvoc[y])
       && invoc_m(nextInvoc[y]) == Queue.push))
// nextTags of the last node is the empty set
&& nextTags[absRefs[Queue.stateTail(Queue.ofSeq(lin)) - 1]]
  == Set_empty()
// lin is made up of nextInvoc[y] for y in the queue
&& (forall n: Invoc :: invoc_m(n) == Queue.push
    ==> (Seq_elem(n, lin)
         <==> Btwn(next, start, nextRef[n], null)
             && nextRef[n] != null && next[nextRef[n]] != null))
// lin is ordered by order of nodes in queue
&& (forall n1, n2: Invoc ::
   (invoc_m(n1) == Queue.push && invoc_m(n2) == Queue.push
   && Seq_elem(n1, lin) && Seq_elem(n2, lin)
   ==> (Seq_ord(lin, n1, n2)
        <==> Btwn(next, nextRef[n1], nextRef[n1], nextRef[n2])
            && nextRef[n1] != nextRef[n2])))
```
**Fig. 5.** A snippet from the civl invariant for the queue.

be reused for other implementations) from the code and proof of the implementation we consider. For each resulting module, we list the number of lines of code, lines of proof, total lines, and civl's verifcation time in seconds. Experiments were conducted on an Intel Core i7-4470 3.4 GHz 8-core machine with 16GB RAM.

Our two case studies are representative of the weakly-consistent behaviors exhibited by all the Java concurrent objects studied in [13], both those using fxed-size arrays and those using dynamic memory. As civl does not direclty support dynamic memory and other Java language features, we were forced to make certain simplifcations to the algorithms in our verifcation efort. However, the assumptions we make are orthogonal to the reasoning and proof of weak consistency of the monotonic methods. The underlying algorithm used by, and hence the proof argument for monotonicity of, hash map's has method is the same as that in the other monotonic hash map operations such as elements, entrySet, and toString. Similarly, the argument used for the queue's size can be adapted to other monotonic ConcurrentLinkedQueue and LinkedTransferQueue operations like toArray and toString. Thus, our proofs carry over to the full versions of the implementations as the key invariants linking the memory tags and visibility sets to the specifcation state are the same.

In addition, civl does not currently have any support for inferring the preconditions of each atomic block, which currently accounts for most of the lines of proof in our case studies. However, these problems have been studied and solved in other tools [30, 39], and in theory can be integrated with civl in order to simplify these kinds of proofs.

In conclusion, our case studies show that verifying weakly-consistent operations introduces little overhead compared to the proofs of the core absolute operations. The additional invariants needed to prove monotonicity were natural and easy to construct. We also see that our methodology brings weak-consistency proofs within the scope of what is provable by of-the-shelf automated concurrent program verifers in reasonable time.

# **6 Related Work**

Though *linearizability* [18] has reigned as the de-facto concurrent-object consistency criterion, several recent works proposed weaker criteria, including *quantitative relaxation* [17], *quiescent consistency* [10], and *local linearizability* [14]; these works efectively permit externally-visible interference among threads by altering objects' sequential specifcations, each in their own way. Motivated by the diversity of these proposals, Sergey et al. [35] proposed the use of Hoare logic for describing a custom consistency specifcation for each concurrent object. Raad et al. [31] continued in this direction by proposing declarative consistency models for concurrent objects atop weak-memory platforms. One common feature between our paper and this line of work (see also [21, 9]) is encoding and reasoning directly about the concurrent history. The notion of *visibility relaxation* [13] originates from Burckhardt et al.'s axiomatic specifcations [7], and leverages traditional sequential specifcations by allowing certain operations to behave as if they are unaware of concurrently-executed linearizationorder predecessors. The linearization (and visibility) actions of our simulation-proof methodology are unique to visibility-relaxation based weak-consistency, since they refer to a global linearization order linking executions with sequential specifcations.

Typical methodologies for proving linearizability are based on reductions to safety verifcation [8, 5] and forward simulation [3, 37, 2], the latter generally requiring the annotation of per-operation *linearization points*, each typically associated with a single program statement in the given operation, e.g., a shared memory access. Extensions to this methodology include *cooperation* [38, 12, 41], i.e., allowing operations' linearization points to coincide with other operations' statements, and *prophecy* [33, 24], i.e., allowing operation' linearization points to depend on future events. Such extensions enable linearizability proofs of objects like the Herlihy-Wing Queue (HWQ). While prophecy [25], alternatively backward simulation [25], is generally more powerful than forward simulation alone, Bouajjani et al. [6] described a methodology based on forward simulation capable of proving seemingly future-dependent objects like HWQ by considering fxed linearization points only for value removal, and an additional kind of specifcation-simulated action, *commit points*, corresponding to operations' fnal shared-memory accesses. Our consideration of specifcation-simulated visibility actions follows this line of thinking, enabling the forward-simulation based proof of weakly-consistent concurrent objects.

# **7 Conclusion and Future Work**

This work develops the frst verifcation methodology for weakly-consistent operations using sequential specifcations and forward simulation, thus reusing existing sequential ADT specifcations and enabling simple reasoning, i.e., without prophecy [1] or backward simulation [25]. This paper demonstrates the application of our methodology to absolute and monotonic methods on sequentially-consistent memory, as these are the consistency levels demonstrated in actual Java implementations of which we are aware. Our formalization is general, and also applicable to the other visibility relaxations, e.g., the *peer* and *weak* visibilities [13], and weaker memory models, e.g., the Java memory model.

Extrapolating, we speculate that handling other visibilities amounts to adding annotations and auxiliary state which mirrors inter-operation communication. For example, while monotonic operations on shared-memory implementations observe mutating linearization-order predecessors – corresponding to a sequence of shared-memory updates – causal operations with message-passing based implementations would observe operations whose messages have (transitively) propagated. The corresponding annotations may require auxiliary state to track message propagation, similar in spirit to the getModLin() auxiliary state that tracks mutating linearization-order predecessors (§4). Since weak memory models essentially alter the mechanics of inter-operation communication, the corresponding visibility annotations and auxiliary state may similarly refect this communication. Since this communication is partly captured by the denotations of memory commands (§3.2), these denotations would be modifed, e.g., to include not one value and tag per memory location, but multiple. While variations are possible depending on the extent to which the proof of a given implementation relies on the details of the memory model, in the worst case the auxiliary state could capture an existing memory model (e.g., operational) semantics exactly.

As with systematic or automated linearizability-proof methodologies, our proof methodology is susceptible to two potential sources of incompleteness. First, as mentioned in Section 3, methodologies like ours based on forward simulation are only complete when specifcations are *return-value deterministic*. However, data types are typically designed to be return-value deterministic and this source of incompleteness does not manifest in practice.

Second, methodologies like ours based on annotating program commands, e.g., with linearization points, are generally incomplete since the consistency mechanism employed by any given implementation may not admit characterization according to a given static annotation scheme; the Herlihy-Wing Queue, whose linearization points depend on the results of future actions, is a prototypical example [18]. Likewise, our systematic strategy for annotating implementations with *lin* and *vis* commands (§3) can fail to prove consistency of future-dependent operations. However, we have yet to observe any practical occurrence of such exotic objects; our strategy is sufcient for verifying the weakly-consistent algorithms implemented in the Java development kit. As a theoretical curiosity for future work, investigating the potential for complete annotation strategies would be interesting, e.g., for restricted classes of data types and/or implementations.

Finally, while civl's high-degree of automation facilitated rapid prototyping of our simulation proofs, its underlying foundation using Owicki-Gries style proof rules limits the potential for modular reasoning. In particular, while our weak-consistency proofs are thread-modular, our invariants and intermediate assertions necessarily talk about state shared among multiple threads. Since our simulation-based methodology and annotations are completely orthogonal to the underlying program logic, it would be interesting future work to apply our methodology using expressive logics like Rely-Guarantee, e.g. [19, 38], or variations of Concurrent Separation Logic, e.g. [28, 32, 34, 35, 4, 20]. It remains to be seen to what degree increased modularity may sacrifce automation in the application of our weak-consistency proof methodology.

**Acknowledgments** This material is based upon work supported by the National Science Foundation under Grant No. 1816936, and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 678177).

# **A Appendix: Proofs to Theorems and Lemmas**

**Lemma 1.** *The abstract executions* E(W) *of a specifcation* W *are consistent with* W*.*

*Proof.* Any complete, sequential, and absolute execution is consistent by defnition, since the labeling of its linearization is taken from the sequential specifcation. Then, any happens-before weakening is consistent for exactly the same reason as its source execution, since its linearization and visibility projection are both identical. Finally, any visibility weakening is consistent by the condition of W-consistency in its defnition.

**Lemma 2.** *A weak-visibility specifcation and its transition system have identical histories.*

*Proof.* It follows almost immediately that the abstract executions of -<sup>W</sup><sup>s</sup> are identical to those of W, since -Ws's state efectively records the abstract execution of a given AETS execution, and only enables those returns that are consistent with W. Since histories are the projections of abstract executions, the corresponding history sets are also identical.

**Theorem 1.** *A witnessing implementation* I *is consistent with a weak-visibility specifcation* W *if the transition system* -<sup>W</sup><sup>s</sup> *of* <sup>W</sup> *simulates* <sup>I</sup>*.*

*Proof.* This follows from standard arguments, given that the corresponding SLTSs include <sup>ε</sup> transitions to ensure that every move of one system can be matched by stuttering from the other: since both systems synchronize on the call, ret, hb, lin, and vis actions, the simulation guarantees that every abstract execution, and thus history, of I is matched by one of -W<sup>s</sup>. Then by Lemma 2, the histories of I are included in W.

**Theorem 3.** <sup>R</sup>chm *is a simulation relation from* <sup>I</sup>chm *to* -W<sup>m</sup><sup>s</sup>*.*

*Proof Sketch.* We show that every step of the implementation, i.e., an atomic section or a program command, is simulated by -W<sup>m</sup><sup>s</sup>. Given q, e<sup>∈</sup> Rchm, we consider the diferent implementation steps which are possible in q.

The case of commands corresponding to procedure calls of put and get is trivial. Executing a procedure call in q leads to a new state q which difers only by having a new active operation o. We have that e call(o,\_,\_) −−−−−−→ e and q , e <sup>∈</sup> <sup>R</sup>chm where <sup>e</sup> is obtained from e by adding o with an appropriate value of *inv*(o) and an empty visibility.

The transition corresponding to the atomic section of put is labeled by a sequence of visibility actions (one for each linearized operation) followed by a linearization action. Let σ denote this sequence of actions. This transition leads to a state q where the array table may have changed (unless writing the same value), and the history variable lin is extended with the put operation <sup>o</sup> executing this step. We defne an abstract execution e from e by changing *lin* to the new value of lin, and defning an absolute visibility for o. We have that e <sup>σ</sup> −→ e because e is consistent with W<sup>m</sup>. Also, q , e <sup>∈</sup> <sup>R</sup>chm because the validity of (3), (4), and (5) follow directly from the defnition

of e . The atomic section of get can be handled in a similar way. The simulation of return actions of get operations is a direct consequence of point (6) which ensures consistency with S.

For has, we focus on the atomic sections containing vis commands and the linearization commands (the other internal steps are simulated by steps of -W<sup>m</sup><sup>s</sup>, and the simulation of the return step follows directly from (7) which justifes the consistency of the return value). The atomic section around the procedure call corresponds to a transition labeled by a sequence σ of visibility actions (one for each linearized modifer) and leads to a state q with a new active has operation o (compared to q). We have that e <sup>σ</sup> −→ e because e is consistent with W<sup>m</sup>. Indeed, the visibility of o in e is not constrained since o has not been linearized and the W<sup>m</sup>-consistency of e follows from the W<sup>m</sup>-consistency of e. Also, q , e <sup>∈</sup> <sup>R</sup>chm because *index* (o) = <sup>−</sup><sup>1</sup> and (7) is clearly valid. The atomic section around the read of table[k] is simulated by -<sup>W</sup>m<sup>s</sup> in a similar way, noticing that (7) models precisely the efect of the visibility commands inside this atomic section. For the simulation of the linearization commands is important to notice that any active has operation in e has a visibility that contains all modifers which returned before it was called and as explained above, this visibility is monotonic.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. included in the chapter's Creative Commons license and your intended

# Local Reasoning for Global Graph Properties

Siddharth Krishna1, Alexander J. Summers2, and Thomas Wies<sup>1</sup>

<sup>1</sup> New York University, New York, NY, USA, {siddharth,wies}@cs.nyu.edu <sup>2</sup> ETH Zürich, Zurich, Switzerland, alexander.summers@inf.ethz.ch

Abstract. Separation logics are widely used for verifying programs that manipulate complex heap-based data structures. These logics build on so-called *separation algebras*, which allow expressing properties of heap regions such that modifications to a region do not invalidate properties stated about the remainder of the heap. This concept is key to enabling modular reasoning and also extends to concurrency. While heaps are naturally related to mathematical graphs, many ubiquitous graph properties are non-local in character, such as reachability between nodes, path lengths, acyclicity and other structural invariants, as well as data invariants which combine with these notions. Reasoning modularly about such graph properties remains notoriously difficult, since a local modification can have side-effects on a global property that cannot be easily confined to a small region.

In this paper, we address the question: What separation algebra can be used to avoid proof arguments reverting back to tedious global reasoning in such cases? To this end, we consider a general class of global graph properties expressed as fixpoints of algebraic equations over graphs. We present mathematical foundations for reasoning about this class of properties, imposing minimal requirements on the underlying theory that allow us to define a suitable separation algebra. Building on this theory, we develop a general proof technique for modular reasoning about global graph properties expressed over program heaps, in a way which can be directly integrated with existing separation logics. To demonstrate our approach, we present local proofs for two challenging examples: a priority inheritance protocol and the non-blocking concurrent Harris list.

# 1 Introduction

Separation logic (SL) [31,37] provides the basis of many successful verification tools that can verify programs manipulating complex data structures [1, 4, 17, 29]. This success is due to the logic's support for reasoning modularly about modifications to heap-based data. For simple inductive data structures such as lists and trees, much of this reasoning can be automated [2, 11, 20, 33]. However, these techniques often fail when data structures are less regular (e.g. multiple overlaid data structures) or provide multiple traversal patterns (e.g. threaded trees). Such idioms are prevalent in real-world implementations such as the fine-grained concurrent data structures found in operating systems and databases. Solutions to these problems have been proposed [14] but remain difficult to automate. For proofs of general graph algorithms, the situation is even more dire. Despite substantial improvements in the verification methodology for such algorithms [35, 38], significant parts of the proof argument still typically need to be carried out using nonlocal reasoning [7, 8, 13, 25]. This paper presents a general technique for local reasoning

Fig. 1: Pseudocode of the PIP and a state of the protocol data structure. Round nodes represent processes and rectangular nodes resources. Nodes are marked with their default priorities def\_prio as well as the aggregate priority multiset prios. A node's current priority curr\_prio is underlined and marked in bold blue.

about global graph properties that can be used within off-the-shelf separation logics. We demonstrate our technique using two challenging examples for which no fully local proof existed before, respectively, whose proof required a tailor-made logic.

As a motivating example, we consider an idealized priority inheritance protocol (PIP), a technique used in process scheduling [39]. The purpose of the protocol is to avoid *priority inversion*, i.e. a situation where a low-priority process causes a high-priority process to be blocked. The protocol maintains a bipartite graph with nodes representing processes and resources. An example graph is shown in Fig. 1. An edge from a process p to a resource r indicates that p is waiting for r to be available whereas an edge in the other direction means that r is currently held by p. Every node has an associated *default* priority and *current*; these are natural numbers. The current priority is used for scheduling processes. When a process attempts to acquire a resource currently held by another process, the graph is updated to avoid priority inversion. For example, when process p<sup>1</sup> with current priority 3 attempts to acquire the resource r<sup>1</sup> held by process p<sup>2</sup> of priority 1, p1's higher priority is propagated to p<sup>2</sup> and, transitively, to any other process that p<sup>2</sup> is waiting for (p<sup>3</sup> in this case). As a result, all nodes on the created cycle3 will get current priority 3. The protocol maintains the following *invariant*: the current priority of each node is the maximum of its default priority and the current priorities of all its predecessors. Priority propagation is implemented by the method update shown in Fig 1. The implementation represents graph edges by next pointers and handles both adding an edge (acquire) and removing one (release - code omitted). To recalculate the current priority of a node (line 12), each node maintains its default priority def\_prio and a multiset prios which contains the priorities of all its immediate predecessors.

Verifying that the PIP maintains its invariant using established separation logic (SL) techniques is challenging. In general, SL assertions describe resources and express the fact that the program has permission to access and manipulate these resources. In what

<sup>3</sup> The cycle can be used to detect/handle a deadlock; this is not the concern of this data structure.

follows, we stick to the standard model of SL where resources are memory regions represented as partial heaps. We sometimes view partial heaps more abstractly as partial graphs (hereafter, simply graphs). Assertions describing larger regions are built from smaller ones using *separating conjunction*, <sup>φ</sup><sup>1</sup> <sup>∗</sup>φ2. Semantically, the <sup>∗</sup> operator is tied to a notion of resource composition defined by an underlying *separation algebra* [5, 6]. In the standard model, composition enforces that φ<sup>1</sup> and φ<sup>2</sup> must describe disjoint regions. The logic and algebra are set up so that changes to the region φ<sup>1</sup> do not affect φ<sup>2</sup> (and vice versa). That is, if <sup>φ</sup><sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup> holds before the modification and <sup>φ</sup><sup>1</sup> is changed to <sup>φ</sup> 1, then φ <sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup> holds afterwards. This so-called *frame rule* enables modular reasoning about modifications to the heap and extends well to the concurrent setting when threads operate on disjoint portions of memory [3, 9, 10, 36]. However, the mere fact that φ<sup>2</sup> is preserved by modifications to φ<sup>1</sup> does not guarantee that if a global property such as the PIP invariant holds for <sup>φ</sup><sup>1</sup> <sup>∗</sup> <sup>φ</sup>2, it also still holds for <sup>φ</sup> <sup>1</sup> <sup>∗</sup> <sup>φ</sup>2.

For example, consider the PIP scenario depicted in Fig. 1. If φ<sup>1</sup> describes the subgraph containing only node p1, φ<sup>2</sup> the remainder of the graph, and φ <sup>1</sup> the graph obtained from φ<sup>1</sup> by adding the edge from p<sup>1</sup> to r1, then the PIP invariant will no longer hold for the new composed graph described by φ <sup>1</sup> <sup>∗</sup> <sup>φ</sup>2. On the other hand, if <sup>φ</sup><sup>1</sup> captures p<sup>1</sup> and the nodes reachable from r<sup>1</sup> (i.e., the set of nodes modified by update), φ<sup>2</sup> the remainder of the graph, and we reestablish the PIP invariant locally in φ<sup>1</sup> obtaining φ 1 (i.e., run update to completion), then φ <sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup> will also globally satisfy the PIP invariant. The separating conjunction ∗ is not sufficient to differentiate these two cases; both describe valid partitions of a possible program heap. As a consequence, prior techniques have to revert back to non-local reasoning to prove that the invariant is maintained.

A first helpful idea towards a solution to this problem is that of *iterated separating conjunction* [30, 44], which describes a graph G consisting of a set of nodes X by a formula <sup>Ψ</sup> <sup>=</sup> ∗<sup>x</sup>∈<sup>X</sup> <sup>N</sup>(x) where <sup>N</sup>(x) is some predicate that holds locally for every node <sup>x</sup> <sup>∈</sup> <sup>X</sup>. Using such node-local conditions one can naturally express non-inductive properties of graphs (e.g. *"*G *has no outgoing edges"* or *"*G *is bipartite"*). The advantages of this style of specification are two-fold. First, one can arbitrarily decompose and recompose Ψ by splitting X into disjoint subsets. For example, if X is partitioned into <sup>X</sup><sup>1</sup> and <sup>X</sup>2, then <sup>Ψ</sup> is equivalent to ∗<sup>x</sup>∈X<sup>1</sup> <sup>N</sup>(x) <sup>∗</sup> ∗<sup>x</sup>∈X<sup>2</sup> <sup>N</sup>(x). Moreover, it is very easy to prove that Ψ is preserved under modifications of subgraphs. For instance, if a program modifies the subgraph induced by <sup>X</sup><sup>1</sup> such that∗<sup>x</sup>∈X<sup>1</sup> <sup>N</sup>(x) is preserved locally, then the frame rule guarantees that Ψ will be preserved in the new larger graph. Iterated separating conjunction thus yields a simple proof technique for local reasoning about graph properties that can be described in terms of node-local conditions. However, this idea alone does not actually solve our problem because general global graph properties such as *"*G *is a direct acyclic graph"*, *"*G *is an overlay of multiple trees"*, or *"*G *satisfies the PIP invariant"* cannot be directly described via node-local conditions.

*Solution.* The key ingredient of our approach is the concept of a *flow* of a graph: a function fl from the nodes of the graph to *flow values*. For the PIP, the flow maps each node to the multiset of its incoming priorities. In general, a flow is a fixpoint of a set of algebraic equations induced by the graph. These equations are defined over a *flow domain*, which determines how flow values are propagated along the edges of the graph and how they are aggregated at each node. In the PIP example, an edge between nodes (n, n ) propagates the multiset containing max(fl(n), n.def\_prio) from n to n . The multisets arriving at n are aggregated with multiset union to obtain fl(n ). Flows enable capturing global graph properties in terms of node-local conditions. For example, the PIP invariant can be expressed by the following node-local condition: n.curr\_prio = max(fl(n), n.def\_prio). To enable compositional reasoning about such properties we need an appropriate separation algebra allowing us to prove locally that modifications to a subgraph do not affect the flow of the remainder of the graph.

To this end, we make the useful observation that a separation algebra induces a notion of an *interface of a resource*: we say that two resources a and a are equivalent if they compose with the same resources. The interface of a resource a could then be defined as a's equivalence class, but more-succinct and simpler representations may be possible. In the standard model of SL where resources are graphs and composition is disjoint graph union, the interface of a graph G is the set of all graphs G that have the same domain as G; in this model, a graph's domain could be defined to be its interface.

The interfaces of resources described by assertions capture the information that is implicitly communicated when these assertions are conjoined by separating conjunction. As we discussed earlier, in the standard model of SL, this information is too weak to enable local reasoning about global properties of the composed graphs because some additional information about the subgraphs' structure other than which nodes they contain must be communicated. For instance, if the goal is to verify the PIP invariant, the interfaces must capture information about the multisets of priorities propagated between the subgraphs. We define a separation algebra achieving exactly this: the induced *flow interface* of a graph G in this separation algebra captures how values of the flow domain must enter and leave G such that, when composed with a compatible graph G , the imposed local conditions on the flow of each node are satisfied in the composite graph.

This is the key to enabling SL-style framing for global graph properties. Using iterated separating conjunctions over the new separation algebra, we obtain a compositional proof technique that yields succinct proofs of programs such as the PIP, whose proofs with existing techniques would involve non-trivial global reasoning steps.

*Contributions.* In §2, we present mathematical foundations for flow domains, imposing the minimal requirements on the underlying algebra that allow us to capture a broad range of data structure invariants and graph properties and reason locally about them in a suitable separation algebra. Building on this theory we develop a general proof technique for modular reasoning about global graph properties that can be integrated with existing separation logics (§3). We further identify general mathematical conditions that can be used when desired to guarantee unique flows, and provide local proof arguments to check the preservation of these conditions (§4). We demonstrate the versatility of our approach by presenting local proofs for two challenging examples: the PIP and the concurrent non-blocking list due to Harris [12].

*Flows Redesigned.* Our work is inspired by the recent flow framework explored by some of the authors [22], but was redesigned from the ground up. We revisit the core algebra behind flow reasoning, and derive a different algebraic foundation by analysing the minimal requirements for general local reasoning; we call our newly-designed reasoning framework the *foundational flow framework*. Our new framework makes several significant improvements over [22] and eliminates its most stark limitations. We provide a detailed technical comparison with [22] and discuss other related work in §5.

# 2 The Foundational Flow Framework

In this section, we introduce the foundational flow framework, explaining the motivation for its design with respect to local reasoning principles. We aim for a general technique for modularly proving the preservation of recursively-defined invariants over (partial) graphs, with well-defined decomposition and composition operations.

# 2.1 Preliminaries and Notation

The term (<sup>b</sup> ? <sup>t</sup><sup>1</sup> : <sup>t</sup>2) denotes <sup>t</sup><sup>1</sup> if condition <sup>b</sup> holds and <sup>t</sup><sup>2</sup> otherwise. We write <sup>f</sup> : <sup>A</sup> <sup>→</sup> B for a function from A to B, and f : AB for a partial function from A to B. For a partial function <sup>f</sup>, we write <sup>f</sup>(x) = <sup>⊥</sup> if <sup>f</sup> is undefined at <sup>x</sup>. We use lambda notation (λx. E) to denote a function that maps x to the expression E (typically containing x). If f is a function from A to B, we write f[x <sup>y</sup>] to denote the function from <sup>A</sup> ∪ {x} defined by f[x <sup>y</sup>](z) := (<sup>z</sup> <sup>=</sup> <sup>x</sup> ? <sup>y</sup> : <sup>f</sup>(z)). We use {x<sup>1</sup> y1,...,x<sup>n</sup> <sup>y</sup>n} for pairwise different x<sup>i</sup> to denote the function [x<sup>1</sup> <sup>y</sup>1] ··· [x<sup>n</sup> yn], where is the function on an empty domain. Given functions <sup>f</sup><sup>1</sup> : <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>B</sup> and <sup>f</sup><sup>2</sup> : <sup>A</sup><sup>2</sup> <sup>→</sup> <sup>B</sup> we write <sup>f</sup><sup>1</sup> <sup>f</sup><sup>2</sup> for the function <sup>f</sup> : <sup>A</sup><sup>1</sup> <sup>A</sup><sup>2</sup> <sup>→</sup> <sup>B</sup> that maps <sup>x</sup> <sup>∈</sup> <sup>A</sup><sup>1</sup> to <sup>f</sup>1(x) and <sup>x</sup> <sup>∈</sup> <sup>A</sup><sup>2</sup> to <sup>f</sup>2(x) (if <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> are not disjoint sets, <sup>f</sup><sup>1</sup> <sup>f</sup><sup>2</sup> is undefined).

We write δn=n- : <sup>M</sup> <sup>→</sup> <sup>M</sup> for the function defined by <sup>δ</sup>n=n- (m) := m if n = n else 0. We also write λ<sup>0</sup> := (λm. 0) for the identically zero function, λid := (λm. m) for the identity function, and use <sup>e</sup> <sup>≡</sup> <sup>e</sup> to denote function equality. For <sup>e</sup>: <sup>M</sup> <sup>→</sup> <sup>M</sup> and <sup>m</sup> <sup>∈</sup> <sup>M</sup> we write me to denote the function application <sup>e</sup>(m). We write <sup>e</sup>◦e to denote function composition, i.e. (<sup>e</sup> ◦ <sup>e</sup> )(m) = e(e (m)) for <sup>m</sup> <sup>∈</sup> <sup>M</sup>, and use superscript notation e<sup>p</sup> to denote the function composition of e with itself p times.

For multisets S, we use standard set notation when clear from the context. We write <sup>S</sup>(x) to denote the number of occurrences of <sup>x</sup> in <sup>S</sup>. We write {x<sup>1</sup> i1,...,x<sup>n</sup> <sup>i</sup>n} for the multiset containing i<sup>1</sup> occurrences of x1, i<sup>2</sup> occurrences of x2, etc.

<sup>A</sup> *partial monoid* is a set <sup>M</sup>, along with a partial binary operation +: <sup>M</sup> <sup>×</sup> MM, and a special zero element <sup>0</sup> <sup>∈</sup> <sup>M</sup>, such that (1) <sup>+</sup> is associative, i.e., (m<sup>1</sup> +m2) +m<sup>3</sup> = m<sup>1</sup> + (m<sup>2</sup> +m3); and (2) 0 is an identity, i.e., m+0 = 0+m = m. Here, = means either both sides are defined and equal, or both are undefined. We identify a partial monoid with its support set M. If + is a total function, then we call <sup>M</sup> a monoid. Let <sup>m</sup>1, m2, m<sup>3</sup> <sup>∈</sup> <sup>M</sup> be arbitrary elements of the (partial) monoid in the following. We call a (partial) monoid M *commutative* if + is commutative, i.e., m<sup>1</sup> + m<sup>2</sup> = m<sup>2</sup> + m1. Similarly, a commutative monoid M is *cancellative* if + is cancellative, i.e., if m<sup>1</sup> + m<sup>2</sup> = m<sup>1</sup> + m<sup>3</sup> is defined, then m<sup>2</sup> = m3.

A *separation algebra* [5] is a cancellative, partial, commutative monoid.

#### 2.2 Flows

Recursive properties of graphs naturally depend on non-local information; e.g. we cannot express that a graph is acyclic directly as a conjunction of per-node invariants. Our foundational flow framework defines *flow values* at each node that capture non-local graph properties, and enables local specification and reasoning about such properties. Flow values are drawn from a *flow domain*, an algebraic structure which also specifies the operations used to define a flow via recursive computations over the graph. Our entire theory is parametric with the choice of a flow domain, whose components will be explained and motivated in the rest of this section.

Definition 1 (Flow Domain). *A* flow domain (M, +, 0, E) *consists of a commutative cancellative (total) monoid* (M, <sup>+</sup>, 0) *and a set of* edge functions <sup>E</sup> <sup>⊆</sup> <sup>M</sup> <sup>→</sup> <sup>M</sup>*.*

*Example 1.* The *path-counting* flow domain is (N, <sup>+</sup>, <sup>0</sup>, {λid, λ0}), consisting of the monoid of natural numbers under addition and the set of edge functions containing only the identity function and the zero function. This can be used to define a flow where the values at each node represent the number of paths to this node from a distinguished node n. Path-counting provides enough information to express locally per node that e.g. (a) all nodes are reachable from n (all path counts are non-zero), or (b) that the graph forms a tree rooted at n (all path counts are exactly 1).

*Example 2.* We use (N<sup>N</sup>, <sup>∪</sup>, <sup>∅</sup>, {λ0}∪{(λm. {max(<sup>m</sup> ∪ {p})}) <sup>|</sup> <sup>p</sup>∈N}) as flow domain for the PIP example (Figure 1). This consists of the monoid of multisets of natural numbers under multiset union and two kinds of edge functions: λ<sup>0</sup> and functions mapping a multiset m to the singleton multiset containing the maximum value between m and a fixed value p (used to represent a node's default priority). This can define a flow which locally captures the appropriate current node priorities as the graph is modified.

Further definitions in this section assume a fixed flow domain (M, +, 0, E) and a (potentially infinite) set of nodes N. For this section, we abstract heaps using directed partial graphs; integration of our graph reasoning with direct proofs over program heaps is explained in §3.

Definition 2 (Graph). *A* (partial) graph G = (N,e) *consists of a finite set of nodes* <sup>N</sup> <sup>⊆</sup> <sup>N</sup> *and a mapping from pairs of nodes to edge functions* <sup>e</sup>: <sup>N</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>E</sup>*.*

*Flow Values and Flows.* Flow values (taken from M; the first element of a flow domain) are used to capture sufficient information to express desired non-local properties of a graph. In Example 1, flow values are non-negative integers; for the PIP (Example 2) we instead use *multisets* of integers, representing relevant *non-local* information: the priorities of nodes currently referencing a given node in the graph. Given such flow values, a node's correct priority can be defined locally per node in the graph. This definition requires only the *maximum* value of these multisets, but as we will see shortly these multisets enable local *recomputation* of a correct priority when the graph is changed.

For a graph G = (N,e) we express properties of G in terms of node-local conditions that may depend on the nodes' *flow*. A flow is a function fl : <sup>N</sup> <sup>→</sup> <sup>M</sup> assigning every node a flow value and must be some fixpoint of the following *flow equation*:

$$\forall n \in N. \; f(n) = in(n) + \sum\_{n' \in N} f(n') \rhd e(n', n) \tag{\mathsf{FlowEqn}}$$

Intuitively, one can think of the flow as being obtained by a fold computation over the graph:4 the *inflow* in : <sup>N</sup> <sup>→</sup> <sup>M</sup> defines an initial flow at each node. This initial flow is then updated recursively for each node n: the current flow value at its predecessor nodes n is transferred to n via *edge functions* e(n , n): <sup>M</sup> <sup>→</sup> <sup>M</sup>. These flow values are aggregated using the *summation operation* + of the flow domain to obtain an updated flow of n; a flow for the graph is some fixpoint satisfying this equation at all nodes. <sup>5</sup>

Definition 3 (Flow Graph). *A* flow graph H = (N, e, fl) *is a graph* (N,e) *and function* fl : <sup>N</sup> <sup>→</sup> <sup>M</sup> *such that there exists an* inflow in : <sup>N</sup> <sup>→</sup> <sup>M</sup> *satisfying* FlowEqn(in, e, fl)*.*

We let dom(H) = N, and sometimes identify H and dom(H) to ease notational burden. For <sup>n</sup> <sup>∈</sup> <sup>H</sup> we write <sup>H</sup><sup>n</sup> for the singleton flow subgraph of <sup>H</sup> induced by <sup>n</sup>.

*Edge Functions.* In any flow graph, the flow value assigned to a node n by a flow is propagated to its neighbours n (and transitively) according to the edge function e(n, n ) labelling the edge (n, n ). The edge function maps the flow value at the *source node* n to one propagated on *this edge* to the *target node* n . Note that we require such a labelling for *all* pairs consisting of a source node n inside the graph and a target node <sup>n</sup> <sup>∈</sup> <sup>N</sup> (i.e., possibly outside the graph). The <sup>0</sup> flow value (the third element of our flow domains) is used to represent no flow; the corresponding (constant) zero *function* λ<sup>0</sup> = (λm. 0) is used as edge function to model the *absence* of an edge in the graph. A set of edge functions E from which this labelling is chosen can, other than the requirement <sup>λ</sup><sup>0</sup> <sup>∈</sup> <sup>E</sup>, be chosen as desired. As we will see in §4.4, restrictions to particular sets of edge functions E can be exploited to further strengthen our overall technique. Edge functions can depend on the local state of the source node (as in the following example); dependencies from elsewhere in the graph must be represented by the node's flow.

*Example 3.* Consider the graph in Figure 1 and the flow domain as in Example 2. We choose the edge functions to be λ<sup>0</sup> where no edge exists in the PIP structure, and otherwise (λm. {max(<sup>m</sup> ∪ {d})}) where <sup>d</sup> is the default priority of the source of the edge. For example, in Figure 1, <sup>e</sup>(r3, p2) = <sup>λ</sup><sup>0</sup> and <sup>e</sup>(r3, p1)=(λm. {max(<sup>m</sup> ∪ {0})}). Since the flow value at <sup>r</sup><sup>3</sup> is {1, <sup>2</sup>, <sup>2</sup>}, the edge (r3, p1) propagates the value {2} to <sup>p</sup>1, correctly representing the current priority of r3.

*Flow Aggregation and Inflows.* The flow value at a node is defined by those propagated to it from each node in a graph via edge functions, along with an additional *inflow* value explained here. Since multiple non-zero flow values can be propagated to a node, we require an aggregation of these values via a binary + operator on flow values : the second element of our flow domains. The edges from which the aggregated values originate are unordered. Thus, we require + to be commutative and associative, making this aggregation order-independent. The 0 flow value must act as a unit for +. For example, in the path-counting flow domain + means addition on natural numbers, while for the multisets employed for the PIP it means multiset union.

<sup>4</sup> We note that flows are not generally defined in this manner as we consider any fixpoint of the flow equation to be a flow. Nonetheless, the analogy helps to build an initial intuition.

<sup>5</sup> We discuss questions regarding the existence and uniqueness of such fixpoints in §4.

Each node in a flow graph has an *inflow*, modelling contributions to its flow value which do *not* come from inside the graph. Inflows play two important roles: first, since our graphs are partial, they model contributions from nodes *outside of the graph*. Second, inflow can be artificially added as a means of specialising the computation of flow values to characterise specific graph properties. For example, in the path-counting domain, we give an inflow of 1 to the node from which we are counting paths, and 0 to all others.

*Example 4.* Let the edges in the graph in Figure 1 be labelled as described in Example 3. If the inflow function in assigns the empty multiset to every node n and we let fl(n) be the multiset labelling every node in the figure, then FlowEqn(in, e, fl) holds.

The flow equation (FlowEqn) defines the flow of a node n to be the aggregation of flow values coming from other nodes n inside the graph (as given by the respective edge function e(n , n)) as well as the inflow in(n). Preserving solutions to this equation across updates to the graph structure is a fundamental goal of our technique. The following lemma (which relies on the fact that + is required to be cancellative) states that any correct flow values uniquely determine appropriate inflow values:

Lemma 1. *Given a flow graph* (N, e, fl)*, there exists a unique inflow* in *such that* FlowEqn(in, e, fl)*.*

We now turn to how solutions of the flow equation can be preserved or appropriately updated under *changes* to the underlying graph.

*Graph Updates and Cancellativity.* Given a flow graph with known flow and inflow values, suppose we *remove* an edge from n<sup>1</sup> to n<sup>2</sup> (replacing the edge function with λ0). For the same inflow, such an update will potentially affect the flow at n<sup>2</sup> and nodes to which n<sup>2</sup> (transitively) propagates flow. Starting from the simple case that n<sup>2</sup> has no outgoing edges, we need to recompute a suitable flow at n2. Knowing the old flow value (say, m) and the contribution m = fl(n1) e(n1, n2) *previously* provided along the removed edge, we know that the correct new flow value is some m such that m + m = m. This constraint has a unique solution (and thus, we can unambiguously recompute a new flow value) exactly when the aggregation + is *cancellative*; we therefore make cancellativity a *requirement* on the + of any flow domain.

Cancellativity intuitively enforces that the flow domain carries enough information to enable adaptation to local updates (in particular, removal of edges6). Returning to the PIP example, cancellativity requires us to carry multisets as flow values rather than only the maximum priority value: + cannot be the maximum operation, as this would not be cancellative. The resulting multisets (like the prio fields in the actual code) provide the information necessary to recompute corrected priority values locally.

For example, in the PIP graph shown in Figure 1, removing the edge from p<sup>6</sup> to r4would not affect the current priority of r<sup>4</sup> whereas if p<sup>7</sup> had current priority 1 instead of 2, then the current priority of r<sup>4</sup> would have to decrease. In either case, recomputing the flow value for <sup>r</sup><sup>4</sup> is simply a matter of subtraction (removing {2} from the multiset at r4); cancellativity guarantees that our flow domains will always provide the information

<sup>6</sup> As we will show in §2.3, an analogous problem for composition of flow graphs is also directly solved by this choice to force aggregation to be cancellative.

needed for this recomputation. Without this property, the recomputation of a flow value for the target node n<sup>2</sup> would, in general, entail recomputing the incoming flow values from all remaining edges from scratch. Cancellativity is also crucial for Lemma 1 above, forcing uniqueness of inflows, given known flow values in a flow graph. This allows us to define natural but powerful notions of flow graph decomposition and recomposition.

### 2.3 Flow Graph Composition and Abstraction

Building towards the core of our reasoning technique, we now turn to the question of decomposition and recomposition of flow graphs. Two flow graphs with disjoint domains always compose to a graph, but this will be a *flow graph* only if their flows are chosen consistently to admit a solution to the resulting flow equation (i.e. the flow graph composition operator defined below is *partial*).

Definition 4 (Flow Graph Algebra). *The* flow graph algebra (FG, , H∅) *for the flow domain* (M, +, 0, E) *is defined by*

$$\begin{aligned} \mathsf{FC} &:= \left\{ (N, e, \mathcal{f} \mathfrak{f}) \mid (N, e, \mathcal{f} \mathfrak{f}) \text{ is a } \mathfrak{f} \text{low graph} \right\}, \qquad H\_{\emptyset} := (\emptyset, e\_{\emptyset}, \mathcal{f}\_{\emptyset}),\\ \left( (N\_1, e\_1, \mathcal{f}\_1) \odot (N\_2, e\_2, \mathcal{f}\_2) := \begin{cases} (N\_1 \uplus N\_2, e\_1 \uplus e\_2, \mathcal{f}\_1 \uplus \mathcal{f}\_2) & \text{if } \mathfrak{ref} \\ \bot & \text{otherwise}, \end{cases} \right. \end{aligned}$$

*where* <sup>e</sup><sup>∅</sup> *and* fl<sup>∅</sup> *are the edge functions and flow on the empty set of nodes* <sup>N</sup> <sup>=</sup> <sup>∅</sup>*.*

Intuitively, two flow graphs compose to a flow graph if their contributions to each others' flow (along edges from one to the other) are reflected in the corresponding inflow of the other graph. For example, consider the subgraph from Figure 1 consisting of the single node p<sup>7</sup> (with 0 inflow). This will compose with the remainder of the graph depicted only if this remainder subgraph has an inflow which, at node r4, includes at least the multiset {2}, reflecting the propagated value from <sup>p</sup>7.

We use this intuition to extract an *abstraction* of flow graphs which we call *flow interfaces*. Given a flow (sub)graph, its flow interface consists of the node-wise inflow and *outflow* (the flow contributions its nodes make to all nodes outside of the graph, defined below). It is thus an abstraction that hides the flow values and edges that are wholly *inside* the flow graph. Flow graphs that have the same flow interface "look the same" to the external graph, as the same values are propagated inwards and outwards.

Definition 5 (Flow Interface). *For a given flow domain* M*, a* flow interface *is a pair* <sup>I</sup> = (in, out) *where* in : <sup>N</sup> <sup>→</sup> <sup>M</sup> *and* out : <sup>N</sup> \ <sup>N</sup> <sup>→</sup> <sup>M</sup> *for some* <sup>N</sup> <sup>⊆</sup> <sup>N</sup>*.*

We write I.in,I.out for the two components of the interface I = (in, out). We will again sometimes identify I and dom(I.in) to ease notational burden.

Given a flow graph <sup>H</sup> <sup>∈</sup> FG, we can compute its interface as follows. Recall that Lemma 1 implies that any flow graph has a unique inflow. Thus, we can define an inflow function that maps each flow graph <sup>H</sup> = (N, e, fl) to the unique inflow inf(H): <sup>H</sup> <sup>→</sup> M such that FlowEqn(inf(H), e, fl). Dually, we define the *outflow* of H as the function outf(H): <sup>N</sup> \ <sup>N</sup> <sup>→</sup> <sup>M</sup> defined by outf(H)(n) := n-<sup>∈</sup><sup>N</sup> fl(n ) e(n , n). The *flow interface of* H, written int(H), is the pair (inf(H), outf(H)) consisting of its inflow

and its outflow. Returning to the previous example, if H is the singleton subgraph consisting of node p<sup>7</sup> from Figure 1 with flow and edges as depicted, then int(H) = (λn. <sup>∅</sup>, λn.(n=r<sup>4</sup> ? {2} : <sup>∅</sup>)).

This abstraction, while simple, turns out to be powerful enough to build a separation algebra over our flow graphs, allowing them to be decomposed, locally modified and recomposed in ways yielding all the local reasoning benefits of separation logics. In particular, for graph operations within a subgraph with a certain interface, we need to prove: (a) that the modified subgraph is still a flow graph (by checking that the flow equation still has a solution locally in the subgraph) and (b) that it satisfies the same interface (in other words, the effect of the modification on the flow is contained within the subgraph); the meta-level results for our technique then justify that we can recompose the modified subgraph with any graph that the original could be composed with.

We define the corresponding *flow interface algebra* as follows:

Definition 6 (Flow Interface Algebra). *For a given flow domain* M*, the* flow interface algebra *over* <sup>M</sup> *is defined to be* (FI, <sup>⊕</sup>, I∅)*, where:*

$$\begin{aligned} \mathsf{F} &:= \left\{ I \mid I \text{ is a flow interface} \right\}, \qquad I\_{\emptyset} := \mathsf{int}(H\_{\emptyset}),\\ I\_1 \oplus I\_2 &:= \left\{ \begin{array}{ll} I & I\_1 \cap I\_2 = \emptyset \\ & \wedge \forall i \neq j \in \{1, 2\}, n \in I\_i. \ I\_i. \sin(n) = I. \sin(n) + I\_j. \cot(n) \\ & \wedge \forall n \notin I. \ I. \cot(n) = I\_1. \cot(n) + I\_2. \cot(n) \\ \bot & \text{otherwise}. \end{array} \right. \end{aligned}$$

Flow interface composition is well-defined because of cancellativity of the underlying flow domain (it is also, exactly as flow graph composition, partial). We next show the key result for this abstraction: the ability for two flow graphs to compose depends only on their interfaces; flow interfaces implicitly define a congruence relation on flow graphs.

$$\text{Lemma 2. }\text{int}(H\_1) = I\_1 \land \text{int}(H\_2) = I\_2 \Rightarrow \text{int}(H\_1 \odot H\_2) = I\_1 \oplus I\_2.$$

Crucially, the following result shows that we can use our flow interfaces as an abstraction directly compatible with existing separation logics.

Theorem 1. *The flow interface algebra* (FI, <sup>⊕</sup>, I∅) *is a separation algebra.*

This result forms the core of our reasoning technique; it enables us to make modifications within a chosen subgraph and, by proving preservation of its interface, know that the result composes with any context exactly as the original did. Flow interfaces capture precisely the information relevant about a flow graph, with respect to composition with other flow graphs. In Appendix B of the accompanying technical report (hereafter, TR) [23] we provide additional examples of flow domains that demonstrate the range of data structures and graph properties that can be expressed using flows, including a notion of *universal flow* that in a sense provides a completeness result for the expressivity of the framework. We now turn to constructing proofs atop these new reasoning principles.

# 3 Proof Technique

This section shows how to integrate flow reasoning into a standard separation logic, using the priority inheritance protocol (PIP) algorithm to illustrate our proof techniques.

Since flow graphs and flow interfaces form separation algebras, it is possible in principle to define a separation logic (SL) using these notions as a custom *semantic model* (indeed, this is the proof approach taken in [22]). By contrast, we integrate flow interfaces with a *standard* separation logic without modifying its semantics. This has the important technical advantage that our proof technique can be naturally integrated with existing separation logics and verification tools supporting SL-style reasoning. We consider a standard *sequential* SL in this section, but our technique can also be directly integrated with a concurrent SL such as RGSep (as we show in §4.5) or frameworks such as Iris [18] supporting (ghost) resources ranging over user-defined separation algebras.

#### 3.1 Encoding Flow-based Proofs in SL

Proofs using our flow framework can employ a combination of specifications enforced at the node level and in terms of the flow graphs and interfaces corresponding to larger heap regions such as entire data structures (henceforth, *composite graphs* and *composite interfaces*). At the node level, we write invariants that every node is intended to satisfy, typically relating the node's flow value to its local state (fields). For example, in the PIP, we use node-local invariants to express that a node's current priority is the maximum of the node's default priority and those in its current flow value. We typically express such specifications in terms of *singleton (flow) graphs*, and their *singleton interfaces*.

Specification in terms of *composite* interfaces has several important purposes. One is to define custom inflows: e.g. in the path-counting flow domain, specifying that the inflow of a composite interface is 1 at some designated node r and 0 elsewhere enforces in any underlying flow graph that each node n's flow value will be the number of paths from r to n. <sup>7</sup> Composite interfaces can also be used to express that, in two states of execution, a portion of the heap "looks the same" with respect to composition (it has the same interface, and so can be composed with the same flow graphs), or to capture by *how much* there is an observable difference in inflow or outflow; we employ this idea in the PIP proof below.

We now define an assertion syntax convenient for capturing both node-level and composite-level constraints, defined within an SL-style proof system. We assume an *intuitionistic*, *garbage-collected* SL [6] with standard syntax and semantics:8 see Appendix A of the TR [23] for more details.

*Node Predicates.* The basic building block of our flow-based specifications is a node predicate N(x, H), representing ownership of the fields of a single node x, as well as

<sup>7</sup> Note that the analogous property cannot be captured at the node level; when considering singleton interfaces per node in a tree rooted at r, *every* singleton interface has an inflow of 1.

<sup>8</sup> As <sup>P</sup> <sup>∗</sup> <sup>φ</sup> <sup>≡</sup> <sup>P</sup> <sup>∧</sup> <sup>φ</sup> for pure formulas <sup>P</sup> in garbage-collected SLs, we use <sup>∗</sup> instead of <sup>∧</sup> throughout this paper.

capturing its corresponding singleton flow graph H:

$$\mathsf{N}(x,H) := \exists fs, ft. \, x \mapsto fs \ast H = (\{x\}, \, (\lambda y. \, \mathsf{edge}(x, fs, y)), \, ft) \ast \gamma(x, fs, ft(x))$$

N is implicitly parameterised by fs, edge and γ; these are explained next and are typically fixed across any given flow-based proof. The N predicate expresses that we have a heap cell at location x containing fields fs (a list of field-name/value mappings).9 It also says that <sup>H</sup> is a singleton flow graph with domain {x} with some flow fl, whose edge functions are defined by a user-defined abstraction function edge(x, fs, y); this function allows us to define edges in terms of x's field values. Finally, the node, its fields, and its flow in this flow graph satisfy the custom predicate γ, used to encode node-local properties such as constraints in terms of the flow values of nodes.

*Graph Predicates.* The analogous predicate for composite graphs is Gr. It carries ownership to the nodes making up a potentially unbounded graph, using iterated separating conjunction over a *set* of nodes X as mentioned in §1:

$$\mathsf{Gr}(X, H) := \exists \mathcal{H}. \bigotimes\_{x \in X} \mathsf{N}(x, \mathcal{H}(x)) \,\*\, H = \bigoplus\_{x \in X} \mathcal{H}(x).$$
 
$$\dots \qquad \qquad \qquad \qquad \ldots \qquad \qquad \qquad \qquad \qquad \qquad \ldots \qquad \ldots$$

Gr is also implicitly parameterised by fs, edge and <sup>γ</sup>. The existentially-quantified <sup>H</sup> is a logical variable representing a *function* from nodes in X to corresponding singleton flow graphs. Gr(X, H) describes a set of nodes <sup>X</sup>, such that each <sup>x</sup> <sup>∈</sup> <sup>X</sup> is an <sup>N</sup> (in particular, it satisfies γ), whose singleton flow graphs compose back to H. As well as carrying ownership of the underlying heap locations, Gr's definition allows us to connect a node-level view of the region <sup>X</sup> (each <sup>H</sup>(x)) with a composite-level view defined by H, on which we can impose appropriate graph-level properties such as constraints on the region's inflow.

*Lifting to Interfaces.* Flow based proofs can often be expressed more elegantly and abstractly using predicates in terms of node and composite-level interfaces rather than flow graphs. To this end, we overload both our node and graph predicates with analogues whose second parameter is a flow interface, defined as follows:

$$\begin{array}{lcl}\mathsf{N}(x,I) &:= \exists H.\,\mathsf{N}(x,H) \ast I = \mathsf{int}(H) \\ \mathsf{Gr}(X,I) &:= \exists H.\,\mathsf{Gr}(x,H) \ast I = \mathsf{int}(H)\end{array}$$

We will use these versions in the PIP proof below; interfaces capture all relevant properties for decomposition and composition of these flow graphs.

*Flow Lemmas.* We first illustrate our N and Gr predicates (which capture SL ownership of heap regions and abstract these with flow interfaces) by identifying a number of lemmas which are generically useful in flow-based proofs. Reasoning at the level of flow interfaces is entirely in the *pure* world (mathematics independent of heap-ownership and

<sup>9</sup> For simplicity, we assume that all fields of a flow graph node are to be handled by our flowbased technique, and that their ownership (via → points-to predicates) is always carried around together; lifting these restrictions would be straightforward.

$$\begin{array}{rclcrcl} & \mathbf{Gr}(X\_1 \uplus X\_2, H) & \hfil= & \exists H\_1, H\_2. \mathbf{Gr}(X\_1, H\_1) \ast \mathbf{Gr}(X\_2, H\_2) \\ & & \ast H\_1 \odot H\_2 = H & \text{(DECMP)} \\ \mathbf{Gr}(X\_1, H\_1) \ast \mathbf{Gr}(X\_2, H\_2) \ast H\_1 \odot H\_2 \neq \perp & \mathbf{Gr}(X\_1 \uplus X\_2, H\_1 \odot H\_2) & \text{(Comp)} \\ & & \mathsf{N}(x, H) & \equiv & \mathbf{Gr}(\{x\}, H) & \text{(SIN)} \\ & & \mathsf{emp} & \mid \mathbf{Gr}(\emptyset, H\_\emptyset) & \text{(RERMP)} \\ \mathbf{Gr}(X\_1, H\_1') \ast \mathbf{Gr}(X\_2, H\_2) \ast H = H\_1 \odot H\_2 & \mid \mathbf{Gr}(X\_1 \uplus X\_2, H\_1' \odot H\_2) & \text{(RERP)} \\ & & \ast \mathbf{int}(H\_1) = \mathsf{int}(H\_1') & \ast \mathbf{int}(H) = \mathsf{int}(H\_1' \odot H\_2) \end{array}$$

Fig. 2: Some useful lemmas for proving entailments between flow-based specifications.

resources) with respect to the underlying SL reasoning; these lemmas are consequences of our predicate definitions and the foundational flow framework definitions themselves.

Examples of these lemmas are shown in Figure 2. (DECOMP) shows that we can always decompose a valid flow graph into subgraphs which are themselves flow graphs. Recomposition (COMP) is possible only if the subgraphs compose. These rules, as well as (SING), and (GREMP) follow directly from the definition of Gr and standard SL properties of iterated separating conjunction. The final rule (REPL) is a direct consequence of rules (COMP), (DECOMP) and the congruence relation on flow graphs induced by their interfaces (cf. Lemma 2). Conceptually, it expresses that after decomposing any flow graph into two parts H<sup>1</sup> and H2, we can *replace* H<sup>1</sup> with a new flow graph H <sup>1</sup> with the same interface; when recomposing, the overall graph will be a flow graph with the same overall interface.

Note the connection between rules (COMP)/(DECOMP) and the algebraic laws of standard inductive predicates such as ls describing a segment of a linked list [2]. For instance by combining the definition of Gr with these rules and (SING) we can prove the following graph analogue of the rule to separate a list into the head node and the tail:

$$\mathsf{Gr}(X \uplus \{y\}, H) \equiv \exists H\_y, H'. \mathsf{N}(y, H\_y) \ast \mathsf{Gr}(X, H') \ast H = H\_y \odot H' \quad \text{((UN)FOLD)}$$

However, crucially (and unlike when using general inductive predicates [32]), this rule is symmetrical for any node x in X; it works analogously for any desired order of decomposition of the graph, and for any data structure specified using flows.

When working with our overloaded N and Gr predicates, similar steps to those described by the above lemmas are useful. Given these overloaded predicates, we simply apply the lemmas above to the *existentially quantified* flow-graphs in their definitions and then lift the consequence of the lemma back to the interface level using the congruence between our flow graph and interface composition notions (Lemma 2).

#### 3.2 Proof of the PIP

We now have all the tools necessary to verify the priority inheritance protocol (PIP). Figure 3 gives the full algorithm with flow-based specifications; we also include some intermediate assertions to illustrate the reasoning steps for the acquire method, which

```
1 // Let δ(m, q1, q2) := m \ (q1 ≥ 0 ? {q1} : ∅) ∪ (q2 ≥ 0 ? {q2} : ∅)
2
3 method update(n: Ref, from: Int, to: Int)
4 requires N(n, In) ∗ Gr(X \ {n} , I
                                   ) ∗ I = I
                                           n ⊕ I ∗ ϕ(I) ∗ n ∈ X
5 requires I
              n = ({n -
                       δ(In.in(n), from, to)} , In.out) ∗ from = to
6 ensures Gr(X, I)
7 {
8 n.prios := n.prios \ {from}
9 if (to >= 0) {
10 n.prios := n.prios ∪ {to}
11 }
12 from := n.curr_prio
13 n.curr_prio := max(n.prios ∪ {n.def_prio})
14 to := n.curr_prio
15
16 if (from != to && n.next != null) {
17 update(n.next, from, to)
18 }
19 }
20
21 method acquire(p: Ref, r: Ref)
22 requires Gr(X, I) ∗ ϕ(I) ∗ p ∈ X ∗ r ∈ X ∗ p = r
23 ensures Gr(X, I)
24 {
25 -

    ∃Ir, Ip, I1. N(r, Ir) ∗ N(p, Ip) ∗ Gr(X \ {r, p} , I1) ∗ I = Ir ⊕ Ip ⊕ I1 ∗ ϕ(I)

26 if (r.next == null) {
27 r.next := p;
28 // Let qr = r.curr_prio
29 
      ∃Ir, I
           r, Ip, I1. N(r, I
                       r) ∗ N(p, Ip) ∗ Gr(X \ {r, p} , I1) ∗ I = Ir ⊕ Ip ⊕ I1
       ∗ I
         r = (Ir.in, {p -
                       {qr}}) ∗ Ir.out = λ0 ∗··· 
30 |=

         ∃Ip, I
              p, I2. N(p, Ip) ∗ Gr(X \ {p} , I2) ∗ I = I
                                               p ⊕ I2
          ∗ I
            p = ({p -
                    δ(Ip.in(p), −1, qr)} , Ip.out) ∗··· 
31 update(p, -1, r.curr_prio)
32 -

      Gr(X, I)

33 } else {
34 p.next := r; update(r, -1, p.curr_prio)
35 }
36 }
37
38 method release(p: Ref, r: Ref)
39 requires Gr(X, I) ∗ ϕ(I) ∗ p ∈ X ∗ r ∈ X ∗ p = r
40 ensures Gr(X, I)
41 { r.next := null; update(p, r.curr_prio, -1) }
```
Fig. 3: Full PIP code and specifications, with proof sketch for acquire. The comments and coloured annotations (lines 29 to 32) are used to highlight steps in the proof, and are explained in detail in the text.

we explain in more detail below. <sup>10</sup> We instantiate our framework in order to capture the PIP invariants as follows:

$$\begin{aligned} fs &:= \{ \text{next} : y, \text{ curr\\_prio} : q, \text{ def\\_prio} : q^0, \text{prios} : Q \} \\ \mathsf{edge}(x, fs, z) &:= \begin{cases} (\lambda m. \max(m \cup \{q^0\})) & \text{if } z = y \neq null \\ \lambda\_0 & \text{otherwise} \end{cases} \\ \varphi(x, fs, m) &:= q^0 \ge 0 \,\* \, (\forall q' \in Q. \, q' \ge 0) \,\* \, m = Q \,\* \, q = \{ \max(Q \cup \{q^0\}) \} \\ \varphi(I) &:= I = (\lambda\_0, \lambda\_0) \end{aligned}$$

Each node has the four fields listed in fs. fs also defines variables such as y to denote field values that are used in the definitions of edge and γ; these variables are bound to the heap by N. edge abstracts the heap into a flow graph by letting each node have an edge to its next successor labelled by a function that passes to it the maximum incoming priority or the node's default priority: whichever is larger. With this definition, one can see that the flow of every node will be the multiset containing exactly the priorities of its predecessors. The node-local invariant γ says that all priorities are non-negative, the flow m of each node is stored in the prios field, and its current priority is the maximum of its default and incoming priorities. Finally, the constraint ϕ on the global interface expresses that the graph is closed – it has no inflow or outflow.

*Flows Specifications for the PIP.* Our specifications of acquire and release guarantee that if we start with a valid flow graph (closed, according to ϕ), we are guaranteed to return a valid flow graph with the same interface (i.e. the graph remains closed). For clarity of the exposition, we focus here on how we prove that being a flow graph that satisfies the PIP invariant is preserved (as is the composite flow graph's interface). Extending this specification to one which proves, e.g., that acquire adds the expected edge is straightforward (see Appendix C of the TR [23]). <sup>11</sup>

The specification for update is somewhat subtle, and exploits the full flexibility of flow interfaces as a specification medium. The preconditions of update describe an update to the graph which is not yet completed. There are three complementary aspects to this specification. Firstly, (as for acquire and release), node-local invariants (γ) hold for all nodes in the graph (enforced via N and Gr predicates). Secondly, we employ flow interfaces to express a decomposition of the original top-level interface I into compatible (primed) sub-interfaces. The key to understanding this specification is that I <sup>n</sup> is in some sense a *fake* interface; it does not abstract the current state of the heap node n. Instead, I <sup>n</sup> expresses the way in which the node n's current inflow *hasn't yet* been accounted for in the heap: that *if* n could adjust its inflow according to the propagated priority change *without* changing its outflow, then it would compose back with the rest of the graph, and restore the graph's overall interface. The shorthand δ defines the required change to n's inflow.

In general (except when n's next field is null, or n's flow value is unchanged), it is not even possible for n's fields to be updated to satisfy I <sup>n</sup>; by updating n's inflow,

<sup>10</sup> In specifications, we implicitly quantify at the top level over free variables such as I. λ<sup>0</sup> denotes an identically zero function on an unconstrained domain.

<sup>11</sup> We also omit acquire's precondition that p.next == null for brevity.

we will necessarily update its outflow. However, we can then construct a corresponding "fake" interface for the next node in the graph, reflecting the update yet to be accounted for, and establishing the precondition for the recursive call to update.

The third specification aspect is the *connection* between heap-level nodes and interfaces. The N(n, In) predicate connects n with a *different* interface; I<sup>n</sup> is the actual current abstraction of n's state. Conceptually, the key property which is broken at this point is this connection between the interface-level specification and the heap at node n, reflected by the decomposition in the specification between <sup>X</sup> \ {n} and {n}.

We note that the same specification ideas and proof style can be easily adapted to other data structure implementations with an update-notify style, including well-known designs such as Subject-Observer patterns, or the Composite pattern [27].

*Proof Outline.* To illustrate the application of flows reasoning to our PIP specification ideas more clearly, we examine in detail the first if-branch in the proof of acquire. Our intermediate proof steps are shown as purple annotations surrounded by braces. The first step, as shown in the first line inside the method body, is to apply ((UN)FOLD) twice (on the flow graphs represented by these predicates) and peel off N predicates for each of r and p. The update to r's next field (line 27) causes the correct singleton interface of r to change to I <sup>r</sup>: its outflow (previously none, since the next field was null) now propagates flow to p. We summarise this state in the assertion on line 29 (we omit e.g. repetition of properties from the function's precondition, focusing on the flow-related steps of the argument). We now rewrite this state; using the definition of interface composition (Definition 6) we deduce that although I <sup>r</sup> and I<sup>p</sup> do not compose (since the former has outflow that the latter does not account for as inflow), the alternative "fake" interface I <sup>p</sup> for p (which artificially accounts for the missing inflow) *would* do so (cf. line 30). Essentially, we show <sup>I</sup><sup>r</sup> <sup>⊕</sup> <sup>I</sup><sup>p</sup> <sup>=</sup> <sup>I</sup> <sup>r</sup> <sup>⊕</sup> <sup>I</sup> <sup>p</sup>, that the interface of {r, p} would be unchanged if p could somehow have interface I <sup>p</sup>. Now by setting I<sup>2</sup> = I <sup>r</sup> <sup>⊕</sup> <sup>I</sup><sup>1</sup> and using algebraic properties of interfaces, we assemble the precondition expected by update. After the call, update's postcondition gives us the desired postcondition.

We focused here on the details of acquire's proof, but very similar manipulations are required for reasoning about the recursive call in update's implementation.12 The main difference there is that if the if-condition wrapping the recursive call is false then either the last-modified node has no successor (and so there is no outstanding inflow change needed), or we have from = to which implies that the "fake" interface is actually the same as the currently correct one.

Despite the property proved for the PIP example being a rather delicate recursive invariant over the (potentially cyclic) graph, the power of our framework enables extremely succinct specifications for the example, and proofs which require the application of relatively few generic lemmas. The integration with standard separation logic reasoning, and the complementary separation algebras provided by flow interfaces allow decomposition and recomposition to be simple proof steps. For this proof, we integrated with standard sequential separation logic, but in the next section we will show that compatibility with concurrent SL techniques is similarly straightforward.

<sup>12</sup> We provide further proof outlines in Appendix C of the TR [23].

Fig. 4: A potential state of the Harris list with explicit memory management. fnext pointers are shown with dashed edges, marked nodes are shaded gray, and null pointers are omitted for clarity.

# 4 Advanced Flow Reasoning and the Harris List

This section introduces some advanced foundational flow framework theory and demonstrates its use in the proof of the Harris list. We note that [22] presented a proof of this data structure in the original flow framework. The proof given here shows that the new framework eliminates the need for the customized concurrent separation logic defined in [22]. We start with a recap of Harris' algorithm adapted from [22].

#### 4.1 The Harris List Algorithm

The power of flow-based reasoning is exhibited in the proof of overlaid data structures such as the Harris list, a concurrent non-blocking linked list algorithm [12]. This algorithm implements a set data structure as a sorted list, and uses atomic compare-and-swap (CAS) operations to allow a high degree of parallelism. As with the sequential linked list, Harris' algorithm inserts a new key k into the list by finding nodes k1, k<sup>2</sup> such that k<sup>1</sup> <k<k2, setting k to point to k2, and using a CAS to change k<sup>1</sup> to point to k only if it was still pointing to k2. However, a similar approach fails for the delete operation. If we had consecutive nodes k1, k2, k<sup>3</sup> and we wanted to delete k<sup>2</sup> from the list (say by setting k<sup>1</sup> to point to k3), there is no way to ensure with one CAS that k<sup>2</sup> and k<sup>3</sup> are also still adjacent (another thread could have inserted/deleted in between them).

Harris' solution is a two step deletion: first atomically mark k<sup>2</sup> as deleted (by setting a mark bit on its successor field) and then later remove it from the list using a single CAS. After a node is marked, no thread can insert or delete to its right, hence a thread that wanted to insert k to the right of k<sup>2</sup> would first remove k<sup>2</sup> from the list and then insert k as the successor of k1.

In a non-garbage-collected environment, unlinked nodes cannot be immediately freed as suspended threads might continue to hold a reference to them. A common solution is to maintain a second "free list" to which marked nodes are added before they are unlinked from the main list (this is the so-called drain technique). These nodes are then labelled with a timestamp, which is used by a maintenance thread to free them when it is safe to do so. This leads to the kind of data structure shown in Figure 4, where each node has two pointer fields: a next field for the main list and an fnext field for the free list (the list from fh to ft via dashed edges). Threads that have been suspended while holding

Fig. 5: Examples of graphs that motivate effective acyclicity. All graphs use the pathcounting flow domain, the flow is displayed inside each node, and the inflow is displayed as curved arrows to the top-left of nodes. (a) shows a graph and inflow that has no solution to (FlowEqn); (b) has many solutions. (c) shows a modification that preserves the interface of the modified nodes, yet goes from a graph that has a unique flow to one that has many solutions to (FlowEqn).

a reference to a node that was added to the free list can simply continue traversing the next pointers to find their way back to the unmarked nodes of the main list.

Even for seemingly simple properties such as that the Harris list is memory safe and not leaking memory, the proof will rely on the following non-trivial invariants:


*Challenges.* To prove that Harris' algorithm maintains the invariants listed above we must tackle a number of challenges. First, we must construct flow domains that allow us to describe overlaid data structures, such as the overlapping main and free lists (§4.2). Second, the flow-based proofs we have seen so far work by showing that the interface of some modified region is unchanged. However, if we consider a program that allocates and inserts a new node into a data structure (like the insert method of Harris), then the interface cannot be the same since the domain has changed (it has increased by the newly allocated node). We must thus have a means to reason about preservation of flows by modifications that allocate new nodes (§4.3). The third issue is that in some flow domains, there exist graphs G and inflows in for which no solutions to the flow equation (FlowEqn) exist. For instance, consider the path-counting flow domain and the graph in Figure 5(a). Since we would need to use the path-counting flow in the proof of the Harris list to encode its structural invariants, this presents a challenge (§4.4).

We will next see how to overcome these three challenges in turn, and then apply those solution to the proof of the Harris list in §4.5.

### 4.2 Product Flows for Reasoning about Overlays

An important fact about flows is that any flow of a graph over a product of two flow domains is the product of the flows on each flow domain component.

Lemma 3. *Given two flow domains* (M1, +1, 01, E1) *and* (M2, +2, 02, E2)*, the* product *domain* (M<sup>1</sup> <sup>×</sup> <sup>M</sup>2, <sup>+</sup>,(01, <sup>0</sup>2), E) *is a flow domain, where* <sup>+</sup> *and* <sup>E</sup> *are the pointwise liftings of* (+1, <sup>+</sup>2) *and* (E1, E2)*, respectively, to the domain* <sup>M</sup><sup>1</sup> <sup>×</sup> <sup>M</sup>2*.*

This lemma greatly simplifies reasoning about overlaid graph structures; we will use the product of two path-counting flows to describe a structure consisting of two overlaid lists that make up the Harris list.

#### 4.3 Contextual Extensions and the Replacement Theorem

In general, when modifying a flow graph H to another flow graph H , requiring that H satisfies *precisely* the same interface int(H) can be too strong a condition as it does not permit allocating new nodes. Instead, we want to allow int(H ) to differ from int(H) in that the new interface could have a larger domain, as long as the edges from the new nodes do not change the outflow of the modified region.

Definition 7. *An interface* I = (in, out) *is* contextually extended *by* I = (in , out )*, written* I I *, if and only if the following conditions all hold:*

*(1)* dom(in) ⊆ dom(in )*,*

*(2)* <sup>∀</sup><sup>n</sup> <sup>∈</sup> dom(in). in(n) = in (n)*, and*

*(3)* <sup>∀</sup>n ∈ dom(in ). out(n ) = out (n )*.*

The following theorem states that contextual extension preserves composability and is itself preserved under interface composition.

Theorem 2 (Replacement Theorem). *If* <sup>I</sup> <sup>=</sup> <sup>I</sup><sup>1</sup> <sup>⊕</sup> <sup>I</sup>2*, and* <sup>I</sup><sup>1</sup> <sup>I</sup> <sup>1</sup> *are all valid interfaces such that* I <sup>1</sup> <sup>∩</sup> <sup>I</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup> *and* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>I</sup> <sup>1</sup> \ <sup>I</sup>1. I2.out(n)=0*, then there exists a valid* I = I <sup>1</sup> <sup>⊕</sup> <sup>I</sup><sup>2</sup> *such that* <sup>I</sup> <sup>I</sup> *.*

In terms of our flow predicates, this theorem gives rise to the following adaptation of the (REPL) rule:

$$\begin{aligned} &\quad \mathsf{Gr}(X\_1', H\_1') \ast \mathsf{Gr}(X\_2, H\_2) \ast H = H\_1 \odot H\_2 \ast \mathsf{int}(H\_1) \preccurlyeq \mathsf{int}(H\_1')\\ &\quad \mid = \; \exists H'. \; \mathsf{Gr}(X\_1' \uplus X\_2, H') \ast H' = H\_1' \odot H\_2 \ast \mathsf{int}(H) \asymp \mathsf{int}(H') \qquad \qquad (\mathsf{RexL} \mapsto) \end{aligned}$$

The rule (REPL+) is derived from the Replacement Theorem by instantiating with I = int(H), I<sup>1</sup> = int(H1), I<sup>2</sup> = int(H2) and I <sup>1</sup> = int(H <sup>1</sup>). We know I<sup>1</sup> I 1; <sup>H</sup> <sup>=</sup> <sup>H</sup><sup>1</sup> <sup>H</sup><sup>2</sup> tells us (by Lemma 2) that <sup>I</sup> <sup>=</sup> <sup>I</sup><sup>1</sup> <sup>⊕</sup> <sup>I</sup>2, and Gr(X 1, H <sup>1</sup>) <sup>∗</sup> Gr(X2, H2) gives us I <sup>1</sup> <sup>∩</sup> <sup>I</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>. The final condition of the Replacement Theorem is to prove that there is no outflow from X<sup>2</sup> to any newly allocated node in X <sup>1</sup>. While we can use additional ghost state to prove such constraints in our proofs, if we assume that the memory allocator only allocates fresh addresses and restrict the abstraction function edge to only propagate flow along an edge (n, n ) if n has a (non-ghost) field with a reference to n then this condition is always true. For simplicity, and to keep the focus of this paper on the flow reasoning, we make this assumption in the Harris list proof.

#### 4.4 Existence and Uniqueness of Flows

We typically express global properties of a graph G = (N,e) by fixing a global inflow in : <sup>N</sup> <sup>→</sup> <sup>M</sup> and then constraining the flow of each node in <sup>N</sup> using node-local conditions. However, as we discussed at the beginning of this section, there is no general guarantee that a flow exists or is unique for a given in and G. The remainder of this section presents two complementary conditions under which we can prove that our flow fixpoint equation always has a unique solution. To this end, we say that a flow domain (M, +, 0, E) has *unique flows* if for every graph (N,e) over this flow domain and inflow in : <sup>N</sup> <sup>→</sup> <sup>M</sup>, there exists a unique fl that satisfies the flow equation FlowEqn(in, e, fl). But first, we briefly recall some more monoid theory.

We say M is *positive* if m<sup>1</sup> + m<sup>2</sup> = 0 implies that m<sup>1</sup> = m<sup>2</sup> = 0. For a positive monoid <sup>M</sup>, we can define a partial order <sup>≤</sup> on its elements as <sup>m</sup><sup>1</sup> <sup>≤</sup> <sup>m</sup><sup>2</sup> if and only if <sup>∃</sup>m3. m<sup>1</sup> <sup>+</sup> <sup>m</sup><sup>3</sup> <sup>=</sup> <sup>m</sup>2. This definition implies that every <sup>m</sup> <sup>∈</sup> <sup>M</sup> satisfies <sup>0</sup> <sup>≤</sup> <sup>m</sup>.

For e, e : <sup>M</sup> <sup>→</sup> <sup>M</sup>, we write <sup>e</sup> <sup>+</sup> <sup>e</sup> for the function that maps <sup>m</sup> <sup>∈</sup> <sup>M</sup> to <sup>e</sup>(m) + e (m). We lift this construction to a set of functions E and write it as <sup>e</sup>∈<sup>E</sup> <sup>e</sup>.

Definition 8. *A function* <sup>e</sup>: <sup>M</sup> <sup>→</sup> <sup>M</sup> *is called an* endomorphism *on* <sup>M</sup> *if for every* <sup>m</sup>1, m<sup>2</sup> <sup>∈</sup> <sup>M</sup>*,* <sup>e</sup>(m<sup>1</sup> <sup>+</sup> <sup>m</sup>2) = <sup>e</sup>(m1) + <sup>e</sup>(m2)*. We denote the set of all endomorphisms on* M *by* End(M)*.*

Note that for cancellative <sup>M</sup>, <sup>e</sup>(0) = 0 for every endomorphism <sup>e</sup> <sup>∈</sup> End(M). Note further that <sup>e</sup> <sup>+</sup> <sup>e</sup> <sup>∈</sup> End(M) for any e, e <sup>∈</sup> End(M). Similarly, for finite sets <sup>E</sup> <sup>⊆</sup> End(M), <sup>e</sup>∈<sup>E</sup> <sup>e</sup> <sup>∈</sup> End(M). We say that a set of endomorphisms <sup>E</sup> <sup>⊆</sup> End(M) is *closed* if for every e, e <sup>∈</sup> <sup>E</sup>, <sup>e</sup> ◦ <sup>e</sup> <sup>∈</sup> <sup>E</sup> and <sup>e</sup> <sup>+</sup> <sup>e</sup> <sup>∈</sup> <sup>E</sup>.

*Nilpotent Cycles.* Let (M, <sup>+</sup>, <sup>0</sup>, E) be a flow domain where every edge function <sup>e</sup> <sup>∈</sup> <sup>E</sup> is an endomorphism on M. In this case, we can show that the flow of a node n is the sum of the flow as computed along *each path* in the graph that ends at n. Suppose we additionally know that the edge functions are defined such that their composition along any *cycle* in the graph eventually becomes the identically zero function. We then need only consider finitely many paths to compute the flow of a node, which means the flow equation has a unique solution.

Definition 9. *A closed set of endomorphisms* <sup>E</sup> <sup>⊆</sup> End(M) *is called* nilpotent *if there exists* p > <sup>1</sup> *such that* <sup>e</sup><sup>p</sup> <sup>≡</sup> <sup>0</sup> *for every* <sup>e</sup> <sup>∈</sup> <sup>E</sup>*.*

*Example 5.* The flow domain (N2, <sup>+</sup>,(0, 0), {(λ(x, y). (0, c · <sup>x</sup>)) <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>N</sup>}) contains nilpotent edge functions that shift the first component of the flow to the second (with a scaling factor). This domain can be used to express the property that every node in a graph is reachable from the root via a single edge (by requiring the flow of every node to be (0, 1) under the inflow (λn. (n = r ? (1, 0) : (0, 0)))).

Before we prove that nilpotent endomorphisms lead to unique flows, we present a useful notion when dealing with endomorphic flow domains.

Definition 10. *The* capacity *of a flow graph* <sup>G</sup> = (N,e) *is* cap(G): <sup>N</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> (<sup>M</sup> <sup>→</sup> M)*, defined inductively as* cap(G) := cap|G<sup>|</sup> (G)*, where* cap<sup>0</sup>(G)(n, n ) := δn=n*and*

$$\mathsf{cap}^{i+1}(G)(n,n') \coloneqq \delta\_{n=n'} + \sum\_{n'' \in G} \mathsf{cap}^i(G)(n,n'') \diamond e(n'',n').$$

For a flow graph H = (N, e, fl), we write cap(H)(n, n ) = cap((N,e))(n, n ) for the capacity of the underlying graph. Intuitively, cap(G)(n, n ) is the function that summarizes how flow is routed from any source node n in G to any other node n , including those outside of G.

We can now show that if all edges of a flow graph are labelled with edges from a nilpotent set of endomorphisms, then the flow equation has a unique solution:

Lemma 4. *If* (M, +, 0, E) *is a flow domain such that* M *is a positive monoid and* E *is a nilpotent set of endomorphisms, then this flow domain has unique flows.*

*Effectively Acyclic Flow Graphs.* There are some flow domains that compute flows useful in practice, but which do not guarantee either existence or uniqueness of fixpoints *a priori* for all graphs. For example, the path-counting flow from Example 1 is one where for certain graphs, there exist no solutions to the flow equation (see Figure 5(a)), and for others, there can exist more than one (in Figure 5(b), the nodes marked with x can have any path count, as long as they both have the same value).

In such cases, we explore how to restrict the class of *graphs* we use in our flow-based proofs such that each graph has a unique fixpoint; the difficulty is that this restriction must be respected for composition of our graphs. Here, we study the class of flow domains (M, +, 0, E) such that M is a positive monoid and E is a set of *reduced* endomorphisms (defined below). In such domains we can decompose the flow computations into the various paths in the graph, and achieve unique fixpoints by restricting the kinds of cycles graphs can have.

Definition 11. *A flow graph* <sup>H</sup> = (N, e, fl) *is* effectively acyclic (EA) *if for every* <sup>1</sup> <sup>≤</sup> <sup>k</sup> *and* <sup>n</sup>1,...,n<sup>k</sup> <sup>∈</sup> <sup>N</sup>*,*

$$f l(n\_1) \rhd e(n\_1, n\_2) \dotsm e(n\_{k-1}, n\_k) \rhd e(n\_k, n\_1) = 0.1$$

The simplest example of an effectively acyclic graph is one where the edges with non-zero edge functions form an acyclic graph. However, our semantic condition is weaker: for example, when reasoning about two overlaid acyclic lists whose union happens to form a cycle, a product of two path-counting domains will satisfy effective acyclicity because the composition of different types of edges results in the zero function.

Lemma 5. *Let* (M, +, 0, E) *be a flow domain such that* M *is a positive monoid and* E *is a closed set of endomorphisms. Given a graph* (N,e) *over this flow domain and inflow* in : <sup>N</sup> <sup>→</sup> <sup>M</sup>*, if there exists a flow graph* <sup>H</sup> = (N, e, fl) *that is effectively acyclic, then* fl *is unique.*

While the restriction to effectively acyclic flow graphs guarantees us that the flow is the unique fixpoint of the flow equation, it is not easy to show that modifications to the graph preserve EA while reasoning locally. Even modifying a subgraph to another with the same flow interface (which we know guarantees that it will compose with any context) can inadvertently create a cycle in the larger composite graph. For instance, consider Figure 5(c), that shows a modification to nodes {n3, n4} (the boxed blue region). The interface of this region is ({n<sup>3</sup> - 1, n<sup>4</sup> - <sup>1</sup>} , {n<sup>5</sup> - 1, n<sup>2</sup> -1}), and so swapping

the edges of n<sup>3</sup> and n<sup>4</sup> preserves this interface. However, the resulting graph, despite composing with the context to form a valid flow graph, is not EA (in this case, it has multiple solutions to the flow equation). This shows that flow interfaces are not powerful enough to preserve effective acyclicity. For a special class of endomorphisms, we show that a local property of the modified subgraph can be checked, which implies that the modified composite graph continues to be EA.

Definition 12. *A closed set of endomorphisms* <sup>E</sup> <sup>⊆</sup> End(M) *is called* reduced *if* <sup>e</sup> ◦ <sup>e</sup> <sup>≡</sup> <sup>λ</sup><sup>0</sup> *implies* <sup>e</sup> <sup>≡</sup> <sup>λ</sup><sup>0</sup> *for every* <sup>e</sup> <sup>∈</sup> <sup>E</sup>*.*

Note that if <sup>E</sup> is reduced, then no <sup>e</sup> <sup>∈</sup> <sup>E</sup> can be nilpotent. In that sense, this class of instantiations is complementary to the nilpotent class.

*Example 6.* Examples of flow domains that fall into this class include positive semirings of reduced rings (with the additive monoid of the semiring being the aggregation monoid of the flow domain and E being any set of functions that multiply their argument with a constant flow value). Note that any direct product of integral rings is a reduced ring. Hence, products of the path counting flow domain are a special case.

For reduced endomorphisms, it suffices to check that a modification preserves the flow routed between every pair of source and sink node in order to ensure that it does not create any new cycles in any composite graph.

Definition 13. *A flow graph* H *is a* subflow-preserving extension *of* H*, for which we write* H <sup>s</sup> H *, if the following conditions all hold:*

*(1)* int(H) int(H ) *(2)* <sup>∀</sup><sup>n</sup> <sup>∈</sup> H, n ∈ <sup>H</sup> , m. m <sup>≤</sup> inf(H)(n) <sup>⇒</sup> mcap(H)(n, n ) = mcap(H )(n, n ) *(3)* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>H</sup> \ H, n ∈ <sup>H</sup> , m. m <sup>≤</sup> inf(H )(n) <sup>⇒</sup> m cap(H )(n, n )=0

This pairwise check, apart from requiring the interface of the modified region to be unchanged, also permits allocating new nodes as long as no flow is routed via the new nodes (condition (3)). We now show that it is sufficient to check that a modification is a subflow-preserving extension to guarantee composition back to an effectively-acyclic composite graph:

Theorem 3. *Let* (M, +, 0, E) *be a flow domain such that* M *is a positive monoid and* E *is a reduced set of endomorphisms. If* <sup>H</sup> <sup>=</sup> <sup>H</sup><sup>1</sup> <sup>H</sup><sup>2</sup> *and* <sup>H</sup><sup>1</sup> <sup>s</sup> <sup>H</sup> <sup>1</sup> *are all effectively acyclic flow graphs such that* H <sup>1</sup> <sup>∩</sup> <sup>H</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup> *and* <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>H</sup> <sup>1</sup> \ <sup>H</sup>1. outf(H2)(n)=0*, then there exists an effectively acyclic flow graph* H = H <sup>1</sup> <sup>H</sup><sup>2</sup> *such that* <sup>H</sup> <sup>s</sup> <sup>H</sup> *.*

We define effectively acyclic versions of our flow graph predicates, Na(x, H) and Gra(X, H), that additionally constrain H to be effectively acyclic. The above theorem yields the following variant of the (REPL) rule for EA graphs:

$$\begin{aligned} \mathsf{Gr}\_{\mathfrak{a}}(X\_1', H\_1') \ast \mathsf{Gr}\_{\mathfrak{a}}(X\_2, H\_2) \ast H &= H\_1 \odot H\_2 \ast H\_1 \underset{\sim s}{\leadsto} H\_1' \\ \mid \vdash \exists H'. \mathsf{Gr}\_{\mathfrak{a}}(X\_1' \oplus X\_2, H') \ast H' &= H\_1' \odot H\_2 \ast H \underset{\sim s}{\precsim} H' \end{aligned} \tag{\mathsf{REPLEA}}$$

#### 4.5 Proof of the Harris List

We use the techniques seen in this section in the proof of the Harris list. As the data structure consists of two potentially overlapping lists, we use Lemma 3 to construct a product flow domain of two path-counting flows: one tracks the path count from the head of the main list, and one from the head of the free list. We also work under the effectively acyclic restriction (i.e. we use the N<sup>a</sup> and Gr<sup>a</sup> predicates), both in order to obtain the desired interpretation of the flow as well as to ensure existence of flows in this flow domain.

We instantiate the framework using the following definitions of parameters:

fs := {key: k, next: y, fnext: <sup>z</sup>} edge(x, fs, v) := (<sup>v</sup> <sup>=</sup> null ? <sup>λ</sup><sup>0</sup> : (<sup>v</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup> ? <sup>λ</sup>(1,0) : (<sup>v</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup> ? <sup>λ</sup>(0,1) : (<sup>v</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup> ? <sup>λ</sup>id : <sup>λ</sup>0)))) <sup>γ</sup>(x, fs, I) := (I.in(x) ∈ {(1, 0),(0, 1),(1, 1)}) <sup>∗</sup> (I.in(x) = (1, 0) <sup>⇒</sup> <sup>M</sup>(y)) <sup>∗</sup> (<sup>x</sup> <sup>=</sup> ft <sup>⇒</sup> I.in(x)=(\_, 1)) <sup>∗</sup> (¬M(y) <sup>⇒</sup> <sup>z</sup> <sup>=</sup> null) ϕ(I) := I = (λ0[mh - (1, 0)][fh -(0, 1)], λ0)

Here, edge encodes the edge functions needed to compute the product of two path counting flows, the first component tracks path-counts from mh on next edges and the second tracks path-counts from fh on fnext edges 13. The node-local invariant γ says: the flow is one of {(1, 0),(0, 1),(1, 1)} (meaning that the node is on one of the two lists, invariant (a)); if the flow is not (1, 0) (the node is not only on the main list, i.e. it is on the free list) then the node is marked (indicated by M(y), invariant (c)); and if the node is ft then it must be on the free list (invariant (d)). The constraint on the global interface, ϕ, says that the inflow picks out mh and fh as the roots of the lists, and there is no outgoing flow (thus, all non-null edges must stay within the graph, invariant (b)).

Since the Harris list is a concurrent algortihm, we perform the proof in rely-guarantee separation logic (RGSep) [41]. Like in §3, we do not need to modify the semantics of RGSep in any way; our flow-based predicates can be defined and reasoning using our lemmas can be performed in the logic out-of-the-box. For space reasons, the full proof can be found in Appendix D of the TR [23].

# 5 Related Work

As mentioned in §1, the most closely related work is the flow framework developed by some of the authors in [22]. We here present a simplified and generalized meta theory of flows that makes the approach much more broadly applicable. There were a number of limitations of the prior framework that prevented its application to more general classes of examples.

First, [22] required flow domains to form a semiring; the analogue of edge functions are restricted to multiplication with a constant which must come from the same flow

<sup>13</sup> We use the shorthands λ(1,0) := (λ(m1, m2). (m1, 0)) and λ(0,1) := (λ(m1, m2). (0, m2)), and denote an anonymous existentially-quantified variable by \_.

value set. This restriction made it complex to encode many graph properties of interest. For example, one could not easily encode the PIP flow, or a simple flow that counts the number of incoming edges to each node. Our foundational flow framework decouples the algebraic structure defining how flow is *aggregated* from the algebraic structure of the edge functions. In this way, we obtain a more general framework that applies to many more examples, and with simpler flow domains.

Second, in [22], a flow graph did not uniquely determine its inflow (cf. Lemma 1). Correspondingly, [22]'s notion of interface included an *equivalence class* of inflows (all those that induce the same flow values). Since, in [22], the interface also determines which modifications are permitted by the framework, [22] could only handle modifications that preserve the inflow equivalence class. For example, this prevents one from reasoning locally about the removal of a single edge from a graph in certain cases (in particular, like release does in the PIP). Our foundational flow framework solves this problem by requiring that the aggregation operation on flow values is cancellative, guaranteeing unique inflows.

Cancellativity is fundamentally incompatible with [22], which requires the flow domain to form an ω-CPO in order to guarantee the existence of unique flows. For example, in a graph with two nodes n and n with identity edges between them and all other edges zero (in [22], edges labelled with 1 and 0), if we have in(n)=0 and in(n) = m for some non-zero m, a solution to the flow equation must satisfy fl(n) = m + fl(n). [22] forces such solutions to exist, ruling out cancellativity. To solve this problem, we present a new theory which can optionally guarantee unique flows when desired and show that requiring cancellativity does not limit expressivity.

Next, the proofs of programs shown in [22] depend on a bespoke program logic. This logic requires new reasoning primitives that are not supported by the logics implemented in existing SL-based verification tools. Our general proof technique eliminates the need for a dedicated program logic and can be implemented on top of standard separation logics and existing SL-based tools. Finally, the underlying separation algebra of the original framework makes it hard to use equational reasoning, which is a critical prerequisite for enabling proof automation.

An abundance of SL variants provide complementary mechanisms for modular reasoning about programs (e.g. [18, 36, 38]). Most are parameterized by the underlying separation algebra; our flow-based reasoning technique easily integrates with these existing logics.

The most common approach to reason about irregular graph structures in SL is to use iterated separating conjunction [30, 44] and describe the graph as a set of nodes each of which satisfies some local invariant. This approach has the advantage of being able to naturally describe general graphs. However, it is hard to express non-local properties that involve some form of fixpoint computation over the graph structure. One approach is to abstract the program state as a mathematical graph using iterated separating conjunction and then express non-local invariants in terms of the abstract graph rather than the underlying program state [14, 35, 38]. However, a proof that a modification to the state maintains a global invariant of the abstract graph must then often revert back to non-local and manual reasoning, involving complex inductive arguments about paths, transitive closure, and so on. Our technique also exploits iterated separating conjunction for the underlying heap ownership, with the key benefit that flow interfaces exactly capture the necessary conditions on a modified subgraph in order to compose with *any* context and preserve desired non-local invariants.

In recent work, Wang et al. present a Coq-mechanised proof of graph algorithms in C, based on a substantial library of graph-related lemmas, both for mathematical and heap-based graphs [42]. They prove rich functional properties, integrated with the VST tool. In contrast to our work, a substantial suite of lemmas and background properties are necessary, since these specialise to particular properties such as reachability. We believe that our foundational flow framework could be used to simplify framing lemmas in a way which remains parameteric with the property in question.

Proofs of a number of graph algorithms have been mechanized in various verification tools and proof assistants, including Tarjan's SCC algorithm [8], union-find [7], Kruskal's minimum spanning tree algorithm [13], and network flow algorithms [25]. These proofs generally involve non-local reasoning arguments about mathematical graphs.

An alternative approach to using SL-style reasoning is to commit to global reasoning but remain within decidable logics to enable automation [16, 21, 24, 28, 43]. However, such logics are restricted to certain classes of graphs and certain types of properties. For instance, reasoning about reachability in unbounded graphs with two successors per node is undecidable [15]. Recent work by Ter-Gabrielyan et al. [40] shows how to deal with modular framing of *pairwise reachability* specifications in an imperative setting. Their framing notion has parallels to our notion of interface composition, but allows subgraphs to *change* the paths visible to their context. The work is specific to a reachability relation, and cannot express the rich variety of custom graph properties available in our technique.

Dynamic frames [19] (e.g. implemented in Dafny [26]), can be used to explicitly reason about framing of heap information in a first-order logic. However, by itself, this theory does not enable modular reasoning about global graph properties. We believe that the flow framework could in principle be adapted to the dynamic frames setting.

# 6 Conclusions and Future Work

We have presented the foundational flow framework, enabling local modular reasoning about recursively-defined properties over general graphs. The core reasoning technique has been designed to make minimal mathematical requirements, providing great flexibility in terms of potential instantiations and applications. We identified key classes of these instantiations for which we can provide existence and uniqueness guarantees for the fixpoint properties our technique addresses and demonstrate our proof technique on several challenging examples. As future work, we plan to automate flow-based proofs in our new framework using existing tools that support SL-style reasoning such as Viper [29] and GRASShopper [34].

*Acknowledgments.* This work is funded in parts by the National Science Foundation under grants CCF-1618059 and CCF-1815633.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Aneris: A Mechanised Logic for Modular Reasoning about Distributed Systems

Morten Krogh-Jespersen, Amin Timany -, Marit Edna Ohlenbusch, Simon Oddershede Gregersen , and Lars Birkedal

Aarhus University, Aarhus, Denmark

Abstract. Building network-connected programs and distributed systems is a powerful way to provide scalability and availability in a digital, always-connected era. However, with great power comes great complexity. Reasoning about distributed systems is well-known to be difficult. In this paper we present Aneris, a novel framework based on separation logic supporting modular, node-local reasoning about concurrent and distributed systems. The logic is higher-order, concurrent, with higherorder store and network sockets, and is fully mechanized in the Coq proof assistant. We use our framework to verify an implementation of a load balancer that uses multi-threading to distribute load amongst multiple servers and an implementation of the two-phase-commit protocol with a replicated logging service as a client. The two examples certify that

Keywords: Distributed systems · Separation logic · Higher-order logic · Concurrency · Formal verification

Aneris is well-suited for both horizontal and vertical modular reasoning.

# 1 Introduction

Reasoning about distributed systems is notoriously difficult due to their sheer complexity. This is largely the reason why previous work has traditionally focused on verification of protocols of core network components. In particular, in the context of model checking, where safety and liveness assertions [29] are considered, tools such as SPIN [9], TLA+ [23], and Mace [17] have been developed. More recently, significant contributions have been made in the field of formal proofs of *implementations* of challenging protocols, such as two-phase-commit, lease-based key-value stores, Paxos, and Raft [7, 25, 30, 35, 40]. All of these developments define domain specific languages (DSLs) specialized for distributed systems verification. Protocols and modules proven correct can be compiled to an executable, often relying on some trusted code-base.

Formal reasoning about distributed systems has often been carried out by giving an abstract model in the form of a *state transition system* or *flow-chart* in the tradition of Floyd [5], Lamport [21, 22]. A state is normally taken to be a

<sup>-</sup> This research was carried out while Amin Timany was at KU Leuven, working as a postdoctoral fellow of the Flemish research fund (FWO).

view of the global state and events are observable changes to this state. State transition systems are quite versatile and have been used in other verification applications. However, reasoning based on state transition systems often suffer from a lack of modularity due to their very global. As a consequence, separate nodes or components cannot be verified in isolation and the system has to be verified as a whole.

IronFleet [7] is the first system that supports node-local reasoning for verifying the implementation of programs that run on different nodes. In IronFleet, a distributed system is modeled by a transition system. This transition system is shown to be refined by the composition of a number of transition systems, each pertaining to one of the nodes in the system. Each node in the distributed system is shown to be correct and a refinement of its corresponding transition system. Nevertheless, IronFleet does not allow you to reason compositionally; a correctness proof for a distributed system cannot be used to show the correctness of a larger system.

Higher-order concurrent separation logics (CSLs) [3, 4, 13, 15, 18, 26, 27, 28, 33, 34, 36, 39] simplify reasoning about higher-order imperative concurrent programs by offering facilities for specifying and proving correctness of programs in a modular way. Indeed, their support for modular reasoning (a.k.a. compositional reasoning) is the key reason for their success. Disel [35] is a separation logic that does support compositional reasoning about distributed systems, allowing correctness proofs of distributed systems to be used for verifying larger systems. However, Disel struggles with node-local reasoning in that it cannot hide nodelocal usage of mutable state. That is, the use of internal state in nodes must be exposed in the high-level protocol of the system and changes to the internal state are only possible upon sending and receiving messages over the network.

Finally, both Disel and IronFleet restrict nodes to run only sequential programs and no node-level concurrency is supported.

In this paper we present Aneris, a framework for implementing and reasoning about functional correctness of distributed systems. Aneris is based on concurrent separation logic and supports modular reasoning with respect to both nodes (node-local reasoning) and threads within nodes (thread-local reasoning). The Aneris framework consists of a programming language, AnerisLang, for writing realistic, real-world distributed systems and a higher-order concurrent separation logic for reasoning about these systems. AnerisLang is a concurrent ML-like programming language with higher-order functions, local state, threads, and network primitives. The operational semantics of the language, naturally, involves multiple hosts (each with their own heap and multiple threads) running in a network. The Aneris logic is build on top of the Iris framework [13, 15, 18] and supports machine-verified formal proofs in the Coq proof assistant about distributed systems written in AnerisLang.

*Networking.* There are several ways of adding network primitives to a programming language. One approach is *message-passing* using first-class communication channels á la the π-calculus or using an implementation of the actor model as done in high-level languages like Erlang, Elixir, Go, and Scala. However, any

such implementation is an abstraction built on top of network sockets where all data has to be serialized, data packets may be dropped, and packet reception may not follow the transmission order. Network sockets are a quintessential part of building efficient, real-world distributed systems and all major operating systems provide an application programming interface (API) to them. Likewise, AnerisLang provides support for datagram-like sockets by directly exposing a simple API with the core methods necessary for socket-based communication using the User Datagram Protocol (UDP) with duplicate protection. This allows for a wide range of real-world systems and protocols to be implemented (and verified) using the Aneris framework.

*Modular Reasoning in* Aneris*.* In general, there are two different ways to support modular reasoning about distributed systems corresponding to how components can be composed. Aneris enables simultaneously both:


Node-local variants of the standard rules of CSLs like, for example, the bind rule and the frame rule (as explained in Sect. 2) enable vertical reasoning. Sect. 6 showcases vertical reasoning in Aneris using a replicated distributed logging service that is implemented and verified using a separate implementation and specification of the two-phase commit protocol.

Horizontal reasoning in Aneris is achieved through the Thread-par-rule and the Node-par-rule (further explained in Sect. 2) which intuitively says that to verify a distributed system, it suffices to verify each thread and each node in isolation. This is analogous to how CSLs allow us to reason about multi-threaded programs by considering individual threads in isolation; in Aneris we extend this methodology to include both threads and nodes. Where most variants of concurrent separation logic use some form of an invariant mechanism to reason about shared-memory concurrency, we abstract the communication between nodes over the network through *socket protocols* that restrict what can be sent and received on a socket and allow us to share ownership of logical resources among nodes. Sect. 5 showcases horizontal reasoning in Aneris using an implementation and a correctness proof for a simple addition service that uses a load balancer to distribute the workload among several addition servers. Each node is verified in isolation and composed to form the final distributed system.

*Contributions.* In summary, we make the following contributions:

	- A replicated logging service that is implemented and verified using a separate implementation and specification of the two-phase commit protocol, demonstrating vertical compositional reasoning.
	- A load balancer that distributes work on multiple servers by means of node-local multi-threading. We use this to verify a simple addition service that uses the load balancer to distribute its requests over multiple servers, demonstrating horizontal compositional reasoning.

*Outline.* We start by describing the core concepts of the Aneris framework in Sec. 2. We then describe the AnerisLang programming language (Sec. 3) before presenting the Aneris logic proof rules and stating our adequacy theorem, *i.e.*, soundness of Aneris, in Sec. 4. Subsequently, we use the logic to verify a load balancer (Sec. 5) and a two-phase-commit implementation with a replicated logging client (Sec. 6). We discuss related work in Sec. 7 and conclude in Sec. 8.

# 2 The Core Concepts of Aneris

In this section we present our methodology to modular verification of distributed systems. We begin by recalling the ideas of thread-local reasoning and protocols from concurrent separation logic and explain how we lift those ideas to *nodelocal* reasoning. Finally, we illustrate the Aneris methodology for specifying, implementing, and verifying distributed systems by developing a simple addition service and a lock server. The distributed systems are composed of individually verified concurrently running nodes communicating asynchronously by exchanging messages that can be reordered or dropped.

# 2.1 Local and Thread-Local Reasoning

The most important feature of (concurrent) separation logic is, arguably, how it enables scalable modular reasoning about pointer-manipulating programs.

Separation logic is a resource logic, in the sense that propositions denote not only facts about the state, but *ownership* of resources. Originally, separation logic [32] was introduced for modular reasoning about the heap—i.e. the notion of resource was fixed to be logical pieces of the heap. The essential idea is that we can give a local specification {P} e {v.Q} to a program e involving only the *footprint* of e. Hence, while verifying e, we need not consider the possibility that another piece of code in the program might interfere with e; the program e can be verified without concern for the environment in which e may occur. Local specifications can then be lifted to more global specifications by framing and binding:

$$\begin{array}{cc} \{P\} e \{v.Q\} \\ \hline \{P\*R\} e \{v.Q\*R\} \end{array} \qquad\qquad\qquad\qquad \begin{array}{cc} \{P\} e \{v.Q\} &\forall v. \{Q\} \ K[v] \{w.R\} \\ \hline \{P\} K[e] \{w.R\} \end{array} \}$$

where K denotes an evaluation context. The symbol <sup>∗</sup> denotes separating conjunction. Intuitively, P <sup>∗</sup> Q holds for a given resource (in this case a heap) if it can be divided into two disjoint resources such that <sup>P</sup> holds for one and <sup>Q</sup> holds for the other. Thus, the frame rule essentially says that executing e for which we know {P} <sup>e</sup> {x.Q} cannot possibly affect parts of the heap that are *separate* from its footprint. Another related separation logic connective is −∗, the separating implication. Proposition P −∗ Q describes a resource that, combined with a disjoint resource satisfying P, results in a resource satisfying Q.

Since its introduction, separation logic has been extended to resources beyond heaps and with more sophisticated mechanisms for modular control of interference. Concurrent separation logics (CSLs) [28] allow reasoning about concurrent programs and a preeminent feature of these program logics is again the support for modular reasoning, in this case with respect to concurrency through *thread-local* reasoning. When reasoning about a concurrent program we consider threads one at a time and need not reason about interleavings of threads explicitly. In a way, our frame here includes, in addition to the shared fragments of the heap and other resources, the execution of other threads which can be interleaved throughout the execution of the thread being verified. This can be seen from the following disjoint concurrency rule:

$$\begin{array}{cc} \text{T}\_{\text{THEAD}-\text{PAR}}\\ \hline \{P\_1\} \langle n; e\_1 \rangle \{v.Q\_1\} & \{P\_2\} \langle n; e\_2 \rangle \{v.Q\_2\} \\\hline \{P\_1\*P\_2\} \langle n; e\_1 \parallel e\_2 \rangle \{v.\exists v\_1, v\_2. v = (v\_1, v\_2)\*Q\_1[v\_1/v]\*Q\_2[v\_2/v] \} \end{array}$$

where <sup>e</sup><sup>1</sup> || <sup>e</sup><sup>2</sup> denotes parallel composition of expressions <sup>e</sup><sup>1</sup> and <sup>e</sup><sup>2</sup> and we use the notation n; e to denote an expression e running on a node with identifier n. 1

Inevitably, at some point threads typically have to communicate with one another through some kind of shared state, an unavoidable form of interference. The original CSL used a simple form of resource invariant in which ownership of a shared resource can be transferred between threads.

<sup>1</sup> In a language with fork-based concurrency, the parallel composition operator is an easily defined construct and the rule is derivable from a more general fork-rule.

A notable program logic in the family of concurrent separation logics is Iris that is specifically designed for reasoning about programs written in concurrent higher-order imperative programming languages. Iris has already proven to be versatile for reasoning about a number of sophisticated properties of programming languages [12, 16, 37]. In order to support modular reasoning about concurrent programs Iris features (1) *impredicative invariants* for expressing protocols on shared state among multiple threads and (2) allows for encoding of *higher-order ghost state* using a form of partial commutative monoids for reasoning about resources. We will give examples of these features and explain them in more detail as needed.

#### 2.2 Node-Local Reasoning

Programs written in AnerisLang are higher-order imperative concurrent programs that run on multiple nodes in a distributed system. When reasoning about distributed systems in Aneris, alongside heap-local and thread-local reasoning, we also reason *node-locally*. When proving correctness of AnerisLang programs we reason about each node of the system in isolation, akin to how we in CSLs reason about each thread in isolation.

By virtue of building on Iris, reasoning in Aneris is naturally modular with respect to separation logic frames and with respect to threads. What Aneris adds on top of this is support for *node-local* reasoning about programs. This is expressed by the following rule:

$$\begin{array}{c} \{ \begin{aligned} \text{NODE-PAR} \\ \{ \begin{subarray}{c} \mathsf{T}\_{1} \ast \mathsf{ls} \mathsf{Node}(n\_{1}) \ast \mathsf{FreePords}(ip\_{1}, \mathsf{NP}) \end{subarray} \Big\} \langle n\_{1}; e\_{1} \rangle \,\Big\} \,\mathsf{True} \} \\ \begin{aligned} \{ \begin{subarray}{c} P\_{2} \ast \mathsf{ls} \mathsf{Node}(n\_{2}) \ast \mathsf{FreePords}(ip\_{2}, \mathsf{NP}) \end{subarray} \Big\} \,\langle n\_{2}; e\_{2} \rangle \,\Big\} \,\mathsf{True} \end{aligned} \end{array} \right\} = \begin{cases} \mathsf{True} \end{aligned} $$

where ||| denotes parallel composition of two nodes with identifier <sup>n</sup><sup>1</sup> and <sup>n</sup><sup>2</sup> running expressions <sup>e</sup><sup>1</sup> and <sup>e</sup><sup>2</sup> with IP addresses ip<sup>1</sup> and ip2. <sup>2</sup> The set P = {p <sup>|</sup> <sup>0</sup> <sup>≤</sup> p <sup>≤</sup> <sup>65535</sup>} denotes a finite set of ports.

Note that only a distinguished system node S can start new nodes (as elaborated on in Sect. 3). In Aneris, the execution of the distributed system starts with the execution of S as the only node in the system. In order to start a new node associated with ip address ip one provides the resource FreeIp(ip) which indicates that ip is not used by other nodes. The node can then rely on the fact that when it starts, all ports on ip are available. The resource IsNode(n) indicates that the node n is a node in the system and keeps track of abstract state related to our modeling of node n's heap and allocated sockets. To facilitate modular reasoning, free ports can be split: if <sup>A</sup> <sup>∩</sup> <sup>B</sup> <sup>=</sup> <sup>∅</sup> then FreePorts(ip, A) <sup>∗</sup> FreePorts(ip, B) FreePorts(ip, A <sup>∪</sup> B) where denotes

<sup>2</sup> In the same way as the parallel composition rule is derived from a more general fork-based rule, this composition rule is also an instance of a more general rule for spawning nodes shown in Sect. 3.

logical equivalence of Aneris propositions (of type *iProp*). We will use FreePort(a) as shorthand for FreePorts(ip, {p}) where a = (ip, p).

Finally, observe that the node-local postconditions are simply True, in contrast to the arbitrary thread-local postconditions in the Thread-par-rule that carry over to the main thread. In the concurrent setting, shared memory provides reliable communication and synchronization between the child threads and the main thread; in the rule for parallel composition, the main thread will wait for the two child processes to finish. In the distributed setting, there are no such guarantees and nodes are separate entities that cannot synchronize with the distinguished system node.

*Socket Protocols.* Similar to how classical CSLs introduce the concept of resource invariants for expressing protocols on shared state among multiple threads, we introduce the simple and novel concept of *socket protocols* for expressing protocols among multiple nodes. With each socket address—a pair of an IP address and a port—a protocol is associated, which restricts what can be communicated on that socket.

A socket protocol is a predicate Φ : *Message* <sup>→</sup> *iProp* on incoming messages received on a particular socket. One can think of this as a form of rely-guarantee reasoning since the socket protocol will be used to restrict the distributed environment's interference with a node on a particular socket. In Aneris we write a -<sup>⇒</sup> Φ to mean that socket address a is governed by the protocol Φ. In particular, if a -<sup>⇒</sup> Φ and a -<sup>⇒</sup> Ψ then Φ and Ψ are equivalent.<sup>3</sup> Moreover, the proposition is duplicable: a -<sup>⇒</sup> Φ.

<sup>⇒</sup> Φ a -<sup>⇒</sup> Φ <sup>∗</sup> a -Conceptually, a socket is an abstract representation of a handle for a local endpoint of some channel. We further restrict channels to use the User Datagram Protocol (UDP) which is *asynchronous*, *connectionless*, and *stateless*. In accordance with UDP, Aneris provides no guarantee of delivery or ordering although we assume duplicate protection. We assume duplicate protection to simplify our examples, as otherwise the code of all of our examples would have to be adapted to cope with duplication of messages. One can think of sockets in Aneris as open-ended multi-party communication channels without synchronization.

It is noteworthy that inter-process communication can happen in two ways. Thread-concurrent programs can communicate both through the shared heap and by sending messages through sockets. For memory-separated programs running on different nodes all communication is by message-passing.

In the logic, we consider both *static* and *dynamic* socket addresses. This distinction is entirely abstract and at the level of the logic. Static addresses come with primordial protocols, agreed upon before starting the distributed system, whereas dynamic addresses do not. Protocols on static addresses are primarily intended for addresses pointing to nodes that offer a service.

To distinguish between static and dynamic addresses, we use a resource Fixed(A) which denotes that the addresses in A are static and should have a fixed

<sup>3</sup> The predicate equivalence is under a later modality in order to avoid self-referential paradoxes. We omit it for the sake of presentation as this is an orthogonal issue.

interpretation. This proposition expresses knowledge without asserting ownership of resources and is duplicable: Fixed(A) Fixed(A) <sup>∗</sup> Fixed(A).

Corresponding to the two kinds of addresses we have two different rules, Socketbind-static and Socketbind-dynamic, for binding an address to a socket as seen below. Both rules consume an instance of Fixed(A) and FreePort(a) as well as a resource z →<sup>n</sup> None. The latter keeps track of the address associated with the socket handle z on node n and ensures that the socket is bound only once as further explained in Sect. 4. Notice that the protocol Φ in Socketbind-dynamic can be freely chosen.

```
Socketbind-static
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
  n; socketbind z a
{x. x = 0 ∗ z →n Some a}
Socketbind-dynamic
{Fixed(A) ∗ a ∈ A ∗ FreePort(a) ∗ z →n None}
  n; socketbind z a
{x. x = 0 ∗ z →n Some a ∗ a -
                            ⇒ Φ}
```
In the remainder of the paper we will use the following shorthands in order to simplify the presentation of our specifications.

$$\mathsf{Static}(a, A, \Phi) \stackrel{\triangle}{=} \mathsf{Fixed}(A) \* a \in A \* \mathsf{FreePart}(a) \* a \mapsto \Phi$$

$$\mathsf{Dynamic}(a, A) \stackrel{\triangle}{=} \mathsf{Fixed}(A) \* a \notin A \* \mathsf{FreePart}(a)$$

#### 2.3 Example: An Addition Service

To illustrate node-local reasoning, socket protocols, and the Aneris methodology for specifying, implementing, and verifying distributed systems we develop a simple addition service that offers to add numbers for clients.

Fig. 1 depicts an implementation of a server and a client written in AnerisLang. Notice that the programs look as if they were written in a realistic functional language with sockets like OCaml. Messages are strings to make programming with sockets easier (similar to send\_substring in the Unix module in OCaml).

The server is parameterized over an address on which it will listen for requests. The server allocates a new socket and binds the address to the socket. Then the server starts listening for an incoming message on the socket, calling a handler function on the message, if any. The handler function will deserialize the message, perform the addition, serialize the result, and return it to the sender before recursively listening for new messages.

The client is parameterized over two numbers to compute on, a server address, and a client address. The client allocates a new socket, binds the address to the socket, and serializes the two numbers. In the end, it sends the serialized message

```
rec server a =
  let skt = socket () in
  socketbind skt a;
  listen skt (rec handler msg from =
    let m = deserialize msg in
    let res = serialize (π1 m + π2 m) in
    sendto skt res from;
    listen skt handler)
                                           rec client x y srv a =
                                              let skt = socket () in
                                              socketbind skt a;
                                              let m = serialize (x, y) in
                                              sendto skt m srv;
                                              let res = listenwait skt in
                                              deserialize (π1 res)
```
Fig. 1. An implementation of an addition service and a client written in AnerisLang. listen and listenwait are convenient helper functions to be found in the appendix [20].

to the server address using the socket and waits for a response, projecting out the result of the addition on arrival and deserializing it.

In order to give the server code a specification we will fix a primordial socket protocol that will govern the address given to the server. The protocol will spell out how the server relies on the socket. We will use from(m) and body(m) for projections of the sender and the message body, respectively, from the message <sup>m</sup>. We define <sup>Φ</sup>add as follows:

$$\begin{aligned} \Phi\_{add}(m) \triangleq \exists \Psi, x, y. \text{ from} (m) \mapsto \Psi \ast \mathsf{body}(m) &= \mathit{serialize}(x, y) \ast \mathsf{ } \\ \forall m', \mathsf{body}(m') &= \mathit{serialize}(x + y) \ast \Psi(m') \end{aligned}$$

Intuitively, the protocol demands that the sender of a message m is governed by some protocol Ψ and that the message body body(m) must be the serialization of two numbers x and y. Moreover, the sender's protocol must be satisfied if the serialization of x <sup>+</sup> y is sent as a response.

Using <sup>Φ</sup>add as the socket protocol, we can give server the specification

$$\{\mathsf{Static}(a, A, \Phi\_{add}) \* \mathsf{lsNode}(n)\} \left\langle n; \mathsf{serve} \, a \right\rangle \{\mathsf{Fales}\} .$$

The postcondition is allowed to be False as the program does not terminate. The triple guarantees safety which, among others, means that *if* the server responds to communication on address a it does so according to Φadd.

Similarly, using <sup>Φ</sup>add as a primordial protocol for the server address, we can also give client a specification

$$\begin{aligned} \{srv &\mapsto \Phi\_{add} \ast srv \in A \ast \text{Dynamic}(a, A) \ast \text{lsNode}(m)\} \\ \langle m; \textbf{c1} \text{ient } x \ y \ sr v \ a \rangle \\ \{v.v &= x + y\} \end{aligned}$$

that showcases how the client is able to conclude that the response from the server is the sum of the numbers it sent to it. In the proof, when binding <sup>a</sup> to the socket using Socketbind-dynamic, we introduce the proposition a -<sup>⇒</sup> <sup>Φ</sup>client where

$$\Phi\_{client}(m) \triangleq \mathsf{body}(m) = serialize(x+y).$$

and use it to instantiate Ψ when satisfying Φadd. Using the two specifications and the Node-par-rule it is straightforward to specify and verify a distributed system composed of, e.g., a server and multiple clients.

#### 2.4 Example: A Lock Server

Mutual exclusion in distributed systems is often a necessity and there are many different approaches for providing it. The simplest solution is a centralized algorithm with a single node acting as the coordinator. We will develop this example to showcase a more interesting protocol that relies on ownership transfer of spatial resources between nodes to ensure correctness.

The code for a centralized lock server implementation is shown in Fig. 2.

```
rec lockserver a =
  let lock = ref NONE in
  let skt = socket () in
  socketbind skt a;
  listen skt (rec handler msg from =
    if (msg = "LOCK") then
      match !lock with
        NONE => lock←SOME (); sendto skt "YES" from
      | SOME __ => sendto skt "NO" from
      end
    else lock←NONE; sendto skt "RELEASED" from
    listen skt handler)
```
Fig. 2. A lock server in AnerisLang.

The lock server declares a node-local variable lock to keep track of whether the lock is taken or not. It allocates a socket, binds the input address to the socket and continuously listens for incoming messages. When a "LOCK" message arrives and the lock is available, the lock gets taken and the server responds "YES". If the lock was already taken, the server will respond "NO". Finally, if the message was not "LOCK", the lock is released and the server responds with "RELEASED".

Our specification of the lock server will be inspired by how a lock can be specified in concurrent separation logic. Thus we first recall how such a specification usually looks like.

Conceptually, a lock can either be unlocked or locked, as described by a two-state labeled transition system.

In concurrent separation logic, the lock specification does not describe this transition system directly, but instead focuses on the resources needed for the transitions to take place. In the case of the lock, the resources are simply a non-duplicable resource <sup>K</sup>, which is needed in order to call the lock's release method. Intuitively, this resource corresponds to the key of the lock.

A typical concurrent separation logic specification for a spin lock module looks roughly like the following:

<sup>∃</sup> isLock .


The intuitive reading of such a specification is:


Sharing of the lock among several threads is achieved by the isLock predicate being duplicable. Mutual exclusion is ensured by the last bullet point together with the requirement of K being non-duplicable whenever we have isLock(v,K). For a leisurely introduction to such specifications, the reader may consult Birkedal and Bizjak [1].

Let us now return to the distributed lock synchronization. To give clients the possibility of interacting with the lock server as they would with such a concurrent lock module, the specification for the lock server will look like follows.

$$\{K\*\mathsf{Stàtic}(a, A, \Phi\_{lock})\}\left\langle n; \mathsf{1ockserver}\, a\right\rangle\left\{\mathsf{False}\right\}.$$

This specification simply states that a lock server should have a primordial protocol <sup>Φ</sup>lock and that it needs the key resource to begin with. To allow for the desired interaction with the server, we define the socket protocol <sup>Φ</sup>lock as follows:

$$\begin{aligned} acq(m,\Psi) & \triangleq (\mathsf{body}(m) = \mathsf{"LOK"}) \ast \\ & \forall m'. (\mathsf{body}(m') = \mathsf{"N0"}) \vee (\mathsf{body}(m') = \mathsf{"YES"} \ast K) \dashrightarrow \Psi(m') \\\ rel(m,\Psi) & \triangleq (\mathsf{body}(m) = \mathsf{"RELASE"}) \ast K \ast \\ & \forall m'. (\mathsf{body}(m') = \mathsf{"RELASE"}) \dashrightarrow \Psi(m') \end{aligned}$$
 
$$\begin{aligned} \Phi\_{\mathit{lock}}(m) & \triangleq \exists \Psi. \mathsf{from}(m) \mapsto \Psi \ast (\mathit{acq}(m,\Psi) \vee rel(m,\Psi)) \end{aligned}$$

The protocol <sup>Φ</sup>lock demands that a client of the lock has to be bound to some protocol Ψ and that the server can receive two types of messages fulfilling either acq(m, Ψ) or rel(m, Ψ). These correspond to the module's two methods acquire and release respectively. In the case of a "LOCK" message, the server will answer either "NO" or "YES" along with the key resource. In either case, the answer should suffice for fulfilling the client protocol Ψ.

Receiving a "RELEASE" request is similar, but the important part is that we require a client to send the key resource K along with the message, which ensures that only the current holder can release the lock.

One difference between the distributed and the concurrent specification is that we allow for the distributed lock to directly deny access. The client can use a simple loop, asking for the lock until it is acquired, if it wishes to wait until the lock can be acquired.

There are several interesting observations one can make about the lock server example: (1) The lock server can allocate, read, and write node-local references but these are hidden in the specification. (2) There are no channel descriptors or assertions on the socket in the code. (3) The lock server provides mutual exclusion by requiring clients to satisfy a sufficient protocol.

# 3 AnerisLang

AnerisLang is an untyped functional language with higher-order functions, forkbased concurrency, higher-order mutable references, and primitives for communicating over network sockets. The syntax is as follows:

v <sup>∈</sup> *Val* ::= () <sup>|</sup> b <sup>|</sup> i <sup>|</sup> s <sup>|</sup> <sup>|</sup> z <sup>|</sup> rec f x <sup>=</sup> <sup>e</sup> <sup>|</sup> ... <sup>e</sup> <sup>∈</sup> *Expr* ::= <sup>v</sup> <sup>|</sup> <sup>x</sup> <sup>|</sup> rec f x <sup>=</sup> <sup>e</sup> <sup>|</sup> <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>|</sup> ref <sup>e</sup> <sup>|</sup> ! <sup>e</sup> <sup>|</sup> <sup>e</sup><sup>1</sup> <sup>←</sup> <sup>e</sup><sup>2</sup> <sup>|</sup> cas <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup> <sup>|</sup> find <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup> <sup>|</sup> substring <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup> <sup>|</sup> i2s <sup>e</sup> <sup>|</sup> s2i <sup>e</sup> <sup>|</sup> fork {e} | start {n; ip; <sup>e</sup>} | makeaddress <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>|</sup> socket <sup>e</sup> <sup>|</sup> socketbind <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>|</sup> sendto <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup> <sup>|</sup> receivefrom <sup>e</sup> <sup>|</sup> ...

We omit the usual operations on pairs, sums, booleans b <sup>∈</sup> <sup>B</sup>, and integers i <sup>∈</sup> <sup>Z</sup> which are all standard. We introduce the following syntactic sugar: lambda abstractions λx. e defined as rec \_ <sup>x</sup> <sup>=</sup> <sup>e</sup>, let-bindings let <sup>x</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> in <sup>e</sup><sup>2</sup> defined as (λx. e<sup>2</sup>)(e<sup>1</sup>), and sequencing <sup>e</sup><sup>1</sup>; <sup>e</sup><sup>2</sup> defined as let \_ <sup>=</sup> <sup>e</sup><sup>1</sup> in <sup>e</sup><sup>2</sup>.

We have the usual operations on locations <sup>∈</sup> *Loc* in the heap: ref v for allocating a new reference, ! for dereferencing, and <sup>←</sup> <sup>v</sup> for assignment. cas v<sup>1</sup> <sup>v</sup><sup>2</sup> is an atomic compare-and-set operation used to achieve synchronization between threads on a specific memory location . Operationally, it tests whether has value <sup>v</sup><sup>1</sup> and if so, updates the location to <sup>v</sup><sup>2</sup>, returning a boolean indicating whether the swap succeeded or not.

The operation find finds the index of a particular substring in a string s <sup>∈</sup> *String* and substring splits a string at given indices, producing the corresponding substring. i2s and s2i convert between integers and strings. These operations are mainly used for serialization and deserialization purposes.

The expression fork {e} forks off a new (node-local) thread and start {n; ip; e} will spawn a new node n <sup>∈</sup> *Node* with ip address ip <sup>∈</sup> *Ip* running the program e. Note that it is only at the bootstrapping phase of a distributed system that a special system-node S will be able to spawn nodes.

We use z <sup>∈</sup> *Handle* to range over socket handles created by the socket operation. makeaddress constructs an address given an ip address and a port, and the network primitives socketbind, sendto, and receivefrom correspond to the similar BSD-socket API methods.

*Operational Semantics.* We define the operational semantics of AnerisLang in three stages.

We first define a node-local, thread-local, head step reduction (e, h) (e- , h- ) for e, e- <sup>∈</sup> *Expr* and h, h- <sup>∈</sup> *Loc* fin <sup>−</sup> *Val* that handles all pure and heap-related node-local reductions. All rules of the relation are standard.

Next, the node-local head step reduction induces a network-aware head step reduction (n; e, Σ) <sup>→</sup> (n; e- , Σ- ).

$$\frac{(e,h)\rightsquigarrow(e',h')}{\langle n;e\rangle,(\mathcal{H}[n\mapsto h],\mathcal{S},\mathcal{P},\mathcal{M})\rightarrow\langle n;e'\rangle,(\mathcal{H}[n\mapsto h'],\mathcal{S},\mathcal{P},\mathcal{M})}.$$

Here n <sup>∈</sup> *Node* denotes a node identifier and Σ,Σ- ∈ *NetworkState* the global network state. Elements of *NetworkState* are tuples (H, <sup>S</sup>,P,M) tracking heaps H ∈ *Node* fin <sup>−</sup> *Heap* and sockets S ∈ *Node* fin <sup>−</sup> *Handle* fin <sup>−</sup> Option *Address* for all nodes, ports in use P ∈ *Ip* fin <sup>−</sup> ℘fin(*Port*), and messages sent M ∈ *Id* fin <sup>−</sup> *Message*. The induced network-aware reduction is furthermore extended with rules for the network primitives as seen in Fig. 3. The socket operation allocates a new

$$\frac{z \notin \text{dom}(\mathcal{S}(n))}{\langle n; \text{succ} \, ^\circ \rangle (), (\mathcal{H}, \mathcal{S}, \mathcal{P}, \mathcal{M}) \to \langle n; z \rangle, (\mathcal{H}, \mathcal{S}', \mathcal{P}, \mathcal{M})}$$

$$\frac{\mathcal{S}(n)z}{\langle n; \text{succ} \, ^\circ \rangle (^\circ \mathcal{S} \, ^\circ \quad \mathcal{T} \, ^\circ \quad \mathcal{S} \, ^\circ \quad \mathcal{M} \, ^\circ \quad \mathcal{S} \, ^\circ \quad \mathcal{M} \, ^\circ \quad \mathcal{M} \, ^\circ \quad \mathcal{M} \, ^\circ$$

$$\frac{p \notin \mathcal{P}(ip) \qquad \mathcal{S}' = \mathcal{S}[n \mapsto \mathcal{S}(n)]z \to \text{Some} \, (ip, p) \} \qquad \mathcal{T}' = \mathcal{P}[ip \mapsto \mathcal{P}(ip) \cup \{p\}]$$

$$\frac{\mathcal{S}(n)(z) = \text{Some} \, \, from \, \, i \in \mathcal{j} \, \, dom(\mathcal{M}) \qquad \mathcal{M}' = \mathcal{M}[i \mapsto \{fmn, to, msg, \text{S}, \text{EN} \}]}{\langle n; \text{send} \, \, t \, \, z \, msg \, \, 0 \rangle, (\mathcal{H}, \mathcal{S}, \mathcal{P}, \mathcal{M} \, \cup \, ^\circ \quad \mathcal{M} \, \, \mathcal{S} \, \mathcal{P} \, \, \mathcal{M} \, \, \/)}$$

$$\frac{\mathcal{S}(n)(z) = \text{Some} \, \, from \, \, i \notin \, \, dom(\mathcal{M}) \qquad \mathcal{M}' = \mathcal{M}[i \mapsto \{fmn, to, msg \}, \, \, \mathcal{S}, \mathcal{N} \,$$

Fig. 3. An excerpt of the rules for network-aware head reduction.

unbound socket using a fresh handle z for a node n and socketbind binds a socket address a to an unbound socket z if the address and port p is not already in use. Hereafter, the port is no longer available in P- (ip). For bound sockets, sendto sends a message msg to a destination address to from the sender's address

from found in the bound socket. The message is assigned a unique identifier and tagged with a status flag Sent indicating that the message has been sent and not received. The operation returns the number of characters sent.

To model possibly dropped or delayed messages we introduce two rules for receiving messages using the receivefrom operation that on a bound socket either returns a previously unreceived message or nothing. If a message is received the status flag of the message is updated to Received

Third and finally, using standard *call-by-value right-to-left evaluation contexts* K <sup>∈</sup> *Ectx* we lift the node-local head reduction to a *distributed systems* reduction shown below. We write <sup>∗</sup> for its reflexive-transitive closure. The distributed systems relation reduces by picking a thread on any node or forking off a new thread on a node.

$$\begin{aligned} (\langle n;e \rangle, \Sigma) &\to (\langle n;e' \rangle, \Sigma')\\ \hline (\mathcal{T}\_1 \neq [\langle n;K[e] \rangle] \mp \mathcal{T}\_2, \Sigma) &\to (\mathcal{T}\_1 \neq [\langle n;K[e'] \rangle] \mp \mathcal{T}\_2; \Sigma')\\ \hline \\ (\mathcal{T}\_1 \neq [\langle n;K[\text{fork }\{e\}] \rangle] \mp \mathcal{T}\_2, \Sigma) &\to (\mathcal{T}\_1 \neq [\langle n;K[() \rangle \rangle] \mp \mathcal{T}\_2 \neq [\langle n;e \rangle], \Sigma)) \end{aligned}$$

# 4 The Aneris Logic

As a consequence of building on the Iris framework, the Aneris logic features all the usual connectives and rules of higher-order separation logic, some of which are shown in the grammar below.<sup>4</sup> The full expressiveness of the logic can be exploited when giving specifications to programs or stating protocols.

$$\begin{aligned} \left| P, Q \in iProp ::= \mathsf{True} \mid \mathsf{False} \mid P \wedge Q \mid P \vee Q \mid P \Rightarrow Q \mid\\ \left| \forall x. P \mid \exists x. P \mid P \* Q \mid P \twoheadrightarrow Q \mid t = u \mid\\ \ell \mapsto\_n v \mid \left| \overleftarrow{P} \right| \mid \overleftarrow{a} \uparrow \mid \{P\} \langle n; e \rangle \{x. Q\} \mid \ldots \end{aligned} \right| \qquad \begin{aligned} \left| P, Q \right) &\left| P, \overline{v} \right| \mid \downarrow \\ \left| \psi \right| \mid \downarrow \end{aligned}$$

Note that in Aneris the usual points-to connective about the heap, →<sup>n</sup> <sup>v</sup>, is indexed by a node identifier n <sup>∈</sup> *Node*, asserting ownership of the singleton heap mapping to v on node n.

The logic features (impredicative) invariants P and user-definable ghost state via the proposition a γ , which asserts ownership of a piece of ghost state a at ghost location γ. The logical support for user-defined invariants and ghost state allows one to relate (ghost and physical) resources to each other; this is vital for our specifications as will become evident in Sect. 5 and Sect. 6. We refer to Jung et al. [14] for a more thorough treatment of user-defined ghost state.

To reason about AnerisLang programs, the logic features Hoare triples.<sup>5</sup> The intuitive reading of the Hoare triple {P} n; e {x. Q} is that if the program e on

<sup>4</sup> To avoid the issue of reentrancy, invariants are annotated with a namespace and Hoare triples with a mask. We omit both for the sake of presentation as they are orthogonal issues.

<sup>5</sup> In both Iris and Aneris the notion of a Hoare triple is defined in terms of a weakest precondition but this will not be important for the remainder of this paper.

node n is run in a distributed system s satisfying P, then the computation does not get stuck and, moreover, if it terminates with a value v and in a system s- , then s satisfies Q[v/x]. In other words, a Hoare triple implies safety and states that all spatial resources that are used by e are contained in the precondition P.

In contrast to spatial propositions that express *ownership*, e.g., →<sup>n</sup> <sup>v</sup>, propositions like P and {P} n; e {x. Q} express *knowledge* of properties that, once true, hold true forever. We call this class of propositions *persistent*. Persistent propositions P can be freely duplicated: P P <sup>∗</sup> P.

#### 4.1 The Program Logic

The Aneris proof rules include the usual rules of concurrent separation logic for Hoare triples, allowing formal reasoning about node-local pure computations, manipulations of the the heap, and forking of threads. Expressions e are annotated with a node identifier n, but the rules are otherwise standard.

To reason about individual nodes in a distributed system in isolation, Aneris introduces the following rule:

$$\frac{\{P\*\mathsf{lsMod}(n)\*\mathsf{FreePords}(ip,\mathfrak{P})\}\,\langle n;e\rangle\,\{\mathsf{True}\}}{\{P\*\mathsf{FreeP}(ip)\}\,\langle \mathfrak{S};\mathsf{start}\,\,\{n;ip;e\}\rangle\,\{x.\,x=()\}}$$

where <sup>P</sup> <sup>=</sup> {p <sup>|</sup> <sup>0</sup> <sup>≤</sup> p <sup>≤</sup> <sup>65535</sup>}. This rule is the key rule allowing node-local reasoning; the rule expresses exactly that to reason about a distributed system it suffices to reason about each node in isolation.

As described in Sect. 3, only the distinguished system node S can start new nodes—this is also reflected in the Start-rule. In order to start a new node associated with IP address ip, the resource FreeIp(ip) is provided. This indicates that ip is not used by other nodes. When reasoning about the node n, the proof can rely on all ports on ip being available. The resource IsNode(n) indicates that the node <sup>n</sup> is a valid node in the system and keeps track of abstract state related to the modeling of node n's heap and sockets. IsNode(n) is persistent and hence duplicable.

*Network Communication.* To reason about network communication in a distributed system, the logic includes a series of rules for reasoning about socket manipulation: allocation of sockets, binding of addresses to sockets, sending via sockets, and receiving from sockets.

To allocate a socket it suffices to prove that the node n is valid by providing the IsNode(n) resource. In return, an unbound socket resource z →<sup>n</sup> None is given.

 $\text{\color{red}{\{\text{SocKET} \gets\text{\\$}\}}\text{\hspace{\\$}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\\$}}\text{\hspace{\$ 

The socket resource z →<sup>n</sup> <sup>o</sup> keeps track of the address associated with the socket handle z on node n and takes part in ensuring that the socket is bound only once. It behaves similarly to the points-to connective for the heap, e.g., z →<sup>n</sup> <sup>o</sup> <sup>∗</sup> z →<sup>n</sup> <sup>o</sup>-⇒ False.

As briefly touched upon in Sect. 2, the logic offers two different rules for binding an address to a socket depending on whether or not the address has a (at the level of the logic) primordial, agreed upon protocol. To distinguish between such static and dynamic addresses, we use a persistent resource Fixed(A) to keep track of the set of addresses that have a fixed socket protocol.

To reason about a static address binding to a socket z it suffices to show that the address a being bound has a fixed interpretation (by being in the "fixed" set), that the port of the address is free, and that the socket is not bound.

$$\begin{array}{l} \{ \text{SocKETBIND-STATIC} \\ \{ \text{Fixed}(A) \* a \in A \* \mathsf{FreePart}(a) \* z \longleftrightarrow\_n \mathsf{None} \} \\ \langle n; \mathsf{socketbind} \; z \; a \rangle \\ \{ x . x = 0 \* z \longleftrightarrow\_n \mathsf{Some} \; a \} \end{array}$$

In accordance with the BSD-socket API, the bind operation returns the integer 0 and the socket resource gets updated, reflecting the fact that the binding took place.

The rule for dynamic address binding is similar but the address a should not have a fixed interpretation. Moreover, the user of the logic is free to pick the socket protocol Φ to govern address a.

$$\begin{array}{l} \{ \text{SocKETBIND-DYNAMIC} \\ \{ \text{Fixed}(A) \* a \notin A \* \mathsf{FreePart}(a) \* z \longleftarrow\_n \mathsf{None} \} \\ \langle n; \text{socketbind } z \ a \rangle \\ \{ x . \ x = 0 \* z \longleftarrow\_n \mathsf{Some} \ a \* a \mapsto \Phi \} \end{array}$$

To reason about sending a message on a socket z it suffices to show that z is bound, that the destination of the message is governed by a protocol Φ, and that the message satisfies the protocol.

> Sendto {z →<sup>n</sup> Some from <sup>∗</sup> to -<sup>⇒</sup> Φ <sup>∗</sup> Φ((from, to, msg, Sent))} n; sendto z msg to {x. x <sup>=</sup> <sup>|</sup>msg| ∗ z →<sup>n</sup> Some from}

Finally, to reason about receiving a message on a socket z the socket must be bound to an address governed by a protocol Φ.

$$\begin{cases} \text{RCEIVVERM} \\ \{z \hookrightarrow\_n \mathsf{Some} \, to \, \* \, to \, \Rightarrow \Phi\} \\\\ \langle n; \, \mathsf{receeive{f}} \mathsf{from} \, z \rangle \\\\ \left\{ \begin{array}{l} x, z \hookrightarrow\_n \mathsf{Some} \, to \, \* \\ (x = \mathsf{None} \vee \left( \exists m. \, x = \mathsf{Some} \left( \mathsf{body}(m), \mathsf{from}(m) \right) \* \Phi(m) \* \mathsf{R}(m) \right) \end{array} \right\} \end{cases}$$

When trying to receive a message on a socket, either a message will be received or no message is available. This is reflected directly in the logic: if no message was received, no resources are obtained. If a message m is received, the resources prescribed by <sup>Φ</sup>(m) are transferred together with an unmodifiable certificate <sup>R</sup>(m) accounting logically for the fact that message m was received. This certificate can in the logic be used to talk about messages that has actually been received in contrast to arbitrary messages. In our specification of the two-phase commit protocol presented in Sect. 6, the notion of a vote denotes not just a message with the right content but only one that has been sent by a participant and received by the coordinator.

#### 4.2 Adequacy for Aneris

We now state a formal adequacy theorem, which expresses that Aneris guarantees both safety, and, that all protocols are adhered to.

To state our theorem we introduce a notion of *initial state coherence*: A set of addresses <sup>A</sup> <sup>⊆</sup> *Address* <sup>=</sup> *Ip* <sup>×</sup> *Port* and a map <sup>P</sup> : *Ip* fin <sup>−</sup> ℘fin(*Port*) are said to satisfy initial state coherence if the following hold: (1) if (i, p) <sup>∈</sup> A then i <sup>∈</sup> dom(P), and (2) if i <sup>∈</sup> dom(P) then <sup>P</sup>(i) = <sup>∅</sup>.

Theorem 1 (Adequacy). *Let* <sup>ϕ</sup> *be a first-order predicate over values, i.e., a meta logic predicate (as opposed to Iris predicates), let* <sup>P</sup> *be a map Ip* fin − <sup>℘</sup>fin(*Port*)*, and* A <sup>⊆</sup> *Address such that* A *and* <sup>P</sup> *satisfy initial state coherence. Given a primordial socket protocol* <sup>Φ</sup><sup>a</sup> *for each* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, suppose that the Hoare triple*

$$\{\mathsf{Fixed}(A) \* \bigotimes\_{a \in A} a \mapsto \Phi\_a \* \bigotimes\_{i \in \text{dom}(\mathcal{P})} \mathsf{Free} \mathsf{p}(i)\}\langle n\_1; e \rangle\{v.\varphi(v)\}$$

*is derivable in* Aneris*.*

*If we have*

(n<sup>1</sup>; e,(∅, <sup>∅</sup>,P, <sup>∅</sup>)) <sup>∗</sup> ([n<sup>1</sup>; e<sup>1</sup>,n<sup>2</sup>; e<sup>2</sup>,...n<sup>m</sup>; e<sup>m</sup>], Σ)

*then the following properties hold:*


Given predefined socket protocols for all primordial protocols and the necessary free IP addresses, this theorem provides the normal adequacy guarantees of Irislike logics, namely *safety*, i.e., that nodes and threads on nodes cannot get stuck and that the postcondition holds for the resulting value. Notice, however, that this theorem also implies that all nodes adhere to the agreed upon protocols; otherwise, a node not adhering to a protocol would be able to cause another node to get stuck, which the adequacy theorem explicitly guarantees against.

# 5 Case Study 1: A Load Balancer

AnerisLang supports concurrent execution of threads on nodes through the fork {e} primitive. We will illustrate the benefits of node-local concurrency by presenting an example of server-side load balancing.

Fig. 4. The architecture of a distributed system with a load balancer and two servers.

*Implementation.* In the case of server-side load balancing, the work distribution is implemented by a program listening on a socket that clients send their requests to. The program forwards the requests to an available server, waits for the response from the server, and sends the answer back to the client. In order to handle requests from several clients simultaneously, the load balancer can employ concurrency by forking off a new thread for every available server in the system that is capable of handling such requests. Each of these threads will then listen for and forward requests. The architecture of such a system with two servers and n clients is illustrated in Fig. 4.

An implementation of a load balancer is shown in Fig. 5. The load balancer is parameterized over an IP address, a port, and a list of servers. It creates a socket (corresponding to <sup>z</sup><sup>0</sup> in Fig. 4), binds the address, and folds a function over the list of servers. This function forks off a new thread (corresponding to <sup>T</sup><sup>1</sup> and <sup>T</sup><sup>2</sup> in Fig. 4) for each server that runs the serve function with the newly-created socket, the given IP address, a fresh port number, and a server as arguments.

The serve function creates a new socket (corresponding to <sup>z</sup><sup>1</sup> and <sup>z</sup><sup>2</sup> in Fig. 4), binds the given address to the socket, and continuously tries to receive a client request on the main socket (z<sup>0</sup>) given as input. If a request is received, it forwards the request to its server and waits for an answer. The answer is passed on to the client via the main socket. In this way, the entire load balancing process is transparent to the client, whose view will be the same as if it was communicating with just a single server handling all requests itself as the load balancer is simply relaying requests and responses.

*Specification and Protocols.* To provide a general, reusable specification of the load balancer, we will parameterize its socket protocol by two predicates <sup>P</sup>in and <sup>P</sup>out that are both predicates on a message <sup>m</sup> and a meta-language value

```
rec load__balancer ip port servers =
  let skt = socket () in
  let a = makeaddress ip port in
  socketbind skt a;
  listfold (λ server, acc.
    fork { serve skt ip acc server };
    acc + 1) 1100 servers
                                           rec serve main ip port srv =
                                             let skt = socket () in
                                             let a = makeaddress ip port in
                                             socketbind skt a;
                                             (rec loop () =
                                               match receivefrom main with
                                                 SOME m =>
                                                   sendto skt (π1 m) srv;
                                                   let res = π1 (listenwait skt) in
                                                   sendto main res (π2 m); loop ()
                                                | NONE => loop ()
                                               end) ()
```
Fig. 5. An implementation of a load balancer in AnerisLang. listfold and listenwait are convenient helper functions available in the appendix [20].

v. The two predicates are application specific and used to give logical accounts of the client requests and the server responses, respectively. Furthermore, we parameterize the protocol by a predicate <sup>P</sup>val on a meta-language value that will allows us to maintain ghost state between the request and response as will become evident in following.

In our specification, the sockets where the load balancer and the servers receive requests (the blue sockets in Fig. 4) will all be governed by the same socket protocol <sup>Φ</sup>rel such that the load balancer may seamlessly relay requests and responses between the main socket and the servers, without invalidating any socket protocols. We define the generic relay socket protocol <sup>Φ</sup>rel as follows:

$$\begin{aligned} \left(\Phi\_{rel}(P\_{val}, P\_{in}, P\_{out})(m) \triangleq \exists \Psi, v. \mathsf{from}(m) \mapsto \Psi \ast P\_{in}(m, v) \ast P\_{val}(v) \ast \Psi\right) \\ \left(\forall m'. P\_{val}(v) \ast P\_{out}(m', v) \dashv \Psi(m')\right) \end{aligned}$$

When verifying a request, this protocol demands that the sender (corresponding to the red sockets in Fig. 4) is governed by some protocol Ψ, that the request fulfills the <sup>P</sup>in and <sup>P</sup>val predicates, and that <sup>Ψ</sup> is satisfied given a response that maintains <sup>P</sup>val and satisfies <sup>P</sup>out.

When verifying the load balancer receiving a request m from a client, we obtain the resources <sup>P</sup>in(m, v) and <sup>P</sup>val(v) for some <sup>v</sup> according to <sup>Φ</sup>rel . This suffices for passing the request along to a server. However, to forward the server's response to the client we must know that the server behaves faithfully and gave us the response to the right request value <sup>v</sup>. <sup>Φ</sup>rel does not give us this immediately as the v is existentially quantified. Hence we define a ghost resource LB(π, s, v) that provides fractional ownership for π <sup>∈</sup> (0, 1], which satisfies LB(1, s, v) LB( <sup>1</sup> <sup>2</sup> , s, v) <sup>∗</sup> LB( <sup>1</sup> <sup>2</sup> , s, v), and for which <sup>v</sup> can only get updated if π = 1 and in particular LB(π, s, v) <sup>∗</sup> LB(π, s, v- ) =<sup>⇒</sup> v <sup>=</sup> v for any π. Using this resource, the server with address s will have PLB(s) as its instantiation of <sup>P</sup>val where

$$P\_{LB}(s)(v) \stackrel{\triangle}{=} \mathsf{LB}(\frac{1}{2}, s, v).$$

When verifying the load balancer, we will update this resource to the request value <sup>v</sup> when receiving a request (as we have the full fraction) and transfer

LB( <sup>1</sup> <sup>2</sup> , s, v) to the server with address <sup>s</sup> handling the request and, according to <sup>Φ</sup>rel , it will be required to send it back along with the result. Since the server logically only gets half ownership, the value cannot be changed. Together with the fact that <sup>v</sup> is also an argument to <sup>P</sup>in and <sup>P</sup>out, this ensures that the server fulfills <sup>P</sup>out for the same value as it received <sup>P</sup>in for. The socket protocol for the serve function's socket (z<sup>1</sup> and <sup>z</sup><sup>2</sup> in Fig. 4) that communicates with a server with address s can now be stated as follows.

$$\Phi\_{serve}(s, P\_{out})(m) \triangleq \exists v. \mathsf{LB}(\frac{1}{2}, s, v) \* P\_{out}(m, v)$$

Since all calls to the serve function need access to the main socket in order to receive requests, we will keep the socket resource required in an invariant ILB which is shared among all the threads:

$$I\_{LB}(n, z, a) \stackrel{\Delta}{=} \boxed{z \hookrightarrow\_n \mathbf{Some}\ a}$$

The specification for the serve function becomes:

{ILB(n, main, amain) <sup>∗</sup> Dynamic((ip, p), A) <sup>∗</sup> IsNode(n) <sup>∗</sup> LB(1, s, v) <sup>∗</sup> <sup>a</sup>main -<sup>⇒</sup> Φrel(λ\_.True, Pin, Pout) <sup>∗</sup> s -<sup>⇒</sup> Φrel(PLB(s), Pin, Pout) } n; serve main ip p s {False}

The specification requires the address <sup>a</sup>main of the socket main to be governed by <sup>Φ</sup>rel with a trivial instantiation of <sup>P</sup>val and the address <sup>s</sup> of the server to be governed by <sup>Φ</sup>rel with <sup>P</sup>val instantiated by <sup>P</sup>LB. The specification moreover expects resources for a dynamic setup, the invariant that owns the resource needed to verify use of the main socket, and a full instance of the LB(1, s, v) resource for some arbitrary v.

With this specification in place the complete specification of our load balancer is immediate (note that it is parameterized by <sup>P</sup>in and <sup>P</sup>out):

$$\begin{cases} \mathsf{Static}((ip,p), A, \phi\_{rel}(\lambda\_{\.}, \mathsf{True}, P\_{in}, P\_{out})) \ast \mathsf{lsNode}(n) \ast \\\\ \begin{pmatrix} \mathsf{PK} & \mathsf{Dynamic}((ip,p'), A) \\\\ p' \in parts \end{pmatrix} \ast \\\\ \begin{pmatrix} \mathsf{PK} & \exists v. \mathsf{LB}(1,s,v) \ast s \mapsto \phi\_{rel}(P\_{LB}(s), P\_{in}, P\_{out}) \\\\ \langle n; \mathsf{load\\_balance} \; ip \; p \; srvs \rangle \\\\ \langle \mathsf{True} \rangle \end{pmatrix} \end{cases}$$

where ports = [1100, ··· , 1100 + <sup>|</sup>srvs|]. In addition to the protocol setup for each server as just described, for each port <sup>p</sup>- <sup>∈</sup> ports which will become the endpoint for a corresponding server, we need the resources for a dynamic setup, and we need the resource for a static setup on the main input address (ip, p).

In the accompanying Coq development we provide an implementation of the addition service from Sect. 2.3, both in the single server case and in a load balanced case. For this particular proof we let the meta-language value v be a pair of integers corresponding to the expected arguments. In order to instantiate the load balancer specification we choose

$$P\_{in}^{add}(m,(v\_1,v\_2)) \triangleq \mathsf{body}(m) = \operatorname{sercialize}(v\_1,v\_2)$$

$$P\_{out}^{add}(m,(v\_1,v\_2)) \triangleq \mathsf{body}(m) = \operatorname{sercialize}(v\_1+v\_2)$$

with serialize being the same serialization function from Sect. 2.3. We build and verify two distributed systems, (1) one consisting of two clients and an addition server and (2) one including two clients, a load balancer and three addition servers. We prove both of these systems safe and the proofs utilize the specifications we have given for the individual components. Notice that Φrel(λ\_.True, Padd in , Padd out ) and <sup>Φ</sup>add from Sect. 2.3 are the same. This is why we can use the same client specification in both system proofs. Hence, we have demonstrated Aneris' ability and support for horizontal composition of the same modules in different systems.

While the load balancer demonstrates the use of node-local concurrency, its implementation does not involve shared memory concurrency, i.e., synchronization among the node-local threads. The appendix [20] includes an example of a distributed system, where clients interact with a server that implements a bag. The server uses multiple threads to handle client requests concurrently and the threads use a *shared* bag data structure governed by a lock. This example demonstrates Aneris' ability to support both shared-memory concurrency and distributed networking.

# 6 Case Study 2: Two-Phase Commit

A typical problem in distributed systems is that of consensus and distributed commit; an operation should be performed by all participants in a system or none at all. The *two-phase commit* protocol (TPC) by Gray [6] is a classic solution to this problem. We study this protocol in Aneris as (1) it is widely used in the real-world, (2) it is a complex network protocol and thus serves as a decent benchmark for reasoning in Aneris, and (3) to show how an implementation can be given a specification that is usable for a client that abstractly relies on some consensus protocol.

The two-phase commit protocol consists of the following two phases, each involving two steps:

	- (b) A participant that receives a vote request replies with a vote for either commit or abort.

(b) All participants that voted for a commit wait for the final verdict from the coordinator. If the participant receives a global commit it locally commits the transaction, otherwise the transaction is locally aborted. All participants must acknowledge.

Our implementation and specification details can be found in the appendix [20] and in the accompanying Coq development, but we will emphasize a few key points.

To provide general, reusable implementations and specifications of the coordinator and participants implementing TPC, we do not define how requests, votes, nor decisions look like. We leave it to a user of the module to provide decidable predicates matching the application specific needs and to define the logical, local pre- and postconditions, P and Q, of participants for the operation in question.

Our specifications use fractional ghost resources to keep track of coordinator and participant state w.r.t. the coordinator and participant transition systems indicated in the protocol description above. Similar to our previous case study, we exploit partial ownership to limit when transitions can be made. When verifying a participant, we keep track of their state and the coordinator's state and require all participants' view of the coordinator state to be in agreement through an invariant.

In short, our specification of TPC

	- the coordinator decides based on all the participant votes,
	- participants act according to the global decision,
	- if the decision was to commit, we obtain the resources described by Q for all participants,
	- if the decision was to abort, we still have the resources described by P for all participants,

#### 6.1 A Replicated Log

In a distributed replicated logging system, a log is stored on several databases distributed across several nodes where the system ensures consistency among the logs through a consensus protocol. We have verified such a system implemented on top of the TPC coordinator and participant modules to showcase vertical composition of complex protocols in Aneris as illustrated in Fig. 6. The blue parts of the diagram constitute node-local instantiations of the TPC modules invoked by the nodes to handle the consensus process. As noted by Sergey et al. [35], clients of core consensus protocols have not received much focus from other major verification efforts [7, 30, 40].

Our specification of a replicated logging system draws on the generality of the TPC specification. In this case, we use fractional ghost state to keep track of two related pieces of information. The first keeps a logical account of the log <sup>l</sup> already

Fig. 6. The architecture of a replicated logging system implemented using the TPC modules (the blue parts of the diagram) with a coordinator and two databases (S<sup>1</sup> and S2) each storing a copy of the log.

stored in the database at a node at address a, LOG(π, a, l). The second one keeps track of what the log should be updated to, if the pending round of consensus succeeds. This is a pair of the existing log l and the (pending) change s proposed in this round, PEND(π, a,(l, s)). We exploit fractional resource ownership by letting the coordinator, logically, keep half of the pending log resources at all times. Together with suitable local pre- and postconditions for the databases, this prevents the databases from doing arbitrary changes to the log. Concretely, we instantiate P and Q of the TPC module as follows:

$$\begin{aligned} P\_{rep}(p)(m) & \triangleq \exists l, s. \left(m = \text{"\mathbb{RE}\varnothing \text{EST}\_{-}" @ s\right) \ast \mathsf{LCG}(\frac{1}{2}, p, l) \ast \mathsf{PEND}(\frac{1}{2}, p, (l, s))\right) \\ Q\_{rep}(p)(n) & \triangleq \exists l, s. \left(\text{LOG}(\frac{1}{2}, p, l@s) \ast \mathsf{PEND}(\frac{1}{2}, p, (l, s))\right) \end{aligned}$$

where @ denotes string concatenation. Note how the request message specifies the proposed change (since the string that we would like to add to the log is appended to the requests message) and how we ensure consistency by making sure the two ghost assertions hold for the same log. Even though l and s are existentially quantified, we know the logs cannot be inconsistent since the coordinator retains partial knowledge of the log. Due to the guarantees given by TPC specification, this implies that if the global decision was to commit a change this change will have happened locally on all databases, *cf.* LOG( <sup>1</sup> <sup>2</sup> , p, l@s) in <sup>Q</sup>rep, and if the decision was to abort, then the log remains unchanged on all databases, *cf.* LOG( <sup>1</sup> <sup>2</sup> , p, l) in <sup>P</sup>rep. We refer to the appendix [20] or the Coq development for further details.

# 7 Related Work

Verification of distributed systems has received a fair amount of attention. In order to give a better overview, we have divided related work into four categories. *Model-Checking of Distributed Protocols.* Previous work on verification of distributed systems has mainly focused on verification of protocols or core network components through model-checking. Frameworks for showing safety and liveness properties, such as SPIN [9], and TLA+ [23], have had great success. A benefit of using model-checking frameworks is that they allow to state both safety and liveness assertions as LTL assertions [29]. Mace [17] provides a suite for building and model-checking distributed systems with asynchronous protocols, including liveness conditions. Chapar [25] allows for model-checking of programs that use causally consistent distributed key-value stores. Neither of these languages provide higher-order functions or thread-based concurrency.

*Session Types for Giving Types to Protocols.* Session types have been studied for a wide range of process calculi, in particular, typed π-calculus. The idea is to describe two-party communication protocols as a type to ensure communication safety and progress [10]. This has been extended to multi-party asynchronous channels [11], multi-role types [2] which informally model topics of actor-based message-passing and dependent session types allowing quantification over messages [38]. Our socket protocol definitions are quite similar to the multi-party asynchronous session types with progress encoded by having suitable ghostassertions and using the magic wand. Actris [8] is a logic for session-type based reasoning about message-passing in actor-based languages.

*Hoare Style Reasoning About Distributed Systems.* Disel [35] is a Hoare Type Theory for distributed program verification in Coq with ideas from separation logic. It provides the novel protocol-tailored rules WithInv and Frame which allow for modularity of proofs under the condition of an inductive invariant and distributed systems composition. In Disel, programs can be extracted into runnable OCaml programs, which is on our agenda for future work.

IronFleet [7] allows for building provably correct distributed systems by combining TLA-style state-machine refinement with Hoare-logic verification in a layered approach, all embedded in Dafny [24]. IronFleet also allows for liveness assertions. For a comparison of Disel and IronFleet to Aneris from a modularity point of view we refer to the Introduction section.

*Other Distributed Verification Efforts.* Verdi [40] is a framework for writing and verifying implementations of distributed algorithms in Coq, providing a novel approach to network semantics and fault models. To achieve compositionality, the authors introduced *verified system transformers*, that is, a function that transforms one implementation to another implementation with different assumptions about its environment. This makes vertical composition difficult for clients of proven protocols and in comparison AnerisLang seems more expressive.

EventML [30, 31] is a functional language in the ML family that can be used for coding distributed protocols using high-level combinators from the Logic of Events, and verify them in the Nuprl interactive theorem prover. It is not quite clear how modular reasoning works, since one works within the model, however, the notion of a central main observer is akin to our distinguished system node.

# 8 Conclusion

Distributed systems are ubiquitous and hence it is essential to be able to verify them. In this paper we presented Aneris, a framework for writing and verifying distributed systems in Coq built on top of the Iris framework. From a programming point of view, the important aspect of AnerisLang is that it is feature-rich: it is a concurrent ML-like programming language with network primitives. This allows individual nodes to internally use higher-order heap and concurrency to write efficient programs.

The Aneris logic provides node-local reasoning through socket protocols. That is, we can reason about individual nodes in isolation as we reason about individual threads. We demonstrate the versatility of Aneris by studying interesting distributed systems both implemented and verified within Aneris. The adequacy theorem of Aneris implies that these programs are safe to run.

Table 1. Sizes of implementations, specifications, and proofs in lines of code. When proving adequacy, the system must be closed.


Relating the verification sizes of the modules from Table 1 to other formal verification efforts in Coq indicates that it is easier to specify and verify systems in Aneris. The total work required to prove two-phase commit with replicated logging is 1,272 lines which is just half of the lines needed for proving the inductive invariant for TPC in other works [35]. However, extensive work has gone into Iris Proof Mode thus it is hard to conclude that Aneris requires less verification effort and does not just have richer tactics.

#### Acknowledgments

This work was supported in part by the ModuRes Sapere Aude Advanced Grant from The Danish Council for Independent Research for the Natural Sciences (FNU); a Villum Investigator grant (no. 25804), Center for Basic Research in Program Verification (CPV), from the VILLUM Foundation; and the Flemish research fund (FWO).

# Bibliography


28 - April 4, 1998, Proceedings, Lecture Notes in Computer Science, vol. 1381, pp. 122–138, Springer (1998), https://doi.org/10.1007/BFb0053567


ings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, pp. 205–217, ACM (2017)


[40] Wilcox, J.R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M.D., Anderson, T.E.: Verdi: a framework for implementing and formally verifying distributed systems. In: Grove, D., Blackburn, S. (eds.) Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pp. 357–368, ACM (2015), https://doi.org/10.1145/2737924.2737958

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Continualization of Probabilistic Programs With Correction**

Jacob Laurel( ) and Sasa Misailovic

University of Illinois Urbana-Champaign, Department of Computer Science Urbana, Illinois 61820, USA {jlaurel2,misailo}@illinois.edu

**Abstract.** Probabilistic Programming offers a concise way to represent stochastic models and perform automated statistical inference. However, many real-world models have discrete or hybrid discrete-continuous distributions, for which existing tools may suffer non-trivial limitations. Inference and parameter estimation can be exceedingly slow for these models because many inference algorithms compute results faster (or exclusively) when the distributions being inferred are continuous. To address this discrepancy, this paper presents Leios. Leios is the first approach for systematically approximating arbitrary probabilistic programs that have discrete, or hybrid discrete-continuous random variables. The approximate programs have all their variables fully continualized. We show that once we have the fully continuous approximate program, we can perform inference and parameter estimation faster by exploiting the existing support that many languages offer for continuous distributions. Furthermore, we show that the estimates obtained when performing inference and parameter estimation on the continuous approximation are still comparably close to both the true parameter values and the estimates obtained when performing inference on the original model.

**Keywords:** Probabilistic Programming · Program Transformation · Continuity · Parameter Synthesis · Program Approximation

# **1 Introduction**

Probabilistic programming languages (PPLs) offer an intuitive way to model uncertainty by representing complex probability models as simple programs [28]. A probabilistic programming system then performs fully automated statistical inference on this program by conditioning on observed data, to obtain a posterior distribution, all while hiding the intricate details of this inference process.

Probabilistic inference is a computationally hard task, even for programs containing only Bernoulli distributions (#P-complete [18]), but prior work has shown that for many inference algorithms, continuous and smooth distributions (such as Gaussians) can be *significantly* easier to handle than the distributions having discrete components or discontinuities in their densities [15, 53, 52, 9, 56].

Fig. 1: Overview of Leios

However, many popular Bayesian models can have distributions which are discrete or hybrid discrete-continuous mixtures (denoted simply as "hybrid") leading to computationally inefficient inference for much the same reason. Particularly when the observed variable is a discrete-continuous mixture, inference may fail altogether [65]. Likewise even if the observed variable and likelihood are continuous, the prior or important latent variables, may be discrete (e.g., Binomial) leading to an equally difficult discrete inference problem [61, 50].

In fact, a number of popular inference algorithms such as Hamiltonian Monte Carlo [48], NUTS [31, 50], or versions of Variational Inference (VI) [9] only work for restricted classes of programs (e.g. by requiring each latent be continuous) to avoid these problems. Furthermore, we cannot always marginalize away the program's discrete component since it is often precisely the one we are interested in. Even if the parameter was one which could be safely marginalized out, doing so may require the programmer to use advanced domain knowledge to analytically solve and obtain a new model and re-write the program completely, which can be well beyond the abilities of the average PPL user.

**Problem statement:** We address the question of how to accurately approximate the semantics of a probabilistic program P whose prior or likelihood is either discrete or hybrid, with a new program P<sup>C</sup> , where all variables follow continuous distributions, so that we can exploit the aforementioned inference algorithms to improve inference in an easy, off-the-shelf fashion.

While a programmer could manually rewrite the probabilistic program or model and apply approximations in an ad hoc manner, such as simply adding Gaussian noise to each variable, this would be neither sufficient nor wise. For instance, it has been shown that when a model contains Gaussians, *how* they are programatically written and parametrized can impact the inference time and quality [29, 5]. Also, by not correcting for continuity in the program's branch conditions, one could significantly alter the probability of executing a particular program branch, and hence alter the overall distribution represented by the probabilistic program.

**Leios:** We introduce a fully automated program analysis framework to continualize probabilistic programs for significantly improved inference performance, especially in cases where inference was originally intractable or prohibitively slow.

An input to Leios is a probabilistic program, which consists of (1) *model* that specifies the prior distributions and how the latent variables are related,

(2) specifications of observable variables, and (3) specifications of data sets. Leios transforms the model, given the set of the observable variables. This model is then substituted back into the original program to produce a fully continuous probabilistic program leading to greatly improved inference. Furthermore the approximated program can easily be reused with different, unseen data.

Figure 1 presents the main workflow of Leios :


**Contributions:** This paper makes the following main contributions:


```
1 Data := [12 ,8 , ... ] ;
2
3 Model {
4 pr ior = Uniform (20 ,50) ;
5 Recruiters = Poisson ( prior ) ;
6
7 perfGPA = 4 ;
8 regGPA = 4∗Beta (7 ,3) ;
9 GPA = Mix ( perfGPA ,.05 ,regGPA , . 9 5 )
10
11 i f (GPA == 4 ) {
12 Interviews = Bin ( Recruiters , . 9 ) ;
13 } else if (GPA > 3.5) {
14 Interviews = Bin ( Recruiters , . 6 ) ;
15 } else {
16 Interviews = Bin ( Recruiters , . 5 ) ;
17 }
18
19 Offers = Bin ( Interviews , 0 . 4 ) ;
20 }
21
22 for d in Data {
23 factor (Offers ,d) ;
24 }
25
26 return prior ;
                                              1 Model {
                                              2 pr ior = Uniform (20 ,50) ;
                                              3 mu p = prior;
                                              4 sigma p = sqrt(prior);
                                              5 Recruiters = Gaussian(mu p,sigma p) ;
                                              6
                                              7 perfGPA = Gaussian(4,β) ;
                                              8 regGPA = 4∗Beta (7 ,3) ;
                                              9 GPA = Mix ( perfGPA , . 0 5 , regGPA , . 9 5 )
                                             10
                                             11 i f (4 - θ1 < GPA < 4+ θ2 ) {
                                             12 mu = Recruiters ∗ 0.9;
                                             13 sigma = sqrt(Recruiters∗0.9∗0.1);
                                             14 Interviews = Gaussian(mu,sigma) ;
                                             15 } else if (GPA > 3.5 + θ3 ) {
                                             16 mu = Recruiters ∗ 0.6;
                                             17 sigma= sqrt(Recruiters∗0.6∗0.4);
                                             18 Interviews = Gaussian(mu,sigma) ;
                                             19 } else {
                                             20 mu = Recruiters ∗ 0.5;
                                             21 sigma = sqrt(Recruiters∗0.5∗0.5);
                                             22 Interviews = Gaussian(mu,sigma) ;
                                             23 }
                                             24 mu2 = Interviews ∗ 0.4;
                                             25 sigma2 = sqrt(Interviews∗0.4∗0.6);
                                             26 Offers = Gaussian(mu2,sigma2) ;
                                             27 }
```
(a)

(b)

Fig. 2: (a) Program *P* and (b) the Continualized Model Sketch

# **2 Example**

Figure 2 (a) presents a program that infers the parameters of the distribution modeling the number of recruiters coming to a recruiting fair given both the number of offers multiple students receive (line 1). As the number of recruiters may vary year to year, we model this count as a Poisson distribution (line 5). However, to accurately quantify how *much* this count varies year to year, we want to estimate the unknown parameter of this Poisson variable. We thus place a uniform prior over this parameter (line 4).

The example represents the student GPAs in lines 7-9: it is either a perfect .0 score or any number between 0 and 4. We model the perfect GPA with a discrete distribution that has all the probability mass at 4.0 (line 7). To model the imperfect GPA, we use a Beta distribution (line 8), scaled by 4 to lie in the range [0.0, 4.0]. Finally, the distribution of the GPAs is a *mixture* of these two components (line 9). Our mixture assumes that 5% of students obtain perfect GPAs.

Because the GPA impacts the number of interviews a student receives, our model incorporates control flow where each branch captures the distribution of interviews received, conditioned on the GPA being in a certain range (lines 11-17). Each student's resume is available to all recruiters and each recruiter can request an interview or not, hence all three of the Interviews distributions follow a Binomial distribution (here denoted as bin) with the same <sup>n</sup> (number of recruiters) but with different probabilities (higher probabilities for higher GPAs). From the factor statement (line 23) we see that the Offers variable governs the distribution of the observed data, hence it is the *observed* variable. Furthermore, given the values of all latent variables, Offers follows a Binomial distribution (line 19), hence the *likelihood function* of this program is discrete.

This program poses several challenges for inference. First, it contains discrete latent variables (such as the Binomials), which are expensive to sample from or rule out certain inference methods [26]. Second, it contains a hybrid discrete-continuous distribution governing the student GPA, and such hybrid distributions are challenging for inference algorithms [65]. Third, the model has complex control flow introduced by the if statements, making the observable data follow a (potentially multimodal) mixture distribution, which is yet another obstacle to efficient inference [43, 17]. Lastly, the discrete distribution of the observed data and likelihood also hinder the inference efficiency [61, 50, 59].

#### **2.1 Continualization**

Our approach starts from the observation that inference with continuous distributions is often more efficient for several inference algorithms [53, 52, 56]. Leios first continualizes discrete and hybrid distributions in the original model. Starting in line 5 in Figure 2 (b), we approximate the Poisson variable with a Gaussian using a classical result [16], hence relaxing the constraint that the number of recruiters be an integer. (For ease of presentation we created new variables mu p and sigma <sup>p</sup> corresponding to the parameters of the approximation; Leios simply inlines these.) We next approximate the discrete component of the GPA hybrid mixture distribution by a Gaussian centered at 4 and small tunable standard deviation β (line 7). The GPA is now a mixture of two *continuous* distributions. We then transform all of the Binomials to Gaussians (lines 14, 18, 22, and 26) using another classic approximation [23].

Finally, Leios smooths the observed variables by a Gaussian to ensure the likelihood function is both fully continuous *and* differentiable. In this example we see that the approximation of the Binomial already makes the distribution of Offers (given all latent values) a Gaussian, hence this final step is not needed.

After continualization, the GPA cannot be *exactly* 4.0, thus we need to repair the first conditional branch of the continualized program. In line 11, we replace the exact equality predicate with the interval predicate 4-θ<sup>1</sup> < GPA < 4+θ<sup>2</sup> where each θ is a hole whose value Leios will *synthesize*. Leios finds all such branching predicates by tracking transitive data dependencies of all continualized variables.

#### **2.2 Parameter Synthesis**

Our continuous approximation should be close enough to the original model such that upon performing inference on the approximation, the estimations obtained will also be close to the ground-truth values. Hence Leios needs to ensure that the values synthesized for each θ are such that for every conditional statement, the probability of executing the true branch in the continualized program roughly matches the original (ensuring similar likelihoods). In probability theory, this value has a natural interpretation as a *continuity correction factor* as

(a)

Fig. 3: (a) the fully continualized model and (b) Convergence of the Synthesis Step for multiple β.

it "corrects' the probability of a predicate being true after applying continuous approximations. For the (GPA == 4) condition, we might think about using a typical continuity correction factor of 0.5 [23], and transform it to 4-0.5 < GPA < 4+0.5. However, in that case, the second else if (GPA > 3.5) branch would never execute, thus significantly changing the program's semantics (and thus the likelihood function). Experimentally, such an error can lead to highly inaccurate inference results.

Hence we must *synthesize* a better continuity correction factor that makes the approximated model "closest" to the original program's with respect to a welldefined distance metric between probability distributions. In this paper, we will use the common Wasserstein distance, which we describe later in Section 5. The objective function aims to find the continuity correction factors that minimize the Wasserstein distance between the original and continualized models.

Figure 3 (a) shows the continualized model. Leios calculated that the optimal values for the first branch are θ<sup>1</sup> = 0.00001 (hence the lower bound is 3.99999) and θ<sup>2</sup> = 0.95208 (hence the upper bound is 4.95208) in line 11, and θ<sup>3</sup> = 0.00012 (hence the lower bound is 3.500122) for the branch in line 15. Intuitively the synthesizer found the upper bound 4.95208 so that any sample larger than 4 (which must have come from the right tail of the continualized perfect GPA) is consumed by the first branch, instead of accidentally being consumed by the second branch.

Fig. 4: Visual comparison between Model Distribution of Original Program with Naive Smoothing and Leios (both with β = 0.1)

Another part of the synthesis step is to make sure that approximations do not introduce run-time errors. Since Interviews is now sampled from Gaussian, there is a small possibility that it could become negative, thus causing a runtime error (since we later take its square root). By dynamically sampling the continualized model during the parameter synthesis, as part of a light-weight *auto-tuning* step, Leios checks if such an error exists. If it does, Leios can instead use a Gamma approximation (which is always non-negative).

While continualization incurs additional computational cost, this cost is typically amortized. In particular, continualization needs to be performed only once. The continualized model can be then be used multiple times for inference on different data-sets. Further, we experimentally observed that our synthesis step is fast. In this example, for all the values of β we evaluated, this step required only a few hundred iterations to converge to the optimal continuity correction factors, as shown in Figure 3 (b).

#### **2.3 Improving Inference**

Upon constructing the continuous approximation of the model, we now wish to perform inference by conditioning upon the outcomes of 25 sampled students. To make a fair comparison, we compile both the original and continuous versions down to Webppl [26] and run MCMC inference (with 3500 samples and a burnin of 700). We also seek to understand how smoothing latent variables improves inference, thus we also compare against a naively continualized version *where only the observed variable* was smoothed using the same β, number of MCMC samples and burn-in.

Figure 4 presents the distribution of the Offers variable in the original model, naively smoothed model, and the Leios-optimized model. The continuous approximation achieved by Leios is smooth and unimodal, unlike the naively smoothed approximation, which is highly multimodal. However all models have similar means

Using these three models for inference, Figure 5 (a) presents the posterior distribution of the variable param for each approach. We finally take the *mean* as

Fig. 5: (a) *Posteriors* of each method – the true value is equal to 37. (b) Avg. Accuracy and Inference time; the bars represent accuracy (left Y-axis), the lines represent time (right Y-axis).

the point-estimate, τest, of the parameter's true value τ . Figure 5 (b) presents the run time and the error ratio, | τ−τest <sup>τ</sup> |, for each approach (for the given true value of 37). It shows that our continualized version leads to the fastest inference.

# **3 Syntax and Semantics of Programs**

We present the syntax and semantics of the probabilistic programming language on which our analyses is defined

#### **3.1 Source Language Syntax**


ContDist ∈ {Gaussian, Uniform, etc.}, DiscDist ∈ {Binomial, Bernoulli, etc.} ArithOp ∈ {+, −, ∗, /, ∗∗}, f ∈ {log, abs, sqrt, exp}, RelOp ∈ {<, ≤, ==}

The syntax is similar to the ones used in [24, 51]. Unlike [51], our syntax does include exact equality predicates, which introduce difficulties during the approximation. To give the developer the flexibility in selecting which parts of the program to continualize, we add the CONST annotation. It indicates that the variable's distribution should not be continualized. Until explicitly noted, we will not use this annotation in the rest of the paper. For simplicity of exposition, we present only a single DataBlock and ObserveBlock, but our approach naturally extends to the cases with multiple data and observed variables.

**Measure Theory Preliminaries** Though various semantics have been proposed [44, 36, 7], we adapt the sub-probability measure transformer semantics of Dahlqvist et al. [19]. We will use the terms distribution and measure interchangeably.

**Definition 1.** A program state <sup>σ</sup> <sup>∈</sup> <sup>S</sup> is a <sup>n</sup>-tuple of real numbers: <sup>S</sup> <sup>=</sup> <sup>R</sup><sup>n</sup> where the i th tuple element corresponds to the i th program variable's value.

**Definition 2.** A Σ-algebra on a set X (denoted as ΣX) is a collection of subsets of X such that (1) <sup>X</sup> <sup>∈</sup> <sup>Σ</sup><sup>X</sup> and (2) <sup>X</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>X</sup> <sup>⇒</sup> <sup>X</sup><sup>c</sup> <sup>i</sup> ∈ Σ<sup>X</sup> (closure under complementation) and (3) X1, X<sup>2</sup> ∈ Σ<sup>X</sup> ⇒ X<sup>1</sup> ∨ X<sup>2</sup> ∈ Σ<sup>X</sup> (closure under countable union). The tuple of (X, ΣX) is called a measurable space. Our semantics is defined on the Borel measurable space (R<sup>n</sup>, B{R<sup>n</sup>}) where B{R<sup>n</sup>} is the standard Borel <sup>Σ</sup>-algebra over <sup>R</sup><sup>n</sup>.

**Definition 3.** A measure <sup>μ</sup> over <sup>R</sup><sup>n</sup> is a mapping from B{R<sup>n</sup>} to [0, <sup>+</sup>∞) such that μ(∅)=0 and μ( - <sup>i</sup>∈<sup>N</sup> <sup>X</sup>i) = <sup>i</sup>∈<sup>N</sup> <sup>μ</sup>(Xi) when all <sup>X</sup><sup>i</sup> are mutually disjoint. A probability measure is a measure that satisfies μ(R<sup>n</sup>)=1 and a sub-probability measure is one satisfying <sup>μ</sup>(R<sup>n</sup>) <sup>≤</sup> <sup>1</sup>. The simplest measure is the Dirac measure denoted as <sup>δ</sup><sup>a</sup><sup>i</sup> (S) = 1 if a<sup>i</sup> in S else 0. We denote the set of all sub-probability measures as M(R<sup>n</sup>).

**Definition 4.** Given measures <sup>μ</sup>1, μ<sup>2</sup> <sup>∈</sup> <sup>M</sup>(R), the product measure <sup>μ</sup><sup>1</sup> <sup>⊗</sup> <sup>μ</sup><sup>2</sup> <sup>∈</sup> <sup>M</sup>(R<sup>2</sup>) is defined as <sup>μ</sup><sup>1</sup> <sup>⊗</sup> <sup>μ</sup>2(B<sup>1</sup> <sup>×</sup> <sup>B</sup>2) = <sup>μ</sup>1(B1)μ2(B2) for <sup>B</sup>1, B<sup>2</sup> ∈ B{R}

**Definition 5.** Given a measure <sup>μ</sup> <sup>∈</sup> <sup>M</sup>(R<sup>n</sup>) the marginal measure of a variable <sup>x</sup><sup>i</sup> is defined as <sup>μ</sup><sup>x</sup><sup>i</sup> (Bi) = <sup>μ</sup>(<sup>R</sup> <sup>×</sup> ...<sup>R</sup> <sup>×</sup> <sup>B</sup><sup>i</sup> <sup>×</sup> <sup>R</sup>...) for <sup>B</sup><sup>i</sup> ∈ B{R}

**Definition 6.** A kernel is a function <sup>κ</sup> : <sup>S</sup> <sup>→</sup> <sup>M</sup>(R<sup>n</sup>) mapping states to measures.

**Definition 7.** The Lebesgue measure on R (denoted Leb) is the measure that maps any interval to its length, e.g., Leb([a, b]) = <sup>b</sup> <sup>−</sup> <sup>a</sup>. The Lebesgue measure in <sup>R</sup><sup>n</sup> is simply the n-fold product measure of n copies of the Lebesgue measure on R.

**Definition 8.** A measure μ is absolutely continuous with respect to the Lebesgue measure Leb (denoted as μ Leb or simply μ is A.C.) iff for any measurable set S Leb(S)=0 ⇒ μ(S)=0.

#### **3.2 Semantics**

**Expression Level Semantics** Arithmetic Expression semantics are standard, they map states <sup>σ</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> to values, equivalently -Expr : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>. Boolean Expression Semantics, denoted -BExpr, simply return the set of states <sup>B</sup><sup>i</sup> ∈ B{R<sup>n</sup>} satisfying the Boolean conditional.

$$\begin{bmatrix} \mathbf{c} \end{bmatrix}(\sigma) = c \quad \begin{bmatrix} x\_i \end{bmatrix}(\sigma) = \sigma[x\_i] \quad \begin{bmatrix} t\_1 \ o \ p \ t\_2 \end{bmatrix}(\sigma) = \begin{bmatrix} t\_1 \end{bmatrix}(\sigma) \text{ o } p \ \begin{bmatrix} t\_2 \end{bmatrix}(\sigma) \quad \begin{bmatrix} f(t\_1) \end{bmatrix}(\sigma) = f(\begin{bmatrix} t\_1 \end{bmatrix}(\sigma))$$


**Distribution Semantics** The interpretation of a distribution is a kernel, κ, mapping a state to the measure associated with the specific parametrization of the distribution in that state. Since measures are set functions we will represent them as λ abstractions. The signature is -Dist : <sup>R</sup><sup>n</sup> <sup>→</sup> (B{R} → [0, 1])

$$\kappa\_{Cont}(\sigma) = [ContDist(e\_1, e\_2, \ldots)](\sigma) = \lambda S. \int\_{v \in \mathbb{R}} \mathbf{1}\_S(v) \cdot f\_{Cont}(v; [e\_1](\sigma), [e\_2](\sigma), \ldots)$$

$$\kappa\_{Disc}(\sigma) = [DiscDist(e\_1, e\_2, \ldots)](\sigma) = \lambda S. \sum\_{v \in Supp \cap S} f\_{Disc}(v; [e\_1](\sigma), [[e\_2](\sigma), \ldots))$$

Where fCont and fDisc are the density and mass functions, respectively, of the primitive distribution being sampled from (e.g., fGauss(x; μ, σ) = <sup>1</sup> σ <sup>√</sup>2<sup>π</sup> e <sup>−</sup>(x−μ)<sup>2</sup> <sup>2</sup>σ<sup>2</sup> · **<sup>1</sup>**{σ>0}) and Supp is the distribution's support.

**Statement Level Semantics** The statement-level semantics are shown in Figure 6. We interpret each statement as a (sub) measure transformer, hence the semantic signature is -Statement : <sup>M</sup>(R<sup>n</sup>) <sup>→</sup> <sup>M</sup>(R<sup>n</sup>) . The skip statement returns the original measure and the abort statement transforms any measure to the **0** sub-measure. The condition statement removes measure from regions not satisfying the Boolean guard B. The factor statement can be seen as a "smoothed" version of condition that uses g, a function of the observed data and its distribution, to re-weight the measure associated with a set by some real value in [0, 1] (as opposed to strictly 0 or 1). Deterministic assignment transforms the measure into one which assigns to any set of states S the same value that the original measure μ would have assigned to all states that end up in S after executing the assignment statement. Probabilistic Assignment updates the measure so that xi's marginal is the measure associated with Dist, but with the parameters governed by μ.

An if else statement can be decomposed into the sum of the true branch's measure and the false branch's measure. The while loop semantics are the solution to the standard least fixed point equation [19], but can also be viewed as a mixture distribution where each mixture component corresponds to going through the loop k times. A for loop is just syntactic sugar for a sequencing of a fixed number of statements. We note that the Data block does not affect the measure (it is also syntactic sugar, and could simply be inlined in the Observe block). The program can be thought of as starting in some initial input measure μ<sup>0</sup> where each variable is undefined (which could simply mean initialized to some special value or even just zero), and as each variable gets defined, that variable's marginal (and hence the joint measure μ) gets updated.

# **4 Continualizing Probabilistic Programs**

Our goal is to synthesize a new continuous approximation of the original program P. We formally define this via a transformation operator <sup>T</sup> <sup>β</sup> <sup>P</sup> [•]: P rogram <sup>→</sup> P rogram. Our approach operates in two main steps:

(1) We first locally approximate the program's prior and latent variables using a series of program transformations to best preserve the local structural properties of the program and then apply smoothing globally to ensure that the likelihood function is both fully continuous and differentiable.

skip(μ) = <sup>μ</sup> abort(μ) = λS.<sup>0</sup> -<sup>P</sup>1; <sup>P</sup><sup>2</sup>(μ) = -<sup>P</sup><sup>2</sup>(-<sup>P</sup><sup>1</sup>(μ)) condition(B)(μ) = λS.μ(<sup>S</sup> <sup>∩</sup>-<sup>B</sup>) factor(xi,t)(μ) = λS. Rn **<sup>1</sup>**<sup>S</sup> · <sup>g</sup>(t, σ)·μ(dσ) <sup>x</sup><sup>i</sup> := e(μ) = λS.μ({(x1, ..., xn) <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> (x1, ..., x<sup>i</sup>−<sup>1</sup>, <sup>e</sup>(x1, ..xn), x<sup>i</sup>+1..., xn) <sup>∈</sup> <sup>S</sup>}) <sup>x</sup><sup>i</sup> := Dist(e1,...ek)(μ) = λS. Rn <sup>μ</sup>(dσ)·δ<sup>x</sup>1⊗...δ<sup>x</sup>i−<sup>1</sup>⊗-Dist(e1,...ek)(σ)⊗δ<sup>x</sup>i+1 ...(S) if (B) {P1} else {P2}(μ) = -<sup>P</sup><sup>1</sup>(condition(B)(μ))+-<sup>P</sup><sup>2</sup>(condition(not B)(μ)) while (B) { <sup>P</sup><sup>1</sup> }(μ) = <sup>∞</sup> k=0 -(condition(B); P1) <sup>k</sup>; condition(not B)(μ)

Fig. 6: Denotational Semantics of Probabilistic Programs

(2) We next synthesize a set of parameters that (approximately) minimize the distance metric between the distributions of the original and continualized models and we use light-weight auto-tuning to ensure the approximations do not introduce runtime errors.

#### **4.1 Overview of the Algorithm**

Algorithm 1 presents the technique for continualizing programs. It takes as input a program P containing a prior or observed variable that is discrete (or hybrid) and returns <sup>T</sup> <sup>β</sup> <sup>P</sup> [P], a probabilistic program representing a fully continuous random variable with a differentiable likelihood function. The algorithm uses a tunable hyper-parameter β ∈ (0, ∞) to control the amount of smoothing (like in [14]). A smaller β leads to less smoothing, while a larger β leads to more smoothing, however the smallest β does not always lead to the best inference, and vice-versa, as can be seen in section 7.

In line 3 of Algorithm 1 Leios constructs a standard control flow graph (CFG) to represent the program, using a method called GetCFG(). This data structure will form the basis of Leios's future analyses. Each CFG node corresponds to a single statement and contains all relevant attributes of that statement. Leios then uses this CFG to build a data dependency graph (line 4) which will be used for checking which variables are tainted by the approximations. In line 5 Leios then applies <sup>T</sup> <sup>β</sup> <sup>P</sup> [•] to obtain a continualized sketch, P<sup>C</sup> . Lastly, Leios synthesizes the optimal continuity correction parameters (line 7), and in doing so, samples the program to detect if a runtime error occurred, also returning a Boolean flag success to convey this information. If a runtime error did occur we find the expression causing it (line 9) and then in lines 10-12 reapply the safer transformations (e.g., Gamma instead of Gaussian) to all possible dependencies which could have contributed to the runtime error.

#### **4.2 Distribution and Expression Transformations**

To continualize each variable, Leios mutates the individual distributions and expressions assigned to latent variables within the program. We use a transform operator for expressions and distributions <sup>T</sup> <sup>β</sup> <sup>E</sup> [•]: Expr∪Dist <sup>→</sup> Expr∪Dist, which we define next. **Algorithm 1:** Procedure for Continualizing a Probabilistic Program

```
1 function Continualize (P, β);
   Input : A probabilistic program P containing discrete/hybrid observable
           variables and/or priors and a smoothing factor β > 0
   Output: A fully continuous probabilistic program PC
2 Acceptable ← F alse;
3 CFG ← GetCFG(P);
4 DataDepGraph ← ComputeDataFlow(CFG);
5 PC ← T β
          P [P]; /* apply all continuous transformations */
6 while not Acceptable do
 7 PC , success ← Synthesize(PC , P);
 8 if not success:
 9 D ← getInvalidExpression();
10 Deps ← getDependencies(DataDepGraph,D);
11 forall Expression in Deps do
12 PC ← reapplySafeTransformation(PC , Expression);
13 else:
14 Acceptable ← T rue;
15 end
16 return PC
```
**Transform Operator For Distributions and Expressions** We now detail the full list of continuous probability distribution transformations that <sup>T</sup> <sup>β</sup> <sup>E</sup> [•] uses.

$$T\_{\mathcal{C}}^{\beta}[E] = \begin{cases} Gaussian(\lambda,\sqrt{\lambda}) & E = Poisson(\lambda) \\Gamma(\lambda,1) & E = Poisson(\lambda) \not\in Gausian\; fails \\Gamma(n,p) & E = Binomial(n,p) \\Gamma(n,p) & E = Binomial(n,p) \\Gamma(p) & E = Distinit(n,b) \\ E = Geometric(p) & E = Geometric(p) \\ MixOfGaus\_{\beta}[([1,p),(0,1-p)]] & E = Bernoulli(p) \\ Bata(\beta,\beta\frac{1-p}{p}) & E = Bernoulli(p) \\ MixUur([([\overline{\beta}\_{\beta}^{\beta}[D\_{1}],p\_{1})...(\overline{\beta}\_{\beta}^{\beta}[D\_{2}],p\_{2})]) & E = Matrix([([0,p\_{1}],...(D\_{2},p\_{2}])] \\ S &= c \ (constant) \\ E &= a\cdot x + b\ (a\neq 0) \\ Gassain(\beta) & E \in Distinit} \\ Gassain(E,\beta) & \text{otherwise} \end{cases}$$

The rationale for this definition is that these approximations all preserve key structural properties of the distributions' shape (e.g., the number of modes) which have been shown to strongly affect the quality of inference [25, 45, 17]. Second, these continuous approximations all match the first moment of their corresponding discrete distributions, which is another important feature that affects the quality of approximation [53]. We refer the reader to [54] to see that for each distribution on the left, the corresponding continuous distribution on the right has the same mean. These approximations are best when certain limit conditions are satisfied, e.g. λ ≥ 10 for approximating a Poisson distribution with Gaussian, hence the values in the program itself do affect the overall approximation accuracy.

However, if we are not careful, a statement level transformation could introduce runtime errors. For example, a Binomial is always non-negative, but its Gaussian approximation could be negative. This is why <sup>T</sup> <sup>β</sup> <sup>E</sup> [•] has multiple transformations for the same distribution. For example, in addition to using a Gaussian to approximate both a Binomial and a Poisson, we also have a Gamma approximation since a Gamma distribution is always non-negative. Likewise we have a Beta approximation to a Bernoulli if we require that the approximation also have support in the range [0, 1]. Leios uses auto-tuning to safeguard against such errors during the synthesis phase, whereby when sampling the transformed program, if we encounter a run-time error of this nature, we simply go back and try a safer (but possibly slower) alternative (Algorithm 1 line 12). Since there are only finitely many variables and (safer) transformations to apply, this process will eventually terminate. For discrete distributions not supported by the specific approximations, but with fixed parameters, we empirically sample them to get a set of samples and then use a Kernel Density Estimate (KDE) [62] with a Gaussian kernel (the KDE bandwidth is precisely β) as the approximation.

Lastly, by default all discrete random variables become approximated with continuous versions, however we leave the option to the user to manually specify CONST in front of a variable if they do not wish for it to be approximated (in which case we no longer make any theoretical guarantees about continuity).

#### **4.3 Influence Analysis and Control-Flow Correction of Predicates**

Simply changing all instances of discrete distributions in the program to continuous ones is not enough to closely approximate the semantics of the original program. We additionally need to ensure that such changes do not introduce control flow errors into the program, in the sense that quantitative properties such as the probability of taking a particular branch need to be reasonably preserved.

**Avoiding Zero Probability Events** A major concern of the approximation is to ensure that no zero-probability events are introduced, such as when we have an exact equality "==" predicate in an if, observe or while statement and the variable being checked was transformed from a discrete to a continuous type. For example, discrete programs commonly have a statement like x := Poisson(1) followed by a conditional such as if (x==4), because the probability that a discrete random variable is exactly equal to a value can be non-zero. However upon applying our distribution transformations and transforming the distribution of x from a discrete Poisson to a continuous Gaussian, the conditional statement "if (x==4)" now corresponds to a **zero probability** (or measure zero) event, as the probability that an absolutely continuous probability measure assigns to the singleton set {4} is by definition zero. Thus, if not corrected for, we could significantly change the probabilities of taking certain branches and hence the overall distribution of the program.

The converse can also be true: applying approximations can make a zero probability event in the original program now have non-zero probability. For example, in x := DiscUniform(1,5); if (x<3 and x>2) the true branch has probability zero of executing but this becomes non-zero after approximations are applied. However, the branch paths like these in the original model could be identified by symbolic analysis (e.g., [24]) and removed via dead code elimination during pre-processing.

**Correcting Control Flow Probabilities via Static Analysis** To prevent zero-probability events and ensure that the branch execution probabilities of the continualized program closely matches the original's, we use data dependence analysis to track which if, while or condition statements have logical comparisons with variables "tainted" by the approximations. A variable v is "tainted" if it has a transitive data dependence on an approximated variable, and we use reaching definitions analysis [35] on the program's CFG to identify these.

As shown in Algorithm 1 line 4, to compute the reaching definitions analysis we use a method called ComputeDataFlow() as part of a pre-transformation pass whereby for each program point in the CFG, each variable is marked with all the other variables on which it has a data-dependence. These annotations are stored in a data structure called DataDepGraph which maps nodes (program points) to sets of tuples where each tuple contains a variable, the other variables it depends on (and where they are assigned), and lastly, whether it will become tainted. Note that in the algorithm this step is done before the previously discussed expression-level transformations, hence why ComputeDataFlow() marks which variables will become continualized and which ones will not (i.e if a variable already defines a continuous random variable or was annotated with CONST). Furthermore, though we are computing the data dependencies before the approximations, because the approximations do not re-order or remove statements, all data dependencies will be the same before and after applying the approximations.

**Transform Operator For Boolean Expressions** We take all such control predicates that contain an exact equality "==" comparison with a tainted variable and transform these predicates from exact equality predicates to interval-style predicates. Thus if we originally had a predicate of the form if(x==4) we will mutate this into a predicate of the form if(x>4-θ<sup>1</sup> && x<4+θ2) where θ are now placeholder values that will need to be filled with a concrete value during the synthesis phase (Section 5). Hence checking for exact equality gets relaxed to checking for containment within the interval (4 − θ1, 4 + θ2). We also need to correct < and <= predicates if one of the variables was approximated or transitively affected by an approximation.

Hence we also define our transform operator <sup>T</sup> <sup>β</sup> <sup>B</sup> [•] : BExpr <sup>→</sup> BExpr at the level of Boolean expressions:

$$\begin{aligned} \mathcal{T}\_{\mathcal{B}}^{\beta}[(x == y)] &= \begin{cases} (y - \theta\_1 < x) \text{ and } (x < y + \theta\_2) & \text{default} \\ (x == y) & \text{CONST } x \text{ and } \textsf{CONST } y \text{ specified} \end{cases} \\ \mathcal{T}\_{\mathcal{B}}^{\beta}[(x < y)] &= \begin{cases} (x < y + \theta) & \text{if } x \text{ or } y \text{ tained} \\ (x < y) & \text{otherwise} \end{cases} \\ \mathcal{T}\_{\mathcal{B}}^{\beta}[(x \le y)] &= \begin{cases} (x \le y + \theta) & \text{if } x \text{ or } y \text{ tained} \\ (x \le y) & \text{otherwise} \end{cases} \end{aligned}$$

Because we have already pre-computed DataDepGraph one can check if a variable in a given statement or expression is tainted (or marked as CONST) in constant time.

This correction has a natural interpretation in classical probability theory. It is well known that to approximate a discrete distribution X with a continuous one Xˆ, we need a continuity correction factor, <sup>θ</sup>, such that <sup>P</sup>(X<x) <sup>≈</sup> <sup>P</sup>(X<x <sup>ˆ</sup> <sup>+</sup> <sup>θ</sup>) (hence why <sup>T</sup> <sup>β</sup> <sup>B</sup> [•] also corrects <sup>&</sup>lt; and <= predicates). For simple approximations (i.e Binomial to Gaussian), the canonical correction factor is known (θ = 0.5) [23], however for the general case, it is not. Furthermore, it has been shown that in many cases, 0.5 is not the best correction factor [3].

#### **4.4 Bringing it all together: Full Program Transformations**

Having defined the transformation for distributions, arithmetic and Boolean expressions, we now define the program transformation operator <sup>T</sup> <sup>β</sup> <sup>P</sup> [•]: P rogram <sup>→</sup> P rogram inductively:

$$\begin{array}{rcl} \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_1; P\_2] & = & \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_1]; \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_2] \\\\ \mathcal{T}\_{\mathcal{P}}^{\beta}[\mathtt{if} \ \{\mathtt{B}\} \ \{P\_1\} \ \mathtt{else} \ \{P\_2\}] & = & \mathtt{if} \ \{\mathcal{T}\_{\mathcal{B}}^{\beta}[B]\} \ \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_1] \ \mathtt{else} \ \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_2] \\\\ \mathcal{T}\_{\mathcal{P}}^{\beta}[\mathtt{while} \{\mathtt{B}\} \ P\_1] & = & \mathtt{while} \{\mathcal{T}\_{\mathcal{B}}^{\beta}[B]\} \ \mathcal{T}\_{\mathcal{P}}^{\beta}[P\_1] \\\\ \mathcal{T}\_{\mathcal{P}}^{\beta}[\mathtt{condition}(\mathtt{B})] & = & \mathtt{condition}(\mathcal{T}\_{\mathcal{B}}^{\beta}[B]) \\\\ \mathcal{T}\_{\mathcal{P}}^{\beta}[\mathtt{x} & \mathtt{=} \mathtt{E}] & = & \mathtt{x} \ \mathtt{:=} \mathcal{T}\_{\mathcal{E}}^{\beta}[E] \\\\ \mathcal{T}\_{\mathcal{P}}^{\beta}[\mathtt{CONST} \ \mathtt{x} & \mathtt{:=} \mathtt{E}] & = & \mathtt{x} \ \mathtt{:=} \mathtt{E} \end{array}$$

The abort, factor and skip statements and the DataBlock remain the same after applying the transformation operator <sup>T</sup> <sup>β</sup> <sup>P</sup> [•].

**Ensuring Smoothness** Upon applying the statement-level transformations and performing both dataflow analysis and predicate mutations, Leios ensures each latent variable comes from a continuous distribution. However a continuous distribution may still have jump discontinuities or non-differentiable regions in its density function (such as a uniform distribution), which can make inference difficult [66]. Furthermore it is known that performing parameter estimation on data that is distributed according to a discontinuous or non-smooth density function, or on distributions with a nonsmooth likelihoods can be just as challenging [50, 1, 59]. Thus to make the Program's likelihood function and density function of the observed data fully smooth, we need to apply additional Gaussian smoothing.

Since it would be redundant to apply smoothing if we already knew this variable came from a smooth distribution (as in the example) hence we make this simple check first. The following transformation performs this on the observed variables (which appear in the factor statement).

$$\begin{aligned} \, ^G T\_{\mathcal{P}}^{\beta} \begin{bmatrix} \mathbf{x}\_o & \mathbf{:} \ & \mathbf{E} \end{bmatrix} = \begin{cases} \mathbf{x}\_o & \mathbf{:} \ & \mathbf{:} \ & \text{if } x \text{ already smooth} \\ \mathbf{x}\_o & \mathbf{:} \ & \mathbf{:} \ & \mathbf{Guuss} \ \mathbf{i} \ & \mathbf{E}, \beta \ & \text{otherwise} \end{cases} \end{aligned}$$

We could perform additional smoothing for every variable to ensure each has a differentiable density, however we empirically observed that the variance added up enough to where inference quality deteriorated, hence we only apply the additional smoothing to observed variables.

Having defined the statement-level transformations we now state a theorem about T β <sup>P</sup> [•] preserving continuity. As many applications may invoke inference at any point in the program [46, 60], it is important that absolute continuity of each marginal hold at every point.

**Theorem 1.** In the transformed program, <sup>T</sup> <sup>β</sup> <sup>P</sup> [P], the marginal sub-probability measure of each variable, denoted μ<sup>x</sup><sup>i</sup> , is absolutely continuous with respect to the Lebesgue measure (denoted μ<sup>x</sup><sup>i</sup> is A.C.) at each program point for which that variable is defined.

Proof. (sketch) To prove the theorem we will show that when any variable x<sup>i</sup> is initially defined, it comes from an absolutely continuous distribution and furthermore that the semantics of each statement in <sup>T</sup> <sup>β</sup> <sup>P</sup> [P] preserves the absolute continuity of each marginal measure (where <sup>μ</sup><sup>x</sup><sup>i</sup> <sup>≡</sup> <sup>μ</sup>(<sup>R</sup> <sup>×</sup> ... <sup>×</sup> <sup>B</sup><sup>i</sup> <sup>×</sup> <sup>R</sup>... <sup>×</sup> <sup>R</sup>)), equivalently for any statement, any (already defined) variable <sup>x</sup><sup>i</sup> and any Borel set <sup>B</sup><sup>i</sup> ∈ B{R}:

<sup>μ</sup>(<sup>R</sup> <sup>×</sup> ... <sup>×</sup> <sup>B</sup><sup>i</sup> <sup>×</sup> <sup>R</sup>... <sup>×</sup> <sup>R</sup>) is A.C. <sup>⇒</sup> statement(μ)(<sup>R</sup> <sup>×</sup> ... <sup>×</sup> <sup>B</sup><sup>i</sup> <sup>×</sup> <sup>R</sup>... <sup>×</sup> <sup>R</sup>) is A.C.

Case 1. skip and abort: Since skip is the identity measure transformer of each defined marginal measure μ<sup>x</sup><sup>i</sup> was A.C. before, then they will trivially be so afterward since they are unchanged. abort sends each marginal to the **0** sub-measure (which is trivially A.C.).

Case 2. condition and factor: Since factor and condition only lose measure we have condition(B)(μ)(S) <sup>≤</sup> <sup>μ</sup>(S) and factor(xk,t)(μ)(S) <sup>≤</sup> <sup>μ</sup>(S) for any Borel set <sup>S</sup>. Thus <sup>μ</sup>(S)=0 <sup>⇒</sup> condition(B)(μ)(S) = 0 and <sup>μ</sup>(S)=0 <sup>⇒</sup> factor(xk,t)(μ)(S)=0 since all measures are non-negative. Hence by transitivity, since <sup>μ</sup>(R×...Bi×R...) is A.C., factor(xk,t)(μ)(S)(R×...B<sup>i</sup> <sup>×</sup>R...×R) is A.C. and likewise for similar reasons, we have that condition(B)(μ)(<sup>R</sup> <sup>×</sup> ...B<sup>i</sup> <sup>×</sup> <sup>R</sup>... <sup>×</sup> <sup>R</sup>) is A.C.

Case 3. Assignment: Probabilistic assignment is straightforward. Since the continualized program only samples from absolutely continuous distributions, the marginal of the sampled variable x<sup>i</sup> will be A.C. and all other marginals μ<sup>x</sup><sup>j</sup> were A.C. by assumption. Deterministic assignment has to be handled carefully. In the continualized program the only deterministic assignments will be x<sup>i</sup> := a\*xj+b; for a = 0 (all other assignments are smoothed). The marginal μ<sup>x</sup><sup>i</sup> (S) is just μ<sup>x</sup><sup>j</sup> (aS + b) where the set aS <sup>+</sup> <sup>b</sup> ≡ {<sup>s</sup> <sup>∈</sup> <sup>R</sup> <sup>|</sup> <sup>a</sup> · <sup>s</sup> <sup>+</sup> <sup>b</sup> <sup>∈</sup> <sup>S</sup>}. However by assumption of the A.C. of <sup>x</sup><sup>j</sup> , Leb(aS + b)=0 ⇒ μ<sup>x</sup><sup>j</sup> (aS + b) = 0, but Leb(S)=0 ⇔ Leb(aS + b) = 0 [55], hence: Leb(S)=0 ⇒ Leb(aS + b)=0 ⇒ μ<sup>x</sup><sup>j</sup> (aS + b) = 0. Lastly by the semantic definition of xi, we have that μ<sup>x</sup><sup>j</sup> (aS + b)=0 ⇒ μ<sup>x</sup><sup>i</sup> (S) = 0, hence Leb(S)=0 ⇒ μ<sup>x</sup><sup>i</sup> (S) = 0 by transitivity. All other marginals are unchanged, hence A.C. of each is preserved.

Case 4. Sequencing, if and while: Intuitively since the above statements each preserve A.C of each marginal, any sequencing of them should too. Since the sum of two measures that are both A.C. in each marginal is also A.C. in each marginal, if statements preserve A.C. of each marginal. For this same reason while loops also preserve A.C.

# **5 Synthesis of Continuity Correction Parameters**

We now present our procedure for synthesizing optimal continuity correction parameters which covers lines 6 to 15 in Algorithm 1. This can be thought of as a "training" step which fits the continualized model to the original one. It is important to note that this step is agnostic to the observed data (it only fits to the Model), hence it need only be done once off-line, regardless of how many times we perform inference on new data sets. Furthermore, even if we do not have parameters to synthesize, this step is still useful for catching runtime errors caused by the approximations, so that we can go back and apply safer approximations if necessary.

#### **5.1 Optimization Framework**

Ideally the posteriors of our approximated program <sup>T</sup> <sup>β</sup> <sup>P</sup> [P] and the original <sup>P</sup>, should be reasonably close. However a specific posterior is induced by the corresponding dataset, if our optimization objective tries to minimize the statistical distance from <sup>T</sup> <sup>β</sup> <sup>P</sup> [P] to P, we would simply be over-fitting to the data and we would not be able to re-use T β <sup>P</sup> [P] for new data sets with different true parameters. Instead our objective is to minimize the distance between the original model M, which is simply the fragment of P that does not contain the data or observe block (and hence only defines the prior, likelihood and latent variables), and the corresponding continualized approximation, T β <sup>P</sup> [M]. To do so, we need to choose the best possible continuity correction factors, <sup>θ</sup>, for <sup>T</sup> <sup>β</sup> <sup>P</sup> [M]. Thus we define the "optimal" parameters as those which minimize a distance metric <sup>d</sup> between probability measures <sup>d</sup> : <sup>M</sup>(R<sup>n</sup>) <sup>×</sup> <sup>M</sup>(R<sup>n</sup>) <sup>→</sup> [0, <sup>∞</sup>). We also need to ensure that the metric can (a) compute the distance between discrete and continuous distributions and (b) is such that if models or likelihoods are close with respect to d, the posteriors should be as well.

**Wasserstein Distance** We choose to use the Wasserstein distance primarily because (1) it can measure the distance between a continuous and discrete distribution (unlike KL-Divergence or Total Variation Distance) and (2) prior work has shown that when performing inference, if using the Wasserstein distance as the chosen metric to approximate a likelihood, the (approximate) posteriors induced are comparable to the true posteriors (obtainable if one used the true likelihood) [49]. Additionally, unlike other metrics, the Wasserstein metric incorporates the underlying difference in geometry of the distributions (which strongly affects inference accuracy [37, 59]).

Let -<sup>M</sup>(μ0) represent the renormalized measure associated to the observed variables of the original model and let -T β <sup>P</sup> [Mθ](μ0) represent the observed variables of the continualized model, but where a given continuity correction factor θ has been substituted in (both measures start in initial distribution <sup>μ</sup>0). Furthermore, let **<sup>J</sup>** <sup>⊆</sup> <sup>M</sup>(R<sup>2</sup>) represent the set of all joint measures with marginal measures -<sup>M</sup>(μ0) and -T β <sup>P</sup> [Mθ](μ0). Hence we now define the 1-Wasserstein Distance:

$$W(\left[M\right](\mu\_0), \left[\mathcal{T}\_\mathcal{P}^\beta[M\_\theta]\right](\mu\_0)) = \inf\_{J \in \mathcal{J}} \int ||x - y|| dJ(x, y) \tag{1}$$

We also provide further justification why the Wasserstein Distance is a sensible metric to use. It is well known that a mixture of Gaussians can converge in distribution to any continuous random variable, however existing work has shown that a mixture of Gaussians can approximate any discrete distribution in the Wasserstein Distance arbitrarily well [20].

**Objective Function** We now formulate our optimization approach as follows, where <sup>ˆ</sup><sup>θ</sup> is the parameter vector minimizing the Wasserstein Distance with respect to the original model M, and d is the number of parameters to synthesize.

$$\hat{\theta} = \operatorname\*{argmin}\_{\theta \in \{0, 1\}^d} W(\left[ \!\!\!M \!\!\! \right](\mu\_0), \left[ \!\!\!T\_{\mathcal{P}}^{\beta}[M\_{\theta}] \!\!\! \right](\mu\_0)) \tag{2}$$

To restrict the search space we follow common practice [23, 3] by requiring each θ<sup>i</sup> ∈ (0, 1). Such optimization problem lacks a closed form solution. Symbolically computing the Wasserstein Distance is intractable, hence we numerically approximate it via the empirical Wasserstein Distance (EWD) between observed samples of <sup>M</sup> and <sup>T</sup> <sup>β</sup> <sup>P</sup> [Mθ]. Because this step is fully dynamic (we run and sample the model), the samples are conditioned upon successfully terminating, and hence the model's sub-measure has been implicitly renormalized to a full probability measure, thus justifying the use of a fully renormalized measure in equations (1) and (2).


```
1 Function Synthesize P, T β
                        P [P];
   Input : A program P and a continualized sketch T β
                                                P [P] with d parameters to
           be synthesized
   Output: A fully continuous probabilistic program PC and a binary flag
           denoting the existence of a runtime error
2 if d==0 then
 3 s ←sample(T β
                 P [P],n);
 4 if s==Error then
 5 return T β
                 P [P], false
 6 end
7 end
8 else
 9 M, T β
          P [M] ←getModel(P, T β
                             P [P]);
10 for θi ∈ Grid([0, 1]d) do
11 p, s ← Nelder-Mead(W,θi,M,T β
                                    P [M],η,
                                           ,n);
12 if s==Error then
13 return T β
                    P [P], false
14 end
15 if W(p) < W(ˆθ) then
16 ˆθ ← p
17 end
18 end
19 end
20 return substitute(T β
                      P [P], ˆθ), true
```
Though intuitively we would expect that as we apply less smoothing (i.e. β < 1), the optimal θ<sup>i</sup> should also be smaller (less need for correction) and the continualized program should become closer to the original, a simple negative result illustrates this is not always the case and that the dependence between the smoothing and continuity correction must be non-linear.

# Remark 1. ˆθ **cannot be linearly proportional** to β.

Proof. Let X be the constant random variable that is 0 with probability 1 and let X ∼ Gaussian(0, β). Furthermore, let I := (X == 0) and I<sup>c</sup> := (cβ ≤ X ≤ cβ) be two indicator random variables. Intuitively we want I<sup>c</sup> to have the same probability of being true as I for any β. However if c is constant (such as 1) then P r(cβ ≤ X ≤ cβ) will **always** be the same regardless of β (when c = 1, the probability is always 0.68).

#### **5.2 Optimization Algorithm**

Algorithm 2 presents our approximate synthesis algorithm, which is called as a subroutine in the main algorithm. As seen in line 2, if there are no parameters to be synthesized (d == 0) we still sample the continualized program in hopes of uncovering a possible runtime error (or gaining statistical confidence that one does not occur). We check for such an error in line 4 and if one exists, we return immediately, with the flag variable set to false (line 5).

To evaluate the EWD objective function (when there are parameters to synthesize), Algorithm 2 follows a technique from [14] and uses a Nelder-Mead search (line 11), due to Nelder-Mead's well known success in solving non-convex program synthesis problems. We first extract the fragment of the programs corresponding to the models, <sup>M</sup> and <sup>T</sup> <sup>β</sup> <sup>P</sup> [M], respectively in line 9. In each step of the Nelder-Mead search we take <sup>n</sup> samples (<sup>n</sup> <sup>≈</sup> 500) of <sup>T</sup> <sup>β</sup> <sup>P</sup> [M], but with a fixed value of <sup>θ</sup><sup>i</sup> substituted into <sup>T</sup> <sup>β</sup> <sup>P</sup> [M], to compute the EWD with respect to samples of the original model M (which have been cached to avoid redundant resampling). The Nelder-Mead search steps through the parameter space (with step size η > 0), substituting different values of θ into T β <sup>P</sup> [M]. This process continues until the search converges to a minimizing parameter, p, that is within the stopping threshold > 0 or encounters a runtime error during the sampling (which is checked in line 12). As before, if we encounter such an error we immediately return with the flag set to false (line 13). Following [14], we successively restart the Nelder-Mead search from k evenly spaced grid points in [0, 1]<sup>d</sup> (hence the loop in line 10), to find the globally optimal parameter (hence our approach is robust to local minima), which we successively update in lines 15-16. If no runtime error was ever encountered, we substitute in the parameters with the minimum EWD over all runs, <sup>ˆ</sup>θ, to the fully continuous program <sup>T</sup> <sup>β</sup> <sup>P</sup> [P] and return (line 20). Though it can be argued this sampling is potentially as difficult as the original inference, we reiterate that we need only do this once offline, hence the cost is easily amortized.

# **6 Methodology**

#### **6.1 Benchmarks**

Table 1 presents the benchmarks. For each benchmark, Columns 2 and 3 present the original prior and likelihood type, respectively. Column 4 presents whether the continuity correction was applied. Column 5 presents the time to continualize the program, TCont.. As can be seen in Columns 4 and 5 the total continualization time, TCont., depends on whether parameters had to be synthesized. GPAExample had the longest TCont. at 3.6s, due to the complexity of the multiple predicates, however these times are amortized as our synthesis step is done only once.

As our problem has received little attention, no standard benchmark suites exist. In fact, to make inference tractable, for many models, developers would construct continuous approximations by hand, in an ad hoc fashion. However we wanted a benchmark suite that showcased all 3 inference scenarios that our approach works for: (1) discrete/hybrid prior and discrete/hybrid likelihood (2) continuous prior but discrete/hybrid likelihood and (3) discrete/hybrid prior but a continuous likelihood. Therefore, we obtained the benchmarks in two ways. First, we looked at variations of the mixed distributions benchmarks previously published in the machine learning community, e.g., [65, 58], which served as the inspiration for our GPAExample. Second, we took existing benchmarks [27, 30] for which designers modeled certain distributions with continuous approximations, and we retro-fitted these models with the corresponding discrete distributions. This step was done for Election, Fairness, SVMfairness, SVE, and TrueSkill. These discretizations were only applied where they made sense, e.g., the Gauss(np,np(1-p)) in the original Election program became discretized as Binomial(n,p). We also took popular Bayesian models from Cognitive Science literature which use multiple discrete latent variables [39] and these models


Table 1: Description of Benchmarks

are BetaBinomial and Exam. Lastly we took population models from the mathematical biology literature [10, 4] to build benchmarks since populations are by nature discrete. This was done for Plankton and DiscreteDisease. We present the original programs in the appendix [38].

**Implementation** We implemented Leios in Python (∼4.5K LoC). All experiments were run on an Intel Xeon, multi-core desktop running Ubuntu 16.04 with a 3.7 GHz CPU and with 32GB RAM. All results are obtained from single-core executions.

#### **6.2 Experimental Setup**

**Continualized Versions** As there are no other general tools that automatically continualize probabilistic programs in mainstream languages, we compare Leios with:


We will refer to these as simply "Original" and "Naive" respectively.

**Inference Accuracy Comparison using Ground Truth** Our experimental design compares the respective inference estimates with the ground truth. We set the experiments as follows: For each of the original discrete or hybrid programs P, we replace the program variable corresponding to the prior distribution with a fixed value τ (the ground-truth) to obtain P(τ ). We then sample P(τ ) to obtain 25 observed data points, which will be used to test inference performance on P, PNS, and PLeios respectively. To test inference performance we then score P (original program), PNS (naively smoothed program), and PLeios against the observed data points to infer the posterior over the ground truth parameter τ . Note the programs only have access to the data samples, but not τ .

For each of the 3 versions: P, PNS, and PLeios, we take the inferred posterior means as the estimates of the value, and then compare it with the ground-truth value τ to measure the error ratio E =  <sup>τ</sup>−τest τ . This entire procedure is repeated for 10 different values of τ to get a representative average of inference performance over a wide range of true parameter values.


Table 2: Inference Times (s) and Error Ratios for each model, β = 0.1

**Analyzed Probabilistic Programming Systems.** We used two languages in our development: WebPPL [26] (with MCMC inference) and Pyro [8] (with Variational inference). Our implementation automatically generates WebPPL code for all the programs. We used 3500 MCMC samples (with burn-in of 700 samples) in the simulation. For Pyro, we only wanted to test fully-automatic black-box Variational Inference, hence we did not manually marginalize out discrete variables (which is often not even applicable, as the discrete variables are the one we wish to estimate).

**Inference Time Measurement** We measure the time taken for inference for each version using built-in timers (which exclude file reading and warm-up). A timeout of 10 minutes was used for the inference step. We used this same procedure for both MCMC-based sampling in WebPPL and Variational Inference in Pyro.

# **7 Evaluation**

We study the following three research questions:

**RQ1** Can program continualization make inference faster, while still maintaining a high degree of accuracy, compared to the original program and naive smoothing?

**RQ2** How do performance and accuracy vary for different smoothing factors β?

**RQ3** Can program continualization enable running transformed programs with offthe-shelf inference algorithms that cannot execute the original programs?

# **7.1 RQ1: Benefits of Continualization**

Table 2 presents detailed timing and accuracy errors for a single smoothing factor β on WebPPL programs. Columns 2 and 3 present the time and error (compared to the ground truth) for the original program. Columns 4 and 5 present time/error for the naive smoothing and Columns 6 and 7 present time/error for Leios.

From Table 2 we can see that on average, Leios leads to faster inference than both the Original (no approximations) and Naive (0.584s vs 2.797s and 0.894s, respectively). The Naive version was also faster than the original, giving more evidence that continuous models (even when just the observed variable is continualized) yield faster inference.

Fig. 7: Inference Times and Error ratios for Leios and Naive for different β

For accuracy, inference performed via Leios was on average more accurate than Naive (E = 0.079 vs. 0.098, respectively). Both were slightly less accurate than inference performed on Original (E = 0.043). This is not unreasonable as Original has no approximations applied (which are the main source of inference error). However the Original failed on Election, SVE, and SVMfairness. For Election, a large Binomial latent led to a timeout, and it also slowed the Naive version relative to Leios (3.23s vs 0.61s). The Original failed on SVE since it is a hybrid discrete-continuous model (which can make inference intractable [65, 6]). SVMfairness is a non-linear model where many latent variables have high variances, leading to inference on the Original failing to converge; Leios and Naive had higher error on this benchmark, for much the same reason (though Leios was still significantly better than Naive, E = 0.261 vs 0.454).

Although Leios was faster than Original in all cases, for TrueSkill and SVMfairness, Leios was somewhat slower than Naive. This is likely because the discrete latent variables in these benchmarks had small enough parameters (Binomial with small n). Similarly, for Fairness, Leios was slightly less accurate than Naive because the Gaussian approximation can be less accurate for smaller n.

#### **7.2 RQ2: Impact of Smoothing Factors**

Figure 7 presents the average inference times and ERs for different smoothing factors β. In both cases, X-axes represent smoothing factors. The Y-Axis of the left subfigure presents time, and Y-Axis of the right presents error ratio compared to the ground truth (less is better).

Figure 7 (a) shows that Inference on the programs constructed by Leios is nontrivially faster than inference done on the naively smoothed version, regardless of the β used (which has negligible affect on the inference time for the β we examined).

Figure 7 (b) presents how accuracy directly depends on β. The Error Ratio for Leios reaches a local minimum when β = 0.1. Because Leios achieves "global" smoothing by approximating each latent, a larger value for β is not needed (unlike Naive). We also noticed for many benchmarks, smaller β led to better continuity correction parameters which also leads to better inference. Naive's performance suffers for smaller β, which we attribute to small β creating a highly multimodal observed variable distribution (also presented in Section 2) which hampers inference [37, 59]. Consequently, Naive performs best when β = 0.5, however this β introduces non-trivially higher variance, which may often negatively affect the precision of inference.


Table 3: Variational Inference Times (s) and Error Ratios for selected β

#### **7.3 RQ3: Extending Results to Other Systems**

Table 3 presents the results for running translated programs in Pyro. Columns 2-5 present the inference times and result errors for the original and naively smoothed program. These columns are "-" when Pyro cannot successfully perform inference (i.e. the model contains a discrete variable that is unsupported by the auto guide). Columns 6-11 present Leios' time and error for each model, for three different smoothing parameters.

Fully-automated Variational Inference failed on all but one of the examples for both the Original and Naive. This is because in both cases the program still contains latent or observed discrete random variables. For most of the benchmarks (Election, GPA, TrueSkill) the program optimized with Leios had errors comparable to those computed previously with MCMC in WebPPL. For some the error was over 0.5 for all β (BetaBinomial, Fairness), which is in part a consequence of limitations of automatic VI, and hence for certain models manual fine-tuning may be unavoidable. These results illustrate that Leios can be used to create an efficient program in situations when the original language does not easily support non-continuous distributions.

# **8 Related Work**

**Probabilistic Program Synthesis** To the best of our knowledge, we are the first to study program transformations that approximate discrete or hybrid discretecontinuous probabilistic programs with fully continuous ones to improve inference. Probabilistic program synthesis takes a more ambitious task of generating probabilistic programs with certain properties directly from data. For instance, Nori et al. [51] aim to synthesize a probabilistic program given a program sketch and a data-set to fit the program to. However, it merely fits the distribution parameters to the sketch. Furthermore their language lacks '==' comparisons. Chasins et al. [11] takes a similar approach but only apply continuous approximations to already continuous variables.

**Probabilistic Inference with Discrete and Hybrid Distributions** Recent work [65, 66] has explored developing languages and semantics to encode discretecontinuous mixtures, however these all restrict the types of programs that can be expressed and require specialized inference algorithms. In contrast, Leios can work with a variety of off-the-shelf inference algorithms that operate on arbitrary models and does not need to define its own inference algorithm. In [66] the authors explored a restricted programming language that can statically detect which parameters the program's density is discontinuous in. However they did not address the question of continuous approximation, rather their approach was to develop a custom inference scheme and restrict the language so that pathological models cannot be written (they also disallow '==' predicates). In [65], Wu et al. develop a custom inference method for discrete-continuous mixtures but only for models encodeable as a Bayesian network, furthermore as pointed out by [47], the specialized inference method of Wu et al. is restrictive since it cannot be composed with other program transformations.

Additionally, Machine Learning researchers have developed other continuous relaxation techniques to address the inherent problems of non-differentiable models. One other popular method is to reparametrize the gradient estimator during Variational Inference (VI) computation, commonly called the "reparameterization trick" [42, 61]. However, this approach suffers from the fact that not all distributions support such gradient reparameterizations, and also this method is only limited to Variational Inference. Conversely our approach allows one to still use any inference scheme. Further, even though these techniques have been attempted in the probabilistic programming setting, [40], such work still inherits the aforementioned weaknesses.

We also draw upon Kernel Density Estimation (KDE) [62], a common approximation scheme in statistics. KDE fits a Kernel density to each observed data point, hence constructing a smooth approximation. Naive Smoothing is essentially a KDE (with a Gaussian Kernel) of the original while Leios employs additional continualizations. Furthermore, our smoothing factor β is analogous to the bandwidth of a KDE.

**Program Analysis for Probabilistic Programs** Multiple Program Analysis frameworks and systems have been developed for Probabilistic Programming [57, 33, 63, 32, 22]. Additionally these analyses make use of a rich set of semantics [44, 36, 7, 64, 19], however of particular note is recent work by Lew et al. [41], which provides a type system for reasoning about variational approximations; however they focus on continuous approximations of already continuous variables.

**Benefits of Continuity in Conventional Programs** The idea of smoothing and working with continuous functions in non-probabilistic programs has found success in a variety of applications [21, 12, 34, 13]. Our work derives inspiration mainly from Smooth interpretation [14], which provides a semantics for smoothing deterministic programs encoding a discontinuous or discrete function.

# **9 Conclusion**

We presented Leios as a method for approximating probabilistic programs with fully continuous versions. Our approach shows that by continualizing probabilistic programs, it is possible to achieve substantial speed-ups in inference performance whilst still preserving a high degree of accuracy. To this effect we combined two key techniques: statement level program transformations to continualize latent variables and a novel continuity correction synthesis procedure to correct branch conditions.

#### **Acknowledgements**

We would like to thank the anonymous reviewers for their constructive feedback. We thank Darko Marinov for his helpful feedback during early stages of the work. We thank Adithya Murali for valuable feedback about the semantics. We thank Zixin Huang and Saikat Dutta for helpful discussions about the evaluation and Vimuth Fernando and Keyur Joshi for helpful proofreads. JL is grateful for support from the Alfred P. Sloan foundation for a Sloan Scholar award used to support much of this work. The research presented in this paper has been supported in part by NSF, Grant no. CCF-1846354.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Semantic Foundations for Deterministic Dataflow and Stream Processing**

Konstantinos Mamouras

Rice University, Houston TX 77005, USA mamouras@rice.edu

**Abstract.** We propose a denotational semantic framework for deterministic dataflow and stream processing that encompasses a variety of existing streaming models. Our proposal is based on the idea that data streams, stream transformations, and stream-processing programs should be classified using types. The type of a data stream is captured formally by a monoid, an algebraic structure with a distinguished binary operation and a unit. The elements of a monoid model the finite fragments of a stream, the binary operation represents the concatenation of stream fragments, and the unit is the empty fragment. Stream transformations are modeled using monotone functions on streams, which we call stream transductions. These functions can be implemented using abstract machines with a potentially infinite state space, which we call stream transducers. This abstract typed framework of stream transductions and transducers can be used to (1) verify the correctness of streaming computations, that is, that an implementation adheres to the desired behavior, (2) prove the soundness of optimizing transformations, e.g. for parallelization and distribution, and (3) inform the design of programming models and query languages for stream processing. In particular, we show that several useful combinators can be supported by the full class of stream transductions and transducers: serial composition, parallel composition, and feedback composition.

**Keywords:** Data streams · Denotational semantics · Type system

# **1 Introduction**

Stream processing is the computational paradigm where the input is not presented in its entirety at the beginning of the computation, but instead it is given in an incremental fashion as a potentially unbounded sequence of elements or data items. This paradigm is appropriate in settings where data is created continually in real-time and has to be processed immediately in order to extract actionable insights and enable timely decision-making. Examples of such datasets are streams of business events in an enterprise setting [26], streams of packets that flow through computer networks [37], time-series data that is captured by sensors in healthcare applications [33], etc.

Due to the great variety of streaming applications, there are various proposals for specialized languages, compilers, and runtime systems that deal with the processing of streaming data. Relational database systems and SQL-based languages have been adapted to the streaming setting [1,2,15,16,18,19,32,37,57,91]. Recently, several systems have been developed for the distributed processing of data streams that are based on the distributed dataflow model of computation [6, 7, 70, 86, 92, 94, 108, 112, 113]. Languages for detecting complex events in distributed systems, which draw on the theory of regular expressions and finite-state automata, have also been proposed [29, 40, 41, 50, 53, 88, 99, 111]. The synchronous dataflow formalisms [20, 24, 28, 51, 73, 107] are based on Kahn's seminal work [59], and they have been used for exposing and exploiting tasklevel and pipeline parallelism within streaming computations in the context of embedded systems. Several formalisms for the runtime verification of reactive systems have been proposed, many of which are based on variants of Temporal Logic and its timed/quantitative extensions [39, 43, 52, 74, 105]. Finally, there is a large collection of languages and systems for reactive programming [34,36,38,46,47,55,68,69,77,89,93,103], which focus on the development of event-driven and interactive applications such as GUIs and web programming.

The aforementioned languages and systems have been successfully used in the application domains for which they were developed. However, each one of them typically introduces a unique variant of the streaming model in terms of: (1) the form of the input and output data, (2) the class of expressible stream-processing computations, and (3) the syntax employed to describe these computations. This has resulted in an enormous proliferation of semantic models for stream processing that are difficult to compare. For this reason, we are interested in identifying a semantic unification of several existing streaming models.

This paper introduces a *typed semantic framework* for reasoning about languages and systems for stream processing. Three key questions are tackled:

1. How do we model *streams* and what is the form of the data that they carry? 2. How do we capture mathematically the notion of a *stream transformation*?

3. What is a general *programming model* for specifying streaming computations? The first two questions concern the discovery of an appropriate *denotational model* for streaming computation. The third question concerns the design of programming and query languages, where a key requirement is that the behavior of a streaming program/query admits a precise mathematical description. Existing works have addressed these questions in the context of specific classes of applications. Here are examples of various perspectives:

− *Transductions of strings* [8, 100, 104, 110]: A stream is viewed as an unbounded sequence of letters, and a stream transformation is a translation from input sequences to output sequences, which is typically called string/word transduction. These translations are commonly described using finite-state transducers, a class of automata that extend acceptors with output.

− *The streaming dataflow model of Gilles Kahn* [59, 60]: The input and output consist of multiple independent channels that carry unbounded sequences of elements. A transformation is a function from a tuple of input sequences to a tuple of output sequences. Such transformations are specified with dataflow graphs whose nodes describe single-process computations.

− *Relational transformations* [71]: A stream is an unbounded multiset (bag) of tuples, and a stream transformation is a monotone operator (w.r.t. multiset containment) on multisets. This can be generalized to consider more than one input stream. An interesting subclass of these operators can be described syntactically using monotone relational algebra.

− *Processing of time-varying relations* [16, 17]: A stream is a timevarying finite multiset of tuples, i.e. an unbounded sequence of finite multisets of tuples. In this setting, a stream transformation processes the input in a way that preserves the notion of time: after processing t input multisets (i.e., t time units) the output consists of t output multisets. The query language CQL [16] defines a class of such computations that involve relational and windowing operators.

− *Transformations of continuous-time signals* [27]: An input stream is a continuous-time signal, that is, a function from the real numbers <sup>R</sup> to an ndimensional space R<sup>n</sup>. A stream transformation is a mapping from input signals to output signals that is *causal*, which means that the value of the output at time t depends on the values of input signal up to (and including) time t. Systems of differential equations can be used to describe classes of such transformations.

We are interested here in a unifying framework that encompasses all the aforementioned concrete instances of streaming models and enables formal reasoning about the composition of streaming computations from different models. In order to achieve this we take an *abstract algebraic approach* that retains only the essential aspects of stream processing without any unnecessary specialization. The rest of the section outlines our proposal.

At the most fundamental level, stream processing is computation over input that is not given at the beginning in full, but rather is presented incrementally as the computation evolves. Since the input is presented piece by piece, the basic concepts that need to be captured mathematically are: (1) what is a *piece* or *fragment of the input*, and (2) how do we *extend the input*. The most general class of algebraic structures that model these notions is the class of *monoids*, the collection of algebras that have a distinguished binary associative multiplication operation · and an identity element 1 for this operation. A monoid (A, ·, 1) then constitutes a *type of data streams*, where the elements of the monoid are all the possible *finite stream fragments*, the identity 1 <sup>∈</sup> A is the *empty stream* fragment, and the multiplication operation · : A <sup>×</sup> A <sup>→</sup> A models the *concatenation* of stream fragments. Using monoids, we can organize several notions of data streams using types that describe the form of the data, as well any invariants or assumptions about them. Monoids encompass the kinds of data streams that we mentioned earlier and many more: strings of letters, linear sequences of data items, tuples of sequences, multisets (bags) of data items, sets of data items, time-varying relations/multisets, (potentially disordered) timestamped sequences of data items, continuous-time signals, and so on.

Stream transformations can be classified according to the type of their input and output streams, which we call a *transduction type*. They are modeled using *monotone functions* that map an input stream history (i.e., the fragment of the input stream that has been received from the beginning of the computation until now) to an output stream history (i.e., the fragment of the output stream produced so far). The monotonicity requirement captures the idea that a stream transformation cannot retract the output that has already been emitted. We call such functions *stream transductions*, and we propose them as a denotational semantic model for stream processing. This model encompasses string transductions, non-diverging Kahn-computable [59] functions on streams, monotone relational transformations [71], the CQL-definable [16] transformations on time-varying relations, and transformations of continuous-time signals [27].

We also introduce an abstract model of computation for stream processing. The considered programs or abstract machines are called *stream transducers*, and they are organized using *transducer types* that specify the input and output stream types. A stream transducer processes the input stream in an incremental fashion, by consuming it fragment by fragment. The consumption of an input fragment results in the emission of an output fragment. Our algebraic setting brings in an unavoidable complication compared to the classical theory of word transducers: not all stream transducers describe a stream transduction. This phenomenon has to do with the generalization of the input and output data streams from sequences of atomic data items to elements of arbitrary monoids. A stream transducer has to respect its input/output type, which means that the way in which the input stream is fragmented into pieces and fed to the transducer does not affect the cumulative output. More concisely, this says that the cumulative output is independent from the fragmentation of the input. In order to formalize this notion, we say that a *factorization* of an input history u is a sequence of stream fragments <sup>u</sup>1, u2,...,u<sup>n</sup> whose concatenation is equal to the input history, i.e. <sup>u</sup><sup>1</sup> ·u<sup>2</sup> ··· <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>u</sup>. Now, the desired restriction can be described as follows: for every input history <sup>w</sup> and any two factorizations <sup>u</sup>1,...,u<sup>m</sup> and <sup>v</sup>1,...,v<sup>m</sup> of <sup>w</sup>, the cumulative output that the transducer emits when consuming the fragments <sup>u</sup>1,...,u<sup>m</sup> in sequence is equal to the cumulative output when consuming the fragments v<sup>1</sup>,...,v<sup>n</sup>. Fortunately, this complex property can be distilled into an equivalent property on the structure of the stream transducer that we call *coherence property*. Every stream transducer that is coherent has a well-defined semantics or *denotation* in terms of a stream transduction.

We have already outlined the basics of our general framework for streaming computation, which includes: (1) a classification of streams using monoids as types, (2) a denotational semantic model that employs monotone functions from input histories to output histories, and (3) a programming model that generalizes transducers to compute meaninfully on elements of arbitrary monoids. This already allows us to address important questions about specific computations:


The first question is a *correctness* property. The second question is relevant for *semantics-preserving program optimization*. We will turn now to the issue of how to modularly specify complex stream transductions and transducers.

One of the most common ways to conceptually organize complex streaming computations is to view the overall computation as the composition of several processes that run independently and are connected with directed communication channels on which streams of data flow. This way of structuring computations is called the *dataflow programming model*. The simple deterministic parallel model of Karp and Miller [61] is one of the first variants of dataflow, and other notable early works on dataflow models include Dennis's parallel language of actors and links [42] and Kahn's networks [59] of computing stations and communication lines. We investigate three key *dataflow combinators* for composing stream transductions (i.e., semantic-level) and stream transducers (i.e., program-level): *serial* composition, *parallel* composition, and *feedback* composition. Serial composition is useful for describing pipelines of processing stages, where the output of one stage is streamed as input into the next stage. Parallel composition describes the independent and concurrent computation of two or more components. Feedback composition supports computations whose current output depends on previously produced outputs. We show that our framework supports all these combinators, which facilitate the modular description of complex computations and expose pipeline and task-based parallelism.

*Outline of paper.* In Sect. 2 we introduce the idea that data streams can be classified using monoids as their types, and in Sect. 3 we propose the semantic model of stream transductions. Sect. 4 is devoted to the description of an abstract model of streaming computation, called stream transducer, and the main properties that it satisfies. In Sect. 5 we show that our abstract model is closed under a fundamental set of dataflow combinators: serial, parallel, and feedback composition. In Sect. 6 we prove the soundness of a streaming optimizing transformation using denotational arguments and algebraic rewriting. Sect. 7 contains related work, and Sect. 8 concludes with a brief summary of our proposal.

# **2 Monoids as Types for Streams**

Data streams are typically viewed as unbounded linear sequences of data items, where a data item can be thought of as a small indivisible piece of data. This viewpoint is sufficient for describing many useful semantic and programming models, but it is too concrete and unnecessarily restricts the notion of a data stream. In order to see this, consider a computation where the specific order in which the data items arrive is not relevant. Counting is a trivial example of such a computation, and it can be described operationally as follows: every time a new data item arrives, the counting stream algorithm emits the total number of items that have been seen so far. This can be described mathematically by the function <sup>β</sup>, given by <sup>β</sup>(x<sup>1</sup>, x<sup>2</sup>,...,x<sup>n</sup>) = 1, <sup>2</sup>,...,n, where x<sup>1</sup>, x<sup>2</sup>,...,x<sup>n</sup> is the input and 1, <sup>2</sup>,...,n is the cumulative output of the computation. For this computation, the input can be meaningfully viewed as a *multiset* (or bag) instead of a sequence, since the ordering of the data items is irrelevant. This means that multisets can also be viewed as data streams, and in some cases this viewpoint is preferable to the traditional one of "streams = sequences".

The example of the previous paragraph raises an obvious question: What class of mathematical objects can meaningfully serve as data streams? Linear sequences and multisets should certainly be included, but it would be desirable to generalize the notion of streams as much as possible. Recent works explore the idea of generalizing streams to encompass a large class of *partial orders* [13, 85], but we will see later that this approach excludes many useful instances. Stream processing is the computational paradigm where the input is not presented in full at the beginning of the computation, but instead it is given in an incremental fashion or *piece by piece*. For this reason, there are just three notions that need to be modeled mathematically: (1) a *fragment* or piece of a data stream, (2) the *extension* of data with an additional fragment of data, and (3) the *empty* data stream, i.e. the data seen at the very beginning of the computation. This leads us to consider a *kind* or *type of a data stream* as an algebraic structure that satisfies the following: (1) its elements model data stream fragments, (2) it has a distinguished associative operation · for the concatenation of stream fragments, and (3) it has a distinguished element 1 that represents the empty fragment so that 1 is a unit for concatenation. The class of monoids is the largest class of algebraic structures that fulfill these requirements.

More formally, a *monoid* is an algebraic structure (A, ·, 1), where · : A×A <sup>→</sup> A is a binary operation called *multiplication* and 1 <sup>∈</sup> A is a constant called *unit*, that satisfies the following two axioms: (I) (x· y)· z <sup>=</sup> x·(y · z) for all x, y, z <sup>∈</sup> A, and (II) 1 · x <sup>=</sup> x · 1 = x for all x <sup>∈</sup> A. The first axiom says that · is associative, and the second axiom says that 1 is a left and right identity for the · operation. For brevity, we will sometimes write xy to denote x · y.

Suppose that A is a monoid. We write A<sup>∗</sup> for the set of all finite sequences of elements of A and ε for the empty sequence. The *finite multiplication* function π : A<sup>∗</sup> <sup>→</sup> A is given by <sup>π</sup>(ε) = 1 and <sup>π</sup>(¯<sup>x</sup> · y) = <sup>π</sup>(¯x) · <sup>y</sup> for ¯<sup>x</sup> <sup>∈</sup> <sup>A</sup><sup>∗</sup> and <sup>y</sup> <sup>∈</sup> <sup>A</sup>. For sequences ¯x, y¯ <sup>∈</sup> A∗, it holds that π(¯x · y¯) = π(¯x) · π(¯y). So, π generalizes the binary multiplication · to a finite but arbitrary number of arguments.

Let (A, ·<sup>A</sup>, <sup>1</sup>A) and (B, ·<sup>B</sup>, <sup>1</sup>B) be monoids. Their *product* is the monoid (A<sup>×</sup> B, ·, 1), where the multiplication operation is given by (x, y) · (x- , y- )=(<sup>x</sup> ·<sup>A</sup> x- , y ·<sup>B</sup> <sup>y</sup>- ) for x, x- <sup>∈</sup> A and y, y-<sup>∈</sup> B, and the identity is 1 = (1<sup>A</sup>, <sup>1</sup>B).

<sup>A</sup> *monoid homomorphism* from a monoid (A, ·, 1) to a monoid (B, ·, 1) is a function h : A <sup>→</sup> B that commutes with the monoid operations, that is, h(1) = 1 and h(x · y) = h(x) · h(y) for all x, y <sup>∈</sup> A.

As we discussed earlier, we can think of a monoid as a *type of data streams*. The elements of the monoid represent *finite stream fragments*. The multiplication operation · models the *concatenation* of stream fragments, and the unit of the monoid is the *empty stream fragment*.

For a monoid (A, ·, 1) we define the binary relation as follows: for all x, y <sup>∈</sup> A, we put x y if and only if xz <sup>=</sup> y for some z <sup>∈</sup> A. Since the relation is reflexive and transitive, we call it the *prefix preorder* for the monoid A. The unit 1 is a minimal element w.r.t. the relation: 1 · x <sup>=</sup> x and hence 1 x for every x <sup>∈</sup> A. Define the function prefix : A <sup>×</sup> A → P(A) as follows: prefix(x, y) = {z <sup>∈</sup> A <sup>|</sup> xz <sup>=</sup> y} for all x, y <sup>∈</sup> A. This implies that x y iff prefix(x, y) <sup>=</sup> <sup>∅</sup>. In other words, prefix(x, y) is the set of all witnesses for x y. A partial function ∂ : A <sup>×</sup> AA is said to be a *prefix witness function* (or simply a *witness function*) for the monoid A if its domain is equal to and it satisfies: ∂(x, y) <sup>∈</sup> prefix(x, y) for every x, y <sup>∈</sup> A with x y. We can express this equivalently by requiring that the type of the function ∂ is - (x,y)∈prefix(x, y).

We say that a monoid A satisfies the *left cancellation* property if xy <sup>=</sup> xz implies y <sup>=</sup> z for all x, y, z <sup>∈</sup> A. In this case we say that A is *left-cancellative*. If A is left-cancellative, then it has a unique prefix witness function, because x y implies that there is a unique z with xz <sup>=</sup> y.

**Example 1 (Finite Sequences).** Consider the algebra (FSeq(A), ·, ε), where FSeq(A) is the set A<sup>∗</sup> of all finite words (strings) over a set A, · is word concatenation, and ε is the empty word. This algebra is a monoid. In fact, it is the *free monoid* with generators A. For u, v <sup>∈</sup> A∗, u v iff the word u is a prefix of the word v. There is a unique prefix witness function, because for every x, y <sup>∈</sup> A<sup>∗</sup> with x y there is a unique z <sup>∈</sup> A<sup>∗</sup> such that xz <sup>=</sup> y.

Let us consider now a variant of Example 1 in order to clear any misunderstandings regarding the order. The set A∗, together with the empty sequence ε, and the operation ◦ given by x◦y <sup>=</sup> yx is a monoid. For the monoid (A∗, ε, ◦), we have that x y iff x ◦ z <sup>=</sup> zx <sup>=</sup> y for some z <sup>∈</sup> A∗. So, x y iff the word x is a *suffix* of the word y.

**Example 2 (Finite Multisets).** Consider the algebra (FBag(A), <sup>∪</sup>, <sup>∅</sup>), where FBag(A) is the set of all finite multisets (bags) over a set A, <sup>∪</sup> is multiset union, and ∅ is the empty multiset. This algebra is a monoid. In fact, it is the *free commutative monoid* with generators A. It is also left cancellative. For x, y <sup>∈</sup> FBag(A), x y iff x is contained in y. So, we also use the notation <sup>⊆</sup> instead of -. There is a unique prefix witness function, because for every x, y <sup>∈</sup> FBag(A) with x <sup>⊆</sup> y there is a unique z <sup>∈</sup> FBag(A) such that xz <sup>=</sup> <sup>y</sup>.

**Example 3 (Finite Sets).** Let A be a set. Consider the algebra (FSet(A), <sup>∪</sup>, <sup>∅</sup>), where FSet(A) is the set of all finite subsets of A, <sup>∪</sup> is set union, and <sup>∅</sup> is the empty set. This algebra is a monoid. In fact, it is the *free commutative idempotent monoid* with generators A. For x, y <sup>∈</sup> FBag(A), x y iff x is contained in y. So, we also use the notation <sup>⊆</sup> instead of -.

For x <sup>⊆</sup> y, define ∂(x, y) = y \ x, where \ is the set difference operation. Since x <sup>∪</sup> (y \ x) = y for x <sup>⊆</sup> y, ∂ is a prefix witness function. We also define τ (x, y) = y for x <sup>⊆</sup> y. Since x <sup>∪</sup> y <sup>=</sup> y for x <sup>⊆</sup> y, τ is a prefix witness function. So, FSet(A) has several distinct prefix witness functions.

**Example 4 (Finite Maps).** Let K be a set of keys, and V be a set of values. Consider the algebra (FMap(K, V ), ·, <sup>∅</sup>), where FMap(K, V ) is the set of all partial maps KV with a finite domain, <sup>∅</sup> is the partial map with empty domain, and · is defined as follows:

$$(f \cdot g)(k) = \begin{cases} g(k), & \text{if } g(k) \text{ is defined} \\ f(k), & \text{if } g(k) \text{ is undefined and } f(k) \text{ is defined} \\ \text{undefined}, & \text{otherwise} \end{cases}$$

for every f,g <sup>∈</sup> FMap(K, V ) and k <sup>∈</sup> K. We leave it to the reader to check that ∅ · f <sup>=</sup> f · ∅ <sup>=</sup> f and (f · g) · h <sup>=</sup> f · (g · h) for all f, g, h <sup>∈</sup> FMap(K, V ). So, the algebra FMap(K, V ) is a monoid.

Let f,g <sup>∈</sup> FMap(A). We write dom(f) = {k <sup>∈</sup> K <sup>|</sup> f(k) is defined} for the domain of f. It holds that dom(f · g) = dom(f) <sup>∪</sup> dom(g). Using this property, we see that f g iff dom(f) <sup>⊆</sup> dom(g).

Let f,g <sup>∈</sup> FMap(K, V ) with f g. Define ∂(f,g) = g. Since dom(f) <sup>⊆</sup> dom(g), we have that f ·∂(f,g) = g. It follows that ∂ is a prefix witness function. Define g \ f <sup>∈</sup> FMap(K, V ) as follows:

$$(g \nmid f)(k) = \begin{cases} g(k), & \text{if } g(k) \text{ is defined and } f(k) \text{ is undefined} \\ g(k), & \text{if } g(k), f(k) \text{ are defined and } g(k) \neq f(k) \\ \text{undefined}, & \text{otherwise} \end{cases}$$

for every k <sup>∈</sup> K. From f g we get f·(g\f) = g. So, \ is a prefix witness function. This means that FMap(K, <sup>V</sup> ) has several distinct prefix witness functions.

**Example 5 (Bounded-Domain Continuous-Time Signals).** Let A be an arbitrary set, and R be the set of real numbers. A bounded-domain continuoustime signal with values in A is a function f : [0, u) <sup>→</sup> A where u <sup>≥</sup> 0 is a real number and [u, v) = {t <sup>∈</sup> <sup>R</sup> <sup>|</sup> u <sup>≤</sup> t<v}. We define the *concatenation* operation · for such signals as follows:

$$\frac{f: [0, u) \to A \qquad g: [0, v) \to A}{f \cdot g: [0, u + v) \to A} \quad (f \cdot g)(t) = \begin{cases} f(t), & \text{if } t \in [0, u) \\ g(t - u), & \text{if } t \in [u, u + v) \end{cases}$$

We write BSig(A) for the set of all these bounded-domain continuous-time signals. The *unit* signal is the unique function of type [0, 0) <sup>→</sup> A, whose domain of definition is empty. Observe that BSig(A) is a monoid. For signals f : [0, u) <sup>→</sup> A and g : [0, v) <sup>→</sup> A, it holds that f g iff u <sup>≤</sup> v and f(t) = g(t) for every t <sup>∈</sup> [0, u). There is a unique prefix witness function, because for every f,g <sup>∈</sup> BSig(A) with <sup>f</sup> g there is a unique h <sup>∈</sup> BSig(A) such that f · h <sup>=</sup> g.

**Example 6 (Timed Finite Sequences).** We write N to denote the set of natural numbers (non-negative integers). A *timed sequence* over A is an alternating sequence <sup>s</sup><sup>0</sup>a<sup>1</sup>s<sup>1</sup>a<sup>2</sup> ...a<sup>n</sup>s<sup>n</sup>, where <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> and <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup> for every <sup>i</sup>. The occurrences s<sup>0</sup>, s<sup>1</sup>,... are called *time punctuations* and indicate the passage of time. So, the set of all timed sequences over A is equal to TFSeq(A) = <sup>N</sup>·(A·N)∗. We define the *fusion product* of timed sequences as follows: <sup>s</sup><sup>0</sup>a<sup>1</sup>s<sup>1</sup> ...a<sup>m</sup>s<sup>m</sup> t<sup>0</sup>b<sup>1</sup>t<sup>1</sup> ...b<sup>n</sup>t<sup>n</sup> <sup>=</sup> <sup>s</sup><sup>0</sup>a<sup>1</sup>s<sup>1</sup> ...a<sup>m</sup>(s<sup>m</sup> <sup>+</sup> <sup>t</sup><sup>0</sup>)b<sup>1</sup>t<sup>1</sup> ...b<sup>n</sup>t<sup>n</sup>. The *unit* timed sequence is the singleton sequence 0. The algebra (TFSeq(A), , 0) is easily shown to be a monoid. There is a unique prefix witness function, because for all x, y <sup>∈</sup> TFSeq(A) with x y there is a unique z <sup>∈</sup> TFSeq(A) s.t. x z <sup>=</sup> y.

**Example 7 (Finite Time-Varying Multisets).** A *finite time-varying multiset* over A is a partial function f : <sup>N</sup> FBag(A) whose domain is equal to [0..n] = {0,...,n} for some integer n <sup>≥</sup> 0. We also use the notation f : [0..n] <sup>→</sup> FBag(A) to convey this information regarding the domain of f. We define the *concatenation* operation · for finite time-varying multisets as follows:

$$\frac{\begin{array}{l}f:[0..m]\to\mathsf{FBag}(A)\\g:[0..n]\to\mathsf{FBag}(A)\end{array}}{\begin{array}{l}g:[0..m]\to\mathsf{FBag}(A)\end{array}}\quad(f\cdot g)(t)=\begin{cases}f(t),&\text{if }t\in[0..m-1]\\f(t)\cup g(0),&\text{if }t=m\\g(t-m),&\text{if }t\in[m+1..n]\end{array}$$

We write TFBag(A) to denote the set of all finite time-varying multisets over A. The *unit* time-varying multiset Id : [0..0] <sup>→</sup> FBag(A) is given by Id(0) = <sup>∅</sup>. It is easy to see that f · Id <sup>=</sup> f and that Id · f <sup>=</sup> f for every f : [0..n] <sup>→</sup> FBag(A). We leave it to the reader to also verify that (f · g) · h <sup>=</sup> f · (g · h) for finite time-varying multisets f, g and h. So, the set TFBag(A) together with · and Id is a monoid. It is not difficult to show that it is left-cancellative.

Let us consider now the prefix preorder on finite time-varying multisets. For f : [0..m] <sup>→</sup> FBag(A) and g : [0..n] <sup>→</sup> FBag(A), it holds that f g iff m <sup>≤</sup> n and f(t) = g(t) for every t <sup>∈</sup> [0..m].

The examples above highlight the variety of mathematical objects that can be meaningfully viewed as streams. These streams can be organized elegantly using the structure of monoids. The sequences of Example 1, the multisets of Example 2, and the finite time-varying multisets of Example 7 can be described equivalently in terms of the partial orders of [13, 85], which have also been suggested as an approach to unify notions of streams. Using partial orders it is also possible to model the timed finite sequences of Example 6, but only with a non-succinct encoding: every time punctuation t <sup>∈</sup> <sup>N</sup> is encoded with a sequence <sup>11</sup> ... 1 of t punctuations, one for each time unit. Partial orders cannot encode the sets of Example 3, the maps of Example 4, or the signals of Example 5. Informally, the reason for this is that partial orders can only encode *commutation equations*, which are insufficient for objects such as sets and maps.

# **3 Stream Transductions**

In this section we will introduce *stream transductions* as semantic denotational models of stream transformations. At any given point in a streaming computation, we have seen an *input history* (the part of the stream from the beginning of the computation until now) and we have produced an *output history* (the cumulative output that has been emitted from the beginning until now). As a first approximation, a streaming computation can be described mathematically by a function β : A <sup>→</sup> B, where A and B are monoids that describe the input and output type respectively, which maps an input history x <sup>∈</sup> A to an output history β(x) <sup>∈</sup> B. The function β has to be *monotone* because the output is cumulative, which means that it can only be extended with more output items as the computation proceeds. An equivalent way to understand the monotonicity property is that it captures the idea that any output that has already been emitted cannot be retracted. Since β takes an entire input history as its argument, it can describe stateful computations, where the output that is emitted at every step potentially depends on the entire input history.

**Definition 8 (Stream Transduction & Incremental Form).** Let A and B be monoids. A function β : A <sup>→</sup> B is said to be *monotone* (with respect to the prefix preorder) if x y implies β(x) β(y) for all x, y <sup>∈</sup> A. For a monotone β : A <sup>→</sup> B, we say that the partial function μ is a *monotonicity witness function* if it maps elements x, y <sup>∈</sup> A and z <sup>∈</sup> prefix(x, y) witnessing that x y to a witness μ(x, y, z) <sup>∈</sup> prefix(β(x), β(y)) for β(x) β(y). That is, we require that the type of μ is - x,y∈<sup>A</sup>prefix(x, y) <sup>→</sup> prefix(β(x), β(y)). So, the defining property of μ is that for all x, y, z <sup>∈</sup> A with xz <sup>=</sup> y it holds that β(x) · μ(x, y, z) = β(y). For brevity, we will sometimes write μ(x, z) to denote μ(x, xz, z). The defining property of μ is then written as β(x) · μ(x, z) = β(xz) for all x, z <sup>∈</sup> A.

<sup>A</sup> *stream transduction* from A to B is a function β : A <sup>→</sup> B that is monotone with respect to the prefix preorder, together with a monotonicity witness function μ : - x,y∈<sup>A</sup>prefix(x, y) <sup>→</sup> prefix(β(x), β(y)). We write STrans(A, B) to denote the set of all stream transductions from A to B.

The *incremental form* of a stream transduction β,μ ∈ STrans(A, B) is a function <sup>F</sup>(β,μ) : A<sup>∗</sup> <sup>→</sup> B∗, which is defined inductively by <sup>F</sup>(β,μ)(ε) = β(1) and <sup>F</sup>(β,μ)(x1,...,x<sup>n</sup>, x<sup>n</sup>+1) = <sup>F</sup>(β,μ)(x1,...,x<sup>n</sup>) · μ(x<sup>1</sup> ··· <sup>x</sup><sup>n</sup>, x<sup>n</sup>+1) for every sequence x1,...,x<sup>n</sup>+1 ∈ A∗.

Consider the stream transduction β,μ : STrans(A, B) and the input fragments x, y <sup>∈</sup> A. Notice that μ(x, y) gives the *output increment* that the streaming computation generates when the input history x is extended into xy. For an arbitrary output monoid B, the output increment μ(x, y) is generally not uniquely determined by β(x) and β(xy). This means that the monotonicity witness function μ generally provides some additional information about the streaming computation that cannot be obtained purely from β. However, if the output monoid B is left-cancellative then there is a unique function μ that witnesses the monotonicity of β.

Suppose that β,μ : STrans(A, B) is a stream transduction. The incremental form <sup>F</sup>(β,μ) of the transduction β,μ describes the stream transformation in explicit input/output increments. For example, <sup>F</sup>(β,μ)(x<sup>1</sup>) = β(1), μ(1, x<sup>1</sup>) and <sup>F</sup>(β,μ)(x<sup>1</sup>, x<sup>2</sup>) = β(1), μ(1, x<sup>1</sup>), μ(x<sup>1</sup>, x<sup>2</sup>). The key property of the incremental form is that π(F(β,μ)(¯x)) = β(π(¯x)) for every ¯x <sup>∈</sup> A∗. For example, π(F(β,μ)(x<sup>1</sup>, x<sup>2</sup>, x<sup>3</sup>)) = β(1)·μ(1, x<sup>1</sup>)·μ(x<sup>1</sup>, x<sup>2</sup>)·μ(x<sup>1</sup>x<sup>2</sup>, x<sup>3</sup>) = β(x<sup>1</sup>)·μ(x<sup>1</sup>, x<sup>2</sup>)· μ(x<sup>1</sup>x<sup>2</sup>, x<sup>3</sup>) = β(x<sup>1</sup>x<sup>2</sup>) · μ(x<sup>1</sup>x<sup>2</sup>, x<sup>3</sup>) = β(x<sup>1</sup>x<sup>2</sup>x<sup>3</sup>).

**Example 9 (Counting).** Let A be an arbitrary set. We will describe a streaming computation whose input type is the monoid FBag(A) and whose output type is the monoid FSeq(N). The informal operational description is as follows: there is no initial output, and every time a new data item arrives the computation emits the total number of items seen so far. The formal description is given by the stream transduction β : FBag(A) <sup>→</sup> FSeq(N), defined by β(∅) = ε and β(x) = 1, <sup>2</sup>,..., <sup>|</sup>x| for every non-empty x <sup>∈</sup> FBag(A), where <sup>|</sup>x<sup>|</sup> denotes the size of the multiset x. It is easy to see that β is monotone. Since FSeq(N) is left-cancellative, the monotonicity witness function is uniquely determined: μ(x, <sup>∅</sup>) = ε and μ(x, y) = |x<sup>|</sup> + 1,..., <sup>|</sup>x<sup>|</sup> <sup>+</sup> <sup>|</sup>y| when y <sup>=</sup> <sup>∅</sup>.

**Example 10 (Per-Key Aggregation).** Let K be a set of keys, and V be a set of values. The elements of K <sup>×</sup> V are typically called key-value pairs. Suppose that op : V <sup>×</sup> V <sup>→</sup> V is an associative and commutative operation. So, op can be generalized to an aggregation operation that takes non-empty finite multisets over V as input. We will describe a streaming computation whose input type is the monoid FBag(K <sup>×</sup> V ) and whose output type is the monoid FMap(K, V ). Informally, every time an item (k, v) is processed, the output map is updated so that the k-indexed entry contains the aggregate (using op) of all values seen so far for the key k. The formal description of this computation is given by the stream transduction β : FBag(K <sup>×</sup> V ) <sup>→</sup> FMap(K, V ), defined by <sup>β</sup>(x) = {<sup>k</sup> → op(x|k) <sup>|</sup> <sup>k</sup> appears in <sup>x</sup>} for every multiset <sup>x</sup>, where <sup>x</sup>|<sup>k</sup> denotes the multiset that results from x by keeping only the pairs whose key is equal to k. That is, the domain of β(x) is equal to dom(β(x)) = {k <sup>∈</sup> K <sup>|</sup> k appears in x} and β(x)(k) = op(x|k) for every <sup>k</sup> that appears in <sup>x</sup>. The monotonicity witness function μ is defined as follows: μ(x, y) is equal to the restriction of the map β(x <sup>∪</sup> y) to the set of all keys that appear in y.

We saw in Sect. 2 that we can form products of monoids: if A and B are monoids, then so is A <sup>×</sup> B. Intuitively, we can think of A <sup>×</sup> B as the data stream type that involves two parallel and independent *channels*: one channel for streams of type A and another channel for streams of type B.

**Example 11 (Merging of Multiple Input Channels).** Given a set A, we want to describe a transformation with two input channels of type FBag(A) and one output channel of type FBag(A). The monotone function β : FBag(A) <sup>×</sup> FBag(A) <sup>→</sup> FBag(A), given by β(x, y) = x <sup>∪</sup> y for multisets x and y, describes the merging of the two input substreams. Operationally, whenever a new data item arrives (regardless of channel) it is propagated to the output channel. Since FBag(A) is left-cancellative, the monotonicity witness function is uniquely determined: <sup>μ</sup>(x<sup>1</sup>, y<sup>1</sup>,x<sup>2</sup>, y<sup>2</sup>)=(x<sup>2</sup> <sup>∪</sup>y<sup>2</sup>) \ (x<sup>1</sup> <sup>∪</sup>y<sup>1</sup>) for all <sup>x</sup><sup>1</sup>, y<sup>1</sup>, x<sup>2</sup>, y<sup>2</sup> <sup>∈</sup> FBag(A).

**Example 12 (Flatten).** Let A be a monoid. The function β : FSeq(A) <sup>→</sup> A, given by β(¯x) = π(¯x) for every ¯x <sup>∈</sup> FSeq(A), describes the *flattening* of a sequence of monoid elements. The function β is monotone, and its monotonicity witness function μ is given by μ(¯x, y¯) = π(¯y) for all ¯x and ¯y. The stream transduction *flatten*(A) = β,μ has type STrans(FSeq(A), A).

**Example 13 (Split in Batches).** Let Σ <sup>=</sup> {a, b} be an alphabet of symbols. Suppose that we want to describe the decomposition of an element of <sup>Σ</sup><sup>∗</sup> into batches of size exactly 3. We describe this using two functions <sup>r</sup><sup>1</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> FSeq(Σ∗) and <sup>r</sup><sup>2</sup> : <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup>∗. Informally, <sup>r</sup><sup>1</sup> gives the sequence of full batches of size 3, and <sup>r</sup><sup>2</sup> gives the remaining incomplete batch. For example, r<sup>1</sup>(abbaabba) = abb, aab and r<sup>2</sup>(abbaabba) = ba.

This idea of splitting in batches can be generalized from the monoid Σ<sup>∗</sup> to an arbitrary monoid A. We say that a *splitter* for A is a pair r = (r<sup>1</sup>, r<sup>2</sup>) of functions <sup>r</sup><sup>1</sup> : <sup>A</sup> <sup>→</sup> FSeq(A) and <sup>r</sup><sup>2</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> satisfying the following properties: (1) the equality <sup>x</sup> <sup>=</sup> <sup>π</sup>(r1(x)) · <sup>r</sup>2(x) says that <sup>r</sup><sup>1</sup> and <sup>r</sup><sup>2</sup> decompose <sup>x</sup> <sup>∈</sup> <sup>A</sup>, (2) <sup>r</sup>1(1A) = ε says that the unit cannot be decomposed, (3) r1(x · y) = r1(x) · r1(r2(x) · y) and (4) r2(x · y) = r2(r2(x) · y) describe how to decompose the concatenation of two monoid elements. The first two properties imply that <sup>r</sup>2(1A)=1A. The third property implies that <sup>r</sup><sup>1</sup> is monotone. Define μ(x, y) = r1(r2(x)·y) for x, y <sup>∈</sup> <sup>A</sup> and observe that <sup>r</sup>1(x)·μ(x, y) = <sup>r</sup>1(xy). It follows that *split*(r) = r1, μ is a stream transduction of type STrans(A, FSeq(A)).

Our denotational model of a stream transformation uses a monotone function whose domain is the monoid of (finite) input histories. We emphasize that such a denotation can also describe the transformation of an *infinite stream*. To illustrate this point in simple terms, consider a monotone function β : A<sup>∗</sup> <sup>→</sup> B∗, where A (resp., B) is the type of input (resp., output) items. This function extends uniquely to the ω-continuous function β<sup>∞</sup> : <sup>A</sup><sup>∞</sup> <sup>→</sup> <sup>B</sup><sup>∞</sup>, where <sup>A</sup><sup>∞</sup> <sup>=</sup> <sup>A</sup>∗∪ <sup>A</sup><sup>ω</sup> is the set of finite and infinite sequences over <sup>A</sup>, as follows: <sup>β</sup><sup>∞</sup>(a0a1a<sup>2</sup> ...) is equal to the supremum of the chain β(ε) <sup>≤</sup> β(a<sup>0</sup>) <sup>≤</sup> <sup>β</sup>(a0a<sup>1</sup>) <sup>≤</sup> ...

# **4 Model of Computation**

We will present an abstract model of computation for stream processing, where the input and output data streams are elements of monoids A and B respectively. A streaming algorithm is described by a transducer, a kind of automaton that produces output values. We consider transducers that can have a potentially infinite state space, which we denote by St. The computation starts at a distinguished initial state init ∈ St, and the initialization triggers some initial output <sup>o</sup> <sup>∈</sup> B. The computation then proceeds by consuming the input stream incrementally, i.e. fragment by fragment. One step of the computation from a state s <sup>∈</sup> St involves consuming an input fragment x <sup>∈</sup> A, producing an output increment out(s, x) <sup>∈</sup> B and transitioning to the next state next(s, x) <sup>∈</sup> St.

**Definition 14 (Stream Transducer).** Let A, B be monoids. A *stream transducer* with inputs from A and outputs from B is a tuple <sup>G</sup> = (St, init, <sup>o</sup>, next, out), where St is a nonempty set of *states*, init <sup>∈</sup> St is the *initial state*, <sup>o</sup> <sup>∈</sup> B is the *initial output*, next : St×A <sup>→</sup> St is the *transition function*, and out : St×A <sup>→</sup> B is the *output function*. We write <sup>G</sup>(A, B) to denote the set of all stream transducers with inputs from A and outputs from B.

We define the *generalized transition function* gnext : St <sup>×</sup> A<sup>∗</sup> <sup>→</sup> St by induction: gnext(s, ε) = s and gnext(s,x · y¯) = gnext(next(s, x), y¯) for all s <sup>∈</sup> St, x <sup>∈</sup> A and ¯y <sup>∈</sup> A∗. A state s <sup>∈</sup> St is said to be *reachable* in <sup>G</sup> if there exists a sequence ¯x <sup>∈</sup> A<sup>∗</sup> such that gnext(init, x¯) = s.

We define the *generalized output function* gout : St <sup>×</sup> A<sup>∗</sup> <sup>→</sup> B by induction on the second argument: gout(s, ε) = 1 and gout(s,x · y¯) = out(s, x) · gout(next(s, x), y¯) for all s <sup>∈</sup> St, x <sup>∈</sup> A and ¯y <sup>∈</sup> A∗. The *extended output function* eout : St×A<sup>∗</sup> <sup>→</sup> B<sup>∗</sup> is defined similarly: eout(s, ε) = ε and eout(x,x·y¯) = out(s, x) · eout(next(s, x), y¯) for all s <sup>∈</sup> St, x <sup>∈</sup> A and ¯y <sup>∈</sup> A∗.

**Example 15 (Transducer for Counting).** Recall the counting streaming computation that was described in Example 9. We will describe a stream transducer that implements the counting computation. The input monoid is FBag(A) and the output monoid is FSeq(N). The state space is St = N, because the transducer has to maintain a counter that remembers the number of data items seen so far. The initial state is init = 0 and the initial output is <sup>o</sup> <sup>=</sup> ε. The transition function increments the counter, i.e. next(s, x) = s <sup>+</sup> <sup>|</sup>x<sup>|</sup> for every s <sup>∈</sup> St and x <sup>∈</sup> FBag(A). The output function is defined by out(s, <sup>∅</sup>) = ε and out(s, x) = s + 1,...,s <sup>+</sup> <sup>|</sup>x| for a nonempty multiset x. The type of this transducer is <sup>G</sup>(FBag(A), FSeq(N)).

**Example 16 (Transducer for Merging).** We will implement the merging computation of Example 11, where there are two input channels of type FBag(A) and one output channel of type FBag(A). The transducer does not need memory, so St <sup>=</sup> Unit, where Unit <sup>=</sup> { } is a singleton set. The initial state is init <sup>=</sup> and the initial output is o = ∅. There is only one possibility for the transition function: next(s,x, y) = . The output function describes the propagation of the input increments of both input channels to the output channel: out(s,x, y) = x <sup>∪</sup> y for all multisets x, y. The type of this transducer is <sup>G</sup>(FBag(A) <sup>×</sup> FBag(A), FBag(A)).

**Example 17 (Flatten).** For a monoid A, we define a transducer Flatten(A) = (St, init, <sup>o</sup>, next, out) : <sup>G</sup>(FSeq(A), A) that implements the flattening transduction of Example 12. This computation does not require memory, so we define St <sup>=</sup> Unit and init <sup>=</sup> . The initial output is o = 1A, the transition function is uniquely determined by next(s, x) = , and the output function is given by out(s,a1,...,a<sup>n</sup>) = <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup>.

**Example 18 (Split in Batches).** For a monoid <sup>A</sup> and a splitter <sup>r</sup> = (r<sup>1</sup>, r<sup>2</sup>) for A (Example 13), we describe a transducer Split(r)=(St, init, <sup>o</sup>, next, out) that implements the transduction *split*(r) : STrans(A, FSeq(A)). We define St <sup>=</sup> A, because the transducer needs to remember the remainder of the cumulative input that does not yet form a complete batch, and init = 1A. The initial output <sup>o</sup> <sup>=</sup> ε is the empty sequence. The transition and output functions are defined by next(s, x) = r<sup>2</sup>(<sup>s</sup> · <sup>x</sup>) and out(s, x) = <sup>r</sup><sup>1</sup>(<sup>s</sup> · <sup>x</sup>).

Definition 14 does not capture a key requirement for streaming computations over monoids, namely that the cumulative output of a transducer G should be independent of the particular way in which the input history is split into the fragments that are fed to it. More precisely, suppose that w is an input history that can be fragmented (factorized) in two different ways: <sup>w</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> · <sup>u</sup><sup>2</sup> ··· <sup>u</sup><sup>m</sup> and <sup>w</sup> <sup>=</sup> <sup>v</sup><sup>1</sup> · <sup>v</sup><sup>2</sup> ··· <sup>v</sup><sup>n</sup>. Then, the cumulative output of the transducer <sup>G</sup> when consuming the sequence of fragments (factorization) <sup>u</sup><sup>1</sup>, u<sup>2</sup>,...,u<sup>m</sup> should be equal to the cumulative output when consuming v<sup>1</sup>, v<sup>2</sup>,...,v<sup>n</sup>. In Definition 20 below, we formulate a set of *coherence conditions* that a transducer must adhere to in order to satisfy this "factorization independence" requirement.

**Definition 19 (Bisimulation & Bisimilarity).** Let <sup>G</sup> = (St, init, <sup>o</sup>, next, out) be a transducer with inputs from A and outputs from B. A relation R <sup>⊆</sup> St <sup>×</sup> St is a *bisimulation* for <sup>G</sup> if for every s, t <sup>∈</sup> St and x <sup>∈</sup> A we have that (s, t) <sup>∈</sup> R implies out(s, x) = out(t, x) and (next(s, x), next(t, x)) <sup>∈</sup> R. We will also use the notation sRt to mean (s, t) <sup>∈</sup> R. We say that the states s, t <sup>∈</sup> R are *bisimilar*, denoted s <sup>∼</sup> t, if there exists a bisimulation R for <sup>G</sup> such that sRt. The relation ∼ is called the *bisimilarity relation* for G.

It is well-known that the bisimilarity relation for G is an equivalence relation (reflexive, symmetric, and transitive), and for all s, t <sup>∈</sup> St and x <sup>∈</sup> A it satisfies the following *extension property*: s <sup>∼</sup> t implies that next(s, x) <sup>∼</sup> next(t, x). It can then be easily seen that the bisimilarity relation is a bisimulation. In fact, it is the largest bisimulation for the transducer G.

**Definition 20 (Coherence).** Suppose <sup>G</sup> = (St, init, <sup>o</sup>, next, out) : <sup>G</sup>(A, B) is a stream transducer. We say that G is *coherent* if it satisfies the following:

(N1) next(init, 1) <sup>∼</sup> init.

(N2) next(init, xy) <sup>∼</sup> next(next(init, x), y) for every x, y <sup>∈</sup> A.

(O1) <sup>o</sup> · out(init, 1) = <sup>o</sup>.

(O2) <sup>o</sup> · out(init, xy) = <sup>o</sup> · out(init, x) · out(next(init, x), y) for every x, y <sup>∈</sup> <sup>A</sup>.

The coherence conditions of Definition 20 capture the idea that the transducer behaves in "essentially the same way" regardless of how the input is split into fragments. For example, the condition (N2) says that the two-step transition init <sup>→</sup><sup>x</sup> <sup>s</sup><sup>1</sup> <sup>→</sup><sup>y</sup> <sup>s</sup><sup>2</sup> and the single-step transition init <sup>→</sup>xy <sup>t</sup><sup>1</sup> end up in states (s<sup>2</sup> and <sup>t</sup><sup>1</sup>) that will have exactly the same behavior in the subsequent computation. In other words, it does not matter whether the input xy was fed to the transducer as a single fragment xy or as a sequence of two fragments x, y.

Let (A, ·, 1) be a monoid. A *factorization* of an element x <sup>∈</sup> A is a sequence <sup>x</sup>1,...,x<sup>n</sup> of elements of <sup>A</sup> such that <sup>x</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> ··· <sup>x</sup><sup>n</sup>. In particular, the empty sequence ε <sup>∈</sup> A<sup>∗</sup> is a factorization of 1. In other words, ¯x <sup>∈</sup> A<sup>∗</sup> is a factorization of x <sup>∈</sup> A if π(¯x) = x.

**Theorem 21 (Factorization Independence).** Let <sup>G</sup> = (St, init, <sup>o</sup>, next, out) be a stream transducer of type <sup>G</sup>(A, B). If <sup>G</sup> is coherent, then for every x <sup>∈</sup> A and every factorization ¯x <sup>∈</sup> A<sup>∗</sup> of x we have that <sup>o</sup> · gout(init, x¯) = <sup>o</sup> · out(init, x).

*Proof.* For clarity, we write x<sup>1</sup>, x<sup>2</sup>,...,x<sup>n</sup> ∈ <sup>A</sup><sup>∗</sup> to denote a finite sequence of elements of A. The following properties hold for all s <sup>∈</sup> St, ¯x <sup>∈</sup> A<sup>∗</sup> and y <sup>∈</sup> A:

$$\mathfrak{gnext}(s,\bar{x}\cdot\langle y\rangle) = \mathfrak{next}(\mathfrak{gnext}(s,\bar{x}),y) \tag{1}$$

$$\mathbf{gout}(s,\bar{x}\cdot\langle y\rangle) = \mathbf{gout}(s,\bar{x}) \cdot \mathbf{out}(\mathbf{gnex}(s,\bar{x}),y) \tag{2}$$

$$\mathsf{eout}(s,\bar{x}\cdot\langle y\rangle) = \mathsf{eout}(s,\bar{x})\cdot\langle \mathsf{out}(\mathsf{gnext}(s,\bar{x}),y)\rangle\tag{3}$$

Each property shown above can be proved by induction on the sequence ¯x.

Consider an arbitrary *coherent* stream transducer <sup>G</sup> = (St, init, <sup>o</sup>, next, out). We claim that G satisfies the following coherence property:

$$\mathsf{gnext}(\mathsf{init}, \langle x\_1, \ldots, x\_n \rangle) \sim \mathsf{next}(\mathsf{init}, x\_1 \cdots x\_n) \text{ for all } \langle x\_1, \ldots, x\_n \rangle \in A^\*. \quad (\mathsf{N}^\*) $$

The proof is by induction on the length of the sequence. For the base case, we have that gnext(init, ε) = init and next(init, 1) are bisimilar because <sup>G</sup> is coherent (recall Property (N1) of Definition 20). For the induction step we have:

$$\begin{aligned} \texttt{gnext}(\texttt{init}, \bar{x} \cdot \langle y \rangle) &= \texttt{next}(\texttt{gnext}(\texttt{init}, \bar{x}), y) & \text{[Equation (1)]}\\ &\sim \texttt{next}(\texttt{next}(\texttt{init}, \pi(\bar{x})), y) & \text{[I.H., extension]}\\ &\sim \texttt{next}(\texttt{init}, \pi(\bar{x}) \cdot y), & \text{[otherwise (N2)]} \end{aligned}$$

which is equal to next(init, π(¯x· y)). This concludes the proof of the claim (N\*).

The proof of the theorem proceeds by induction on ¯x <sup>∈</sup> A∗. For the base case, observe that <sup>o</sup> · gout(init, ε) = <sup>o</sup> · 1 = <sup>o</sup> is equal to <sup>o</sup> · out(init, 1) = <sup>o</sup> (property (O1) for G). For the induction step, we have:

$$\begin{aligned} \mathbf{o} \cdot \mathbf{g} \mathbf{out} (\mathbf{in} \mathbf{it}, \bar{x} \cdot \langle y \rangle) &= \mathbf{o} \cdot \mathbf{g} \mathbf{out} (\mathbf{in} \mathbf{it}, \bar{x}) \cdot \mathbf{out} (\mathbf{gen} \mathbf{xt} (\mathbf{in} \mathbf{it}, \bar{x}), y) & \quad \text{[Eq. (2)]}\\ &= \mathbf{o} \cdot \mathbf{out} (\mathbf{in} \mathbf{it}, \pi(\bar{x})) \cdot \mathbf{out} (\mathbf{gen} \mathbf{ext} (\mathbf{in} \mathbf{it}, \bar{x}), y) & \quad \text{[I.H.]}\\ &= \mathbf{o} \cdot \mathbf{out} (\mathbf{in} \mathbf{it}, \pi(\bar{x})) \cdot \mathbf{out} (\mathbf{next} (\mathbf{in} \mathbf{it}, \pi(\bar{x})), y) & \quad \text{[Prop. (N}^\*)]\\ &= \mathbf{o} \cdot \mathbf{out} (\mathbf{in} \mathbf{it}, \pi(\bar{x}) \cdot y) & \quad \text{[Prop. (O2)]} \end{aligned}$$

which is equal to <sup>o</sup> · out(init, π(¯x · y)).

Theorem 21 says that the condition of coherence guarantees a basic correctness property for stream transducers: the output that they produce does not depend on the specific way in which the input was partitioned into fragments.

For a transducer <sup>G</sup> = (St, init, <sup>o</sup>, next, out) we define the function -G : A<sup>∗</sup> <sup>→</sup> B<sup>∗</sup> as follows: -G(¯x) = o · eout(init, x¯) for every ¯x <sup>∈</sup> A∗. We call -<sup>G</sup> the *interpretation* or *denotation* of <sup>G</sup>. The definition of -<sup>G</sup> implies that -<sup>G</sup>(ε) = o and the following holds for every ¯x <sup>∈</sup> A<sup>∗</sup> and y <sup>∈</sup> A:

$$\mathbb{E}[\mathcal{G}](\bar{x} \cdot \langle y \rangle) = \lceil \mathcal{G} \rceil(\bar{x}) \cdot \langle \mathsf{out}(\mathsf{gnext}(\mathsf{init}, \bar{x}), y) \rangle \tag{4}$$

When G is coherent, Theorem 21 says that the denotation gives the same cumulative output for any two factorizations of the input. We say that the transducers <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup> are *equivalent* if their denotations are equal, i.e. -<sup>G</sup><sup>1</sup> <sup>=</sup> -<sup>G</sup><sup>2</sup>.

**Definition 22 (The Implementation Relation).** Let A, B be monoids, <sup>G</sup> : <sup>G</sup>(A, B) be a stream transducer, and β,μ : STrans(A, B) be a stream transduction. We say that <sup>G</sup> *implements* β,μ if -<sup>G</sup>(¯x) = <sup>F</sup>(β,μ)(¯x) for every ¯x <sup>∈</sup> A∗.

**Theorem 23 (Implementation & Coherence).** A stream transducer G : <sup>G</sup>(A, B) is coherent if and only if it implements some stream transduction.

*Proof.* Suppose that <sup>G</sup> = (St, init, <sup>o</sup>, next, out) : <sup>G</sup>(A, B) is a coherent transducer. Define the function β : A <sup>→</sup> B by β(x) = <sup>o</sup> · out(init, x) for every x <sup>∈</sup> A, and the function <sup>μ</sup> : <sup>A</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>B</sup> by <sup>μ</sup>(x, y) = out(next(init, x), y) for all x, y <sup>∈</sup> A. For any x, y <sup>∈</sup> A, we have to establish that β(x) · μ(x, y) = β(xy). This follows immediately from Part (O2) of the coherence property for <sup>G</sup>. So, β,μ is a stream transduction. It remains to prove that <sup>G</sup> implements β,μ, that is,


$$\begin{aligned} [\mathcal{G}](\bar{x} \cdot \langle y \rangle) &= [\mathcal{G}](\bar{x}) \cdot \langle \text{out}(\text{gnext}(\text{init}, \bar{x}), y) \rangle & \text{[Equation (4)]}\\ \mathsf{F}(\beta, \mu)(\bar{x} \cdot \langle y \rangle) &= \mathsf{F}(\beta, \mu)(\bar{x}) \cdot \langle \mu(\pi(\bar{x}), y) \rangle & \text{[def. of } \mathsf{F}(\beta, \mu)] \end{aligned}$$

By the induction hypothesis, it suffices to show that out(gnext(init, x¯), y) is equal to μ(π(¯x), y) = out(next(init, π(¯x)), y). This follows from the fact that gnext(init, x¯) and next(init, π(¯x)) are bisimilar, see Property (N\*).

For the converse, suppose that <sup>G</sup> = (St, init, <sup>o</sup>, next, out) : <sup>G</sup>(A, B) is a transducer that implements β,μ : STrans(A, B). Define the relation R as:

$$R = \{(s, t) \in \mathfrak{St} \times \mathfrak{St} \mid \text{there are } \bar{x}, \bar{y} \in A^\* \text{ with } \pi(\bar{x}) = \pi(\bar{y}) \text{ s.t.} $$

$$s = \mathfrak{gnext}(\text{init}, \bar{x}) \text{ and } t = \mathfrak{gnext}(\text{init}, \bar{y}) \text{.}$$

We claim that R is a bisimulation. Consider arbitrary states s, t <sup>∈</sup> St with sRt and z <sup>∈</sup> A. It follows that there are ¯x, y¯ <sup>∈</sup> A<sup>∗</sup> with π(¯x) = π(¯y) such that s <sup>=</sup> gnext(init, x¯) and t <sup>=</sup> gnext(init, y¯). We have to show that out(s, z) = out(t, z) and next(s, z) R next(t, z). First, notice that:

$$\begin{aligned} [\mathcal{G}](\bar{x} \cdot \langle z \rangle) &= [\mathcal{G}](\bar{x}) \cdot \langle \text{out}(s, z) \rangle & & \text{[Equation (4), def. of } s] \\ \mathsf{F}(\beta, \mu)(\bar{x} \cdot \langle z \rangle) &= \mathsf{F}(\beta, \mu)(\bar{x}) \cdot \langle \mu(\pi(\bar{x}), z) \rangle & & \text{[def. of } \mathsf{F}(\beta, \mu)] \end{aligned}$$

Since <sup>G</sup> implements β,μ, we have that -<sup>G</sup>(¯x · z) = <sup>F</sup>(β,μ)(¯x · z) and therefore out(s, z) = μ(π(¯x), z). Similarly, we can obtain that out(t, z) = μ(π(¯y), z). From <sup>π</sup>(¯x) = <sup>π</sup>(¯y) we get that μ(π(¯x), z) = μ(π(¯y), z), and therefore out(s, z) = out(t, z). Now, observe that s- <sup>=</sup> next(s, z) = next(gnext(init, x¯), z) = gnext(¯x · z) using Property 1. Similarly, we have that t - <sup>=</sup> next(t, z) = gnext(¯y · z). From π(¯x · z) = π(¯x)z <sup>=</sup> π(¯y)z <sup>=</sup> π(¯y · z) we conclude that s- Rt- . We have thus established that R is a bisimulation.

Now, we are ready to prove that G is coherent. We will only present the cases of Part (N2) and Part (O2), since they are the most interesting ones. Let x, y <sup>∈</sup> A. For Part (N2), we have to show that the states s <sup>=</sup> next(next(init, x), y) and t <sup>=</sup> next(init, xy) are bisimilar. Since R (previous paragraph) is a bisimulation, it suffices to show that (s, t) <sup>∈</sup> R. Indeed, this is true because s <sup>=</sup> gnext(init,x, y), t <sup>=</sup> gnext(init,xy) and π(x, y) = xy <sup>=</sup> π(xy). For Part (O2), we have that -<sup>G</sup>(xy) = o, out(init, xy) and <sup>F</sup>(β,μ)(xy) = β(1), μ(1, xy), as well as

$$\begin{aligned} \Pounds \mathsf{f}(\langle x, y \rangle) &= \langle \mathsf{o}, \mathsf{out}(\mathsf{init}, x), \mathsf{out}(\mathsf{next}(\mathsf{init}, x), y) \rangle \text{ and} \\ \mathsf{F}(\beta, \mu)(\langle x, y \rangle) &= \langle \beta(1), \mu(1, x), \mu(x, y) \rangle, \end{aligned}$$

using the definitions of -G and <sup>F</sup>. Since <sup>G</sup> implements β,μ, we know that -<sup>G</sup>(x, y) = <sup>F</sup>(β,μ)(x, y) and -G(xy) = <sup>F</sup>(β,μ)(xy). Using all the above, we get that <sup>o</sup> · out(init, x) · out(next(init, x), y) = <sup>β</sup>(1) · <sup>μ</sup>(1, x) · <sup>μ</sup>(x, y) = β(x) · μ(x, y) = β(xy) and <sup>o</sup> · out(init, xy) = β(1) · μ(1, xy) = β(xy). So, Part (O2) of the coherence property holds.

Theorem 23 provides justification for our definition of the coherence property for stream transducers (recall Definition 20). It says that the definition is exactly appropriate, because it is a necessary and sufficient condition for a stream transducer to have a stream transduction as its denotation. In other words, the coherence property characterizes the transducers have a well-defined denotational semantics in terms of transductions. It offers this guarantee of correctness without limiting their expressive power as implementations of transductions.

**Theorem 24 (Expressive Completeness).** Let A and B be monoids, and β,μ be a stream transduction in STrans(A, B). There exists a coherent stream transducer that implements β,μ.

*Proof.* Recall from Definition 8 that the monotonicity witness function μ satisfies the following property: β(x)· μ(x, y) = β(xy) for every x, y <sup>∈</sup> A. Now, we define the transducer <sup>G</sup> = (St, init, <sup>o</sup>, next, out) as follows: St <sup>=</sup> A, init = 1, <sup>o</sup> <sup>=</sup> β(1), next(s, x) = s · x and out(s, x) = μ(s, x) for every state s <sup>∈</sup> St and input x <sup>∈</sup> A. The following properties hold for every <sup>s</sup> <sup>∈</sup> St and x1,...,x<sup>n</sup> ∈ A∗:

$$\mathfrak{gnext}(s, \langle x\_1, \dots, x\_n \rangle) = s \cdot x\_1 \cdot \dots \cdot x\_n \quad \text{and} \tag{5}$$

$$
\mathsf{F}\langle\mathsf{o}\rangle \cdot \mathsf{eout}(\mathsf{init}, \langle x\_1, \dots, x\_n \rangle) = \mathsf{F}(\beta, \mu)(\langle x\_1, \dots, x\_n \rangle) \tag{6}
$$

Both these properties are shown by induction on the sequence x1,...,x<sup>n</sup>. It follows that -G(¯x) = o · eout(init, x¯) = <sup>F</sup>(β,μ)(¯x) for every ¯x <sup>∈</sup> A∗. So, <sup>G</sup> implements the transduction β,μ. Finally, <sup>G</sup> is coherent by Theorem 23.

Theorem 24 assures us that the abstract computational model of coherent stream transducers is expressive enough to implement any stream transduction. For this reason, we will be using stream transducers as the basic programming model for describing streaming computations.

**Example 25 (Correctness of Flatten).** Using induction, we will show that the transducer <sup>G</sup> <sup>=</sup> Flatten(A)=(Unit, , <sup>1</sup><sup>A</sup>, next, out) implements the transduction π, μ <sup>=</sup> *flatten*(A) for a monoid A (recall Examples 12 and 17). We show by induction that -<sup>G</sup>(¯x) = <sup>F</sup>(π, μ)(¯x) for every ¯x <sup>∈</sup> FSeq(A)∗. For the base case, we have that -<sup>G</sup>(ε) = 1A and <sup>F</sup>(π, μ)(ε) = π(ε) <sup>=</sup> 1A. Now,

$$\begin{aligned} [\mathcal{G}](\bar{x} \cdot \langle y \rangle) &= [\mathcal{G}](\bar{x}) \cdot \langle \text{out}(\text{genxt}(\text{init}, \bar{x}), y) \rangle & & [\text{def. of } [\mathcal{G}]] \\ &= \mathsf{F}(\pi, \mu)(\bar{x}) \cdot \langle \pi(y) \rangle & & [\text{I.H. and def. of } \text{out}] \\ &= \mathsf{F}(\pi, \mu)(\bar{x}) \cdot \langle \mu(\pi(\bar{x}), y) \rangle & & [\text{def. of } \mu] \\ &= \mathsf{F}(\pi, \mu)(\bar{x} \cdot \langle y \rangle) & & [\text{def. of } \mathsf{F}] \end{aligned}$$

for all ¯x <sup>∈</sup> FSeq(A)<sup>∗</sup> and y <sup>∈</sup> FSeq(A). We have thus proved that Flatten(A) is correct: its denotation is equal to the intended semantics.

**Example 26 (Correctness of Split).** We will establish that the transducer for splitting in batches is correct, namely that <sup>G</sup> <sup>=</sup> Split(r)=(A, <sup>1</sup><sup>A</sup>, ε, next, out) implements r<sup>1</sup>, μ <sup>=</sup> *split*(r) for a splitter r = (r<sup>1</sup>, r<sup>2</sup>) for the monoid A (recall Examples 13 and 18). Using the properties of splitters and an argument by induction, we obtain that gnext(init, <sup>x</sup>¯) = <sup>r</sup>2(π(¯x)) for every ¯x <sup>∈</sup> A∗. We show by induction that -<sup>G</sup>(¯x) = <sup>F</sup>(r1, μ)(¯x) for every ¯x <sup>∈</sup> A∗. For the base case, we have that -<sup>G</sup>(ε) = ε and <sup>F</sup>(r1, μ)(ε) = r1(1A) <sup>=</sup> ε. Now,

$$\begin{aligned} [\mathcal{G}](\bar{x}\cdot \langle y \rangle) &= [\mathcal{G}](\bar{x}) \cdot \langle \text{out}(\text{genxt}(\text{init}, \bar{x}), y) \rangle & & [\text{Equation (4)}] \\ &= \mathsf{F}(r\_1, \mu)(\bar{x}) \cdot \langle \text{out}(r\_2(\pi(\bar{x})), y) \rangle & & [\text{I.H. and previous claim}] \\ &= \mathsf{F}(r\_1, \mu)(\bar{x}) \cdot \langle r\_1(r\_2(\pi(\bar{x})) \cdot y) \rangle & & [\text{def. of out}] \\ &= \mathsf{F}(r\_1, \mu)(\bar{x}) \cdot \langle \mu(\pi(\bar{x}), y) \rangle & & [\text{def. of } \mu] \\ &= \mathsf{F}(r\_1, \mu)(\bar{x} \cdot \langle y \rangle) & & [\text{def. of } \mathsf{F}] \end{aligned}$$

for all ¯x <sup>∈</sup> A<sup>∗</sup> and y <sup>∈</sup> A. We have thus established that Split(r) is correct: its denotation is equal to the intended semantics.

# **5 Combinators for Deterministic Dataflow**

We consider four dataflow combinators: (1) the *lifting* of pure morphisms to streaming computations, (2) *serial composition* for exposing pipeline parallelism, (3) *parallel composition* for exposing task-based parallelism, and (4) *feedback composition* for describing computations whose current output depends on previously produced output. The combinators are defined both for stream transductions (semantic objects) and for stream transducers (programs). Table 1 shows the definitions. The lifting of pure morphisms is implemented with a stateless transducer (i.e., the state space is a singleton set). Both parallel and serial composition are implemented using a product construction on transducers. In the case of parallel composition, each component computes independently. In the case of serial composition, the output of the first component is passed as input to the second component. In the case of feedback composition, the computation proceeds in well-defined rounds in order to prevent divergence.

We prove a precise correspondence between the semantics-level and programlevel combinators for all cases: lifting (Proposition 27), parallel composition (Propsition 28), serial composition (Proposition 29), and feedback composition (Proposition 30). These are essentially *correctness properties* for the implementations of the combinators Lift, Par, Serial, Loop. They establish that our typed framework is appropriate for the modular specification of complex streaming computations, as it can support composition constructs that are essential for parallelization and distribution.

**Proposition 27 (Lifting).** Let h : A <sup>→</sup> B be a monoid homomorphism. Then, Lift(h) is a coherent transducer and it implements the transduction lift(h).

**Proposition 28 (Parallel Composition).** Let <sup>A</sup><sup>1</sup>, <sup>A</sup><sup>2</sup>, <sup>B</sup><sup>1</sup>, <sup>B</sup><sup>2</sup> be monoids, β<sup>1</sup>, μ<sup>1</sup> : STrans(A<sup>1</sup>, B<sup>1</sup>) and β<sup>2</sup>, μ<sup>2</sup> : STrans(A<sup>2</sup>, B<sup>2</sup>) be transductions, and <sup>G</sup><sup>1</sup> : <sup>G</sup>(A<sup>1</sup>, B<sup>1</sup>) and <sup>G</sup><sup>2</sup> : <sup>G</sup>(A<sup>2</sup>, B<sup>2</sup>) be transducers.

(1) Implementation: If <sup>G</sup><sup>1</sup> implements β<sup>1</sup>, μ<sup>1</sup> and <sup>G</sup><sup>2</sup> implements β<sup>2</sup>, μ<sup>2</sup>, then Par(G<sup>1</sup>, <sup>G</sup>2) implements β<sup>1</sup>, μ<sup>1</sup>β<sup>2</sup>, μ<sup>2</sup>.



(2) Coherence: If <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup> are coherent, then so is Par(G1, <sup>G</sup>2).

*Proof.* Notice that Part (2) follows immediately from Part (1) and Theorem 23. Define f <sup>=</sup> -Par(G1, <sup>G</sup>2) and β,μ <sup>=</sup> β1, μ1β2, μ2. We will show that f( ¯w) = <sup>F</sup>(β,μ)( ¯w) for every ¯w <sup>∈</sup> (A1×A2)∗. Suppose that fst is the (elementwise) left projection function. We claim that fst(gnext(s, w¯)) = gnext1(fst(s), fst( ¯w)) and fst(eout(s, <sup>w</sup>¯)) = eout1(fst(s), fst( ¯w)) for all s <sup>∈</sup> St and ¯w <sup>∈</sup> (A1×A2)∗. Both claims are shown by induction on the length of ¯w. With similar arguments we can obtain that snd(f( ¯w)) = -<sup>G</sup><sup>2</sup>(snd( ¯w)) for every ¯w <sup>∈</sup> (A<sup>1</sup>×A<sup>2</sup>)∗. It can be shown by induction that fst(F(β,μ)( ¯w)) = <sup>F</sup>(β1, μ<sup>1</sup>)(fst( ¯w)) and snd(F(β,μ)( ¯w)) = <sup>F</sup>(β1, μ<sup>1</sup>)(snd( ¯w)) for all ¯<sup>w</sup> <sup>∈</sup> (A<sup>1</sup> <sup>×</sup> <sup>A</sup><sup>2</sup>)∗. In order to establish that <sup>f</sup>( ¯w) = <sup>F</sup>(β,μ)( ¯w), it suffices to show that fst(f( ¯w)) = fst(F(β,μ)( ¯w)) and snd(f( ¯w)) = snd(F(β,μ)( ¯w)). Given the claims shown previously, these equalities are equivalent to -<sup>G</sup><sup>1</sup>(fst( ¯w)) = <sup>F</sup>(β1, μ<sup>1</sup>)(fst( ¯w)) and -<sup>G</sup><sup>2</sup>(snd( ¯w)) = <sup>F</sup>(β2, μ<sup>2</sup>)(snd( ¯w)) respectively. These equalities follow from the assumptions that G<sup>1</sup> implements β1, μ<sup>1</sup> and <sup>G</sup><sup>2</sup> implements β2, μ<sup>2</sup>.

**Proposition 29 (Serial Composition).** Let <sup>A</sup>, <sup>B</sup>, <sup>C</sup> be monoids, β1, μ<sup>1</sup> : STrans(A, B) and β2, μ<sup>2</sup> : STrans(B,C) be transductions, and <sup>G</sup><sup>1</sup> : <sup>G</sup>(A, B) and <sup>G</sup><sup>2</sup> : <sup>G</sup>(B,C) be transducers.


*Proof.* Part (2) follows easily from Part (1) and Theorem 23. In order to prove Part (1) we have to first establish a number of preliminary facts. We define the function <sup>M</sup><sup>2</sup> : <sup>A</sup><sup>∗</sup> <sup>→</sup> <sup>A</sup> as follows: <sup>M</sup>2(ε) = 1, <sup>M</sup>2(x) = <sup>x</sup> for <sup>x</sup> <sup>∈</sup> <sup>A</sup>, and <sup>M</sup>2(x, y · <sup>z</sup>¯) = xy · <sup>z</sup>¯ for x, y <sup>∈</sup> <sup>A</sup> and ¯<sup>z</sup> <sup>∈</sup> <sup>A</sup>∗. We write <sup>G</sup> to denote <sup>G</sup><sup>1</sup> G2.

$$\mathsf{fst}(\mathsf{gnext}(s,\bar{x})) = \mathsf{gnext}\_1(\mathsf{fst}(s),\bar{x}) \text{ for all } s \in \mathsf{St} \text{ and } \bar{x} \in A^\* \tag{7}$$

$$\mathsf{snd}(\mathsf{gnext}(s,\bar{x})) = \mathsf{gnext}\_2(\mathsf{snd}(s), \mathsf{eout}\_1(\mathsf{fst}(s),\bar{x})) \text{ for all } s \in \mathsf{St} \text{ and } \bar{x} \in A^\* \tag{8}$$

$$\left[\mathcal{G}\right](\bar{x}) = \mathsf{M}\_2(\left[\mathcal{G}\_2\right](\left[\mathcal{G}\_1\right](\bar{x}))) \text{ for all } \bar{x} \in A^\* \tag{9}$$

$$\mathsf{F}(\beta,\mu)(\bar{x}) = \mathsf{M}\_2(\mathsf{F}(\beta\_2,\mu\_2)(\mathsf{F}(\beta\_1,\mu\_1)(\bar{x}))) \text{ for all } \bar{x} \in A^\* \tag{10}$$

where β,μ <sup>=</sup> β<sup>1</sup>, μ<sup>1</sup>β<sup>2</sup>, μ<sup>2</sup>. All four claims above are proved by induction on the sequence ¯x. Equations (7) and (8) are needed to prove Equation (9). Now, we will establish that <sup>G</sup> implements β,μ. Indeed, we have that

$$\begin{aligned} [\mathcal{G}](\bar{x}) &= \mathsf{M}\_2([\mathcal{G}\_2]([\mathcal{G}\_1](\bar{x}))) & \text{[Equation (9)]}\\ &= \mathsf{M}\_2([\mathcal{G}\_2](\mathsf{F}(\beta\_1, \mu\_1)(\bar{x}))) & [\mathcal{G}\_1 \text{ implements } \langle \beta\_1, \mu\_1 \rangle] \\ &= \mathsf{M}\_2(\mathsf{F}(\beta\_2, \mu\_2)(\mathsf{F}(\beta\_1, \mu\_1)(\bar{x}))) & [\mathcal{G}\_2 \text{ implements } \langle \beta\_2, \mu\_2 \rangle] \\ &= \mathsf{F}(\beta, \mu)(\bar{x}) & [\text{Equation (10)}] \end{aligned}$$

for every ¯x <sup>∈</sup> A∗. So, we conclude that <sup>G</sup> implements β,μ.

Let us give an example of how to construct complex computations from simpler ones using the dataflow combinators. Let A, B be sets and op : A <sup>→</sup> B

be a function. We want to describe a streaming computation with two input channels, both of type FBag(A), and one output channel of type FBag(B). The computation transforms both input channels in the same way, namely by applying the function op to each element. This gives two output substreams, both of type FBag(B), that are merged into the output stream. The function op : A <sup>→</sup> B lifts to a monoid homomorphism op : FBag(A) <sup>→</sup> FBag(B), given by op(x) = {op(a) <sup>|</sup> a <sup>∈</sup> x} for every multiset x. The streaming computation described previously can be visualized using the dataflow graph shown below.

Each edge of the graph represents a communication channel along which a stream flows, and it is annotated with the type of the stream. The dataflow graph above represents the transducer <sup>G</sup> <sup>=</sup> Serial(Par(Lift(op), Lift(op)), Merge), where Merge : <sup>G</sup>(FBag(A)×FBag(A), FBag(A)) is the transducer of Example 16. From Propositions 27, 29 and 28 we obtain that G implements the transduction (lift(op) lift(op)) merge, where merge is described in Example 11.

We will now consider the feedback combinator, which introduces cycles in the dataflow graph. One consequence of cyclic graphs in the style of Kahn-MacQueen [60] is that divergence can be introduced, that is, a finite amount of input can cause an operator to enter an infinite loop. For example, consider the transducer Merge : <sup>G</sup>(FBag(A) <sup>×</sup> FBag(A), FBag(A)) of Example 16. The figure below visualizes the dataflow graph, where the output channel of Merge is connected to one of its input channels, thus forming a feedback loop.

Suppose that the singleton input {a} is fed to the input of the dataflow graph above, which corresponds to the first input channel of Merge. This will cause Merge to emit {a}, which will be sent again to the second input channel of Merge. Intuitively, this will cause the computation to enter an infinite loop (divergence) of consuming and emitting {a}. This behavior is undesirable in systems that process data streams, because divergence can make the system unresponsive. For this reason, we will consider here a form of feedback that eliminates this problem by ensuring that the computation of a feedback loop proceeds in a *sequence of rounds*. This avoid divergence, because the computation always makes progress by moving from one round to the next, as dictated by the input data. We describe this organization in rounds by requiring that the programmer specifies a *splitter* (recall Example 18). The splitter decomposes the input stream into *batches*, and one round of computation for the feedback loop corresponds to consuming one batch of data, generating the corresponding output batch, and sending the output batch along the feedback loop to be available for the next round of processing. This form of feedback allows flexibility in specifying what constitutes a single *batch* (and thus a single *round*), and therefore generalizes the feedback combinator of Synchronous Languages such as Lustre [31].

**Proposition 30 (Feedback Composition).** Let A and B be monoids, β,μ : STrans(A, B) be a transduction, <sup>G</sup> : <sup>G</sup>(A, B) be a transducer, and r = (r1, r2) be a splitter for A (see Example 13).

(1) Implem.: If <sup>G</sup> implements β,μ, then Loop(G, r) implements *loop*(β, μ, r).

(2) Coherence: If <sup>G</sup> is coherent, then so is Loop(G, r).

*Proof.* We leave to the reader the proofs that Split (Example 18) implements *split* and that Flatten (Example 17) implements *flatten*. Given Proposition 29, it suffices to show that G- <sup>=</sup> LoopB(G) implements γ,ν <sup>=</sup> *loopB*(β,μ). Since G is of type <sup>G</sup>(FSeq(A), FSeq(B)) it suffices to define the transition and output functions on singleton sequences (as done in Table 1), because there is a unique way to extend them so that G is coherent. It remains to show that -G- (¯x) = <sup>F</sup>(γ,ν)(¯x) for every ¯x <sup>∈</sup> FSeq(A)∗. The base case is easy, and for the step case it suffices to show that out- (gnext- (init- , x¯), y) = ν(π(¯x), y) for every ¯x <sup>∈</sup> FSeq(A)<sup>∗</sup> and y <sup>∈</sup> FSeq(A). As we discussed before, gnext and out can be viewed as being defined on elements of A rather than sequences of FSeq(A), so we can equivalently prove that out- (gnext- (init- ,a1,...,a<sup>n</sup>), a<sup>n</sup>+1) = ν(a1,...,a<sup>n</sup>, a<sup>n</sup>+1) with each <sup>a</sup><sup>i</sup> an element of <sup>A</sup>. Given that <sup>G</sup> implements β,μ, the key observation to finish the proof is gnext- (init- ,a1,...,a<sup>n</sup>) = gnext(init,a1, b<sup>0</sup>,...,a<sup>n</sup>, b<sup>n</sup>−<sup>1</sup>), b<sup>n</sup>, where γ(a1,...,a<sup>n</sup>) = b0, b1,...,b<sup>n</sup>.

**Example 31.** For an example of using the feedback combinator, consider the transduction β,μ which adds two input streams of numbers pointwise. That is, <sup>β</sup> : FSeq(N) <sup>×</sup> FSeq(N) <sup>→</sup> FSeq(N) is defined by <sup>β</sup>(x1x<sup>2</sup> ...x<sup>m</sup>, y1y<sup>2</sup> ...y<sup>n</sup>) = 0(x<sup>1</sup> <sup>+</sup> <sup>y</sup><sup>1</sup>)(x<sup>2</sup> <sup>+</sup> <sup>y</sup><sup>2</sup>)...(x<sup>k</sup> <sup>+</sup> <sup>y</sup><sup>k</sup>) where <sup>k</sup> = min(m, n). Additionally, consider the trivial splitter r = (r1, r<sup>2</sup>) for sequences where each batch is a singleton: <sup>r</sup><sup>1</sup>(x<sup>1</sup> ...x<sup>n</sup>) = x<sup>1</sup>,...,x<sup>n</sup> and <sup>r</sup><sup>2</sup>(x<sup>1</sup> ...x<sup>n</sup>) = <sup>ε</sup>. We use this splitter to enforce that each batch is a single element and that each round of the computation involves consuming one element. Finally, the transduction *loop*(β, μ, r) = γ,ν describes the *running sum*, that is, <sup>γ</sup>(x<sup>1</sup> ...x<sup>n</sup>)=0x<sup>1</sup>(x<sup>1</sup>+x<sup>2</sup>)...(x<sup>1</sup>+···+x<sup>n</sup>).

The dataflow combinators of this section could form the basis of query language design. The StreamQRE language [10,84] and related formalisms [9,11,12, 14] are based on a set of combinators for efficiently processing linearly-ordered streams (e.g., time series [3, 4]). Extending a language like StreamQRE to the typed setting of stream transductions is an interesting research direction.

# **6 Algebraic Reasoning for Optimizing Transformations**

Our typed denotational framework can be used to validate optimizing transformations using algebraic reasoning. This amounts to establishing that the original transducer is equivalent to the optimized one. A fundamental approach for showing equivalence of composite transducers is to establish algebraic laws between basic building blocks, and then use algebraic rewriting.

As a concrete example, consider the per-key streaming aggregation of Example 10, which is described by the transduction reduce(K, op) : STrans(FBag(K <sup>×</sup> V ), FMap(K, V )), where K is the set of keys, V is the set of values, and op : V <sup>×</sup> V <sup>→</sup> V is an associative and commutative aggregation operation. Let h : K → {1,...,n} be a hash function for the keys, and define K<sup>h</sup> <sup>i</sup> <sup>=</sup> <sup>h</sup>−1(i) = {k <sup>∈</sup> K <sup>|</sup> h(k) = i} for every i. Consider two variants of the merging operation of Example 11: (1) kmerge(h) merges n input streams of types FBag(K<sup>h</sup> <sup>1</sup> <sup>×</sup><sup>V</sup> ), . . . , FBag(K<sup>h</sup> <sup>n</sup> <sup>×</sup><sup>V</sup> ) respectively into an output stream of type FBag(<sup>K</sup> <sup>×</sup><sup>V</sup> ), and (2) mmerge(h) merges n input streams of types FMap(K<sup>h</sup> <sup>1</sup> , V ), . . . , FMap(K<sup>h</sup> <sup>n</sup>, V ) respectively into an output stream of type FMap(K, V ). We also consider the transduction ksplit(h) that partitions an input stream of type FBag(K <sup>×</sup> V ) into n output substreams of types FBag(K<sup>h</sup> <sup>1</sup> <sup>×</sup> <sup>V</sup> ), . . . , FBag(K<sup>h</sup> <sup>n</sup> <sup>×</sup> <sup>V</sup> ) respectively. Using elementary set-theoretic arguments, the following equalities can be established: ksplit(h) kmerge(h) = *id* and

$$\text{knmerge}(h) \gg \text{rd}(K, \textsf{op}) = (\text{rd}(K\_1^h, \textsf{op}) \parallel \dots \parallel \text{rd}(K\_n^h, \textsf{op})) \gg \text{mmerge}(h),$$

where rd abbreviates reduce. Next, we consider the corresponding transducers KSplit(h), KMerge(h), Id, Reduce(K, op) (abbreviation Rd) and MMerge(h) and establish that they implement the respective transductions. This can be shown with induction proofs as shown earlier in Example 25 and Example 26. Using these facts and the propositions of Sect. 5, the equalities between transductions shown earlier give the following equations (equivalences) between transducers: KSplit(h) KMerge(h) <sup>≡</sup> Id and

$$\mathsf{KMegrge}(h) \gg \mathsf{Rd}(K, \mathsf{op}) \equiv (\mathsf{Rd}(K\_1^h, \mathsf{op}) \parallel \dots \parallel \mathsf{Rd}(K\_n^h, \mathsf{op})) \gg \mathsf{MMegrge}(h).$$

Using these equations, we can establish the following optimizing transformation for *data parallelization*, which is useful when processing high-rate data streams.

$$\begin{split} \mathsf{Reduce}(K,\mathsf{op}) &\equiv \mathsf{ld} \,\,\mathsf{geqslant}\,\mathsf{Reduce}(K,\mathsf{op}) \\ &\equiv \mathsf{KSp1.it}(h) \,\,\mathsf{N}\,\mathsf{merge}(h) \,\,\mathsf{geqslant}\,\mathsf{Reduce}(K,\mathsf{op}) \\ &\equiv \mathsf{KSp1.it}(h) \,\,\mathsf{N}\,\mathsf{(Rd}(K\_1^h,\mathsf{op}) \,\,\|\,\cdot\,\cdots \,\|\,\,\mathsf{Rd}(K\_n^h,\mathsf{op})\right) \,\,\,\mathsf{N}\,\mathsf{merge}(h) .\end{split}$$

The above equation illustrates our proposed style of reasoning for establishing the soundness of optimizing streaming transformations: (1) prove equalities between transductions using elementary set-theoretic arguments, (2) prove that the transducers (programs) implement the transductions (denotations) using induction, (3) translate the equalities between transductions into equivalences between transducers using the results of Sect. 5, and finally (4) use algebraic reasoning to establish more complex equivalences.

The example of this section is simple but illustrates two key points: (1) our data types for streams (monoids) capture important invariants about the streams that enable transformations, and (2) useful program transformations can be established with denotational arguments that require an appropriate notion of transduction. This approach opens up the possibility of formally verifying the wealth of optimizing transformations that are used in stream processing systems. The papers [54, 101] describe several of them, but use informal arguments that rely on the operational intuition about streaming computations. Our approach here, on the other hand, relies on rigorous denotational arguments.

The equational axiomatizations of arrows [56] and traced monoidal categories [58] are relevant to our setting, but would require adaptation. An interesting question is whether a *complete axiomatization* can be provided for the basic dataflow combinators of Sect. 5, similarly to how Kleene Algebra (KA) [62, 63] and its extensions [49,64,79,83] (as well as other program logics [65,66,78,80–82]) capture properties of imperative programs at the propositional level. We also leave for future work the development of the coalgebraic approach [96–98] for reasoning about the equivalence of stream transducers. We have already defined a notion of bisimulation in Sect. 4, which could give an alternative approach for proving equivalence using coinduction on the transducers.

# **7 Related Work**

Sect. 1 contains several pointers to related literature for stream processing. In this section, we will focus on prior work that specifically addresses aspects of formal semantics for streaming computation.

The seminal work of Gilles Kahn [59] is exemplary in its rigorous treatment of denotational semantics for a language of *deterministic dataflow* graphs of independent processes, which access their input channels using blocking read statements and the output channels using nonblocking write statements. The language Lustre [31] is a synchronous restriction of Kahn's model, which introduces the semantic idea of a clock for specifying the rate of a stream. Other notable synchronous formalisms are the language Signal [21, 72] and Esterel [22, 28], and the synchronous dataflow graphs of [73] and [24]. These formalisms are all deterministic, in the sense the the output is determined purely by the input data. Nondeterminism creates unavoidable semantic complications [30].

The CQL language [16] is a streaming extension of a *relational* database language with additional constructs for time-based windowing. The denotational semantics of CQL [17] can be reconstructed and greatly simplified within our framework using the notion of stream described in Example 7 (finite time-varying multisets). There are several works that deal with the semantics of specific language constructs (e.g., windows), notions of time, punctuations and disordered streams, but do not give a mathematical description of the overall streaming computation [5, 7, 25, 44, 67, 75, 76, 109].

The literature on Functional Reactive Programming (*FRP*) [34, 46, 47, 55, 68, 69, 93, 103] is closely related to the deterministic dataflow formalisms mentioned earlier. The main abstractions in FRP are signals and event sequences, which are linearly ordered data. Processing unordered data (e.g., multisets and maps) and extracting data parallelism (e.g., the per-key aggregation of Sect. 6) require a data model that goes beyond linear orders. In particular, the axioms of *arrows* [56] (often used in FRP) cannot prove the soundness of the optimizing transformation of Sect. 6, which requires reasoning about multisets.

The idea of using *types* to classify streams has been recently explored in [85] (see also [13]), but only for a restricted class of types that correspond to partial orders. No general abstract model of computation is presented in [85], and many of the examples in this paper cannot be adequately accomodated.

The mathematical framework of *coalgebras* [97] has been used to describe streams [98]. One advantage of this approach is that proofs of equivalence can be given using the proof principle of coinduction [96], which in many cases offers a useful alternative to proofs by induction. This line of work mostly focuses on infinite sequences of elements, whereas here we focus on the transformation of streams of data that can be of various different forms (not just sequences).

The idea to model the input/output of automata using monoids has appeared in the *algebraic theory* of automata and transducers. Monoids (non-free, e.g. A<sup>∗</sup> <sup>×</sup> B∗) have been used to generalize automata from recognizers of languages to recognizers of relations [45], which are sometimes called *rational transducers* [100]. Our focus here is on (deterministic) functions, as models that recognize relations can give rise to the Brock-Ackerman anomaly [30]. The automata models (with inputs from a free monoid A∗) most closely related to our stream transducers are deterministic: Mealy machines [87], Moore machines [90], sequential transducers [48, 95], and sub-sequential transducers [102]. The concept of coherence that we introduce here (Definition 20) does not arise in these models, because they do not operate on input batches. An algebraic generalization of a deterministic acceptor is provided by a *right monoid action* δ : St <sup>×</sup> A <sup>→</sup> St (see page 231 of [100]), which satisfies the following properties for all s <sup>∈</sup> St and x, y <sup>∈</sup> A: (1) δ(s, 1) = s, and (2) δ(δ(s, x), y) = δ(s, xy). These properties look similar to (N1) and (N2) of Definition 20. They are, however, too restrictive for our stream transducers, as they would falsify Theorem 23.

# **8 Conclusion**

We have presented a typed semantic framework for stream processing, based on the idea of abstracting data streams as elements of algebraic structures called *monoids*. Data streams are thus classified using monoids as *types*. Stream transformations are modeled as monotone functions, which are organized by input/output type. We have adapted the classical model of string transducers to our setting, and we have developed a general theory of streaming computation with a formal denotational semantics. The entire technical development in this paper is constructive, and therefore lends itself well to formalization in a proof assistant such as Coq [23,35,106]. Our framework can be used for the formalization of streaming models, and the validation of subtle optimizations of streaming programs (e.g., Sect. 6), such as the ones described in [54, 101]. We have restricted our attention in this paper to *deterministic* streaming computation, in the sense that the behaviors that we model have predictable and reproducible results. Nondeterminism causes fundamental semantic difficulties [30], and it is undesirable in applications where repeatability is important.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Connecting Higher-Order Separation Logic to a First-Order Outside World**

William Mansky<sup>1</sup> , Wolf Honor´e<sup>2</sup> , and Andrew W. Appel<sup>3</sup>

<sup>1</sup> University of Illinois at Chicago, Chicago, IL, USA <sup>2</sup> Yale University, New Haven, CT, USA <sup>3</sup> Princeton University, Princeton, NJ, USA

**Abstract.** Separation logic is a useful tool for proving the correctness of programs that manipulate memory, especially when the model of memory includes higher-order state: Step-indexing, predicates in the heap, and higher-order ghost state have been used to reason about function pointers, data structure invariants, and complex concurrency patterns. On the other hand, the behavior of system features (e.g., operating systems) and the external world (e.g., communication between components) is usually specified using first-order formalisms. In principle, the soundness theorem of a separation logic is its interface with first-order theorems, but the soundness theorem may implicitly make assumptions about how other components are specified, limiting its use. In this paper, we show how to extend the higher-order separation logic of the Verified Software Toolchain to interface with a first-order verified operating system, in this case CertiKOS, that mediates its interaction with the outside world. The resulting system allows us to prove the correctness of C programs in separation logic based on the semantics of system calls implemented in CertiKOS. It also demonstrates that the combination of interaction trees + CompCert memories serves well as a *lingua franca* to interface and compose two quite different styles of program verification.

**Keywords:** formal verification · verifying communication · modular verification · interaction trees · VST · CertiKOS

# **1 Introduction**

Separation logic allows us to verify programs by stating pre- and postconditions that describe the memory usage of a program. Modern variants include reasoning principles for shared-memory concurrency, invariants of locks and shared data structures, function pointers, rely-guarantee-style reasoning, and various other interesting features of programming languages. To support these features, the "memory" that is the subject of their assertions is not just a map from addresses to values, but something more complex: it may contain "predicates in the heap" to allow reasoning about invariants attached to dynamically allocated objects such as semaphores, it may be step-indexed to allow higher-order assertions, and it may contain various forms of ghost state describing resources that exist only for the purposes of verification. The soundness proof of the logic then relates these decorated heaps to the simple address-map view of memory used in the semantics of the target language.

This works well as long as every piece of the system is verified with respect to decorated heaps, but what if we have multiple verification tools, some of which provide correctness results in terms of undecorated memory (or, still worse, memory with a different set of decorations)? To take advantage of the correctness theorem of a function verified with one of these tools, we will need to translate our decorated memory into an undecorated one, demonstrate that it meets the function's undecorated precondition, and then take the memory output by the function and use it to reconstruct a decorated memory. In this paper, we demonstrate a technique to do exactly that, allowing higher-order separation logics (in this instance, the Verified Software Toolchain) to take advantage of correctness proofs generated by other tools (in this case, the CertiKOS verified operating system). This allows us to remove the separation-logic-level specifications of system calls from our trusted computing base, instead relying on the operating system's proofs of its own calls. In particular, we are interested in functions that do more than just manipulate memory (which is separation logic's specialty)—they communicate with the outside world, which may not know anything about program memory or higher-order state.

```
int main(void) {
  unsigned int n, d; char c;
  n=0;
  c=getchar();
  while (n<1000) {
    d = ((unsigned)c)-(unsigned)'0';
    if (d>=10) break;
    n+=d;
    print int(n);
    putchar('\n');
    c=getchar();
  }
  return 0;
}
```
Fig. 1: A simple communicating program

Consider the program in Figure 1. It repeatedly reads a digit from the console, adds it to the sum of the digits seen so far, and prints the current sum to the console. Although this is a very simple program, it is not a natural fit for separation-logic-based verification tools, which model the behavior of C programs in terms of computation and memory rather than I/O. Several approaches have been suggested for reasoning about I/O in separation logic, for instance by Penninckx et al. [18] and Koh et al. [13]. Using the latter approach, we might specify the behavior of getchar with the Hoare triple {ITree(r ← read; ; k r)} x = getchar() {ITree(k x)}, relating the function call to an external read event: the program before the call to getchar must have permission to perform a sequence of operations beginning with a read, and after the call it has permission to perform the remaining operations (with values that may depend upon the received value). By adding these specifications as axioms to VST's separation logic, we can use standard separation logic techniques to prove the correctness of programs such as the one above. But when we compile and run this C program, putchar and getchar are not axiomatized functions; they are system calls provided by the operating system, which may have an effect on kernel memory, user memory, and of course the console itself. If we prove a specification of this C program using the separation logic rules for putchar and getchar, what does that tell us about the behavior of the program when it runs? For programs without external calls, we can answer this question with the soundness proof of the logic. To extend this soundness proof to programs with external calls, we must relate the pre- and postconditions of the external calls to both the semantics of C and their implementations in the operating system.

In this paper, we describe a modular approach to proving soundness of a verification system for communicating programs, including the following elements:


The result is the first soundness proof of a separation logic that can be extended with first-order specifications of system calls. All proofs are formalized in the Coq proof assistant.

To understand the scope of our results, it is important to clarify exactly how much of CertiKOS we have brought into our proofs of correctness for C programs, and how much of a gap remains. The semantics on which we prove the soundness of our separation logic is the standard CompCert semantics of C, extended with the specifications of system calls provided by CertiKOS. Our model does not include the process by which CertiKOS switches from user mode to kernel mode when executing a system call, but rather assumes that CertiKOS implements this process so that the user cannot distinguish it from a normal function call. To prove this assertion rather than assuming it, we would need to transfer our soundness proof to the whole-system assembly-language semantics used by CertiKOS, and interface with not just CertiKOS's system call specifications but also its top-level correctness theorem. We discuss this last gap further in Section 7, but in summary, we prove that our client-side programs and OS-side system calls are correct, while assuming that CertiKOS correctly implements its transition between user mode and kernel mode.

The rest of the paper proceeds as follows. In Section 2, we describe generic ghost state in separation logic. In Section 3, we show how to encode the state of the outside world as ghost state that can only be changed through calls to external functions, allowing us to describe external communication in separation logic specifications. In Section 4, we use this approach to specify console I/O operations, and demonstrate the verification of a simple communicating program. In Sections 5 and 6, we describe the process of verifying the implementation of an external call, by first connecting its VST specification to a first-order specification on memory and then relating that "dry" specification to the functional specification of the same call in CertiKOS. This allows us to state our central theorem, which guarantees that programs verified in VST run correctly given the CertiKOS system call specifications. In Section 7, we address the relationship between user-level events and the actual communication performed by the OS. In Sections 8 and 9, we review related work and summarize our results.

# **2 Background: Ghost State in Separation Logic**

#### **2.1 Ghost Algebras**

The fundamental insight behind ghost state is that if a mathematical object has the same basic properties as a separation logic heap, it can be injected into separation logic as a resource, even if it is not actually present in program memory. This insight was discovered independently by many people [4,3,19], and the "basic properties" required have been characterized in many ways: partial commutative monoids (PCMs), resource algebras, separation algebras, etc. They all include the idea that the ghost state must support an operator, often written as ·, for combining it in the same way heaps are combined by disjoint union, and they require that operator to have some of the properties of heap union (associativity, commutativity) but not all (for instance, it may be possible to combine two identical pieces of ghost state). Crucially, the operator · may be partial, so that the very existence of one piece of state means that another piece cannot possibly exist in the same program (just as ownership of one piece of the heap means that no other thread can hold the same piece). We follow Iris [11] in also including a validity predicate valid that marks out the elements of an algebra that represent well-formed ghost state.

Ghost state appears in the logic in a new kind of assertion, which we write as own, asserting that the current thread owns a certain ghost resource. In the assertion own g a pp, g is an identifier (analogous to a location in the heap), a is an element of the underlying algebra, and pp is a predicate, allowing for a limited form of higher-order ghost state—for instance, we can store separation logic assertions in ghost state to implement global invariants. The key property of the own assertion is that separating conjunction on it corresponds to the · operator of the underlying algebra (see rule own op in Figure 2). By defining different algebras with different operators, we can define different sharing protocols for the ghost state. For instance, if we only want to count the number of times some shared resource is used, the state may be a number and the operator

$$\mathsf{swm\\_op} \xrightarrow[\mathsf{owm\\_g}]{a1\\_a 2 = a3} \begin{array}{c} a1\\_a 2 = a3 \\ a\\_p 2 = p\\_a \\ a\\_p 2 = p\\_a \\ \mathsf{owm\\_up} \end{array} \begin{array}{c} \mathsf{fp\\_up} \mathsf{update\\_g\ } a \; b \\ \mathsf{cwm\\_up} \; a\; p \; p \Rightarrow \mathsf{owm\\_g} \; b\; p \; p \\ \mathsf{cwm\\_op} \; \overline{a\; b} \; \overline{\; \; C\; \;} \; \{Q'\} \qquad Q' \; \exists\; Q \; \; \overline{Q} \end{array}$$

$$\begin{array}{c} \begin{array}{c} P \Rightarrow P' & \{P'\} \ C \; \{Q'\} \qquad Q' \Rightarrow Q \\ \{P\} \ C \; \{Q\} \end{array} \end{array}$$

Fig. 2: Key separation logic rules for ghost state

may be addition; if we want to describe the pattern of sharing more precisely, as with ghost variables, the state may be a pair of the variable's value and a fraction of ownership, with a guarantee that two fractions are only compatible if they agree on the value. More complex sharing patterns correspond to more complicated join operations; for instance, Jung et al. [11] showed that any acyclic state machine can be encoded as ghost state, with the join operation computing the closest common successor of two states. The ghost state is not explicitly referenced by program instructions, but it can be modified at any time via a frame-preserving update: ghost state a can be replaced with b as long as any third party's ghost state c that is consistent with a is also consistent with b, formally expressed as fp update a b <sup>∀</sup>c, a · <sup>c</sup> <sup>⇒</sup> <sup>b</sup> · <sup>c</sup>, where we write <sup>a</sup> · <sup>b</sup> to mean ∃d. a · b = d, i.e., a and b are compatible pieces of ghost state. This framepreserving update is embedded into the logic using a view-shift operator -, as shown in rule own update of Figure 2.

$$\begin{array}{l} \mathtt{x} = \mathtt{0};\\ \mathtt{a} \mathtt{q} \mathtt{u} \mathtt{r} \mathtt{e} \{1\}; \begin{array}{l} \mathtt{x} = \mathtt{0};\\ \mathtt{a} \mathtt{q} \mathtt{u} \mathtt{e} \mathtt{u} \mathtt{e} \mathtt{e} \{1\}; \end{array};\\ \mathtt{x} \mathtt{t} \mathtt{e} \mathtt{e} \mathtt{e} \{1\}; \begin{array}{l} \mathtt{x} ++ ;\\ \mathtt{r} \mathtt{e} \mathtt{e} \mathtt{e} \mathtt{e} \end{array}; \end{array} \end{array}$$

Fig. 3: The increment example

Figure 3 shows the canonical example of a program where ghost state increases the verification power of separation logic. Using concurrent separation logic as originally presented by O'Hearn [17], we can prove that the value of x at the end of the program is at least 0, but we cannot prove that it is exactly 2. This limitation comes from the fact that we can associate an invariant with the lock l, but that invariant cannot express progress properties such as a change in the value of x. We can get around this limitation by adding ghost state that captures the contribution of each thread to x, and then use the invariant to ensure that the value of x is the sum of all contributions. (This approach is due to Ley-Wild and Nanevski [16].) We begin with ghost state that models the central operation of the program:

**Definition 1.** The sum ghost algebra is the algebra (N, +, λn.True) of natural numbers with addition, in which every number is a valid element.

Intuitively, the lock invariant should remember every addition to x, while each individual thread only knows its own contribution. This is actually an instance of a very general pattern: the reference pattern, in which one party holds a complete and correct "reference" copy of some ghost state, and one or more other parties hold possibly incomplete "partial" copies. Because the reference copy must always be completely up to date, the partial copies cannot be modified without access to the reference copy. When all the partial copies are gathered together, they are guaranteed to accurately represent the state of the data structure. The reference ghost algebra is built as follows:

**Definition 2.** Given a ghost algebra G, we define the positive ghost algebra on G, written pos(G), as an algebra whose carrier set is (Π × G) ∪ {⊥}, where Π is a set of shares.<sup>4</sup> An element of pos(G) is valid if it has a nonempty share, and the operator · is defined such that (π1, a1) · (π2, a2)=(π<sup>1</sup> + π2, a<sup>1</sup> · a2) and x · ⊥ = x for all x.

The positive ghost algebra contains pairs of a nonempty share and an element of G, with join defined pointwise, representing partial ownership of an element of G. Total ownership of the element can be recovered by combining all of the pieces, obtaining a full share, and combining all of the G elements accordingly.

**Definition 3.** Given a ghost algebra G, let the reference ghost algebra on G, written ref(G), be the algebra (pos(G) × (G ∪ ⊥), ·, {(p, r) | r = ⊥ ∨ p r}), where (p1, r) · (p2, <sup>⊥</sup>)=(p<sup>1</sup> · <sup>p</sup>2, r), and <sup>p</sup> <sup>r</sup> <sup>∃</sup>q. p · <sup>q</sup> = ( , r).

An element of the reference ghost algebra is a pair of a positive share of G (partial element) and an optional reference element of G, where the reference element is unique and indivisible, and the partial element must be completable to the reference element if one exists. This ensures that when all the shares are gathered, i.e., when the partial element is ( , a), then it exactly matches the reference element, but no changes can be made to the partial element without the reference element present. To more clearly relate elements of this algebra to their intended meanings, we write ref r for the reference element (⊥, r) and part s v for the partial element ((s, v), ⊥).

Now we can formalize our intuition about what each party knows about the sum. We let the lock invariant for l be <sup>∃</sup>v. x <sup>→</sup> <sup>v</sup> <sup>∗</sup> own <sup>g</sup> (ref <sup>v</sup>), and start each thread with a partial element part ½ 0. When each thread acquires its lock and increments <sup>x</sup>, it also uses the own update rule to increment its partial ghost state. At the end of the program, we can combine the two partial elements to obtain part 2, which in combination with the lock invariant is sufficient to guarantee that the value of x is 2. This pattern can be used for a wide range of applications

<sup>4</sup> We use tree shares [1, Chapter 41] in the Coq proofs, but for simplicity of presentation in this paper we will use fractional shares: ⊥ is the empty share, ½ is a half share, and is the full share.

by replacing the sum algebra with one appropriate to the application or data structure in question. We will also make use of it later to model the state of the external world as a separation logic resource.

#### **2.2 Semantics of Ghost State**

To support the use of ghost state in a separation logic, we need to make two main changes in the construction of the logic. First, we need to extend the underlying model of the logic with ghost state: rather than being predicates on the heap, our assertions are now predicates on the combination of heap and ghost state. Once ghost state exists in the model, we can give semantics to the own assertion.

Second, we need to change our definition of Hoare triples to allow for the possibility of frame-preserving updates to ghost state at any point in a program's execution. In a ghost-free separation logic, we might define Hoare triples with respect to an operational semantics for the language as follows:

$$\left[\left\{P\right\}\;c\;\left\{Q\right\}\right] \stackrel{\Delta}{=} \forall h, P(h) \Rightarrow (c, h) \rightarrow^\* (\mathsf{done}, h') \Rightarrow Q(h')$$

where (c, h) → (c , h ) means that the program c executed with starting heap h may take a step to a new program c with heap h . For a step-indexed logic, it is more convenient to write this definition inductively:

**Definition 4 (Safety).** A configuration (c, h) is safe for n steps with postcondition Q if:


We can then define {P} c {Q} (at step-index n) to mean that ∀h. P(h) ⇒ (c, h) is safe for n steps with Q.

Once we have added ghost state, our heap h is now a pair (h, g) of physical and ghost state, and between any two steps the ghost state may change. This leads us to a ghost-augmented version of safety.

**Definition 5 (Safety with Ghost State).** A configuration (c, h, g) is safe for n steps with postcondition Q if:


The program must be able to continue executing under any gframe consistent with its current ghost state, but its choice of new ghost state g may depend on the frame. This quantifier alternation captures the essence of ghost state: the ghost state held by the program constrains any other ghost state held by the notional "rest of the system", and may be changed arbitrarily in any way that does not invalidate that other ghost state.

# **3 External State as Ghost State**

An I/O-performing program modifies the state of the outside world. We would like to treat this external state as a kind of ghost state, since it is not in the program's memory and yet can be described by separation logic assertions. At the same time, we would emphatically not like to allow users to make arbitrary frame-preserving updates to external state: the external environment should have complete control of the external state, and the program should never be able to change it except by calling external functions. Furthermore, VST's semantic model (used to prove soundness) already includes an external state element<sup>5</sup>, a black box of arbitrary type that is carried around by the program and passed to the environment at each external call, allowing the effects of external calls to be stateful without explicitly representing their state in program memory. While this external state is present in the operational semantics of VST, prior to the changes we describe it could not be referred to by separation logic assertions and was never instantiated with anything other than the singleton type unit. In this section, we describe how we combine ghost state with the built-in external state to make the external state visible in the separation logic.

Intuitively, external state is just another kind of shared resource, and we should be able to model it with a form of ghost state. However, one of the key features of ghost state is that programs can make arbitrary frame-preserving updates to it, while programs should never be able to modify external state. We can accomplish this using the reference ghost algebra of Section 2: the reference element ref a will be held by the external environment, while the program holds a partial element part a. This ensures that the program cannot make any frame-preserving updates without the reference element, which is only available when the program passes control to the external environment via an external call. It then remains to choose the underlying algebra G of the external state. Different applications may call for external state with different carrier sets and operations, but in the simplest case, the VST user will not want to split or combine the local copy of the external state<sup>6</sup>. In this case, they can pick a type Z and make G the exclusive ghost algebra for Z, which holds only an empty unit element and an indivisible ownership element, preventing the local copy from being divided. Then the user program holds an element part a that cannot be divided or modified, but only passed to the external environment, where a : Z is the current value of the external state. We encapsulate the ghost state construction in an assertion has ext <sup>a</sup> own 0 (part a), where 0 is the identifier reserved for the external ghost state. Now, when verifying a program with external state, the user simply provides the starting state a, and receives in the precondition of the main function the assertion has ext <sup>a</sup>, with no need to use or understand the ghost state mechanism.

<sup>5</sup> Appel et al. [1] call this the *external oracle*, but we refer to it as simply "external state" to avoid confusion with the environment oracles of CertiKOS.

<sup>6</sup> One example of a use case that benefits from nontrivial external state structure is a multithreaded web server in which different threads serve different clients simultaneously; in this case, each thread might have its own piece of the external state.

On the back end, we must still modify VST's semantics to connect the ghost state a to the actual external state, and to prevent the "ghost steps" of the semantics from changing the external state. Recall from Section 2 that in order for a non-terminated configuration (c, h, g) to be safe for a nonzero number of steps, it must be the case that (c, h) → (c , h ) and ∀gframe. g · gframe ⇒ ∃g . g · gframe ∧ (c , h , g ) is safe. To connect the external ghost state to a real external state z , we simply extend this definition to require that gframe include an element (⊥, z ) at identifier 0. This enforces the requirement that the value of the external ghost state always be the same as the value of the external state, and ensures that frame-preserving updates cannot change the value of the external state. Re-proving the separation logic rules of Verifiable C with this new definition of Hoare triple required only minor changes, since internal program steps never change the external ghost state.

When the semantics reaches an external call, the call is allowed to make arbitrary changes to the state consistent with its pre- and postcondition, including changing the value of the external ghost state (as well as the actual external state). We can use has ext assertions in the pre- and postcondition of an external function to describe how that function affects the external state. For instance, we might give a console write function the "consuming-style" specification {has ext(write(v); ; <sup>k</sup>)} write(v) {has ext(k)}, stating that if before calling write(v) the program has permission to write the value <sup>v</sup> and then do the operations in k, then after the call it is left with permission to do k. (We could reverse the pre- and postcondition for a "trace-style" specification, in which the external state records the history of operations performed by the program instead of the future operations allowed.) In this paper, we use interaction trees [13] as a means of describing a collection of allowed traces of external events. Interaction trees can be thought of as "abstract traces with binding"; for instance, we can write x ← read; ;write (x + 1); ; k x to mean "read a value, call it x, write the value x + 1, and then continue to do the actions in k using the same value of x."

In the end, we have a new assertion has ext on external state that works in exactly the way we expect: it can hold external state of any type, it cannot be modified by user code, it can be freely modified by external calls, it always has exactly the same value as the external state already present in VST's semantics, and it exposes no ghost-state functionality to the user. If the user wants more fine-grained control over external state (for instance, to split it into pieces so multiple threads can make concurrent calls to external functions), they can define their own ghost algebra for the state and pass around part elements explicitly, but for the common case, has ext provides seamless separation-logic reasoning about C programs that interact with an external environment.

# **4 Verifying C Programs with I/O in VST**

Once we have separation logic specifications for external function calls, verifying a communicating program is no different from verifying any other program. We demonstrate this with the example program excerpted in Figure 1, shown in

```
{ITree(write list(decimal rep-

                             (i)); ; k)}
void print intr(unsigned int i) {
  unsigned int q,r;
  if (i!=0) {
    q=i/10u;
    r=i%10u;
    print intr(q);
    putchar(r+'0');
  }
}
{ITree(k)}
{ITree(write list(decimal rep(i)); ; k)}
void print int(unsigned int i) {
  if (i==0)
    putchar('0');
  else print intr(i);
}
{ITree(k)}
                                                  {ITree(c ← read; ; main loop(0, c))}
                                                  int main(void) {
                                                    unsigned int n, d; char c;
                                                    n=0;
                                                    c=getchar();
                                                    while (n<1000) {
                                                      d = ((unsigned)c)-
                                                  (unsigned)'0';
                                                      if (d>=10) break;
                                                      n+=d;
                                                      print int(n);
                                                      putchar('\n');
                                                      c=getchar();
                                                    }
                                                    return 0;
                                                  }
                                                  {ITree(done)}
```
Fig. 4: A simple communicating program, with specifications for each function

full in Figure 4. The print intr function uses external calls to putchar to print the decimal representation of its argument, as long as that argument is nonzero; print int handles the zero case as well. The main function repeatedly reads in digits using getchar and then prints the running total of the digits read so far. The ITree predicate is simply a wrapper around the has ext predicate of the previous section (i.e., an assertion on the external ghost state), specialized to interaction trees on I/O operations. We can then write simple specifications for getchar and putchar, using interaction trees to represent external state:

```
{ITree(r ← read; ; k r)} x = getchar() {ITree(k x)}
{ITree(write(x); ; k)} putchar(x) {ITree(k)}
```
Next, we annotate each function with separation logic pre- and postconditions; the program does not manipulate memory, so the specifications only describe the I/O behavior of each function. The effect of print intr is to make a series of calls to putchar, printing the digits of the argument i as computed by the meta-level function decimal rep (where write list([i0;i1; ...;in]) is an abbreviation for the series of outputs write(i0); ; write(i1); ; ...; ; write(in)). When the value of i is 0, print intr assumes that the number has been completely printed, so print int adds a special case for 0 as the initial input. The specification for the main loop is a recursive sequence of read and write operations, taking the running total (which starts at 0) and the most recent input as arguments:

main loop(n, d) if n < 1000 then write list(decimal rep(n + d)); ; c ← read; ; main loop(n + d, c) else done

Using the specifications for putchar and getchar as axioms, we can easily prove the specifications of print intr, print int, and main. (The following sections show how we substantiate these axioms.)

```
{ITree(-
        ← read list(n); ; k -
                             ) ∗ buf → }
x = getchars(buf , n)
{∃vs. length(vs) = n ∧ x = n ∧ ITree(k vs) ∗ buf → vs}
{length(vs) = n ∧ ITree(write list(vs); ; k) ∗ buf → vs}
putchars(buf , n)
{ITree(k) ∗ buf → vs}
```
Fig. 5: Separation logic specifications for I/O calls with memory

More complicated programs may manipulate memory as well as communicating, and we can easily combine the two. For instance, if we want to read or write several characters in a single call, the standard C idiom is to pass a buffer in memory as an argument. Figure 5 shows the specifications for functions putchars and getchars in this style, where each function takes as arguments a buffer to hold the input/output and a number indicating the size of the buffer<sup>7</sup>. The preand postconditions of these functions now involve both the external state and a standard points-to assertion for the buffer. (Note that ← read list(n) is an abbreviation for the series of inputs <sup>0</sup> ← read; ; <sup>1</sup> ← read; ; ...; ; <sup>n</sup>−<sup>1</sup> ← read.)

Figures 6 and 7 show a variant of the previous program that uses these external functions with memory. The print intr function now populates a buffer with the characters to be written and returns the length of the decimal representation of its argument (retval in the postcondition refers to the return value of the function), while print int makes a single call to putchars with the populated buffer. The main function now reads four characters at a time and then processes them one by one, ultimately producing the same output as the previous program. The specifications for putchars and getchars describe changes to both external state and memory, as shown in Figure 5. Proving the specifications for the functions in this program is not any more difficult than in the memoryless case: we define an interaction tree main loop capturing the slightly different pattern of interaction in this program, and then apply the appropriate separation logic rule to each command. The external calls affect both memory and the ITree predicate, while all other commands affect only memory and local variables, as usual.

<sup>7</sup> While these are not standard POSIX I/O functions, they are close to the behavior of POSIX read/write, socket operations, and other common forms of I/O.

```
{length(decimal rep-

                    (i)) ≤ length(contents) ∧
 buf → contents}
int print intr(unsigned int i,
      unsigned char ∗buf) {
  unsigned int q;
  unsigned char r;
  int k = 1;
  if (i!=0) {
    q=i/10u;
    r=i%10u;
    k = print intr(q, buf);
    buf[k] = r+'0';
  }
  return k + 1;
}
     {buf → contents[0...(retval − 1) :=
                       decimal rep-

                                   (i)]}
                                                 {ITree(write list(decimal rep(i)); ; k)}
                                                 void print int(unsigned int i) {
                                                    unsigned char ∗buf = malloc(5);
                                                    if (!buf) exit(1);
                                                    int k;
                                                    if (i==0){
                                                      buf[0] = '0';
                                                      buf[1] = '\n';
                                                      k = 2;
                                                    }
                                                    else{
                                                      k = print intr(i, buf);
                                                      buf[k] = '\n';
                                                      k++;
                                                    }
                                                    putchars(buf, k);
                                                    free(buf);
                                                 }
                                                 {ITree(k)}
```
Fig. 6: A communicating program with memory (part 1)

# **5 Soundness of External-State Reasoning**

The soundness proof of VST [1] describes the guarantees that the Hoare-logic proof of correctness for a C program provides about the actual execution of that program. A C program P is represented as a list P1, ..., P<sup>n</sup> of function definitions in CompCert Clight, a Coq representation of the abstract syntax of C. The program is annotated with a collection of function specifications (i.e., separation logic pre- and postconditions) Γ = Γ1, ..., Γn, one for each function. We then prove that each P<sup>i</sup> satisfies its specification Γi, which we write as Γ P<sup>i</sup> : Γ<sup>i</sup> (note that each function may call on the specification of any function, including itself). The soundness theorem of VST without external function calls is then:

**Theorem 1 (VST Soundness).** Let P be a program with specification Γ. Suppose for every function P<sup>i</sup> there is a proof Γ P<sup>i</sup> : Γ<sup>i</sup> that P<sup>i</sup> satisfies its specification. Then the main function of <sup>P</sup> can run according to the Comp-Cert Clight semantics for any number of steps without getting stuck, and if it terminates then it does so in a state that satisfies its postcondition.

Proof. First, make a nonstandard, ownership-annotated, resource-annotated, stepindexed small-step semantics for Clight. Define Verifiable C's Hoare triple as a shallowly embedded statement about safe executions in this "juicy" semantics. Then show that executions in the juicy semantics erase to corresponding safe executions in Clight's standard "dry" small-step semantics.

```
{ITree(cs ← read list(4); ; main loop-

                                        (0, cs))}
```

```
int main(void) {
  unsigned int n, d; unsigned char c;
  unsigned char ∗buf;
  int i, j;
  n=0;
  buf = malloc(4);
  if (!buf) exit(1);
  i = getchars(buf, 4);
  while (n<1000) {
    for(j = 0; j < i; j++){
      c = buf[j];
      d = ((unsigned)c)-(unsigned)'0';
      if (d>=10) { free(buf); return 0; }
      n+=d;
      print int(n);
    }
    i = getchars(buf, 4);
  }
  free(buf);
  return 0;
}
{ITree(done)}
```
**Corollary 1.** Since null pointer dereferences, integer overflows, etc. are all stuck in CompCert's small-step semantics, this means that a verified program will be free of all of these kinds of errors.

This soundness theorem expresses the relationship between the juicy semantics described by VST's separation logic and the dry semantics under which C programs actually execute<sup>8</sup>. The proof of correctness of a program gives us enough information to construct a corresponding dry execution for each juicy execution<sup>9</sup>. However, we may not have access to the code of external functions, and in some cases (e.g., system calls) they may not even be implemented in C. In this section, we generalize the soundness theorem to include external functions.

Fig. 7: A communicating program with memory (part 2)

<sup>8</sup> Of course, a C program *actually* executes by running machine code, but the relationship between the dry C semantics and the semantics of assembly language is already proved in CompCert, as is assembly-to-machine language [20].

<sup>9</sup> Theorem 1 blurs the line between juicy and dry by saying that a dry execution "terminates in a state that satisfies its postcondition", where the postcondition is stated in separation logic. In the original proof of soundness [1], this is resolved by assuming that the postcondition of main is always true. The techniques we use in this section can also be applied to more refined specifications of main.

In order to prove correctness of a C program with external calls in our separation logic, we must have a pre- and postcondition Γ<sup>i</sup> for each external function. At this level these specifications are taken as axioms, since we do not have access to the code of the external functions. To be able to describe the dry executions of programs that call these functions, we also need simpler specifications on dry states. Each dry external specification contains a pre- and postcondition for the function, which may refer to the memory state, arguments/return values, the external state, and a witness used to provide logical parameters to the pre- and postcondition. The core of our approach is to prove the correspondence between the juicy specification and the dry specification of each external function.

If we can relate every juicy specification to a dry specification, then why bother with the juicy specifications at all? The answer is, not every function can be specified "dry." Higher-order functions in object-oriented patterns, dynamically created locks with self-referential resource invariants, and many other C programming patterns cannot be given simple first-order specifications. But the external functions that correspond to ordinary input/output can be given first-order specifications. Therefore, users can write higher-order object-oriented programs, in which the internal functions have (only) juicy specifications, so long as the external functions have (also) dry specifications. For instance, consider the specification of the putchars function from the previous section:

{length(vs) = n ∧ ITree(write list(vs); ; k) ∗ buf → vs} putchars(buf , n) {ITree(k) ∗ buf → vs}

The pre- and postcondition each make one assertion about memory (that the buffer buf points to the string of bytes vs) and one assertion about the external state<sup>10</sup> (that the interaction tree allows write list(vs) followed by k before the call, and k afterward). The corresponding first-order specification on dry memory and external state is:

$$\begin{array}{c} \text{Pre}((vs,k),(buf,n),m,z) \triangleq \mathsf{length}(vs) = n \land z = (\mathsf{write}.\mathsf{list}(vs); \, ; \, k) \land \\ \qquad \forall i < n. \, m(buf+i) = vs[i] \\ \text{Post}((vs,k),(buf,n),m\_0,m,z) \triangleq m\_0 = m \land z = k \end{array}$$

where (vs, k) is the witness (i.e., the parameters to the specification), buf and n are the arguments passed to the function, m is the current memory, z is the external state, and m<sup>0</sup> in the postcondition is the memory before the call (allowing us to state that memory is unchanged). Of the roughly 210 Linux system calls that are not Linux- or platform-specific, about 140 fall into this pattern, including socket, console, and file I/O, memory allocation, or are simpler informational calls like gethostname that do not involve memory.

Once we have a juicy and a dry specification for a given external function, what is the relationship between them? Intuitively, if the juicy specification for a function f is {P<sup>j</sup>} f(args); {Q<sup>j</sup>}, the Hoare logic proof for a program that calls

<sup>10</sup> ITree is actually an assertion on the *external ghost state*, which is connected to the true external state as described in Section 3, and is erased at the dry level.

f guarantees that Pj is satisfied before every call to f, and relies on Qj holding after each such call returns. To know that the program will run without getting stuck, on the other hand, we must know that the dry precondition P<sup>d</sup> is satisfied before each call, and we can assume that the dry postcondition Q<sup>d</sup> is satisfied after each return. So informally, we need to know that P<sup>j</sup> implies P<sup>d</sup> and that Q<sup>d</sup> implies Q<sup>j</sup> . This cannot be a simple logical implication, however, because P<sup>j</sup> and Q<sup>j</sup> are predicates on juicy memories, while P<sup>d</sup> and Q<sup>d</sup> are predicates on dry memories. A juicy memory jm is a dependent triple (m, φ, pf ), where m is a dry memory, φ is a higher-order, step-indexed memory with ghost state, and pf is a proof of the relationship between m and φ. We can easily extract the dry memory m from a juicy memory (we write this as dry(jm)), but there are many possible φ's that may correspond to a single m: we need to make decisions about ownership information and ghost state that is not present at the CompCert level.

In order to relate the juicy and dry specifications, we must erase the juice from the precondition, P<sup>j</sup> ⇒ Pd, and then reconstruct the juice in the postcondition, Q<sup>d</sup> ⇒ Q<sup>j</sup> . The key to this erasure is that, as explained above, the P<sup>j</sup> and Q<sup>j</sup> for external functions generally make only first-order assertions on memory (memory buffers passed to system calls don't contain higher-order objects such as function pointers and locks). The rest of the memory is implicitly the frame, and will not be changed by the external call. For first-order predicates, erasure is injective, and the associated juicy memory can be uniquely reconstructed once the buffer has been modified. The frame can contain noninjective juice, but we can reuse the same juice in going from Q<sup>d</sup> ⇒ Q<sup>j</sup> that we erased in going from P<sup>j</sup> ⇒ Pd, since the external function does not modify the frame. In practice, the story is not quite so simple: the external function might allocate or free memory, the dry witness (used in P<sup>d</sup> and Qd) must be derived from the juicy witness (used in P<sup>j</sup> and Q<sup>j</sup> ), and so on. We now formalize the details, culminating in Definition 6, the formal correspondence between juicy and dry specifications.

First, we address the problem of reconstructing a juicy memory from a dry memory. While there are many juicy memories that correspond to a given Comp-Cert memory, it is easy to start with a (precondition) juicy memory and change it to reflect (postcondition) modifications to the associated dry memory, as long as those changes fall within certain limits. In particular, a memory location may be newly allocated or deallocated, or its value may be changed while staying at the same permission level, but its permissions should not otherwise be changed<sup>11</sup>. If a dry specification ensures that memory is changed in only (at most) these ways, we say that it safely evolves memory. When a user adds a new set of external functions to VST, this safe evolution property will be one of their proof obligations. As long as an external function satisfies a specification that safely evolves memory, we can always reconstruct the juicy memory after the call by modifying the original juicy memory to reflect the changes to the dry memory. This

<sup>11</sup> Any function that interacts with memory through the standard interface of load, store, alloc, and free will fall within these limits; concurrency operations, such as acquiring or releasing a lock, may not, and proving that lock operations are correctly implemented is outside the scope of this work.

reconstruction captures the effects of the external call on the program's memory; to reflect the changes to the external state, we must also set the external ghost state of the reconstructed juicy memory to match the external state returned by the call. We define a reconstruct operation such that reconstruct(jm, m, z) is a version of the juicy memory jm that has been modified to take into account the changes in the dry memory m and the external state z.

Second, we need a way to transform a juicy witness into the corresponding dry witness. When a user adds a new external call to VST, they must provide a dessicate function that performs this transformation. Fortunately, the dessicate operation usually follows a simple pattern. Components of the witness that are not memory objects are generally identical in their juicy and dry versions. The frame is usually the only memory object in the juicy witness; while it is possible in VST to write a Hoare triple that quantifies over other memory objects explicitly, it is very unusual and runs counter to the spirit of separation logic. Similarly, the postcondition of the dry specification may refer to the memory state before the call (to express properties such as "this call stored value v at location "), but there is rarely a reason to refer to any other memory object. Thus, the dessicate operation for each function can simply discard the frame (juicy) memory and replace it with the dry memory from before the call. This standard dessicate operation works for all external functions shown in this paper.

This leads to the following definition and theorem:

**Definition 6 (Juicy-Dry Correspondence).** A juicy specification (P<sup>j</sup> , Q<sup>j</sup> ) and a dry specification (Pd, Qd) for an external function correspond if, for a suitable dessicate operation:


**Theorem 2 (VST Soundness with External Functions).** Let P be a program with n functions, calling also upon m external functions. The internal functions have (juicy) specifications Γ<sup>1</sup> ...Γ<sup>n</sup> and the external functions have (juicy) specifications Γ<sup>n</sup>+1 ...Γ<sup>n</sup>+<sup>m</sup>. Suppose P is proved correct in Verifiable C—there is a derivation Γ P<sup>1</sup> : Γ1,...,P<sup>n</sup> : Γn. Let D<sup>n</sup>+1,...,D<sup>n</sup>+<sup>m</sup> be dry specifications that safely evolve memory, and that correspond to Γ<sup>n</sup>+1 ...Γ<sup>n</sup>+<sup>m</sup>. Then the main function of <sup>P</sup> can run according to the CompCert C semantics, using D as the semantics of external function calls, for any number of steps without getting stuck, and if it terminates then it satisfies its postcondition.

Proof. We extend the juicy semantics of Theorem 1 with a rule for external calls that uses their juicy pre- and postconditions, and then prove that executions in this semantics erase to safe executions in the dry semantics, using the correspondence to relate juicy and dry behaviors of external calls.

Although this theorem does not explicitly mention external communication, it implies that any I/O operations performed by P conform to the description of allowed communication in the specification of main. This follows from the fact that only external calls can change the external state, and only external calls can communicate with the outside world. Thus, if P performs a sequence of external function calls f1, ..., fn, the external communication performed by P must be consistent with the specifications Df<sup>1</sup> , ..., Df<sup>n</sup> . In the case of the examples above, this means that at any point in a program's execution, its communication so far will be a prefix of the operations allowed by the initial ITree predicate, as desired.

Proving the correspondence between the juicy and dry specifications is the primary proof burden for a VST user who wants to use a new external function in their program. Fortunately, this proof only needs to be done once per external function rather than once per program (as long as the original specification is general enough to be usable in many different programs), and soundness (Theorem 2) has been proved once and for all. As a result, a VST user can prove that their program with external calls runs correctly as follows:


For instance, we have already seen the VST-level specifications for putchars and getchars, and used them to prove correctness of a simple program; we can complete the process with the following lemma.

**Lemma 1.** The juicy specifications of putchars and getchars correspond to their dry specifications.

As a result, we now know that the sample program in Figure 7 runs correctly for any implementation of putchars and getchars that satisfy their dry specifications.

# **6 Connecting VST to CertiKOS**

In the previous section, we showed how to connect a step-indexed separation logic specification of an external function to a "dry" specification on non-step-indexed CompCert memories and external state. This gives us a correctness property for C programs with external functions, but it still treats the dry specifications of the external functions as axioms. In this section, we show how to discharge these axioms by connecting dry specifications to implementations of the corresponding functions in the verified operating system CertiKOS [7].

```
Definition serial_in (port : Z) (st : OSState) : OSState * Z :=
 ... (* read buffers, compare bits, etc *)
 let new := st.(serial_oracle) st.(serial_trace) in
 match new with
 | SerialRecv data ⇒
   let (st', byte) := ... in (* manipulate data *)
   (st'/[serial_trace := st.(serial_trace) ++ [new]], byte)
 | ... (* handle other events *) end.
```
Fig. 8: A specification of a serial driver

#### **6.1 CertiKOS Specifications**

In order to explain how to connect VST and CertiKOS specifications, we first summarize how their specification styles differ. In VST, a specification is a preand postcondition on the (step-indexed, ghost-state-augmented) memory state of a program. In CertiKOS, a specification is a function representing a state transition from the current OS state to a new one with an (optional) return value. The OS state is a record with fields for each piece of concrete or logical state that CertiKOS maintains, such as page table maps and console buffers. Specifications are organized into "Certified Abstraction Layers" [6], which can be independently proven to refine higher-level abstractions, and then composed with other layers to build more complex systems. The concrete CertiKOS kernel implementation, in C and assembly, is verified with respect to high-level specifications using this layer framework and the CompCert compiler.

Because the specifications are pure, deterministic functions, something more is needed to model functions with externally visible effects such as I/O. To handle such functions, CertiKOS parameterizes specifications by "environment contexts" [8], which act as oracles that take a log of the events up to that point and return the next steps taken by the environment. Each oracle has a fixed set of events it can produce, along with a trace well-formedness invariant that it must preserve. For example, the oracle for modeling the behavior of the serial device can return events indicating the successful completion of a send or the arrival of some data, and it is assumed to only receive values that fit in a byte ([0, 255]). Although any particular choice of oracle is a deterministic function, its implementation is completely opaque to the specification, so that proofs about the specification's behavior hold given any oracle and environment state.

As a concrete example, consider the abridged specification of part of the serial driver in CertiKOS (Figure 8). After some initial work, the specification needs to know what bits came in from the physical device, so it consults the oracle and branches based on the next serial event. If the next event is a receive, it manipulates the received data to extract a byte and returns it along with a new state in which the trace is updated to include the processed event.

#### **6.2 Relating OS and User State**

```
Definition serial_putc (c : Z) (st : OSState) : option (OSState * Z) :=
 let c' := c mod 256 in
 if st.(ikern) && st.(init) && st.(ihost) then
   if st.(drv_serial).(serial_exists) then
     match st.(com1) with
     | mkDevData (mkSerialState _ true _ _ txbuf nil false) _ ltx _ ⇒
       let cs := if c' =? CHAR_LF then [CHAR_LF;CHAR_CR] else [c'] in
       Some (st/[com1/s/TxBuf := cs,
                 serial_log := st.(serial_log) ++ [IOEvPutc c]], c')
     | _ ⇒ None end
   else Some (st, -1)
 else None.
```
Pre(k, c, m, z) (write(c); ; k) z Post(k, c, m<sup>0</sup>, m, z) <sup>m</sup><sup>0</sup> <sup>=</sup> <sup>m</sup> <sup>∧</sup> <sup>z</sup> <sup>k</sup>

Fig. 9: The core of the putchar system call vs. its dry specification

User-level programs cannot directly interact with the outside environment, and must instead communicate through the OS using the system call interface it provides. System calls in CertiKOS are specified just like any other operation, i.e., as a state transition function. For each system call, we would like to relate its dry pre- and postcondition (as described in Section 5) to its functional specification in CertiKOS. The property we would like to prove is something like: for any initial state s, if the dry precondition holds for s, then the value v and state s returned by the functional specification satisfy the dry postcondition. Combined with the correspondence between juicy and dry specifications, this implies that the system call specification correctly implements the behavior expected by the user program (as expressed by its separation logic specification in VST). However, this property cannot be proven in its current form because the dry preand postconditions are predicates on CompCert memories and external state, which differ from CertiKOS's state, much of which is invisible and irrelevant to the user program, as can be seen in Figure 9. Instead, we must restate the correctness property in terms of relations between the common elements of the two state representations. The key components to relate are the return value of the system call, the representation of the user program's memory, and the model of external behaviors. The return value is a CompCert value in both systems, but the other two require additional work to translate between them.

Although, like VST, the CertiKOS kernel uses the CompCert C semantics and memory model, user-process memory is represented as a flat physical address space rather than a set of disjoint blocks. The OS state also includes page tables to map virtual to physical addresses and a record of which addresses are allocated. Fortunately, aside from these differences, the flat memory model is quite similar to CompCert's (see Figure 10). We assume the existence of a relation Rmem that maps blocks to virtual addresses. Other than the restriction

```
Inductive flatmem_val :=
(* Map from address to value *)
Definition flatmem :=
 ZMap.t flatmem_val.
                                  Inductive memval :=
                                  | Undef: memval
                                  | Byte: byte → memval
                                  | Pointer: block → int → nat → memval.
                                  (* Map from block and offset to value *)
                                  Record mem := mkmem {
                                    mem_contents: PMap.t (ZMap.t memval);
                                    ... }.
```
Fig. 10: A comparison of CertiKOS flat memory and CompCert memory

that blocks fit in the virtual address space and map to nonoverlapping regions, the exact mapping has no effect on the system call correctness, so it can be completely arbitrary. To relate a CompCert memory to a CertiKOS one, we define a relation inj(m, flat(s), ptbl(s)), which states that if a block and offset in the CompCert memory m is valid, then it contains the same data as the corresponding location (according to Rmem and the page table) in the flat memory of the OS state s. Note that inj is parameterized by the page table to allow a system call to alter the address mapping, for example by allocating new memory.

At the user level, the precondition contains an interaction tree (or similar external specification) that specifies the allowed external behaviors, and the postcondition contains a smaller tree that continues using the return value of the "consumed" actions. On the other hand, in CertiKOS, specifications begin with a trace of the events that have already happened and extend it with new events by querying the external environment. To reconcile these two views, we can first relate an interaction tree to a (possibly infinite) set of (possibly infinitely long) traces, each of which intuitively is the result of following one path in the tree. Then any trace allowed by the output interaction tree should be a suffix of a trace allowed by the input tree, and the difference between the two should be exactly the trace of events generated during the system call:

**Definition 7.** We write consume(T , T , tr ) to mean that, if tr is a trace of T , then tr ++ tr (concatenation of tr and tr ) is a trace of T .

Equipped with the relations defined above, we can define more precisely what it means for a system call to satisfy its dry specification.

**Definition 8 (Dry-Syscall Correspondence).** A system call f with functional specification O<sup>f</sup> correctly implements a dry specification (Pd, Qd) if for any arguments *v*, CompCert memory m, interaction tree T , and OS state s, if Pd(*v*, m, T ), inj(m, flat(s), ptbl(s)), and O<sup>f</sup> (*v*, s)=(s , v , tnew), then for all m such that inj(m , flat(s ), ptbl(s )), there exists T such that consume(T , T , tnew), and Qd(*v*, v , m , T ).

That is, if f correctly implements a dry specification then for any state that satisfies the dry precondition Pd, we can inject the relevant piece of memory into an OS state s, apply the functional specification O<sup>f</sup> , and then extract a resulting state that satisfies the dry postcondition Qd. The inj relation may relate multiple CompCert memories to a given OS state (hence the universal quantification over the resulting memory m ), but all such memories must agree on the contents of all valid addresses, so the postcondition will usually hold for all m if it holds for any m .

**Theorem 3.** Putchar and getchar in CertiKOS correctly implement their dry specifications.

While this correspondence is specific to CertiKOS, we can adapt it to other verified operating systems by replacing the CertiKOS system call specification, user memory model, and external event representation with those of the other OS. For example, in the case of the seL4 microkernel [12], inj could be redefined to relate a CompCert memory to certain capability slots that represent the virtual memory, and the system call might send a message to a device driver running in another process. Despite these changes, most of the theorems in this paper aside from Theorem 3 would continue to hold with minor or no alterations.

# **6.3 Soundness of VST + CertiKOS**

In Section 5, we described a correspondence between "juicy" separation logic specifications for external functions and "dry" CompCert-level specifications that is sufficient to guarantee that verified C programs behave correctly when run, as long as the external functions actually satisfy their dry specifications. Now we have seen how to prove that an external function satisfies its dry specification, by relating it to its CertiKOS specification. We combine these two proofs to get a stronger correctness property for programs that use CertiKOS system calls. This will also allow us to formalize the idea that at each point in a program's execution, it has performed some prefix of the communication operations specified in its precondition.

First, we define the semantics of programs with respect to the implementation of external functions:

**Definition 9 (OS Safety).** Suppose that we have a set of external calls F such that each f ∈ F has a functional specification O<sup>f</sup> . Then a configuration (c, m, t, T ), where c is a C program state, m is a memory, t is a trace of events performed so far, and T is an interaction tree specifying the allowed future events, is safe for n steps with respect to a set of traces T if:


The C program has states (c, m), where c holds the values of local variables and the control stack, and m is the memory. Our small-step relation (c, m) → (c , m ) characterizes internal C execution, and therefore if c is at a call to an external function then (c, m) → (c , m ). The operating system has states s that contain the physical memory flat(s) and many other components used internally by the OS (and its proof of correctness), including a trace of past events; we say that s is consistent with t when the trace in s is exactly t.

Definition 9 has several important differences from our original definition of safety in Section 2. First, configurations include the trace t of events performed so far, as well as T , the high-level specification of the allowed communication events (here it is taken to be an interaction tree, but it could easily be defined in another formalism just by changing the definition of consume). Second, our external functions are not simply axiomatized with pre- and postconditions, but implemented by the executable specifications O<sup>f</sup> provided by the operating system. We use the ideas of the previous section to relate the execution of C programs to the behavior of system calls: we inject the user memory into the OS state, extract the resulting memory from the resulting state, and require that the new interaction tree T reflect the communication events tnew performed by the call. Note the quantification over the current OS state s: the details of the OS state, such as the buffer of values received, are unknown to the C program (and may change arbitrarily between steps, for instance, if an interrupt occurs), and so it must be safe under all possible OS states consistent with the events t. The set T contains all possible communication traces from the program's execution, so by proving that every trace in T is allowed by the initial interaction tree T , we show that the program's communication is always constrained by T .

**Lemma 2 (Trace Correctness).** If (c, m, T ) is safe for n steps with respect to T, then for all traces t ∈ T, there exists some interaction tree T such that consume(T , T , t).

Proof. By induction on n. Since the consume relation holds for the trace segment produced by each external call, it suffices to show that it is transitive, i.e., that consume(a, b, t1) and consume(b, c, t2) imply consume(a, c, t<sup>1</sup> ++ t2).

**Theorem 4 (Soundness of VST + CertiKOS).** Let P be a program with n functions, calling also upon m external functions. The internal functions have (juicy) specifications Γ<sup>1</sup> ...Γ<sup>n</sup> and the external functions have (juicy) specifications Γ<sup>n</sup>+1 ...Γ<sup>n</sup>+<sup>m</sup>. Suppose P is proved correct in Verifiable C with initial interaction tree T . Let D<sup>n</sup>+1,...,D<sup>n</sup>+<sup>m</sup> be dry specifications that safely evolve memory, and that correspond to Γ<sup>n</sup>+1 ...Γ<sup>n</sup>+<sup>m</sup>. Further, let each D<sup>i</sup> be correctly implemented by an OS function f<sup>i</sup> with executable specification O<sup>f</sup><sup>i</sup> . Then for all n, the main function of P is safe for n steps with respect to some set of traces T, and for every trace t ∈ T, there exists some interaction tree T such that consume(T , T , t).

Proof. By the combination of the soundness of VST with external functions (Theorem 2), Lemma 2, and a proof relating our previous definition of safety to the new definition.

This is our main result: by combining the results of the previous sections, we obtain a soundness theorem down to the operating system's implementation of system calls, one that guarantees that the actual communication operations performed by the program are always a prefix of the initial specification of allowed operations. By instantiating the theorem with a set of verified system calls, we obtain a strong correctness result for our VST-verified programs, such as:

**Theorem 5.** Let P be a program that uses the putchar and getchar system calls provided by CertiKOS, such as the one in Figure 4. Suppose P is proved correct with initial interaction tree <sup>T</sup> . Then for all <sup>n</sup>, the main function of <sup>P</sup> is safe for n steps with respect to some set of traces T, and for every trace t ∈ T, there exists some interaction tree T such that consume(T , T , t).

# **7 From syscall-level to hardware-level interactions**

Thus far, we have assumed that the events in a program's trace are exactly the events described in the user-level interaction tree T . In practice, however, the communication performed by the OS may differ from that observed by the user. For example, like all operating systems, CertiKOS uses a kernel buffer of finite size to store characters received from the serial device; if the buffer is full, incoming characters are discarded without being read. To represent this distinction, we distinguish between the user-visible events produced by system calls, and external events, which are generated by the environment oracle and recorded in the trace at the time that they occur. For the system call events to be meaningful, they must correspond in some way to the external events, but this correspondence may not be one-to-one. In the case of console I/O, each character received by the serial device should be returned by getchar at most once, and in the order they arrived, but characters may be dropped. This leads us to the condition that the user events should be a subsequence of the environment events, which is proved in CertiKOS.

**Lemma 3.** The getchar system call maintains the invariant that there exists an injective map from a system call event with value v in the OS trace to an external event with value v earlier in the trace.

**Corollary 2.** Let P be a verified program as described in Theorem 4, in which getchar is the only system call performed. Then for all n, the main function of P is safe for n steps with respect to some set of traces T, and for every trace t ∈ T, there exists some interaction tree T such that consume(T , T , t), and the events in t correspond to external events performed as described in Lemma 3.

Unlike Theorem 4, this corollary is specific to a particular system call, but it gives a stronger correctness property: the events in the user-level interaction tree are now interpreted in terms of actual bytes received by the OS, in the form of external events. Note that Lemma 3 does not require that every external event has a corresponding system call event; if the buffer fills up and characters are dropped before a getchar call, then there will be external events that do not correspond to anything in the interaction tree, and this is the intended semantics of buffered communication without flow control. A similar corollary can be proved for any set of system calls, but the precise correspondence between user events and external events will depend on the particular system calls involved.

There is one more soundness theorem we might want to prove, asserting that the combined system of program and operating system executes correctly according to the assembly-level semantics of the OS. We should be able to obtain this theorem by connecting Theorem 4 with the soundness theorem of CertiKOS, which guarantees that the behavior of the operating system running a program P refines the behavior of a system K P consisting of the program along with an abstract model of the operating system. However, this connection is far from trivial: it involves lowering our soundness result from C to assembly (using the correctness theorem of CompCert), modeling the switch from user to kernel mode (including the semantics of the trap instruction), and considering the effects of other OS features on program behavior (e.g., context switching). We estimate that we have covered more than half of the distance between VST and CertiKOS with our current result, but there is still work to be done to complete the connection. We can now remove the OS's implementation of each system call from the trusted computing base; it remains to remove the OS entirely.

# **8 Related Work**

The most comprehensive prior work connecting verified programs to the implementation of I/O operations is that of F´er´ee et al. [5] in CakeML, a functional language with I/O connected to a verified compiler and verified hardware. As in our approach, the language is parameterized by functional specifications for external functions, backed by proofs at a lower level. However, while CakeML does support a separation logic [9], it is not higher-order, so all of the components are specified in the same basic style. Our approach could enable higher-order separation logic reasoning about CakeML programs. Ironclad Apps [10] also includes verified communicating code, for user-level networking applications running on the Verve operating system [21]. However, their network stack is implemented outside of the operating system, so proofs about I/O operations are carried out within the same framework as the programs that use the operations.

One major category of system calls is file I/O operations. The FSCQ file system [2] is verified using Crash Hoare Logic, a separation logic which accounts for possible crashes at any point in a program. File system assertions are similar to the ordinary points-to assertions of separation logic, but may persist through crashes while memory is reset. In Crash Hoare Logic, the implementation-level model of the file state is the same as the user's model, and the approach does not obviously generalize to other forms of external communication.

Another related area is the extension of separation logic to distributed systems, which necessarily involves reasoning about communication with external entities. The most closely related such logic is Aneris [14], which is built on Iris, the inspiration for VST's approach to ghost state. The adequacy theorem of Aneris proves the connection between higher-order separation logic specifications of socket operations and a language that includes first-order operational semantics for those functions. In our approach, this would correspond to directly adding the "dry" specifications for each operation to the language semantics, and building the correspondence proof for those particular operations into the soundness theorem of the logic; our more generic style of soundness theorem would make it easier to plug in new external calls. The bottom half of our approach showing that the language-level semantics of the operations are implemented by an OS such as CertiKOS—could be applied to Aneris more or less as is. Another interesting feature of Aneris is that the communication allowed on each socket is specified by a user-provided protocol, an arbitrary separation logic predicate on messages and resources. In our examples thus far, we have assumed that the external world does not share any notion of resource with the program, and so our external state only mentions the messages to be sent and received; however, the construction of Section 3 does allow the external state to have arbitrary ghost-state structure, which we could use to define similarly expressive protocols.

# **9 Conclusion and Future Work**

We have now seen how to connect programs verified using higher-order separation logic to external functions provided by a first-order verified system, effectively importing the results of outside verification (e.g. OS verification) into our separation logic. The approach consists of two halves: we first relate separation logic specifications for the external functions to "dry" first-order specifications on CompCert memories [15] and interaction trees [13], and then relate these dry specifications to the system that implements the functions (CertiKOS in our example). In the process, we interpret the C-level communication constraints in terms of OS-level events that more accurately represent the communication that occurs in the real world. Our approach works for any type of external communication, and allows users to extend the system with new external functions as needed. Each new correspondence proof for an external function modularly extends the soundness theorem of VST, removing the separation-logic specification of the function from the trusted computing base.

The combination of CompCert memories with interaction trees has served as a robust specification interface between two quite different approaches to verification: VST's higher-order impredicative concurrent separation logic, and CertiKOS's certified concurrent abstraction layers. This strongly suggests that the combination of CompCert memories and interaction trees can serve as a lingua franca to interface with other verification systems for client programs and for operating systems.

# **References**

1. Appel, A.W., Dockins, R., Hobor, A., Beringer, L., Dodds, J., Stewart, G., Blazy, S., Leroy, X.: Program Logics for Certified Compilers. Cambridge University Press (2014), http://www.cambridge.org/de/academic/subjects/computer-science/ programming-languages-and-applied-logic/program-logics-certified-compilers? format=HB


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Modular Inference of Linear Types for Multiplicity-Annotated Arrows

Kazutaka Matsuda<sup>1</sup>

Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan kztk@ecei.tohoku.ac.jp

Abstract. Bernardy et al. [2018] proposed a linear type system λ<sup>q</sup> <sup>→</sup> as a core type system of Linear Haskell. In the system, linearity is represented by annotated arrow types A →<sup>m</sup> B, where m denotes the multiplicity of the argument. Thanks to this representation, existing non-linear code typechecks as it is, and newly written linear code can be used with existing non-linear code in many cases. However, little is known about the type inference of λ<sup>q</sup> <sup>→</sup>. Although the Linear Haskell implementation is equipped with type inference, its algorithm has not been formalized, and the implementation often fails to infer principal types, especially for higher-order functions. In this paper, based on OutsideIn(X) [Vytiniotis et al., 2011], we propose an inference system for a rank 1 qualified-typed variant of λ<sup>q</sup> <sup>→</sup>, which infers principal types. A technical challenge in this new setting is to deal with ambiguous types inferred by naive qualified typing. We address this ambiguity issue through quantifier elimination and demonstrate the effectiveness of the approach with examples.

Keywords: Linear Types · Type Inference · Qualified Typing.

# 1 Introduction

Linearity is a fundamental concept in computation and has many applications. For example, if a variable is known to be used only once, it can be freely inlined without any performance regression [29]. In a similar manner, destructive updates are safe for such values without the risk of breaking referential transparency [32]. Moreover, linearity is useful for writing transformation on data that cannot be copied or discarded for various reasons, including reversible computation [19, 35] and quantum computation [2, 25]. Another interesting application of linearity is that it helps to bound the complexity of programs [1, 5, 13]

Linear type systems use types to enforce linearity. One way to design a linear type system is based on Curry-Howard isomorphism to linear logic. For example, in Wadler [33]'s type system, functions are linear in the sense that their arguments are used exactly once, and any exception to this must be marked by the type operator (!). Such an approach is theoretically elegant but cumbersome in programming; a program usually contains both linear and unrestricted code, and many manipulations concerning (!) are required in the latter and around the

interface between the two. Thus, there have been several proposed approaches for more practical linear type systems [7, 21, 24, 28].

Among these approaches, a system called λ<sup>q</sup> <sup>→</sup>, the core type system of Linear Haskell, stands out for its ability to have linear code in large unrestricted code bases [7]. With it, existing unrestricted code in Haskell typechecks in Linear Haskell without modification, and if one desires, some of the unrestricted code can be replaced with linear code, again without any special programming effort. For example, one can use the function *append* in an unrestricted context as λx.*tail* (*append* x x), regardless of whether *append* is a linear or unrestricted function. This is made possible by their representation of linearity. Specifically, they annotate function type with its argument's multiplicity ("linearity via arrows" [7]) as A →<sup>m</sup> B, where m = 1 means that the function of the type uses its argument linearly, and m = ω means that there is no restriction in the use of the argument, which includes all non-linear standard Haskell code. In this system, linear functions can be used in an unrestricted context if their arguments are unrestricted. Thus, there is no problem in using *append* : List A →<sup>1</sup> List A →<sup>1</sup> List A as above, provided that x is unrestricted. This promotion of linear expressions to unrestricted ones is difficult in other approaches [21, 24, 28] (at least in the absence of bounded kind-polymorphism), where linearity is a property of a type (called "linearity via kinds" in [7]).

However, as far as we are aware, little is known about type inference for λq <sup>→</sup>. It is true that Linear Haskell is implemented as a fork<sup>1</sup> of the Glasgow Haskell Compiler (GHC), which of course comes with type inference. However, the algorithm has not been formalized and has limitations due to a lack of proper handling of multiplicity constraints. Indeed, Linear Haskell gives up handling complex constraints on multiplicities such as those with multiplications p · q; as a result, Linear Haskell sometimes fails to infer principal types, especially for higher-order functions.<sup>2</sup> This limits the reusability of code. For example, Linear Haskell cannot infer an appropriate type for function composition to allow it to compose both linear and unrestricted functions.

A classical approach to have both separated constraint solving that works well with the usual unification-based typing and principal typing (for a rank 1 fragment) is qualified typing [15]. In qualified typing, constraints on multiplicities are collected, and then a type is qualified with it to obtain a principal type. Complex multiplicities are not a problem in unification as they are handled by a constraint solver. For example, consider *app* = λf.λx.f x. Suppose that f has type a →<sup>p</sup> b, and x has type a (here we focus only on multiplicities). Let us write the multiplicities of f and x as p<sup>f</sup> and px, respectively. Since x is passed to f, there is a constraint that the multiplicity p<sup>x</sup> of x must be ω if the multiplicity p of the f's argument also is. In other words, p<sup>x</sup> must be no less than p, which is represented by inequality p ≤ p<sup>x</sup> under the ordering 1 ≤ ω. (We could represent the constraint as an equality p<sup>x</sup> = p · px, but using inequality is simpler here.)

<sup>1</sup> https://github.com/tweag/ghc/tree/linear-types

<sup>2</sup> Confirmed for commit 1c80dcb424e1401f32bf7436290dd698c739d906 at May 14, 2019.

For the multiplicity p<sup>f</sup> of f, there is no restriction because f is used exactly once; linear use is always legitimate even when p<sup>f</sup> = ω. As a result, we obtain the inferred type ∀p p<sup>f</sup> p<sup>x</sup> a b. p ≤ p<sup>x</sup> ⇒ (a →<sup>p</sup> b) →p<sup>f</sup> a →p<sup>x</sup> b for *app*. This type is a principal one; it is intuitively because only the constraints that are needed for typing λf.λx.f x are gathered. Having separate constraint solving phases itself is rather common in the context of linear typing [3, 4, 11, 12, 14, 23, 24, 29, 34]. Qualified typing makes the constraint solving phase local and gives the principal typing property that makes typing modular. In particular, in the context of linearity via kinds, qualified typing is proven to be effective [11, 24].

As qualified typing is useful in the context of linearity via kinds, one may expect that it also works well for linearity via arrows such as λ<sup>q</sup> <sup>→</sup>. However, naive qualified typing turns out to be impractical for λ<sup>q</sup> <sup>→</sup> because it tends to infer ambiguous types [15,27]. As a demonstration, consider a slightly different version of *app* defined as *app* = λf.λx.*app* f x. Standard qualified typing [15, 31] infers the type

$$\forall q \; q\_f \; q\_x \; p\_f \; p\_x \; a \; b. \; (q \le q\_x \land q\_f \le p\_f \land q\_x \le p\_x) \Rightarrow (a \to\_q b) \to\_{p\_f} a \to\_{p\_x} b$$

by the following steps:


This inference is unsatisfactory, as the inferred type leaks internal details and is ambiguous [15, 27] in the sense that one cannot determine q<sup>f</sup> and q<sup>x</sup> from an instantiation of (a →<sup>q</sup> b) →<sup>p</sup><sup>f</sup> a →<sup>p</sup><sup>x</sup> b. Due to this ambiguity, the types of *app* and *app*' are not judged as equivalent; in fact, the standard qualified typing algorithms [15, 31] reject *app* : ∀p p<sup>f</sup> p<sup>x</sup> a b. p ≤ p<sup>x</sup> ⇒ (a →<sup>p</sup> b) →<sup>p</sup><sup>f</sup> a →<sup>p</sup><sup>x</sup> b. We conjecture that the issue of inferring ambiguous types is intrinsic to linearity via arrows because of the separation of multiplicities and types, unlike the case of linearity via kinds, where multiplicities are always associated with types. Simple solutions such as rejecting ambiguous types are not desirable as this case appears very often. Defaulting ambiguous variables (such as q<sup>f</sup> and qx) to 1 or ω is not a solution either because it loses principality in general.

In this paper, we propose a type inference method for a rank 1 qualified-typed variant of λ<sup>q</sup> <sup>→</sup>, in which the ambiguity issue is addressed without compromising principality. Our type inference system is built on top of OutsideIn(X) [31], an inference system for qualified types used in GHC, which can handle local assumptions to support **let**, existential types, and GADTs. An advantage of using OutsideIn(X) is that it is parameterized over theory X of constraints. Thus, applying it to linear typing boils down to choosing an appropriate X. We choose X carefully so that the representation of constraints is closed under quantifier

elimination, which is the key to addressing the ambiguity issue. Specifically, in this paper:


Finally, we discuss related work (Sect. 7) and then conclude the paper (Sect. 8). The prototype implementation is available as a part of a reversible programming system Sparcl, available from https://bitbucket.org/kztk/partially-reversible-lang-impl/. Due to space limitation, we omit some proofs from this paper, which can be found in the full version [20].

#### 2 Qualified-Typed Variant of *λ<sup>q</sup> →*

In this section, we introduce a qualified-typed [15] variant of λ<sup>q</sup> <sup>→</sup> [7] for its rank 1 fragment, on which we base our type inference. Notable differences to the original λ<sup>q</sup> <sup>→</sup> include: (1) multiplicity abstractions and multiplicity applications are implicit (as type abstractions and type applications), (2) this variant uses qualified typing [15], (3) conditions on multiplicities are inequality based [6], which gives better handling of multiplicity variables, and (4) local definitions are excluded as we postpone the discussions to Sect. 5 due to their issues in the handling of local assumptions in qualified typing [31].

#### 2.1 Syntax of Programs

Programs and expressions, which will be typechecked, are given below.

*prog* ::= *bind* <sup>1</sup>; ... ; *bind* <sup>n</sup> *bind* ::= f = e | f : A = e e ::= x | λx.e | e<sup>1</sup> e<sup>2</sup> | C e | **case** e<sup>0</sup> **of** {C<sup>i</sup> x<sup>i</sup> → ei}<sup>i</sup>

A program is a sequence of bindings with or without type annotations, where bound variables can appear in following bindings. As mentioned at the beginning


Fig. 1. Types and related notions: a and p are type and multiplicity variables, respectively, and D represents a type constructor.

of this section, we shall postpone the discussions of local bindings (i.e., **let**) to Sect. 5. Expressions consist of variables x, applications e<sup>1</sup> e2, λ-abstractions λx.e, constructor applications C e, and (shallow) pattern matching **case** e<sup>0</sup> **of** {C<sup>i</sup> x<sup>i</sup> → ei}i. For simplicity, we assume that constructors are fully-applied and patterns are shallow. As usual, patterns C<sup>i</sup> x<sup>i</sup> must be linear in the sense that each variable in x<sup>i</sup> is different. Programs are assumed to be appropriately α-renamed so that newly introduced variables by λ and patterns are always fresh. We do not require the patterns of a **case** expression to be exhaustive or no overlapping, following the original λ<sup>q</sup> <sup>→</sup> [7]; the linearity in <sup>λ</sup><sup>q</sup> <sup>→</sup> cares only for successful computations. Unlike the original λ<sup>q</sup> <sup>→</sup>, we do not annotate λ and **case** with the multiplicity of the argument and the scrutinee, respectively.

Constructors play an important role in λ<sup>q</sup> <sup>→</sup>. As we will see later, they can be used to witness unrestrictedness, similarly to ! of !e in a linear type system [33].

#### 2.2 Types

Types and related notations are defined in Fig. 1. Types are separated into monotypes and polytypes (or, type schemes). Monotypes consist of (rigid) type variables a, datatypes D μ τ , and multiplicity-annotated function types τ<sup>1</sup> →<sup>μ</sup> τ2. Here, a multiplicity μ is either 1 (linear), ω (unrestricted), or a (rigid) multiplicity variable p. Polytypes have the form ∀pa.Q ⇒ τ , where Q is a constraint that is a conjunction of predicates. A predicate φ has the form of M ≤ M , where M and M are multiplications of multiplicities. We shall sometimes treat Q as a set of predicates, which means that we shall rewrite Q according to contexts by the idempotent commutative monoid laws of ∧. We call both multiplicity (p) and type (a) variables type-level variables, and write ftv(t) for the set of free type-level variables in syntactic objects (such as types and constraints) t.

The relation (≤) and operator (·) in predicates denote the corresponding relation and operator on {1, ω}, respectively. On {1, ω}, (≤) is defined as the reflexive closure of 1 ≤ ω; note that ({1, ω} , ≤) forms a total order. Multiplication (·) on {1, ω} is defined by

$$1 \cdot m = m \cdot 1 = m \qquad \omega \cdot m = m \cdot \omega = \omega.$$

For simplicity, we shall sometimes omit (·) and write m1m<sup>2</sup> for m<sup>1</sup> · m2. Note that, for m1, m<sup>2</sup> ∈ {1, ω}, m<sup>1</sup> · m<sup>2</sup> is the least upper bound of m<sup>1</sup> and m<sup>2</sup> with respect to ≤. As a result, m<sup>1</sup> · m<sup>2</sup> ≤ m holds if and only if (m<sup>1</sup> ≤ m) ∧ (m<sup>2</sup> ≤ m) holds; we will use this property for efficient handling of constraints (Sect. 3.2).

We assume a fixed set of constructors given beforehand. Each constructor is assigned a type of the form ∀pa. τ<sup>1</sup> →μ<sup>1</sup> ... →μn<sup>1</sup> τ<sup>n</sup> →μ<sup>n</sup> D p a where each τ<sup>i</sup> and μ<sup>i</sup> do not contain free type-level variables other than {pa}, i.e., - <sup>i</sup> ftv(τi, μi) ⊆ {pa}. For simplicity, we write the above type as ∀pa. τ →<sup>μ</sup> D p a. We assume that types are well-kinded, which effectively means that D is applied to the same numbers of multiplicity arguments and type arguments among the constructor types. Usually, it suffices to use constructors of linear function types as below because they can be used in both linear and unrestricted code.

$$\begin{aligned} (-,-) &: \forall a \, b . \, a \to\_1 \, b \to\_1 \, a \otimes b \\ \mathsf{Nil} &: \forall a . \, \mathsf{List} \, a \qquad \mathsf{Cons} : \forall a . \, a \to\_1 \, \mathsf{List} \, a \to\_1 \, \mathsf{List} \, a \end{aligned}$$

In general, constructors can encapsulate arguments' multiplicities as below, which is useful when a function returns both linear and unrestricted results.

$$\mathsf{Mkỏn} :: \forall a. \, a \to\_{\omega} \mathsf{Un} \, a \qquad \mathsf{MkMany} :: \forall p \, a. \, a \to\_{p} \mathsf{Many} \, p \, a.$$

For example, a function that reads a value from a mutable array at a given index can be given as a primitive of type *readMArray* : ∀a. MArray a →<sup>1</sup> Int →<sup>ω</sup> (MArray a ⊗ Un a) [7]. Multiplicity-parameterized constructors become useful when the multiplicity of contents can vary. For example, the type IO<sup>L</sup> p a with the constructor MkIO<sup>L</sup> : (World →<sup>1</sup> (World ⊗ Many p a)) →<sup>1</sup> IO<sup>L</sup> p a can represent the IO monad [7] with methods *return* : ∀p a. *a* →<sup>p</sup> IO<sup>L</sup> p a and (>>=) : ∀p q a b. IO<sup>L</sup> p a →<sup>1</sup> (a →<sup>p</sup> IO<sup>L</sup> q b) →<sup>1</sup> IO<sup>L</sup> q b.

#### 2.3 Typing Rules

Our type system uses two sorts of environments A typing environment maps variables into polytypes (as usual in non-linear calculi), and a multiplicity environment maps variables into multiplications of multiplicities. This separation of the two will be convenient when we discuss type inference. As usual, we write x<sup>1</sup> : A1,...,x<sup>n</sup> : A<sup>n</sup> instead of {x<sup>1</sup> → A1,...,x<sup>n</sup> → An} for typing environments. For multiplicity environments, we use multiset-like notation as x<sup>1</sup> <sup>M</sup><sup>1</sup> ,...,xnM<sup>n</sup> .

We use the following operations on multiplicity environments:<sup>3</sup>

$$\begin{aligned} (\Delta\_1 + \Delta\_2)(x) &= \begin{cases} \omega & \text{if } x \in \mathsf{dom}(\Delta\_1) \cap \mathsf{dom}(\Delta\_2) \\ \Delta\_i(x) & \text{if } x \in \mathsf{dom}(\Delta\_i) \nmid \mathsf{dom}(\Delta\_j) \ (i \neq j \in \{1, 2\}) \end{cases} \\ (\mu \Delta)(x) &= \mu \cdot \Delta(x) \\ (\Delta\_1 \sqcup \Delta\_2)(x) &= \begin{cases} \Delta\_1(x) \cdot \Delta\_2(x) & \text{if } x \in \mathsf{dom}(\Delta\_1) \cap \mathsf{dom}(\Delta\_2) \\ \omega & \text{if } x \in \mathsf{dom}(\Delta\_i) \ \mathsf{dom}(\Delta\_j) \ (i \neq j \in \{1, 2\}) \end{cases} \end{aligned}$$

<sup>3</sup> In these definitions, we implicitly consider multiplicity 0 and regard Δ(x)=0 if x ∈ dom(Δ). It is natural that 0 + m = m + 0. With 0, multiplication ·, which is extended as 0 · m = m · 0=0, no longer computes the least upper bound. Therefore, we use for the last definition; in fact, the definition corresponds to the pointwise computation of Δ1(x) Δ2(x), where ≤ is extended as 0 ≤ ω but not 0 ≤ 1. This treatment of 0 coincides with that in the Linear Haskell proposal [26].

Q; Γ; Δ e : τ Q |= Δ = Δ Q |= τ ∼ τ <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>e</sup> : <sup>τ</sup> Eq Γ(x) = ∀pa.Q ⇒ τ Q |= Q [<sup>p</sup> <sup>→</sup> <sup>μ</sup>] <sup>Q</sup> <sup>|</sup><sup>=</sup> <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>Δ</sup> <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup> <sup>x</sup> : <sup>τ</sup> [<sup>p</sup> <sup>→</sup> μ, <sup>a</sup> <sup>→</sup> <sup>τ</sup> ] Var <sup>Q</sup>; Γ, x : <sup>σ</sup>; Δ, x<sup>μ</sup> <sup>e</sup> : <sup>τ</sup> <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup> λx.e : <sup>σ</sup> <sup>→</sup><sup>μ</sup> <sup>τ</sup> Abs <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup><sup>1</sup> <sup>e</sup><sup>1</sup> : <sup>σ</sup> <sup>→</sup><sup>μ</sup> τ Q; <sup>Γ</sup>; <sup>Δ</sup><sup>2</sup> <sup>e</sup><sup>2</sup> : <sup>σ</sup> <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup><sup>1</sup> <sup>+</sup> μΔ<sup>2</sup> <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> : <sup>τ</sup> App C : ∀pa. τ →<sup>ν</sup> D p a {Q; Γ; Δ<sup>i</sup> e<sup>i</sup> : τi[p → μ, a → σ]}<sup>i</sup> Q; Γ; ωΔ<sup>0</sup> + <sup>i</sup> <sup>ν</sup>i[<sup>p</sup> <sup>→</sup> <sup>μ</sup>]Δ<sup>i</sup> <sup>C</sup> <sup>e</sup> : <sup>D</sup> <sup>μ</sup> <sup>σ</sup> Con <sup>Q</sup>; <sup>Γ</sup>; <sup>Δ</sup><sup>0</sup> <sup>e</sup><sup>0</sup> : <sup>D</sup> <sup>μ</sup> <sup>σ</sup> C<sup>i</sup> : <sup>∀</sup>pa. <sup>τ</sup><sup>i</sup> <sup>→</sup><sup>ν</sup><sup>i</sup> <sup>D</sup> <sup>p</sup> <sup>a</sup> Q; Γ, x<sup>i</sup> : τi[p → μ, a → σ]; Δi, x<sup>i</sup> <sup>μ</sup>0νi[p →μ] e<sup>i</sup> : τ i Q; Γ; μ0Δ<sup>0</sup> + <sup>i</sup> <sup>Δ</sup><sup>i</sup> **case** <sup>e</sup><sup>0</sup> **of** {C<sup>i</sup> <sup>x</sup><sup>i</sup> <sup>→</sup> <sup>e</sup>i}<sup>i</sup> : <sup>τ</sup> Case

Fig. 2. Typing relation for expressions

Intuitively, Δ(x) represents the number of uses of x. So, in the definition of Δ<sup>1</sup> + Δ2, we have (Δ<sup>1</sup> + Δ2)(x) = ω if x ∈ dom(Δ1) ∩ dom(Δ2) because this condition means that x is used in two places. Operation Δ<sup>1</sup> Δ<sup>2</sup> is used for **case** branches. Suppose that a branch e<sup>1</sup> uses variables as Δ<sup>1</sup> and another branch e<sup>2</sup> uses variables as Δ2. Then, putting the branches together, variables are used as Δ<sup>1</sup> Δ2. The definition says that x is considered to be used linearly in the two branches put together if and only if both branches use x linearly, where non-linear use includes unrestricted use (Δi(x) = ω) and non-use (x ∈ dom(Δ)).

We write Q |= Q if Q logically entails Q . That is, for any valuation of multiplicity variables θ(p) ∈ {1, ω}, Q θ holds if Qθ does. For example, we have p ≤ r ∧ r ≤ q |= p ≤ q. We extend the notation to multiplicity environments and write Q |= Δ<sup>1</sup> ≤ Δ<sup>2</sup> if dom(Δ1) ⊆ dom(Δ2) and Q |= <sup>x</sup>∈dom(Δ) <sup>Δ</sup>1(x) <sup>≤</sup> Δ2(x)∧ <sup>x</sup>∈dom(Δ2)\dom(Δ1) <sup>ω</sup> <sup>≤</sup> <sup>Δ</sup>2(x) hold. We also write <sup>Q</sup> <sup>|</sup><sup>=</sup> <sup>Δ</sup><sup>1</sup> <sup>=</sup> <sup>Δ</sup><sup>2</sup> if both Q |= Δ<sup>1</sup> ≤ Δ<sup>2</sup> and Q |= Δ<sup>2</sup> ≤ Δ<sup>1</sup> hold. We then have the following properties.

Lemma 1. Suppose Q |= Δ ≤ Δ and Q |= Δ = Δ<sup>1</sup> + Δ2. Then, there are some Δ <sup>1</sup> and Δ <sup>2</sup> such that Q |= Δ = Δ <sup>1</sup> + Δ <sup>2</sup>, Q |= Δ<sup>1</sup> ≤ Δ <sup>1</sup> and Q |= Δ<sup>2</sup> ≤ Δ <sup>2</sup>.

$$\text{Lemma 2.}\quad Q \mid = \mu \Delta \le \Delta' \text{ implies } Q \mid = \Delta \le \Delta'. \tag{7}$$

Lemma 3. Q |= Δ<sup>1</sup> Δ<sup>2</sup> ≤ Δ implies Q |= Δ<sup>1</sup> ≤ Δ and Q |= Δ<sup>2</sup> ≤ Δ .

Constraints Q affect type equality; for example, under Q = p ≤ q ∧ q ≤ p, σ →<sup>p</sup> τ and σ →<sup>q</sup> τ become equivalent. Formally, we write Q |= τ ∼ τ if τθ = τ θ for any valuation θ of multiplicity variables that makes Qθ true.

Now, we are ready to define the typing judgment for expressions, Q; Γ; Δ e : τ , which reads that under assumption Q, typing environment Γ, and multiplicity environment Δ, expression e has monotype τ , by the typing rules in Fig. 2. Here, we assume dom(Δ) ⊆ dom(Γ). Having x ∈ dom(Γ) \ dom(Δ) means that the multiplicity of x is essentially 0 in e.

Rule Eq says that we can replace τ and Δ with equivalent ones in typing.

$$\begin{array}{c} \overline{\Gamma \vdash \varepsilon} \operatorname{EMPTY} \quad \frac{Q; \Gamma; \Delta \vdash e : \tau \quad \overline{p} \overline{a} = \operatorname{ftv}(Q, \tau) \quad \Gamma, f : \forall \overline{p} \overline{a}. Q \Rightarrow \tau \vdash prog}{\Gamma \vdash f = e; \operatorname{prog}} \operatorname{BIND} \\\\ \underline{Q; \Gamma; \Delta \vdash e : \tau \quad \overline{p} \overline{a} = \operatorname{ftv}(Q, \tau) \quad \Gamma, f : \forall \overline{p} \overline{a}. Q \Rightarrow \tau \vdash prog} \operatorname{BIND} \\ \end{array} \operatorname{BIND}$$

Fig. 3. Typing rules for programs

Rule Var says that x is used once in a variable expression x, but it is safe to regard that the expression uses x more than once and uses other variables ω times. At the same time, the type ∀pa.Q ⇒ τ of x instantiated to τ [p → μ, a → σ] with yielding constraints Q [p → μ], which must be entailed from Q.

Rule Abs says that λx.e has type <sup>σ</sup> <sup>→</sup><sup>μ</sup> <sup>τ</sup> if <sup>e</sup> has type <sup>τ</sup> , assuming that the use of x in e is μ. Unlike the original λ<sup>q</sup> <sup>→</sup> [7], in our system, multiplicity annotations on arrows must be μ, i.e., 1, ω, or a multiplicity variable, instead of M. This does not limit the expressiveness because such general arrow types can be represented by type σ →<sup>p</sup> τ with constraints p ≤ M ∧ M ≤ p.

Rule App sketches an important principle in λ<sup>q</sup> <sup>→</sup>; when an expression with variable use Δ is used μ-many times, the variable use in the expression becomes μΔ. Thus, since we pass e<sup>2</sup> (with variable use Δ2) to e1, where e<sup>1</sup> uses the argument μ-many times as described in its type σ →<sup>μ</sup> τ , the use of variables in e<sup>2</sup> of e<sup>1</sup> e<sup>2</sup> becomes μΔ2. For example, for (λy.42) x, x is considered to be used ω times because (λy.42) has type σ →<sup>ω</sup> Int for any σ.

Rule Con is nothing but a combination of Var and App. The ωΔ<sup>0</sup> part is only useful when C is nullary; otherwise, we can weaken Δ at leaves.

Rule Case is the most complicated rule in this type system. In this rule, μ<sup>0</sup> represents how many times the scrutinee e<sup>0</sup> is used in the **case**. If μ<sup>0</sup> = ω, the pattern bound variables can be used unrestrictedly, and if μ<sup>0</sup> = 1, the pattern bound variables can be used according to the multiplicities of the arguments of the constructor.<sup>4</sup> Thus, in the <sup>i</sup>th branch, variables in <sup>x</sup><sup>i</sup> can be used as <sup>μ</sup>0νi[<sup>p</sup> → μ], where μi[p → μ] represents the multiplicities of the arguments of the constructor Ci. Other than xi, each branch body e<sup>i</sup> can contain free variables used as Δi. Thus, the uses of free variables in the whole branch bodies are summarized as <sup>i</sup> Δi. Recall that the **case** uses the scrutinee μ<sup>0</sup> times; thus, the whole uses of variables are estimated as μ0Δ<sup>0</sup> + <sup>i</sup> Δi.

Then, we define the typing judgment for programs, Γ *prog*, which reads that program *prog* is well-typed under Γ, by the typing rules in Fig. 3. At this place, the rules Bind and BindA have no significant differences; their difference will be clear when we discuss type inference. In the rules Bind and BindA, we assumed that Γ contains no free type-level variables. Therefore, we can safely generalize all free type-level variables in Q and τ . We do not check the use Δ in both rules

<sup>4</sup> This behavior, inherited from λ<sup>q</sup> <sup>→</sup> [7], implies the isomorphism !(A ⊗ B) ≡ !A ⊗ !B, which is not a theorem in the standard linear logic. The isomorphism intuitively means that unrestricted products can (only) be constructed from unrestricted components, as commonly adopted in linearity-via-kind approaches [11, 21, 24, 28, 29].

as bound variables are assumed to be used arbitrarily many times in the rest of the program; that is, the multiplicity of a bound variable is ω and its body uses variable as ωΔ, which maps x ∈ dom(Δ) to ω and has no free type-level variables.

# 2.4 Metatheories

Lemma 4 is the standard weakening property. Lemma 5 says that we can replace Q with a stronger one, Lemma 6 says that we can replace Δ with a greater one, and Lemma 7 says that we can substitute type-level variables in a term-in-context without violating typeability. These lemmas state some sort of weakening, and the last three lemmas clarify the goal of our inference system discussed in Sect. 3.

$$\text{Lemma 4. } Q; \Gamma; \Delta \vdash e : \tau \text{ implies } Q; \Gamma, x : A; \Delta \vdash e : \tau. \tag{7.1}$$

Lemma 5. Q; Γ; Δ e : τ and Q |= Q implies Q ; Γ; Δ e : τ .

$$\text{Lemma 6. } Q; I; \Delta \vdash e : \tau \text{ and } Q \vdash \Delta \le \Delta' \text{ implies } Q; I; \Delta' \vdash e : \tau. \tag{7.1}$$

Lemma 7. Q; Γ; Δ e : τ implies Qθ; Γ θ; Δθ e : τθ.

We have the following form of the substitution lemma:

Lemma 8 (Substitution). Suppose Q0; Γ, x : σ; Δ0, x<sup>μ</sup> e : τ , and Qi; Γ; Δ<sup>i</sup> e <sup>i</sup> : σ<sup>i</sup> for each i. Then, Q<sup>1</sup> ∧ <sup>i</sup> Qi; Γ; Δ<sup>0</sup> + <sup>i</sup> μiΔ<sup>i</sup> e[x → e ] : τ .

Subject Reduction We show the subject reduction property for a simple call-byname semantics. Consider the standard small-step call-by-name relation e −→ e with the following β-reduction rules (we omit the congruence rules):

$$\{(\lambda x.e\_1) \: e\_2 \longrightarrow e\_1 [x \mapsto e\_2] \qquad \qquad \mathsf{case } \mathsf{C}\_j \ \overline{e}\_j \ \mathsf{of } \{\mathsf{C}\_i \ \overline{x\_i} \to e'\_i\}\_i \longrightarrow e'\_j [\overline{x\_j \mapsto e\_j}]\}$$

Then, by Lemma 8, we have the following subjection reduction property:

Lemma 9 (Subject Reduction). Q; Γ; Δ e : τ and e −→ e implies Q; Γ; Δ e : τ .

Lemma 9 holds even for the call-by-value reduction, though with a caveat. For a program f<sup>1</sup> = e1; ... ; f<sup>n</sup> = en, it can happen that some e<sup>i</sup> is typed only under unsatisfiable (i.e., conflicting) Qi. As conflicting Q<sup>i</sup> means that e<sup>i</sup> is essentially ill-typed, evaluating e<sup>i</sup> may not be safe. However, the standard call-by-value strategy evaluates ei, even when f<sup>i</sup> is not used at all and thus the type system does not reject this unsatisfiability. This issue can be addressed by the standard witness-passing transformation [15] that converts programs so that Q ⇒ τ becomes W<sup>Q</sup> → τ , where W<sup>Q</sup> represents a set of witnesses of Q. Nevertheless, it would be reasonable to reject conflicting constraints locally.

We then state the correspondence with the original system [7] (assuming the modification [6] for the variable case<sup>5</sup>) to show that the qualified-typed version

<sup>5</sup> In the premise of Var, the original [7] uses <sup>∃</sup>Δ . Δ = x<sup>1</sup> + ωΔ , which is modified to <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>Δ</sup> in [6]. The difference between the two becomes clear when <sup>Δ</sup>(x) = <sup>p</sup>, for which the former one does not hold as we are not able to choose Δ depending on p.

captures the linearity as the original. While the original system assumes the call-by-need evaluation, Lemma 9 could be lifted to that case.

Theorem 1. If ; Γ; Δ e : τ where Γ contains only monotypes, e is also well-typed in the original λ<sup>q</sup> <sup>→</sup> under some environment.

The main reason for the monotype restriction is that our polytypes are strictly more expressive than their (rank-1) polytypes. This extra expressiveness comes from predicates of the form ···≤ M·M . Indeed, f = λx.**case** x **of** {MkMany y → (y, y)} has type ∀p q a. ω ≤ p · q ⇒ MkMany p a →<sup>q</sup> a ⊗ a in our system, while it has three incomparable types in the original λ<sup>q</sup> →.

# 3 Type Inference

In this section, we give a type inference method for the type system in the previous section. Following [31, Section 3], we adopt the standard two-phase approach; we first gather constraints on types and then solve them. As mentioned in Sect. 1, the inference system described here has the issue of ambiguity, which will be addressed in Sect. 4.

#### 3.1 Inference Algorithm

We first extend types τ and multiplicities μ to include unification variables.

$$
\tau ::= \dots \mid \alpha \qquad \mu ::= \dots \mid \pi
$$

We call α/π a unification type/multiplicity variable, which will be substituted by a concrete type/multiplicity (including rigid variables) during the inference. Similarly to ftv(t), we write fuv(t) for the unification variables (of both sorts) in t, where each t<sup>i</sup> ranges over any syntactic element (such as τ , Q, Γ, and Δ).

Besides Q, the algorithm will generate equality constraints τ ∼ τ . Formally, the sets of generated constraints C and generated predicates ψ are given by

$$C ::= \bigwedge\_i \psi\_i \qquad \psi ::= \phi \mid \tau \sim \tau'$$

Then, we define type inference judgment for expressions, Γ e : τ ❀ Δ; C, which reads that, given Γ and e, type τ is inferred together with variable use Δ and constraints C, by the rules in Fig. 4. Note that Δ is also synthesized as well as τ and C in this step. This difference in the treatment of Γ and Δ is why we separate multiplicity environments Δ from typing environments Γ.

Gathered constraints are solved when we process top-level bindings. Figure 5 defines type inference judgment for programs, Γ *prog*, which reads that the inference finds *prog* well-typed under Γ. In the rules, manipulation of constraints is done by the simplification judgment Q simp C ❀ Q ; θ, which simplifies C under the assumption Q into the pair (Q , θ) of residual constraints Q and substitution θ for unification variables, where (Q , θ) is expected to be equivalent

Γ(x) = ∀pa.Q ⇒ τ α, π : fresh Γ <sup>x</sup> : <sup>τ</sup> [<sup>p</sup> <sup>→</sup> π, <sup>a</sup> <sup>→</sup> <sup>α</sup>] ❀ <sup>x</sup><sup>1</sup>; <sup>Q</sup>[<sup>p</sup> <sup>→</sup> <sup>π</sup>] Γ, x : α e : τ ❀ Δ, x<sup>M</sup>; C α, π : fresh Γ λx.e : α →<sup>π</sup> τ ❀ Δ; C ∧ M ≤ π Γ e<sup>1</sup> : τ<sup>1</sup> ❀ Δ1; C<sup>2</sup> Γ e<sup>2</sup> : τ<sup>2</sup> ❀ Δ2; C<sup>1</sup> β, π : fresh Γ e<sup>1</sup> e<sup>2</sup> : β ❀ Δ<sup>1</sup> + πΔ2; C<sup>1</sup> ∧ C<sup>2</sup> ∧ τ<sup>1</sup> ∼ (τ<sup>2</sup> →<sup>π</sup> β) C : ∀pa. σ →<sup>ν</sup> D p a {Γ e<sup>i</sup> : τ<sup>i</sup> ❀ Δi; Ci}<sup>i</sup> α, π : fresh Γ - C e : D π α ❀ <sup>i</sup> νi[p → π]Δi; - <sup>i</sup> C<sup>i</sup> ∧ τ<sup>i</sup> ∼ σi[p → π, a → α] Γ e<sup>0</sup> : τ<sup>0</sup> ❀ Δ0; C<sup>0</sup> π0, πi, αi, β : fresh C<sup>i</sup> : ∀pa. τ<sup>i</sup> →<sup>ν</sup><sup>i</sup> D p a Γ, x<sup>i</sup> : τi[p → πi, a → αi] e<sup>i</sup> : τ <sup>i</sup> ❀ Δi, x<sup>i</sup> <sup>M</sup><sup>i</sup> ; C<sup>i</sup> i C = C<sup>0</sup> ∧ - i C<sup>i</sup> ∧ β ∼ τ <sup>i</sup> ∧ (τ<sup>0</sup> ∼ D π<sup>i</sup> αi) ∧ - <sup>j</sup> Mij ≤ π0νij [p → πi] Γ **case** e<sup>0</sup> **of** {C<sup>i</sup> x<sup>i</sup> → ei}<sup>i</sup> : β ❀ π0Δ<sup>0</sup> + <sup>i</sup> Δi; C

Fig. 4. Type inference rules for expressions

$$\begin{array}{c} \Gamma \vdash \mathsf{e} : \tau \sim \Delta; C \quad \top \vdash \mathsf{\star\_{simp}} \; C \sim \rightsquigarrow; \theta : \{\overline{\pi \alpha}\} = \mathsf{fu}(Q, \tau \theta) \\\hline \overline{\Gamma \vdash \mathsf{\star}} \; \frac{\overline{p}, \overline{a} : \mathsf{fresh} \quad \Gamma, f : \forall \overline{p \overline{a}}.(Q \Rightarrow \tau \theta) \overline{[\alpha \mapsto a, \pi \mapsto \overline{p}]} \vdash \mathsf{prog} }{ \Gamma \vdash e ; \mathsf{prog} } \\\hline \hline \Gamma \vdash e : \sigma \sim \Delta; C \quad Q \vdash \mathsf{\star\_{simp}} \; C \land \tau \sim \sigma \sim \top; \theta \quad \Gamma, f : \forall \overline{p \overline{a}}.Q \Rightarrow \tau \vdash \mathsf{\star\_{prog}} \; \overline{\pi} \\\hline \hline \end{array}$$
  $\begin{array}{c} \Gamma \vdash e : \sigma \sim \Delta; C \quad Q \vdash \mathsf{\star\_{simp}} \; C \land \tau \sim \sigma \sim \top; \theta \quad \Gamma, f : \forall \overline{p \overline{a}}.Q \Rightarrow \tau \vdash \mathsf{\star\_{prog}} \; \overline{\pi} \\\hline \end{array}$ 

Fig. 5. Type inference rules for programs

in some sense to C under the assumption Q. The idea underlying our simplification is to solve type equality constraints in C as much as possible and then remove predicates that are implied by Q. Rules s-Fun, s-Data, s-Uni, and S-Triv are responsible for the former, which decompose type equality constraints and yield substitutions once either of the sides becomes a unification variable. Rules S-Entail and S-Rem are responsible for the latter, which remove predicates implied by Q and then return the residual constraints. Rule S-Entail checks Q |= φ; a concrete method for this check will be discussed in Sect. 3.2.

Example 1 (*app*). Let us illustrate how the system infers a type for *app* = λf.λx.f x. We have the following derivation for its body λf.λx.f x:

f : α<sup>f</sup> <sup>f</sup> : <sup>α</sup><sup>f</sup> ❀ <sup>f</sup> <sup>1</sup>; <sup>x</sup> : <sup>α</sup><sup>x</sup> <sup>x</sup> : <sup>α</sup><sup>x</sup> ❀ <sup>x</sup><sup>1</sup>; f : α<sup>f</sup> , x : α<sup>x</sup> f x : <sup>β</sup> ❀ <sup>f</sup> <sup>1</sup>, x<sup>π</sup>; <sup>α</sup><sup>f</sup> <sup>∼</sup> (α<sup>x</sup> <sup>→</sup><sup>π</sup> <sup>β</sup>) f : α<sup>f</sup> λx.f x : <sup>α</sup><sup>x</sup> <sup>→</sup><sup>π</sup><sup>x</sup> <sup>β</sup> ❀ <sup>f</sup> <sup>1</sup>; <sup>α</sup><sup>f</sup> <sup>∼</sup> (α<sup>x</sup> <sup>→</sup><sup>π</sup> <sup>β</sup>) <sup>∧</sup> <sup>π</sup><sup>x</sup> <sup>≤</sup> <sup>π</sup> λf.λx.f x : α<sup>f</sup> →<sup>π</sup><sup>f</sup> α<sup>x</sup> →<sup>π</sup><sup>x</sup> β ❀ ∅; α<sup>f</sup> ∼ (α<sup>x</sup> →<sup>π</sup> β) ∧ π<sup>x</sup> ≤ π ∧ 1 ≤ π<sup>f</sup>

The highlights in the above derivation are:

– In the last two steps, f is assigned to type α<sup>f</sup> and multiplicity π<sup>f</sup> , and x is assigned to type α<sup>x</sup> and multiplicity πx.

$$\begin{array}{c} \frac{Q \mid \star\_{\text{simp}} \sigma \sim \sigma' \land \mu \le \mu' \land \mu' \le \mu \land \tau \sim \tau' \leadsto Q';\theta}{Q \mid \star\_{\text{simp}} \left(\sigma \to \mu \mid \tau\right) \sim \left(\sigma' \to \mu \mid \tau'\right) \land C \sim Q';\theta} \text{S-Fwn} \\ \frac{Q \mid \star\_{\text{simp}} \overline{\mu \sim \mu'} \land \mu' \le \mu}{Q \mid \star\_{\text{simp}} \left(\mathcal{D} \not\overline{\mu} \; \overline{\sigma}\right) \sim \left(\mathcal{D} \not\overline{\mu' \; \sigma'\}\right) \land C \sim Q';\theta} \text{S-DATa} \\ \frac{\alpha \not\notin \text{fuv}(\tau)}{Q \mid \star\_{\text{simp}} \left(\mathcal{D} \not\to \frac{\tau}{\alpha}\right) \sim Q';\theta}{Q \mid \star\_{\text{simp}} \alpha \sim \tau \land C \sim Q';\theta \mid \alpha \left[\tau \to \tau\right]} \text{S-UNI} \quad \frac{Q \mid \star\_{\text{simp}} \left(\mathcal{C} \sim Q';\theta\right)}{Q \mid \star\_{\text{simp}} \tau \sim \tau \land C \sim Q';\theta} \text{S-Tuv} \\ \frac{Q \wedge Q\_{\text{w}} \mid \phi \quad Q' \; \mathsf{H}\_{\text{simp}} \, Q\_{\text{w}} \vee C \sim Q';\theta}{Q \mid \star\_{\text{simp}} \phi \land Q\_{\text{w}} \wedge C \sim Q';\theta} \text{S-EnTAIL} \quad \frac{\alpha \text{other} \; \text{rule} \; \text{can } \; \text{apply} \; \text{S-EnRA}}{Q \mid \star\_{\text$$

Fig. 6. Simplification rules (modulo commutativity and associativity of ∧ and commutativity of ∼)


As a result, the type τ = α<sup>f</sup> →<sup>π</sup><sup>f</sup> α<sup>x</sup> →<sup>π</sup><sup>x</sup> β is inferred with the constraint C = α<sup>f</sup> ∼ (α<sup>x</sup> →<sup>π</sup> β) ∧ π<sup>x</sup> ≤ π ∧ 1 ≤ π<sup>f</sup> .

Then, we try to assign a polytype to *app* by the rules in Fig. 4. By simplification, we have simp C ❀ π<sup>x</sup> ≤ π; [α<sup>f</sup> → (α<sup>x</sup> →<sup>π</sup> β)]. Thus, by generalizing τ [α<sup>f</sup> → (α<sup>x</sup> →<sup>π</sup> β)] = (α<sup>x</sup> →<sup>π</sup> β) →<sup>π</sup><sup>f</sup> α<sup>x</sup> →<sup>π</sup><sup>x</sup> β with π<sup>x</sup> ≤ π, we obtain the following type for *app*:

$$\forall app: \forall p \, p\_f \, p\_x \, a \, b. \, p \le p\_x \Rightarrow (a \to\_p b) \to\_{p\_f} a \to\_{p\_x} b$$

Correctness We first prepare some definitions for the correctness discussions. First, we allow substitutions θ to replace unification multiplicity variables as well as unification type variables. Then, we extend the notion of |= and write C |= C if C θ holds when Cθ holds. From now on, we require that substitutions are idempotent, i.e., τθθ = τθ for any τ , which excludes substitutions [α → List α] and [α → β, β → Int] for example. Let us write Q |= θ = θ if Q |= τθ ∼ τθ for any τ . The restriction of a substitution θ to a domain X is written by θ|X.

Consider a pair (Qg, Cw), where we call Q<sup>g</sup> and C<sup>w</sup> given and wanted constraints, respectively. Then, a pair (Q, θ) is called a (sound) solution [31] for the pair (Qg, Cw) if Q<sup>g</sup> ∧ Q |= Cwθ, dom(θ) ∩ fuv(Qg) = ∅, and dom(θ) ∩ fuv(Q) = ∅. A solution is called guess-free [31] if it satisfies Q<sup>g</sup> ∧ C<sup>w</sup> |= Q ∧ <sup>π</sup>∈dom(θ)(<sup>π</sup> <sup>=</sup> θ(π))∧ <sup>α</sup>∈dom(θ)(<sup>α</sup> <sup>∼</sup> <sup>θ</sup>(α)) in addition. Intuitively, a guess-free solution consists of necessary conditions required for a wanted constraint C<sup>w</sup> to hold, assuming a given constraint Qg. For example, for (, α ∼ (β →<sup>1</sup> β)), (, [α → (Int →<sup>1</sup> Int), β → Int]) is a solution but not guess-free. Very roughly speaking, being for (Q, θ) a guess-free solution of (Qg, Cw) means that (Q, θ) is equivalent to C<sup>w</sup> under the assumption Qg. There can be multiple guess-free solutions; for example, for (, π ≤ 1), both (π ≤ 1, ∅) and (, [π → 1]) are guess-free solutions.

Lemma 10 (Soundness and Principality of Simplification). If Q simp C ❀ Q ; θ, (Q , θ) is a guess-free solution for (Q, C).

Lemma 11 (Completeness of Simplification). If (Q , θ) is a solution for (Q, C) where Q is satisfiable, then Q simp C ❀ Q; θ for some Q and θ .

Theorem 2 (Soundness of Inference). Suppose Γ e : τ ❀ Δ; C and there is a solution (Q, θ) for (, C). Then, we have Q; Γ θ; Δθ e : τθ.

Theorem 3 (Completeness and Principality of Inference). Suppose Γ - e : τ ❀ Δ; C. Suppose also that Q ; Γ θ ; Δ e : τ for some substitution θ on unification variables such that dom(θ ) ⊆ fuv(Γ) and dom(θ ) ∩ fuv(Q ) = ∅. Then, there exists θ such that dom(θ) \ dom(θ ) ⊆ X, (Q , θ) is a solution for (, C), Q |= θ|dom(θ-) = θ , Q |= τθ ∼ τ , and Q |= Δθ ≤ Δ , where X is the set of unification variables introduced in the derivation.

Note that the constraint generation Γ e : τ ❀ Δ; C always succeeds, whereas the generated constraints may possibly be conflicting. Theorem 3 states that such a case cannot happen when e is well-typed under the rules in Fig. 2.

Incompleteness in Typing Programs. It may sound contradictory to Theorem 3, but the type inference is indeed incomplete for checking type-annotated bindings. Recall that the typing rule for type-annotated bindings requires that the resulting constraint after simplification must be . However, even when there exists a solution of the form (, θ) for (Q, C), there can be no guess-free solution of this form. For example, (, π ≤ π ) has a solution (, [π → π ]), but there are no guess-free solutions of the required form. Also, even though there exists a guessfree solution of the form (, θ), the simplification may not return the solution, as guess-free solutions are not always unique. For example, for (, π ≤ π ∧ π ≤ π), (, [π → π ]) is a guess-free solution, whereas we have simp π ≤ π∧π ≤ π ❀ π ≤ π ∧ π ≤ π; ∅. The source of the issue is that constraints on multiplicities can (also) be solved by substitutions.

Fortunately, this issue disappears when we consider disambiguation in Sect. 4. By disambiguation, we can eliminate constraints for internally-introduced multiplicity unification variables that are invisible from the outside. As a result, after processing equality constraints, we essentially need only consider rigid multiplicity variables when checking entailment for annotated top-level bindings.

Promoting Equalities to Substituions. The inference can infer polytypes ∀p. p ≤ 1 ⇒ Int →<sup>p</sup> Int and ∀p<sup>1</sup> p2. (p<sup>1</sup> ≤ p<sup>2</sup> ∧ p<sup>2</sup> ≤ p1) ⇒ Int →<sup>p</sup><sup>1</sup> Int →<sup>p</sup><sup>2</sup> Int, while programmers would prefer more simpler types Int →<sup>1</sup> Int and ∀p. Int →<sup>p</sup> Int →<sup>p</sup> Int; the simplification so far does not yield substitutions on multiplicity unification variables. Adding the following rule remedies the situation:

$$\frac{\begin{array}{c} \pi \notin \mathsf{fuv}(Q) \quad \pi \neq \mu \\ \hline Q \land Q\_{\mathsf{w}} \mid \equiv \pi \leq \mu \land \mu \leq \pi \end{array} \begin{array}{c} \pi \neq \mu \\ Q \vdash \star\_{\text{simp}} \left(Q\_{\mathsf{w}} \land C\right)[\pi \mapsto \mu] \sim Q'; \theta \end{array}}{\begin{array}{c} Q \; \mathsf{\reflectbox{s}{\hbox{imp}}} \; Q\_{\mathsf{w}} \land C \sim \sim Q'; \theta \circ [\pi \mapsto \mu] \end{array} \begin{array}{c} \mathsf{S-Eq} \end{array}} \text{S-Eq} $$

This rule says that if π = μ must hold for Q<sup>w</sup> ∧C to hold, the simplification yields the substitution [π → μ]. The condition π ∈ fuv(Q) is required for Lemma 10; a solution cannot substitute variables in Q. Note that this rule essentially finds an improving substitution [16].

Using the rule is optional. Our prototype implementation actually uses S-Eq only for Q<sup>w</sup> for which we can find μ easily: M ≤ 1, ω ≤ μ, and looping chains μ<sup>1</sup> ≤ μ<sup>2</sup> ∧···∧ μn−<sup>1</sup> ≤ μ<sup>n</sup> ∧ μ<sup>n</sup> ≤ μ1.

#### 3.2 Entailment Checking by Horn SAT Solving

The simplification rules rely on the check of entailment Q |= φ. For the constraints in this system, we can perform this check in quadratic time at worst but in linear time for most cases. Specifically, we reduce the checking Q |= φ to satisfiability of propositional Horn formulas (Horn SAT), which is known to be solved in linear time in the number of occurrences of literals [10], where the reduction (precisely, the preprocessing of the reduction) may increase the problem size quadratically. The idea of using Horn SAT for constraint solving in linear typing can be found in Mogensen [23].

First, as a preprocess, we normalize both given and wanted constraints by the following rules:


After this, each predicate φ has the form μ ≤ <sup>i</sup> νi.

After the normalization above, we can reduce the entailment checking to satisfiability. Specifically, we use the following property:

$$|Q| = \mu \le \prod\_i \nu\_i \quad \text{iff} \quad Q \land \bigwedge\_i (\nu\_i \le 1) \land (\omega \le \mu) \text{ is unsatisfiable}$$

Here, the constraint Q ∧ <sup>i</sup>(ν<sup>i</sup> ≤ 1) ∧ (ω ≤ μ) intuitively asserts that there exists a counterexample of Q |= μ ≤ <sup>i</sup> νi.

Then, it is straightforward to reduce the satisfiability of Q to Horn SAT; we just map 1 to true and ω to false and accordingly map ≤ and · to ⇐ and ∧, respectively. Since Horn SAT can be solved in linear time in the number of occurrences of literals [10], the reduction also shows that the satisfiability of Q is checked in linear time in the size of Q if Q is normalized.

# Corollary 1. Checking Q |= φ is in linear time if Q and φ are normalized.

The normalization of constraints can duplicate M of ··· ≤ M, and thus increases the size quadratically in the worst case. Fortunately, the quadratic increase is not common because the size of M is bounded in practice, in many cases by one. Among the rules in Fig. 2, only the rule that introduces non-singleton <sup>M</sup> in the right-hand side of <sup>≤</sup> is Case for a constructor whose arguments' multiplicities are non-constants, such as MkMany : ∀p a. a →<sup>p</sup> Many p a. However, it often suffices to use non-multiplicity-parameterized constructors, such as Cons : ∀a. a →<sup>1</sup> List a →<sup>1</sup> List a, because such constructors can be used to construct or deconstruct both linear and unrestricted data.

### 3.3 Issue: Inference of Ambiguous Types

The inference system so far looks nice; the system is sound and complete, and infers principal types. However, there still exists an issue to overcome for the system to be useful: it often infers ambiguous types [15, 27] in which internal multiplicity variables leak out to reveal internal implementation details.

Consider *app* = λf.λx.*app* f x for *app* = λf.λx.f x from Example 1. We would expect that equivalent types are inferred for *app* and *app*. However, this is not the case for the inference system. In fact, the system infers the following type for *app* (here we reproduce the inferred type of *app* for comparison):

$$\begin{array}{llll} \text{app} & \text{\reflectbox{ $p$ }} & \forall p \, p\_f \, p\_x \, a \, b. \, (p \le p\_x) & \Rightarrow (a \to\_p b) \to\_{p\_f} a \to\_{p\_x} b \\ \text{app}' & \forall q \, q\_f \, q\_x \, p\_f \, p\_x \, a \, b. \, (q \le q\_x \land q\_f \le p\_f \land q\_x \le p\_x) \Rightarrow (a \to\_q b) \to\_{p\_f} a \to\_{p\_x} b \, \end{array}$$

We highlight why this type is inferred as follows.


Then, for the gathered constraints, by simplification (including S-Eq), we obtain a (guess-free) solution (Q, θ) such that Q = (π <sup>f</sup> ≤ π<sup>f</sup> ∧ π ≤ π <sup>x</sup> ∧ π <sup>x</sup> ≤ πx) and θ = [α<sup>f</sup> → (α →<sup>π</sup> β ), π 1 → π <sup>f</sup> , β → (α<sup>f</sup> →<sup>π</sup>- <sup>x</sup> β ), π<sup>2</sup> → π <sup>x</sup>, γ → β ]). Then, after generalizing (α<sup>f</sup> →<sup>π</sup><sup>f</sup> α<sup>x</sup> →<sup>π</sup><sup>x</sup> γ)θ = (α →<sup>π</sup> β ) →<sup>π</sup><sup>f</sup> α →<sup>π</sup><sup>x</sup> β, we obtain the inferred type above.

There are two problems with this inference result:


Inference of ambiguous types is common in the system; it is easily caused by using defined variables. Rejecting ambiguous types is not a solution for our case because it rejects many programs. Defaulting such ambiguous type-level variables to 1 or ω is not a solution either because it loses principality in general. However, we have no other choices than to reject ambiguous types, as long as multiplicities are relevant in runtime behavior.

In the next section, we will show how we address the ambiguity issue under the assumption that multiplicities are irrelevant at runtime. Under this assumption, it is no problem to have multiplicity-monomorphic primitives such as array processing primitives (e.g., *readMArray* : ∀a. MArray a →<sup>1</sup> Int →<sup>ω</sup> (MArray a ⊗ Un a)) [31]. Note that this assumption does not rule out all multiplicity-polymorphic primitives; it just prohibits the primitives from inspecting multiplicities at runtime.

# 4 Disambiguation by Quantifier Elimination

In this section, we address the issue of ambiguous and leaky types by using quantifier elimination. The basic idea is simple; we just view the type of *app* as

$$App' \colon \forall q \, p\_f \, p\_x \, a \, b. \left(\exists q\_x \, q\_f. \, q \le q\_x \land q\_f \le p\_f \land q\_x \le p\_x\right) \Rightarrow (a \to\_q b) \to\_{p\_f} a \to\_{p\_x} b$$

In this case, the constraint (∃q<sup>x</sup> q<sup>f</sup> . q ≤ q<sup>x</sup> ∧ q<sup>f</sup> ≤ p<sup>f</sup> ∧ q<sup>x</sup> ≤ px) is logically equivalent to q ≤ px, and thus we can infer the equivalent types for both *app* and *app* . Fortunately, such quantifier elimination is always possible for our representation of constraints; that is, for ∃p.Q, there always exists Q that is logically equivalent to ∃p.Q. A technical subtlety is that, although we perform quantifier elimination after generalization in the above explanation, we actually perform quantifier elimination just before generalization, or more precisely, as a final step of simplification, for compatibility with the simplification in OutsideIn(X) [31], especially in the treatment of local assumptions.

#### 4.1 Elimination of Existential Quantifiers

The elimination of existential quantifiers is rather easy; we simply use the wellknown fact that a disjunction of a Horn clause and a definite clause can also be represented as a Horn clause. Regarding our encoding of normalized predicates (Sect. 3.2) that maps μ ≤ M to a Horn clause, the fact can be rephrased as:

$$\text{Lemma 12. } (\mu \le M \lor \omega \le M') \equiv \mu \le M \cdot M'. \tag{7}$$

Here, we extend constraints to include ∨ and write ≡ for the logical equivalence; that is, Q ≡ Q if and only if Q |= Q and Q |= Q.

As a corollary, we obtain the following result:

Corollary 2. There effectively exists a quantifier-free constraint Q , denoted by elim(∃π.Q), such that Q is logically equivalent to ∃π.Q.

Proof. Note that ∃π.Q means Q[π → 1]∨Q[π → ω] because π ranges over {1, ω}. We safely assume that Q is normalized (Sect. 3.2) and that Q does not contain a predicate π ≤ M where π appears also in M, because such a predicate trivially holds.

We define Φ1, Φω, and Qrest as Φ<sup>1</sup> = {μ ≤ M | (μ ≤ π · M) ∈ Q, μ = π}, Φ<sup>ω</sup> = {ω ≤ M | (π ≤ M) ∈ Q, π ∈ fuv(M)}, and Qrest = {φ | φ ∈ Q, π ∈ fuv(φ)}. Here, we abused the notation to write φ ∈ Q to mean that Q = <sup>i</sup> φ<sup>i</sup> and φ = φ<sup>i</sup> for some i. In the construction of Φ1, we assumed the monoid laws of (·); the definition says that we remove π from the right-hand sides and M becomes 1 if the right-hand side is π. By construction, Q[p → 1] and Q[p → ω] are equivalent to ( Φ1) ∧ Qrest and ( Φω) ∧ Qrest, respectively. Thus, by Lemma 12 and by the distributivity of ∨ over ∧ it suffices to define Q as Q = ( {μ ≤ M · M | μ ≤ M ∈ Φ1, ω ≤ M ∈ Φω}) ∧ Qrest.

Example 2. Consider Q = (π <sup>f</sup> ≤ π<sup>f</sup> ∧ π ≤ π <sup>x</sup> ∧ π <sup>x</sup> ≤ πx); this is the constraint obtained from λf.λx.*app* f x (Sect. 3.3). Since π <sup>f</sup> and π <sup>x</sup> do not appear in the inferred type (α →<sup>π</sup> β ) →<sup>π</sup><sup>f</sup> α →<sup>π</sup><sup>x</sup> β, we want to eliminate them by the above step. There is a freedom to choose which variable is eliminated first. Here, we shall choose π <sup>f</sup> first.

First, we have elim(∃π <sup>f</sup> .Q) = π ≤ π <sup>x</sup> ∧ π <sup>x</sup> ≤ π<sup>x</sup> because for this case we have Φ<sup>1</sup> = ∅, Φ<sup>ω</sup> = {ω ≤ π<sup>f</sup> }, and Qrest = π ≤ π <sup>x</sup> ∧ π <sup>x</sup> ≤ πx. We then have elim(∃π <sup>x</sup>.π ≤ π <sup>x</sup> ∧ π <sup>x</sup> ≤ πx) = π ≤ π<sup>x</sup> because for this case we have Φ<sup>1</sup> = {π ≤ 1}, Φ<sup>2</sup> = {ω ≤ πx}, and Qrest = .

In the worst case, the size of elim(∃π.Q) can be quadratic to that of Q. Thus, repeating elimination can make the constraints exponentially bigger. We believe that such blow-up rarely happens because it is usual that π occurs only in a few predicates in Q. Also, recall that non-singleton right-hand sides are caused only by multiplicity-parameterized constructors. When each right-hand side of ≤ is a singleton in Q, the same holds in elim(∃π.Q). For such a case, the exponential blow-up cannot happen because the size of constraints in the form is at most quadratic in the number of multiplicity variables.

#### 4.2 Modified Typing Rules

As mentioned at the begging of this section, we perform quantifier elimination as the last step of simplification. To do so, we define Q τ simp C ❀ Q; θ as follows:

$$\frac{Q \vdash\_{\text{simp}} C \sim Q'; \theta \quad \{\overline{\pi}\} = \text{fuv}(Q') \mid \text{fuv}(\tau \theta) \quad Q'' = \text{elim}(\exists \overline{\pi}. Q')}{Q \vdash\_{\text{simp}}^{\tau} C \sim Q''; \theta}$$

Here, τ is used to determine which unification variables will be ambiguous after generalization. We simply identify variables (π above) that are not in τ as ambiguous [15] for simplicity. This check is indeed conservative in a more general definition of ambiguity [27], in which ∀p r a. (p ≤ r, r ≤ p) ⇒ a →<sup>p</sup> a for example is not judged as ambiguous because r is determined by p.

Then, we replace the original simplification with the above-defined version.

$$\begin{array}{c} \Gamma \vdash \begin{array}{c} e : \tau \leadsto \Delta; C \quad \top \vdash \uparrow \vdash\_{\text{simp}}^{\tau} C \leadsto Q; \theta \quad \{\overline{\pi}\overline{\alpha}\} = \mathsf{f} \text{u} \mathsf{v}(Q, \tau \theta) \\\hline \overline{p}, \overline{a} : \text{fresh} \quad \Gamma, f: \forall \overline{p} \overline{a}. (Q \Rightarrow \tau \theta) [\overline{\alpha \mapsto a, \pi \mapsto \overline{p}}] \vdash \mathsf{p} \mathsf{prog} \\\hline \end{array} \\\begin{array}{c} \Gamma \vdash e : \sigma \leadsto \Delta; C \quad Q \mid \mathsf{\vdash\_{\text{simp}}^{\sigma} C \land \tau \sim \sigma \leadsto \top; \theta \quad \Gamma, f: \forall \overline{p} \overline{a}. Q \Rightarrow \tau \vdash \text{pro} \\\hline \end{array} \\\hline \end{array}$$

Here, the changed parts are highlighted for readability.

Example 3. Consider (Q, θ) in Sect. 3.3 such that Q = (π <sup>f</sup> ≤ π<sup>f</sup> ∧π ≤ π <sup>x</sup> ∧π <sup>x</sup> ≤ πx) and θ = [α<sup>f</sup> → (α →<sup>π</sup> β ), π 1 → π <sup>f</sup> , β → (α<sup>f</sup> →<sup>π</sup>- <sup>x</sup> β ), π<sup>2</sup> → π <sup>x</sup>, γ → β ]), which is obtained after simplification of the gathered constraint. Following Example 2, eliminating variables that are not in τθ = (α →<sup>π</sup> β ) →<sup>π</sup><sup>f</sup> α →<sup>π</sup><sup>x</sup> β yields the constraint π ≤ πx. As a result, by generalization, we obtain the polytype

$$\forall q \, p\_f \, p\_x \, a \, b. \ (q \le p\_x) \Rightarrow (a \to\_q b) \to\_{p\_f} a \to\_{p\_x} b$$

for *app* , which is equivalent to the inferred type of *app*.

Note that (Q , θ) of Q τ simp C ❀ Q ; θ is no longer a solution of (Q, C) because C can have eliminated variables. However, it is safe to use this version when generalization takes place, because, for variables q that do not occur in τ , ∀pqa. Q ⇒ τ and ∀pa. Q ⇒ τ have the same set of monomorphic instances, if ∃q.Q is logically equivalent to Q . Note that in this type system simplification happens only before (implicit) generalization takes place.

# 5 Extension to Local Assumptions

In this section, following OutsideIn(X) [31], we extend our system with local assumptions, which enable us to have **let**s and GADTs. We focus on the treatment of **let**s in this section because type inference for **let**s involves a linearity-specific concern: the multiplicity of a **let**-bound variable.

#### 5.1 "Let Should Not Be Generalized" for Our Case

We first discuss that even for our case "**let** should not be generalized" [31]. That is, generalization of **let** sometimes results in counter-intuitive typing and conflicts with the discussions so far.

Consider the following program:

 $h = \lambda f. \lambda k. \mathbf{let}$   $y = f\left(\lambda x. k\right.\right) \text{ in } 0$ 

Suppose for simplicity that f and x have types (a →<sup>π</sup><sup>1</sup> b) →<sup>π</sup><sup>2</sup> c and a →<sup>π</sup><sup>3</sup> b, respectively (here we only focus on the treatment of multiplicity). Then, f (λx.k x) has type c with the constraint π<sup>3</sup> ≤ π1. Thus, after generalization, y has type π<sup>3</sup> ≤ π<sup>1</sup> ⇒ c, where π<sup>3</sup> and π<sup>1</sup> are neither generalized nor eliminated because they escape from the definition of y. As a result, h has type ∀p<sup>1</sup> p<sup>2</sup> p<sup>3</sup> a b c.((a →p<sup>1</sup> b) →p<sup>2</sup> c) →<sup>ω</sup> (a →p<sup>3</sup> b) →<sup>ω</sup> Int; there is no constraint p<sup>3</sup> ≤ p<sup>1</sup> because the definition of y does not yield a constraint. This nonexistence of the constraint would be counter-intuitive because users wrote f (λx.k x) while the constraint for the expression is not imposed. In particular, it does not cause an error even when f : (a →<sup>1</sup> b) →<sup>1</sup> c and k : a →<sup>ω</sup> b, while f (λx.k x) becomes illegal for this case. Also, if we change 0 to y, the error happens at the use site instead of the definition site. Moreover, the type is fragile as it depends on whether y occurs or not; for example, if we change 0 to *const* 0 y where *const* = λa.λb.a, the type of h changes to ∀p<sup>1</sup> p<sup>2</sup> p<sup>3</sup> a b c. p<sup>1</sup> ≤ p<sup>3</sup> ⇒ ((a →<sup>p</sup><sup>1</sup> b) →<sup>p</sup><sup>2</sup> c) →<sup>ω</sup> (a →<sup>p</sup><sup>3</sup> b) →<sup>ω</sup> Int. In this discussion, we do not consider type-equality constraints, but there are no legitimate reasons why type-equality constraints are solved on the fly in typing y.

As demonstrated in the above example, "**let** should not be generalized" [30,31] in our case. Thus, we adopt the same principle in OutsideIn(X) that **let** will be generalized only if users write a type annotation for it [31]. This principle is also adopted in GHC (as of 6.12.1 when the language option MonoLocalBinds is turned on) with a slight relaxation to generalize closed bindings.

#### 5.2 Multiplicity of Let-Bound Variables

Another issue with **let**-generalization, which is specific to linear typing, is that a generalization result depends on the multiplicity of the **let**-bound variable. Let us consider the following program, where we want to generalize the type of y (even without a type annotation):

 $g = \lambda x. \textbf{let}$   $y = \lambda f. f. x$   $\textbf{in}$   $y$   $not$ 

Suppose for simplicity that not has type Bool →<sup>1</sup> Bool and x has type Bool already in typing **let**. Then, y's body λf.f x has a monotype (Bool →<sup>π</sup> r) →<sup>π</sup> r with no constraints (on multiplicity). There are two generalization results depending on the multiplicity π<sup>y</sup> of y because the use of x also escapes in the type system.


A difficulty here is that π<sup>y</sup> needs to be determined at the definition of y, while the constraint on π<sup>y</sup> is only obtained from the use of y.

Our design choice is the latter; the multiplicity of a generalizable **let**-bound variable is ω in the system. One justification for this choice is that a motivation of polymorphic typing is to enhance reusability, while reuse is not possible for variables with multiplicity 1. Another justification is compatibility with recursive definitions, where recursively-defined variables must have multiplicity ω; it might be confusing, for example, if the multiplicity of a list-manipulation function changes after we change its definition from an explicit recursion to *foldr* .

#### 5.3 Inference Rule for Lets

In summary, the following are our criteria about **let** generalization:


This idea can be represented by the following typing rule:

$$\begin{array}{c} \Gamma \vdash e\_1 : \tau\_1 \leadsto \Delta\_1; C\_1 \quad \{\overline{\pi \alpha}\} = \mathsf{fuv}(\tau\_1, C\_1) \mid \mathsf{fuv}(\varGamma) \\\quad C\_1' = \exists \overline{\pi \alpha}. (Q \mid \vdash^{\tau\_1} C\_1 \land \tau \sim \tau\_1) \\\quad \Gamma \theta\_1, x : (\forall \overline{p\alpha}. Q \Rightarrow \tau) \vdash e\_2 : \tau\_2 \leadsto \Delta\_2, x^M; C\_2 \\\hline \Gamma \vdash \mathsf{let} \; x : (\forall \overline{p\alpha}. Q \Rightarrow \tau) = e\_1 \; \mathsf{in} \; e\_2 : \tau\_2 \leadsto \omega \Delta\_1 + \Delta\_2; C\_1' \land C\_2 \end{array} \text{Let} \,\mathsf{A} \,\mathsf{B}$$

(We do not discuss non-generalizable **let** because they are typed as (λx.e2) e1.) Constraints like <sup>∃</sup>πα.(<sup>Q</sup> <sup>|</sup>=<sup>τ</sup><sup>1</sup> <sup>C</sup><sup>1</sup> <sup>∧</sup> <sup>τ</sup> <sup>∼</sup> <sup>τ</sup>1) above are called implication constraints [31], which states that the entailment must hold only by instantiating unification variables in πα. There are two roles of implication constraints. One is to delay the checking because τ<sup>1</sup> and C<sup>1</sup> contain some unification variables that will be made concrete after this point by solving C2. The other is to guard constraints; in the above example, since the constraints C<sup>1</sup> ∧ τ ∼ τ<sup>1</sup> hold by assuming Q, it is not safe to substitute variables outside πα in solving the constraints because the equivalence might be a consequence of Q; recall that Q affects type equality. We note that there is a slight deviation from the original approach [31]; an implication constraint in our system is annotated by τ<sup>1</sup> to identify for which subset of {πα} the existence of a unique solution is not required and thus quantifier elimination is possible, similarly to Sect. 4.

#### 5.4 Solving Constraints

Now, the set of constraints is extended to include implication constraints.

$$C ::= \bigwedge\_i \psi\_i \qquad \psi\_i ::= \cdot \cdot \mid \exists \overline{\pi \alpha}. (Q \mid \vdash^\tau C)$$

As we mentioned above, an implication constraint <sup>∃</sup>πα.(<sup>Q</sup> <sup>|</sup>=<sup>τ</sup> <sup>C</sup>) means that Q |= C must hold by substituting π and α with appropriate values, where we do not require uniqueness of solutions for unification variables that do not appear in τ . That is, Q τ simp C ❀ ; θ must hold with dom(θ) ⊆ {πα}.

Then, following OutsideIn(X) [31], we define the solving judgment πα.Q τ solv C ❀ Q ; θ, which states that we solve (Q, C) as (Q , θ) where θ only touches variables in πα, where τ is used for disambiguation (Sect. 4). Let us write impl(C) for all the implication constraints in C, and simpl(C) for the rest. Then, we can define the inference rules for the judgment simply by recursive simplification, similarly to the original [31].

πα. Q τ simpl simpl(C) ❀ Qr; θ {πiαi. Q ∧ Q<sup>i</sup> ∧ Q<sup>r</sup> τi solv C<sup>i</sup> ❀ ; θi}(∃πiαi.(Qi|=τi <sup>C</sup>i))∈impl(Cθ) πα. Q τ solv C ❀ Qr; θ

Here, πα. Q τ simpl C ❀ Qr; θ is a simplification relation defined similarly to Q τ simp C ❀ Qr; θ except that we are allowed to touch only variables in πα. We omit the concrete rules for this version of simplification relation because they are straightforward except that unification caused by S-Uni and S-Eq and quantifier elimination (Sect. 4) are allowed only for variables in {πα}.

Accordingly, we also change the typing rules for bindings to use the solving relation instead of the simplification relation.

$$\begin{array}{c} \Gamma \vdash e : \tau \sim \Delta; C \quad \mathsf{fuv}(C, \tau) . \top \vdash\_{\mathsf{solv}}^{\tau} C \sim^{\rho} Q; \theta \quad \{\overline{\pi \alpha}\} = \mathsf{fuv}(Q, \tau \theta) \\\hline \overline{p}, \overline{a} : \mathsf{fresh} \quad \Gamma, f : \mathsf{VPa}. (Q \Rightarrow \tau \theta) [\overline{\alpha \mapsto a, \pi \mapsto \overline{p}}] \blackvdash\_{\mathsf{prog}} \\\hline \Gamma \vdash f = e; \mathsf{prog} \\\Gamma \vdash e : \sigma \sim \Delta; C \quad \mathsf{fuv}(C, \sigma). Q \; \mathsf{\star}^{\sigma}\_{\mathsf{solv}} \; C \land \tau \sim \sigma \sim \top; \theta \quad \Gamma, f : \mathsf{\forall p\overline{a}.} Q \Rightarrow \tau \mathsf{\star} \cdot \mathsf{prog} \\\hline \Gamma \vdash f : (\mathsf{\forall p\overline{a}.} Q \Rightarrow \tau) = e; \mathsf{prog} \end{array}$$

Above, there are no unification variables other than fuv(C, τ ) or fuv(C, σ).

The definition of the solving judgment and the updated inference rules for programs are the same as those in the original OutsideIn(X) [31] except τ for disambiguation. This is one of the advantages of being based on OutsideIn(X).

# 6 Implementation and Evaluation

In this section, we evaluate the proposed inference method using our prototype implementation. We first report what types are inferred for functions from Prelude to see whether or not inferred types are reasonably simple. We then report the performance evaluation that measures efficiency of type inference and the overhead due to entailment checking and quantifier elimination.

#### 6.1 Implementation

The implementation follows the present paper except for a few points. Following the implementation of OutsideIn(X) in GHC, our type checker keeps a natural number, which we call an implication level, corresponding to the depth of implication constraints, and a unification variable also accordingly keeps the implication level at which the variable is introduced. As usual, we represent unification variables by mutable references. We perform unification on the fly by destructive assignment, while unification of variables that have smaller implication levels than the current level is recorded for later checking of implication constraints; such a variable cannot be in πα of <sup>∃</sup>πα.Q <sup>|</sup>=<sup>τ</sup> <sup>C</sup>. The implementation supports GADTs because they can be implemented rather easily by extending constraints Q to include type equalities, but does not support type classes because the handling of them requires another X of OutsideIn(X).

Although we can use a linear-time Horn SAT solving algorithm [10] for checking Q |= φ, the implementation uses a general SAT solver based on DPLL [8, 9] because the unit propagation in DPLL works efficiently for Horn formulas. We do not use external solvers, such as Z3, as we conjecture that the sizes of formulas are usually small, and overhead to use external solvers would be high.

```
(◦):(q ≤ s ∧ q ≤ t ∧ p ≤ t) ⇒ (b →q c) →r (a →p b) →s a →t c
     curry :(p ≤ r ∧ p ≤ s) ⇒ ((a ⊗ b) →p c) →q a →r b →s c
  uncurry :(p ≤ s ∧ q ≤ s) ⇒ (a →p b →q c) →r (a ⊗ b) →s c
    either :(p ≤ r ∧ q ≤ r) ⇒ (a →p c) →ω (b →q c) →ω Either a b →r c
     foldr :(q ≤ r ∧ p ≤ s ∧ q ≤ s) ⇒ (a →p b →q b) →ω b →r List a →s b
      foldl :(p ≤ r ∧ r ≤ s ∧ q ≤ s) ⇒ (b →p a →q b) →ω b →r List a →s b
      map :(p ≤ q) ⇒ (a →p b) →ω List a →q List b
     filter :(a →p Bool) →ω List a →ω List a
   append : List a →p List a →q List a
   reverse : List a →p List a
    concat : List (List a) →p List a
concatMap :(p ≤ q) ⇒ (a →p List b) →ω List a →q List b
```
Fig. 7. Inferred types for selected functions from Prelude (quantifications are omitted)

#### 6.2 Functions from Prelude

We show how our type inference system works for some polymorphic functions from Haskell's Prelude. Since we have not implemented type classes and I/O in our prototype implementation and since we can define copying or discarding functions for concrete first-order datatypes, we focus on the unqualified polymorphic functions. Also, we do not consider the functions that are obviously unrestricted, such as *head* and *scanl*, in this examination. In the implementation of the examined functions, we use natural definitions as possible. For example, a linear-time accumulative definition is used for *reverse*. Some functions can be defined by both explicit recursions and *foldr*/*foldl*; among the examined functions, *map*, *filter* , *concat*, and *concatMap* can be defined by *foldr* , and *reverse* can be defined by *foldl*. For such cases, both versions are tested.

Fig. 7 shows the inferred types for the examined functions. Since the inferred types coincide for the two variations (by explicit recursions or by folds) of *map*, *filter* , *append*, *reverse*, *concat*, and *concatMap*, the results do not refer to these variations. Most of the inferred types look unsurprising, considering the fact that the constraint p ≤ q is yielded usually when an input that corresponds to q is used in an argument that corresponds to p. For example, consider *foldr* f e *xs*. The constraint q ≤ r comes from the fact that e (corresponding to r) is passed as the second argument of f (corresponding to q) via a recursive call. The constraint p ≤ s comes from the fact that the head of *xs* (corresponding to s) is used as the first argument of f (corresponding to p). The constraint q ≤ s comes from the fact that the tail of *xs* is used in the second argument of *f* . A little explanation is needed for the constraint r ≤ s in the type of *foldl*, where both r and s are associated with types with the same polarity. Such constraints usually come from recursive definitions. Consider the definition of *foldl*:

$$foldl = \lambda f.\lambda e.\lambda x.\mathsf{case}\ x\ \mathsf{of}\ \{\mathsf{Nil} \to e; \mathsf{Cons}\ a\ y \tofoldl\ f\ (f\ e\ a)\ y\}$$

Here, we find that a, a component of x (corresponding to s), appears in the second argument of *fold* (corresponding to r), which yields the constraint r ≤ s. Note that the inference results do not contain →1; recall that there is no problem in using unrestricted inputs linearly, and thus the multiplicity of a linear input can be arbitrary. The results also show that the inference algorithm successfully detected that *append*, *reverse*, and *concat* are linear functions.

It is true that these inferred types indeed leak some internal details into their constraints, but those constraints can be understood only from their extensional behaviors, at least for the examined functions. Thus, we believe that the inferred types are reasonably simple.

### 6.3 Performance Evaluation

We measured the elapsed time for type checking and the overhead of implication checking and quantifier elimination. The following programs were examined in the experiments: funcs: the functions in Fig. 7, gv: an implementation of a simple


communication in a session-type system GV [17] taken from [18, Section 4] with some modifications,<sup>6</sup> app1: a pair of the definitions of *app* and *app* , and app10: a pair of the definitions of *app* and *app10* = λf.λx. *app* ... *app* <sup>10</sup> f x. The former

two programs are intended to be miniatures of typical programs. The latter two programs are intended to measure the overhead of quantifier elimination. Although the examined programs are very small, they all involve the ambiguity issues. For example, consider the following fragment of the program gv:

```
answer : Int = fork prf calculator $ \c -> left c & \c ->
               send (MkUn 3) c & \c -> send (MkUn 4) c & \c ->
               recv c & \(MkUn z, c) -> wait c & \() -> MkUn z
```
(Here, we used our paper's syntax instead of that of the actual examined code.) Here, both \$ and & are operator versions of *app*, where the arguments are flipped in &. As well as treatment of multiplicities, the disambiguation is crucial for this expression to have type Int.

The experiments were conducted on a MacBook Pro (13-inch, 2017) with Mac OS 10.14.6, 3.5 GHz Intel Core i7 CPU, and 16 GB memory. GHC 8.6.5 with -O2 was used for compiling our prototype system.

Table 1 lists the experimental results. Each elapsed time is the average of 1,000 executions for the first two programs, and 10,000 executions for the last two. All columns are self-explanatory except for the # column, which counts the number of

<sup>6</sup> We changed the type of fork : Dual s s <sup>→</sup><sup>ω</sup> (Ch <sup>s</sup> <sup>→</sup><sup>1</sup> Ch End) <sup>→</sup><sup>1</sup> (Ch <sup>s</sup> <sup>→</sup><sup>1</sup> Un r) →<sup>1</sup> r, as their type Dual s s ⇒ (Ch s →<sup>1</sup> Ch End) →<sup>1</sup> Ch s is incorrect for the multiplicity erasing semantics. A minor difference is that we used a GADT to witness duality because our prototype implementation does not support type classes.

executions of corresponding procedures. We note that the current implementation restricts <sup>Q</sup><sup>w</sup> in S-Entail to be and removes redundant constraints afterward. This is why the number of SAT solving in app1 is four instead of two. For the artificial programs (app1 and app10), the overhead is not significant; typing cost grows faster than SAT/QE costs. In contrast, the results for the latter two show that SAT becomes heavy for higher-order programs (funcs), and quantifier elimination becomes heavy for combinator-heavy programs (gv), although we believe that the overhead would still be acceptable. We believe that, since we are currently using naive algorithms for both procedures, there is much room to reduce the overhead. For example, if users annotate most general types, the simplification invokes trivial checks <sup>i</sup> φ<sup>i</sup> |= φ<sup>i</sup> often. Special treatment for such cases would reduce the overhead.

# 7 Related Work

Borrowing the terminology from Bernardy et al. [7], there are two approaches to linear typing: linearity via arrows and linearity via kinds. The former approaches manage how many times an assumption (i.e., a variable) can be used; for example, in Wadler [33]'s linear λ calculus, there are two sort of variables: linear and unrestricted, where the latter variables can only be obtained by decomposing **let** !x = e<sup>1</sup> **in** e2. Since primitive sources of assumptions are arrow types, it is natural to annotate them with arguments' multiplicities [7,12,22]. For multiplicities, we focused on 1 and ω following Linear Haskell [6, 7, 26]. Although {1, ω} would already be useful for some domains including reversible computation [19, 35] and quantum computation [2, 25], handling more general multiplicities, such as {0, 1, ω} and arbitrary semirings [12], is an interesting future direction. Our discussions in Sect. 2 and 3, similarly to Linear Haskell [7], could be extended to more general domains with small modifications. In contrast, we rely on the particular domains {1, ω} of multiplicities for the crucial points of our inference, i.e., entailment checking and quantifier elimination. Igarashi and Kobayashi [14]'s linearity analysis for π calculus, which assigns input/output usage (multiplicities) to channels, has similarity to linearity via arrows. Multiplicity 0 is important in their analysis to identify input/output only channels. They solve constraints on multiplicities separately in polynomial time, leveraging monotonicity of multiplicity operators with respect to ordering 0 ≤ 1 ≤ ω. Here, 0 ≤ 1 comes from the fact that 1 in their system means "at-most once" instead of "exactly once".

The "linearity via kinds" approaches distinguish types of which values are treated linearly and types of which values are not [21,24,28], where the distinction usually is represented by kinds [21, 28]. Interestingly, they also have two function types—function types that belong to the linear kind and those that belong to the unrestricted kind—because the kind of a function type cannot be determined solely by the argument and return types. Mazurak et al. [21] use subkinding to avoid explicit conversions from unrestricted values to linear ones. However, due to the variations of the function types, a function can have multiple incompatible types; e.g., the function *const* can have four incompatible types [24] in the system. Universal types accompanied by kind abstraction [28] address the issue to some extent; it works well for *const*, but still gives two incomparable types to the function composition (◦) [24]. Morris [24] addresses this issue of principality with qualified typing [15]. Two forms of predicates are considered in the system: Un τ states that τ belongs to the unrestricted kind, and σ ≤ τ states that Un σ implies Un τ . This system is considerably simple compared with the previous systems. Turner et al. [29]'s type-based usage analysis has a similarity to the linearity via kinds; in the system, each type is annotated by usage (a multiplicity) as (List Int<sup>ω</sup>)<sup>ω</sup>. Wansbrough and Peyton Jones [34] extends the system to include polymorphic types and subtyping with respect to multiplicities, and have discussions on multiplicity polymorphism. Mogensen [23] is a similar line of work, which reduces constraint solving on multiplicities to Horn SAT. His system concerns multiplicities {0, 1, ω} with ordering 0 ≤ 1 ≤ ω, and his constraints can involve more operations including additions and multiplications but only in the left-hand side of ≤.

Morris [24] uses improving substitutions [16] in generalization, which sometimes are effective for removing ambiguity, though without showing concrete algorithms to find them. In our system, as well as S-Eq, elim(∃π.Q) can be viewed as a systematic way to find improving substitutions. That is, elim(∃π.Q) improves Q by substituting π with min{M<sup>i</sup> | ω ≤ M<sup>i</sup> ∈ Φω}, i.e., the largest possible candidate of π. Though the largest solution is usually undesirable, especially when the right-hand sides of ≤ are all singletons, we can also view that elim(∃π.Q) substitutes π by <sup>μ</sup>i≤1∈Φ<sup>1</sup> <sup>μ</sup>i, i.e., the smallest possible candidate.

# 8 Conclusion

We designed a type inference system for a rank 1 fragment of λ<sup>q</sup> <sup>→</sup> [7] that can infer principal types based on the qualified typing system OutsideIn(X) [31]. We observed that naive qualified typing infers ambiguous types often and addressed the issue based on quantifier elimination. The experiments suggested that the proposed inference system infers principal types effectively, and the overhead compared with unrestricted typing is acceptable, though not negligible.

Since we based our work on the inference algorithm used in GHC, the natural expectation is to implement the system into GHC. A technical challenge to achieve this is combining the disambiguation techniques with other sorts of constraints, especially type classes, and arbitrarily ranked polymorphism.

#### Acknowledgments

We thank Meng Wang, Atsushi Igarashi, and the anonymous reviewers of ESOP 2020 for their helpful comments on the preliminary versions of this paper. This work was partially supported by JSPS KAKENHI Grant Numbers 15H02681 and 19K11892, JSPS Bilateral Program, Grant Number JPJSBP120199913, the Kayamori Foundation of Informational Science Advancement, and EPSRC Grant EXHIBIT: Expressive High-Level Languages for Bidirectional Transformations (EP/T008911/1).

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **RustHorn: CHC-based Verification for Rust Programs***-*

Yusuke Matsushita<sup>1</sup> , Takeshi Tsukada<sup>1</sup> , and Naoki Kobayashi<sup>1</sup>

The University of Tokyo, Tokyo, Japan {yskm24t,tsukada,koba}@is.s.u-tokyo.ac.jp

**Abstract.** Reduction to the satisfiablility problem for constrained Horn clauses (CHCs) is a widely studied approach to automated program verification. The current CHC-based methods for pointer-manipulating programs, however, are not very scalable. This paper proposes a novel translation of pointer-manipulating Rust programs into CHCs, which clears away pointers and heaps by leveraging ownership. We formalize the translation for a simplified core of Rust and prove its correctness. We have implemented a prototype verifier for a subset of Rust and confirmed the effectiveness of our method.

# **1 Introduction**

Reduction to *constrained Horn clauses (CHCs)* is a widely studied approach to automated program verification [22,6]. A CHC is a Horn clause [30] equipped with constraints, namely a formula of the form ϕ ⇐= ψ<sup>0</sup> ∧···∧ ψ<sup>k</sup>−1, where ϕ and ψ0,...,ψ<sup>k</sup>−<sup>1</sup> are either an atomic formula of the form f(t0,...,t<sup>n</sup>−1) (f is <sup>a</sup> *predicate variable* and <sup>t</sup>0,...,t<sup>n</sup>−<sup>1</sup> are terms), or a constraint (e.g. a<b+ 1).<sup>1</sup> We call a finite set of CHCs a *CHC system* or sometimes just CHC. *CHC solving* is an act of deciding whether a given CHC system S has a *model*, i.e. a valuation for predicate variables that makes all the CHCs in S valid. A variety of program verification problems can be naturally reduced to CHC solving.

For example, let us consider the following C code that defines McCarthy's 91 function.

```
int mc91(int n) {
  if (n > 100) return n - 10; else return mc91(mc91(n + 11));
}
```
Suppose that we wish to prove mc91(n) returns 91 whenever n ≤ 101 (if it terminates). The wished property is equivalent to the satisfiability of the following CHCs, where *Mc91* (n, r) means that mc91(n) returns r if it terminates.

*Mc91* (n, r) ⇐= n > 100 ∧ r = n − 10

<sup>-</sup>The full version of this paper is available as [47].

<sup>1</sup> Free variables are universally quantified. Terms and variables are governed under sorts (e.g. int, bool), which are made explicit in the formalization of § 3.

$$\begin{array}{l}Mc91(n,r) \iff n \le 100 \land Mc91(n+11,res') \land Mc91(res',r)\\r = 91 \iff n \le 101 \land Mc91(n,r) \end{array}$$

The property can be verified because this CHC system has a model:

*Mc91* (n, r) :⇐⇒ r = 91 ∨ (n > 100 ∧ r = n − 10).

A CHC solver provides a common infrastructure for a variety of programming languages and properties to be verified. There have been effective CHC solvers [40,18,29,12] that can solve instances obtained from actual programs<sup>2</sup> and many program verification tools [23,37,25,28,38,60] use a CHC solver as a backend.

However, the current CHC-based methods do not scale very well for programs using *pointers*, as we see in § 1.1. We propose a novel method to tackle this problem for pointer-manipulating programs under *Rust-style ownership*, as we explain in § 1.2.

#### **1.1 Challenges in Verifying Pointer-Manipulating Programs**

The standard CHC-based approach [23] for pointer-manipulating programs represents the memory state as an *array*, which is passed around as an argument of each predicate (cf. the *store-passing style*), and a pointer as an index.

For example, a pointer-manipulating variation of the previous program

```
void mc91p(int n, int* r) {
  if (n > 100) *r = n - 10;
  else { int s; mc91p(n + 11, &s); mc91p(s, r); }
}
```
is translated into the following CHCs by the array-based approach:<sup>3</sup>

$$\begin{array}{l} Mc91p(n,r,h,h') \iff n > 100 \land h' = h\{r \gets n - 10\} \\\ Mc91p(n,r,h,h') \iff n \le 100 \land Mc91p(n+11,s,h,h'') \\\ \qquad \qquad \qquad \land \quad Mc91p(h''[s],r,h'',h') \\\ h'[r] = 91 \iff n \le 101 \land Mc91p(n,r,h,h'). \end{array}$$

*Mc91p* additionally takes two arrays *h*, *h* representing the (heap) memory states before/after the call of mc91p. The second argument r of *Mc91p*, which corresponds to the pointer argument r in the original program, is an index for the arrays. Hence, the assignment \*r = n - 10 is modeled in the first CHC as an update of the r-th element of the array. This CHC system has a model

*Mc91p*(n, r, h, h- ) :⇐⇒ h- [r] = 91 ∨ (n > 100 ∧ h- [r] = n − 10), which can be found by some array-supporting CHC solvers including Spacer [40], thanks to evolving SMT-solving techniques for arrays [62,10].

However, the array-based approach has some shortcomings. Let us consider, for example, the following innocent-looking code.<sup>4</sup>

<sup>2</sup> For example, the above CHC system on Mc91 can be solved instantly by many CHC solvers including Spacer [40] and HoIce [12].

<sup>3</sup> h{r <sup>←</sup> v} is the array made from h by replacing the value at index r with v. h[r] is the value of array <sup>h</sup> at index <sup>r</sup>. <sup>4</sup> rand() is a non-deterministic function that can return any integer value.

486 Y. Matsushita et al.

```
bool just_rec(int* ma) {
  if (rand() >= 0) return true;
  int old_a = *ma; int b = rand(); just_rec(&b);
  return (old_a == *ma);
}
```
It can immediately return true; or it recursively calls itself and checks if the target of ma remains unchanged through the recursive call. In effect this function *does nothing* on the allocated memory blocks, although it can possibly modify some of the unused parts of the memory.

Suppose we wish to verify that just\_rec never returns false. The standard CHC-based verifier for C, SeaHorn [23], generates a CHC system like below:<sup>56</sup>

$$\begin{array}{l}JustRec(ma,h,h',r) \Longleftarrow \ h'=h \land r=\text{true} \\JustRec(ma,h,h',r) \Longleftarrow \ m b \neq ma \land h''=h\{mb \leftarrow b\} \\ \wedge \ JustRec(mb,h'',h',r') \land r=(h[ma] == h'[ma]) \\ r=\text{true} \Longleftarrow \ \mathit{JustRec}(ma,h,h',r) \end{array}$$

Unfortunately the CHC system above is *not* satisfiable and thus SeaHorn issues a false alarm. This is because, in this formulation, *mb* may not necessarily be completely fresh; it is assumed to be different from the argument *ma* of the current call, but may coincide with *ma* of some deep ancestor calls.<sup>7</sup>

The simplest remedy would be to explicitly specify the way of memory allocation. For example, one can represent the memory state as a pair of an array h and an index *sp* indicating the maximum index that has been allocated so far.

$$\begin{aligned} &JustRec\_+(ma, h, sp, h', sp', r) \iff h' = h \land sp' = sp \land r = \text{true} \\ &JustRec\_+(ma, h, sp, h', sp', r) \iff mb = sp'' = sp + 1 \land h'' = h \{ mb \gets b \} \\ &JustRec\_+(mb, h'', sp'', h', sp', r') \land r = (h [ma] == h' [ma]) \\ &r = \text{true} \iff JustRec\_+(ma, h, sp, h', sp', r) \land ma \le sp \end{aligned}$$

The resulting CHC system now has a model, but it involves quantifiers:

*JustRec*+(*ma*, h, *sp*, h- , *sp*- , r) :⇐⇒ r = true ∧ ∀ i ≤ *sp*. h[i] = h- [i]

Finding quantified invariants is known to be difficult in general despite active studies on it [41,2,36,26,19] and most current array-supporting CHC solvers give up finding quantified invariants. In general, much more complex operations on pointers can naturally take place, which makes the universally quantified invariants highly involved and hard to automatically find. To avoid complexity of models, CHC-based verification tools [23,24,37] tackle pointers by pointer analysis [61,43]. Although it does have some effects, the current applicable scope of pointer analysis is quite limited.

<sup>5</sup> ==, !=, >=, && denote binary operations that return boolean values. <sup>6</sup> We omitted the allocation for old\_a for simplicity.

<sup>7</sup> Precisely speaking, SeaHorn tends to even omit shallow address-freshness checks like mb = ma.

#### **1.2 Our Approach: Leverage Rust's Ownership System**

This paper proposes a novel approach to CHC-based verification of pointermanipulating programs, which makes use of *ownership* information to avoid an explicit representation of the memory.

*Rust-style Ownership.* Various styles of *ownership/permission/capability* have been introduced to control and reason about usage of pointers on programming language design, program analysis and verification [13,31,8,31,9,7,64,63]. In what follows, we focus on the ownership in the style of the Rust programming language [46,55].

Roughly speaking, the ownership system guarantees that, for each memory cell and at each point of program execution, either (i) only one alias has the *update* (write & read) permission to the cell, with any other alias having *no* permission to it, or (ii) some (or no) aliases have the *read* permission to the cell, with no alias having the update permission to it. In summary, *when an alias can read some data* (with an update/read permission), *any other alias cannot modify the data*.

As a running example, let us consider the program below, which follows Rust's ownership discipline (it is written in the C style; the Rust version is presented at Example 1):

```
int* take_max(int* ma, int* mb) {
 if (*ma >= *mb) return ma; else return mb;
}
bool inc_max(int a, int b) {
 {
   int* mc = take_max(&a, &b); // borrow a and b
   *mc += 1;
 } // end of borrow
 return (a != b);
}
```
Figure 1 illustrates which alias has the update permission to the contents of a and b during the execution of take\_max(5,3).

A notable feature is *borrow*. In the running example, when the pointers &a and &b are taken for take\_max, the *update permissions* of a and b are *temporarily transferred* to the pointers. The original variables, a and b, *lose the ability to access their contents* until the end of borrow. The function take\_max returns a pointer having the update permission until the end of borrow, which justifies the *update operation* \*mc += 1. In this example, the end of borrow is at the end of the inner block of inc\_max. At this point, *the permissions are given back* to the original variables a and b, allowing to compute a != b. Note that mc can point to a and also to b and that this choice is determined *dynamically*. The values of a and b after the borrow *depend on the behavior of the pointer* mc.

The end of each borrow is statically managed by a *lifetime*. See §2 for a more precise explanation of ownership, borrow and lifetimes.

**Fig. 1.** Values and aliases of a and b in evaluating inc\_max(5,3). Each line shows each variable's permission timeline: a solid line expresses the update permission and a bullet shows a point when the borrowed permission is given back. For example, b has the update permission to its content during (i) and (iv), but not during (ii) and (iii) because the pointer mb, created at the call of take\_max, borrows b until the end of (iii).

*Key Idea.* The key idea of our method is to *represent a pointer* ma *as a pair* a, a◦ *of the current target value* a *and the target value* a◦ *at the end of borrow*. <sup>89</sup> This representation employs *access to the future information* (it is related to *prophecy variables*; see § 5). This simple idea turns out to be very powerful.

In our approach, the verification problem "Does inc\_max always return true?" is reduced to the satisfiability of the following CHCs:

$$\begin{aligned} \textit{TakeMax}(\langle a, a\_{\diamond} \rangle, \langle b, b\_{\diamond} \rangle, r) & \Longleftarrow \ a \ge b \land b\_{\diamond} = b \land r = \langle a, a\_{\diamond} \rangle \\ \textit{TakeMax}(\langle a, a\_{\diamond} \rangle, \langle b, b\_{\diamond} \rangle, r) & \Longleftarrow \ a < b \land a\_{\diamond} = a \land r = \langle b, b\_{\diamond} \rangle \\ \textit{IncMax}(a, b, r) & \Longleftarrow \ \textit{TakeMax}(\langle a, a\_{\diamond} \rangle, \langle b, b\_{\diamond} \rangle, \langle c, c\_{\diamond} \rangle) \land c' = c + 1 \\ & \land c\_{\diamond} = c' \land r = \langle a\_{\diamond} \rangle = b\_{\diamond}) \\ r = \textit{true} & \Longleftarrow \mathit{IncMax}(a, b, r) . \end{aligned}$$

The mutable reference ma is now represented as a, a◦, and similarly for mb and mc. The first CHC models the then-clause of take\_max: the return value is ma, which is expressed as r = a, a◦; in contrast, mb is released, which *constrains* b◦, the value of b at the end of borrow, to the current value b. In the clause on *IncMax* , *mc* is represented as a pair c, c◦. The constraint c- = c + 1 ∧ c◦ = c- models the increment of mc (in the phase (iii) in Fig. 1). Importantly, the final check a != b is simply expressed as a◦ != b◦; the updated values of a/b are available as a◦/b◦. Clearly, the CHC system above has a simple model.

Also, the just\_rec example in § 1.1 can be encoded as a CHC system

$$\begin{array}{c} JustRec(\langle a, a\_{\diamondsuit} \rangle, r) \Longleftarrow \ a\_{\diamondsuit} = a \land r = \text{true} \\ JustRec(\langle a, a\_{\diamondsuit} \rangle, r) \Longleftarrow mb = \langle b, b\_{\diamondsuit} \rangle \land JustRec(mb, r') \\ \land a\_{\diamondsuit} = a \land r = \langle a == a\_0 \rangle \end{array}$$

<sup>8</sup> Precisely, this is the representation of a pointer with a borrowed update permission

<sup>(</sup>i.e. mutable reference). Other cases are discussed in § 3. <sup>9</sup> For example, in the case of Fig. 1, when take\_max is called, the pointer ma is 5, <sup>6</sup> and mb is 3, <sup>3</sup>.

$$r = \text{true} \iff \text{JustRec}(\langle a, a\_\diamond \rangle, r).$$

Now it has a simple model: *JustRec*( a, a◦, r) :⇐⇒ r = true ∧ a◦ = a. Remarkably, arrays and quantified formulas are not required to express the model, which allows the CHC system to be easily solved by many CHC solvers. More advanced examples are presented in § 3.4, including one with destructive update on a singly-linked list.

*Contributions.* Based on the above idea, we formalize the translation from programs to CHC systems for a core language of Rust, prove correctness (both soundness and completeness) of the translation, and confirm the effectiveness of our approach through preliminary experiments. The core language supports, among others, recursive types. Remarkably, our approach enables us to automatically verify some properties of a program with destructive updates on recursive data types such as lists and trees.

The rest of the paper is structured as follows. In § 2, we provide a formalized core language of Rust supporting recursions, lifetime-based ownership and recursive types. In §3, we formalize our translation from programs to CHCs and prove its correctness. In § 4, we report on the implementation and the experimental results. In § 5 we discuss related work and in § 6 we conclude the paper.

# **2 Core Language: Calculus of Ownership and Reference**

We formalize a core of Rust as *Calculus of Ownership and Reference (COR)*, whose design has been affected by the safe layer of λRust in the RustBelt paper [32]. It is a typed procedural language with a Rust-like ownership system.

#### **2.1 Syntax**

The following is the syntax of COR.

(program) <sup>Π</sup> ::= <sup>F</sup><sup>0</sup> ··· <sup>F</sup>n−<sup>1</sup> (function definition) <sup>F</sup> ::= fn f Σ {L<sup>0</sup>: <sup>S</sup><sup>0</sup> ··· <sup>L</sup>n−<sup>1</sup>: <sup>S</sup>n−<sup>1</sup>} (function signature) <sup>Σ</sup> ::= α<sup>0</sup>,...,αm−<sup>1</sup> <sup>|</sup> <sup>α</sup>a<sup>0</sup> <sup>≤</sup>αb<sup>0</sup> ,...,αa*l*−<sup>1</sup> <sup>≤</sup>αb*l*−<sup>1</sup> (x<sup>0</sup>: <sup>T</sup><sup>0</sup>,...,xn−<sup>1</sup>: <sup>T</sup>n−<sup>1</sup>) <sup>→</sup> <sup>U</sup> (statement) S ::= I; gotoL <sup>|</sup> return x <sup>|</sup> match <sup>∗</sup>x {inj0∗y<sup>0</sup>→gotoL<sup>0</sup>, inj1∗y<sup>1</sup>→gotoL<sup>1</sup>} (instruction) <sup>I</sup> ::= let <sup>y</sup> <sup>=</sup> mutborα <sup>x</sup> <sup>|</sup> drop <sup>x</sup> <sup>|</sup> immut <sup>x</sup> <sup>|</sup> swap(∗x, <sup>∗</sup>y) <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> x <sup>|</sup> let y <sup>=</sup> <sup>∗</sup>x <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> copy <sup>∗</sup>x <sup>|</sup> x as T <sup>|</sup> let <sup>y</sup> <sup>=</sup> <sup>f</sup>α<sup>0</sup>,...,αm−<sup>1</sup>(x<sup>0</sup>,...,xn−<sup>1</sup>) <sup>|</sup> intro α <sup>|</sup> now α <sup>|</sup> α <sup>≤</sup> β <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> const <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> <sup>∗</sup>x op <sup>∗</sup>x- <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> rand() <sup>|</sup> let <sup>∗</sup>y <sup>=</sup> injT0+T<sup>1</sup> i <sup>∗</sup><sup>x</sup> <sup>|</sup> let <sup>∗</sup><sup>y</sup> = (∗x<sup>0</sup>, <sup>∗</sup>x<sup>1</sup>) <sup>|</sup> let(∗y<sup>0</sup>, <sup>∗</sup>y<sup>1</sup>) = <sup>∗</sup><sup>x</sup> (type) T,U ::= <sup>X</sup> <sup>|</sup> μX.T <sup>|</sup> P T <sup>|</sup> <sup>T</sup><sup>0</sup>+T<sup>1</sup> <sup>|</sup> <sup>T</sup><sup>0</sup>×T<sup>1</sup> <sup>|</sup> int <sup>|</sup> unit (pointer kind) <sup>P</sup> ::= own <sup>|</sup> <sup>R</sup>α (reference kind) <sup>R</sup> ::= mut <sup>|</sup> immut

$$\begin{array}{lcl} \alpha, \beta, \gamma ::= \text{(lifetime variable)} & X, Y ::= \text{(type variable)}\\ x, y ::= \text{(variable)} & f, g ::= \text{(function name)} & L ::= \text{(label)}\\ \text{const} ::= n \mid () \quad \textbf{bool} ::= \textbf{unit} + \textbf{unit} \quad op ::= op\_{\text{int}} \mid op\_{\text{bool}}\\ op\_{\text{int}} ::= + \mid - \mid \cdots & op\_{\text{bool}} ::= \text{>} \mid == \mid \neg \mid \cdots \mid \end{array}$$

*Program, Function and Label.* A program (denoted by Π) is a set of function definitions. A function definition (F) consists of a function name, a function signature and a set of labeled statements (L: S). In COR, for simplicity, the input/output types of a function are restricted to *pointer types*. A function is parametrized over lifetime parameters under constraints; polymorphism on types is not supported for simplicity, just as λRust. For the lifetime parameter receiver, often α0, · · · | is abbreviated to α0,... and | is omitted.

A label (L) is an abstract program point to be jumped to by goto. <sup>10</sup> Each label is assigned a *whole context* by the type system, as we see later. This style, with unstructured control flows, helps the formal description of CHCs in §3.2. A function should have the label entry (entry point), and every label in a function should be syntactically reachable from entry by goto jumps.<sup>11</sup>

*Statement and Instruction.* A statement (S) performs an instruction with a jump (I; gotoL), returns from a function (return x), or branches (match ∗x {· · ·}).

An instruction (I) performs an elementary operation: mutable (re)borrow (let y = mutbor<sup>α</sup> x), releasing a variable (drop x), weakening ownership (immut <sup>x</sup>),<sup>12</sup> swap (swap(∗x, <sup>∗</sup>y)), creating/dereferencing a pointer (let <sup>∗</sup><sup>y</sup> <sup>=</sup> <sup>x</sup>, let <sup>y</sup> <sup>=</sup> <sup>∗</sup>x), copy (let <sup>∗</sup><sup>y</sup> <sup>=</sup> copy <sup>∗</sup>x),<sup>13</sup> type weakening (<sup>x</sup> as <sup>T</sup>), function call (let <sup>y</sup> <sup>=</sup> f · · ·(···)), lifetime-related ghost operations (intro α, now α, α ≤ β; explained later), getting a constant / operation result / random integer (let ∗y = *const* / ∗x*op*∗x- / rand()), creating a variant (let <sup>∗</sup><sup>y</sup> <sup>=</sup> inj<sup>T</sup>0+T<sup>1</sup> <sup>i</sup> ∗x), and creating/destructing a pair (let ∗y = (∗x0, ∗x1), let(∗y0, ∗y1) = ∗x). An instruction of form let ∗y = ··· implicitly allocates new memory cells as y; also, some instructions deallocate memory cells implicitly. For simplicity, every variable is designed to be a *pointer* and every *release of a variable* should be explicitly annotated by 'drop x'. In addition, we provide swap instead of assignment; the usual assignment (of copyable data from ∗x to ∗y) can be expressed by let ∗x- = copy ∗x; swap(∗y, ∗x- ); drop x- .

*Type.* As a type (T), we support recursive types (μX.T), pointer types (P T), variant types (T<sup>0</sup> + T1), pair types (T<sup>0</sup> × T1) and basic types (int, unit).

A pointer type P T can be an *owning pointer* own T (Box<T> in Rust), *mutable reference* mut<sup>α</sup> T (&'a mut T) or *immutable reference* immut<sup>α</sup> T (&'a T). An

<sup>10</sup> It is related to a continuation introduced by letcont in <sup>λ</sup>Rust. <sup>11</sup> Here 'syntactically' means that detailed information such that a branch condition on match or non-termination is ignored.

<sup>12</sup> This instruction turns a mutable reference to an immutable reference. Using this, an

immutable borrow from <sup>x</sup> to <sup>y</sup> can be expressed by let <sup>y</sup> <sup>=</sup> mutbor<sup>α</sup> <sup>x</sup>; immut <sup>y</sup>. <sup>13</sup> Copying a pointer (an immutable reference) x to y can be expressed by let <sup>∗</sup>ox <sup>=</sup> x; let <sup>∗</sup>oy <sup>=</sup> copy <sup>∗</sup>ox ; let y <sup>=</sup> <sup>∗</sup>oy.

*owning pointer* has data in the heap memory, can freely update the data (unless it is borrowed), and has the obligation to clean up the data from the heap memory. In contrast, a *mutable/immutable reference* (or *unique/shared reference*) borrows an update/read permission from an owning pointer or another reference with the deadline of a *lifetime* α (introduced later). A mutable reference cannot be copied, while an immutable reference can be freely copied. A reference loses the permission at the time when it is released.<sup>14</sup>

A type T that appears in a program (not just as a substructure of some type) should satisfy the following condition (if it holds we say the type is *complete*): every type variable X in T is bound by some μ and guarded by a pointer constructor (i.e. given a binding of form μX.U, every occurrence of X in U is a part of a pointer type, of form P U- ).

*Lifetime.* A *lifetime* is an *abstract time point in the process of computation*, 15 which is statically managed by *lifetime variables* α. A lifetime variable can be a *lifetime parameter* that a function takes or a *local lifetime variable* introduced within a function. We have three lifetime-related ghost instructions: intro α introduces a new local lifetime variable, now α sets a local lifetime variable to the current moment and eliminates it, and α ≤ β asserts the ordering on local lifetime variables.

**Expressivity and Limitations.** COR can express most borrow patterns in the core of Rust. The set of moments when a borrow is active forms a continuous time range, even under *non-lexical lifetimes* [54].<sup>16</sup>

A major limitation of COR is that it does not support *unsafe code blocks* and also lacks *type traits and closures*. Still, our idea can be combined with unsafe code and closures, as discussed in §3.5. Another limitation of COR is that, unlike Rust and λRust, we *cannot directly modify/borrow a fragment of a variable* (e.g. an element of a pair). Still, we can eventually modify/borrow a fragment by borrowing the whole variable and *splitting pointers* (e.g. 'let(∗y0, ∗y1) = ∗x'). This borrow-and-split strategy, nevertheless, yields a subtle obstacle when we extend the calculus for advanced data types (e.g. get\_default in 'Problem Case #3' from [54]). For future work, we pursue a more expressive calculus modeling Rust and extend our verification method to it.

*Example 1 (COR Program).* The following program expresses the functions take\_max and inc\_max presented in § 1.2. We shorthand sequential executions

<sup>14</sup> In Rust, even after a reference loses the permission and the lifetime ends, its address data can linger in the memory, although dereferencing on the reference is no longer allowed. We simplify the behavior of lifetimes in COR.

<sup>15</sup> In the terminology of Rust, a lifetime often means a time range where a borrow is active. To simplify the discussions, however, we in this paper use the term lifetime to refer to a time point when a borrow ends.

<sup>16</sup> Strictly speaking, this property is broken by recently adopted implicit two-phase borrows [59,53]. However, by shallow syntactical reordering, a program with implicit two-phase borrows can be fit into usual borrow patterns.

```
by ';L' (e.g. L0: I0;
                   L1 I1; gotoL2 stands for L0: I0; gotoL1 L1: I1; gotoL2).17
```

```
fn take-max α(ma: mutα int, mb: mutα int) → mutα int {
  entry: let ∗ord = ∗ma >= ∗mb;
                                   L1 match ∗ord {inj1 ∗ou → goto L2, inj0 ∗ou → goto L5}
  L2: drop ou;
              L3 drop mb;
                          L4 return ma L5: drop ou;
                                                     L6 drop ma;
                                                                 L7 return mb
}
fn inc-max(oa: own int, ob: own int) → own bool {
  entry: intro α;
                L1 let ma = mutborα oa;
                                         L2 let mb = mutborα ob;
                                                                  L3
   let mc = take-maxα(ma, mb);L4 let ∗o1 = 1;L5 let ∗oc-
                                                              = ∗mc + ∗o1 ;
                                                                             L6 drop o1 ;
                                                                                        L7
   swap(mc, oc-

                 );L8 drop oc-

                             ;
                              L9 drop mc;
                                         L10 now α;
                                                   L11 let ∗or = ∗oa != ∗ob;
                                                                             L12
   drop oa;
            L13 drop ob;
                       L14 return or
}
```
In take-max, conditional branching is performed by match and its goto directions (at L1). In inc-max, increment on the mutable reference *mc* is performed by calculating the new value (at L4, L5) and updating the data by swap (at L7).

The following is the corresponding Rust program, with ghost annotations (marked italic and dark green, e.g. drop ma ) on lifetimes and releases of mutable references.

```
fn take_max<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 {
  if *ma >= *mb { drop mb; ma } else { drop ma; mb }
}
fn inc_max(mut a: i32, mut b: i32) -> bool {
  { intro 'a;
    let mc = take_max<'a> (&'a mut a, &'a mut b); *mc += 1;
  drop mc; now 'a; }
  a != b
}
```
# **2.2 Type System**

The type system of COR assigns to each label a *whole context* (**Γ**, **A**). We define below the whole context and the typing judgments.

*Context.* A *variable context* **Γ** is a finite set of items of form x: **<sup>a</sup>** T, where T should be a complete *pointer* type and **a** (which we call *activeness*) is of form 'active' or '†α' (*frozen* until lifetime α). We abbreviate x: active T as x: T. A variable context should not contain two items on the same variable. A *lifetime context* **A** = (A, R) is a finite preordered set of lifetime variables, where A is the underlying set and R is the preorder. We write |**A**| and ≤**<sup>A</sup>** to refer to A and R. Finally, a *whole context* (**Γ**, **A**) is a pair of a variable context **Γ** and a lifetime context **A** such that every lifetime variable in **Γ** is contained in **A**.

<sup>17</sup> The first character of each variable indicates the pointer kind (o/m corresponds to own/mutα). We swap the branches of the match statement in take-max, to fit the order to C/Rust's if.

*Notations.* The set operation A + B (or more generally - <sup>λ</sup> Aλ) denotes the disjoint union, i.e. the union defined only if the arguments are disjoint. The set operation A − B denotes the set difference defined only if A ⊇ B. For a natural number n, [n] denotes the set {0,...,n−1}.

Generally, an auxiliary definition for a rule can be presented just below, possibly in a dotted box.

*Program and Function.* The rules for typing programs and functions are presented below. They assign to each label a whole context (**Γ**, **A**). 'S:Π,f (**Γ**, **A**) | (**Γ**L, **A**L)<sup>L</sup> | U' is explained later.

> for any <sup>F</sup> in Π, F:<sup>Π</sup> (**Γ**name(<sup>F</sup> ),L, **<sup>A</sup>**name(<sup>F</sup> ),L)L∈Label*<sup>F</sup>* <sup>Π</sup>: (**Γ**f,L, **<sup>A</sup>**f,L)(f,L) <sup>∈</sup> FnLabel*<sup>Π</sup>*

name(F): the function name of <sup>F</sup> LabelF : the set of labels in <sup>F</sup> FnLabelΠ: the set of pairs (f,L) such that a function <sup>f</sup> in <sup>Π</sup> has a label <sup>L</sup>

<sup>F</sup> <sup>=</sup> fn <sup>f</sup>α<sup>0</sup>,...,αm−<sup>1</sup> <sup>|</sup>αa<sup>0</sup> <sup>≤</sup>αb<sup>0</sup> ,...,αa*l*−<sup>1</sup> <sup>≤</sup>αb*l*−<sup>1</sup> (x<sup>0</sup>: <sup>T</sup><sup>0</sup>,...,xn−<sup>1</sup>: <sup>T</sup>n−<sup>1</sup>)→<sup>U</sup> {· · ·} **<sup>Γ</sup>**entry <sup>=</sup> {xi: <sup>T</sup>i <sup>|</sup>i∈[n]} <sup>A</sup> <sup>=</sup> {αj <sup>|</sup> <sup>j</sup> <sup>∈</sup>[m]} **<sup>A</sup>**entry <sup>=</sup> - A, - Id<sup>A</sup> ∪{(αa*<sup>k</sup>* , αb*<sup>k</sup>* )|k∈[l]} <sup>+</sup> for any L- : <sup>S</sup> <sup>∈</sup> LabelStmt<sup>F</sup> , S:Π,f (**Γ**L- , **A**L- ) <sup>|</sup> (**Γ**L, **<sup>A</sup>**L)L∈Label*<sup>F</sup>* <sup>|</sup> <sup>U</sup> <sup>F</sup>:<sup>Π</sup> (**Γ**L, **<sup>A</sup>**L)L∈Label*<sup>F</sup>*

LabelStmtF : the set of labeled statements in <sup>F</sup> IdA: the identity relation on A R<sup>+</sup>: the transitive closure of <sup>R</sup>

On the rule for the function, the initial whole context at entry is specified (the second and third preconditions) and also the contexts for other labels are checked (the fourth precondition). The context for each label (in each function) can actually be determined in the order by the distance in the number of goto jumps from entry, but that order is not very obvious because of *unstructured control flows*.

*Statement.* 'S:Π,f (**Γ**, **A**) | (**Γ**L, **A**L)<sup>L</sup> | U' means that running the statement S (under Π, f) with the whole context (**Γ**, **A**) results in a jump to a label with the whole contexts specified by (**Γ**L, **A**L)<sup>L</sup> or a return of data of type U. Its rules are presented below. 'I:Π,f (**Γ**, **A**) → (**Γ**- , **A**- )' is explained later.

$$\begin{array}{c} I \colon\_{\Pi, f} (\Gamma, \mathbf{A}) \to (\Gamma\_{L\_0}, \mathbf{A}\_{L\_0})\\ \hline I \colon \mathbf{g} \text{to} \, L\_0 \colon\_{\Pi, f} (\Gamma, \mathbf{A}) \mid (\Gamma\_L, \mathbf{A}\_L)\_L \mid U \end{array} \quad \begin{array}{c} \Gamma = \{x \colon U\} \quad |\mathbf{A}| = A\_{\text{ex } \Pi, f} \\ \hline \mathtt{return } x \colon\_{\Pi, f} (\Gamma, \mathbf{A}) \mid (\Gamma\_L, \mathbf{A}\_L)\_L \mid U \end{array}$$
 
$$A\_{\text{ex } \Pi, f} \colon \text{the set of lifetime parameters of } f \text{ in } \Pi$$

<sup>x</sup>: <sup>P</sup> (T<sup>0</sup>+T<sup>1</sup>) <sup>∈</sup> **Γ** for <sup>i</sup> = 0, <sup>1</sup>, (**Γ**L*<sup>i</sup>* , **<sup>A</sup>**L*<sup>i</sup>* )=(**Γ**−{x: <sup>P</sup> (T<sup>0</sup>+T<sup>1</sup>)}+{yi: P Ti}, **<sup>A</sup>**) match <sup>∗</sup><sup>x</sup> {inj<sup>0</sup> <sup>∗</sup>y<sup>0</sup> <sup>→</sup> gotoL<sup>0</sup>, inj<sup>1</sup> <sup>∗</sup>y<sup>1</sup> <sup>→</sup> gotoL<sup>1</sup>}:Π,f (**Γ**, **<sup>A</sup>**) <sup>|</sup> (**Γ**L, **<sup>A</sup>**L)<sup>L</sup> <sup>|</sup> <sup>U</sup>

The rule for the return statement ensures that there remain no extra variables and local lifetime variables.

*Instruction.* 'I:Π,f (**Γ**, **A**) → (**Γ**- , **A**- )' means that running the instruction I (under Π, f) updates the whole context (**Γ**, **A**) into (**Γ**- , **A**- ). The rules are designed so that, for any I, Π, f, (**Γ**, **A**), there exists at most one (**Γ**- , **A**- ) such that I:Π,f (**Γ**, **A**) → (**Γ**- , **A**- ) holds. Below we present some of the rules; the complete rules are presented in the full paper. The following is the typing rule for mutable (re)borrow.

$$\frac{\alpha \notin A\_{\texttt{ox}\,\,\Pi, f} \quad P = \texttt{own}, \texttt{mut}\_{\alpha} \quad \text{for any } \beta \in \texttt{Lifetime}\_{P\,\,T}, \; \alpha \leq\_{\mathbf{A}} \beta}{\texttt{let}\,\,y = \texttt{mut}\_{\alpha} \,x : \, \_{\mathbf{T},f}\left(\Gamma + \{x \colon P\,\,T\}, \, \mathbf{A}\right) \to \left(\Gamma + \{y \colon \texttt{mut}\_{\alpha} \,T, \, x \cdot \uparrow^{\alpha} \,P \,\,T\}\right), \, \mathbf{A}\rangle}$$
 
$$\text{Lifetime}\_{T} \colon \text{the set of lifetime variables occurring in } T$$

After you mutably (re)borrow an owning pointer / mutable reference x until α, x is *frozen* until α. Here, α should be a local lifetime variable<sup>18</sup> (the first precondition) that does not live longer than the data of x (the third precondition). Below are the typing rules for local lifetime variable introduction and elimination.

$$\begin{aligned} \mathsf{intro}\,\alpha:\_{\Pi,f}\left(\Gamma,(A,R)\right) &\to \left(\Gamma,\left(\{\alpha\}+A,\{\alpha\}\times\{\{\alpha\}+A\_{\text{ex}\,\Pi,f}\right)+R\right) \\ \dfrac{\alpha\notin A\_{\text{ex}\,\Pi,f}}{\mathsf{now}\,\alpha:\_{\Pi,f}\left(\Gamma,\left(\{\alpha\}+A,R\right)\right) &\to \left(\{\text{thaw}\_{\alpha}(x;\mathsf{a}^{\mathsf{a}}T)\mid x: ^{\mathsf{a}}T\in\Gamma\},\left(A,\{\left(\beta,\gamma\right)\in R\mid\beta\neq\alpha\}\right)\right) \\ \text{thaw}\_{\alpha}(x;\mathsf{a}^{\mathsf{a}}T) &:= \begin{cases} x:T & \left(\mathsf{a} = \uparrow\alpha\right) \\ x:\mathsf{a}^{\mathsf{a}}T & \text{(otherwise)} \end{cases} \end{aligned}$$

On intro α, it just ensures the new local lifetime variable to be earlier than any lifetime parameters (which are given by exterior functions). On now α, the variables frozen with α get active again. Below is the typing rule for dereference of a pointer to a pointer, which may be a bit interesting.

$$\begin{array}{lcl} \mathtt{let}\ y = \*x:\_{\mathtt{T},f}\left(\Gamma + \{x:P\,P'\,T\},\,\mathbf{A}\right) \to \left(\Gamma + \{y:\left(P\circ P'\right)\,T\},\,\mathbf{A}\right)\\ \mathtt{if}\ P\circ\mathtt{on}\mathtt{m} = \mathtt{own}\circ P := P \quad R\_{\alpha}\circ R'\_{\beta} := R''\_{\alpha} \text{ where } R'' = \begin{cases} \mathtt{mut} & \left(R = R' = \mathtt{mut}\right) \\ \mathtt{immut} & \left(\mathtt{otherwise}\right) \end{cases} \end{array}$$

The third precondition of the typing rule for mutbor justifies taking just α in the rule 'R<sup>α</sup> ◦ R- <sup>β</sup> := R-- α'.

Let us interpret Π: (**Γ**f,L, **A**f,L)(f,L) <sup>∈</sup> FnLabel*<sup>Π</sup>* as "the program Π has the type (**Γ**f,L, **A**f,L)(f,L) <sup>∈</sup> FnLabel*<sup>Π</sup>* ". The type system ensures that any program has at most one type (which may be a bit unclear because of unstructured control flows). Hereinafter, we implicitly assume that a program has a type.

#### **2.3 Concrete Operational Semantics**

We introduce for COR *concrete operational semantics*, which handles a concrete model of the heap memory.

The basic item, *concrete configuration* **C**, is defined as follows.

**S** ::= end [f,L] x, **<sup>F</sup>**; **<sup>S</sup>** (concrete configuration) **<sup>C</sup>** ::= [f,L] **<sup>F</sup>**; **<sup>S</sup>** <sup>|</sup> **<sup>H</sup>**

Here, **H** is a *heap*, which maps addresses (represented by integers) to integers (data). **F** is a *concrete stack frame*, which maps variables to addresses. The stack

<sup>18</sup> In COR, a reference that lives after the return from the function should be created by splitting a reference (e.g. 'let(∗y<sup>0</sup>, <sup>∗</sup>y<sup>1</sup>) = <sup>∗</sup>x') given in the inputs; see also Expressivity and Limitations.

part of **C** is of form '[f,L] **F**; [f- , L- ] x, **F**- ; ··· ; end' (we may omit the terminator '; end'). [f,L] on each stack frame indicates the program point. 'x,' on each nontop stack frame is the receiver of the value returned by the function call.

Concrete operational semantics is characterized by the one-step transition relation **C** →<sup>Π</sup> **C** and the termination relation finalΠ(**C**), which can be defined straightforwardly. Below we show the rules for mutable (re)borrow, swap, function call and return from a function; the complete rules and an example execution are presented in the full paper. SΠ,f,L is the statement for the label L of the function f in Π. TyΠ,f,L(x) is the type of variable x at the label.

$$\frac{S\_{\Pi,f,L}}{\lfloor f,L\rfloor \mathbf{F};\mathbf{S}\rfloor \mathbf{H}} = \text{let } y = \text{number} \, x; \, \mathbf{g} \, \mathbf{t} \, L' \quad \mathbf{F}(x) = a$$

$$\frac{S\_{\Pi,f,L}}{\lfloor f,L\rfloor \mathbf{F};\mathbf{S}\rfloor \mathbf{H}} \to \frac{\lfloor f,L\rfloor \mathbf{F} + \{\lfloor f,u\rfloor\}; \mathbf{S}\rfloor \mathbf{H}}{\lceil f,L\rceil \mathbf{F};\mathbf{S}\rceil \mathbf{H} + \{\lfloor a+k,m\rfloor\} \mathbf{t} \, L'}$$

$$\frac{S\_{\Pi,f,L}}{\lceil f,L\rceil \mathbf{F};\mathbf{S}\rceil \mathbf{H} + \{\{a+k,m\}\} k \{\in[\#T]\} + \{\{b+k,m\}\} k \{\in[\#T]\}}{\text{in} \parallel \{f,L\} \mathbf{F};\mathbf{S}\rceil \mathbf{H} + \{\{a+k,m\}\} k \{\in[\#T]\} + \{\{b+k,m\}\} k \{\in[\#T]\}}$$

$$S\_{\Pi,f,L} = \text{let } y = g \cdot \langle \cdot \rangle (x\_0, \dots, x\_{n-1}); \, \mathbf{g} \, \mathbf{t} \, L'$$

$$\frac{S\_{\Pi,g}}{\lvert f,L\rVert \mathbf{F} + \{\langle x\_i, a\_i\rangle\} i \in \lbrackn\rvert\Vert \mathbf{S}\rVert}; \, \mathbf{S} \parallel \mathbf{H} \to \mathbb{H} \quad \{\text{gett}\} \{\langle x'\_i, a\_i\rangle\} i \in \lbrackx\_i\rvert; \, \mathbf{f} \, \mathbf{t} \, \mathbf{J}\}$$

$$S\_{\Pi,f,L} = \text{return } x$$

$$\frac{S\_{\Pi,f,L}}{\lceil f,L\rceil \{\{x,a\}\} i \$$

Here we introduce '#T', which represents how many memory cells the type T takes (at the outermost level). #T is defined for every *complete* type T, because every occurrence of type variables in a complete type is guarded by a pointer constructor.

$$\begin{aligned} \#(T\_0 + T\_1) &:= 1 + \max\{\#T\_0, \#T\_1\} & \#(T\_0 \times T\_1) &:= \#T\_0 + \#T\_1 \\ \#\mu X.T &:= \#T[\mu X.T/X] & \#\text{int} = \#\,P.T &:= 1 \quad \#\text{unit} = 0 \end{aligned}$$

# **3 CHC Representation of COR Programs**

To formalize the idea discussed in § 1, we give a translation from COR programs to CHC systems, which precisely characterize the input-output relations of the COR programs. We first define the logic for CHCs (§ 3.1). We then formally describe our translation (§3.2) and prove its correctness (§3.3). Also, we examine effectiveness of our approach with advanced examples (§ 3.4) and discuss how our idea can be extended and enhanced (§ 3.5).

#### **3.1 Multi-sorted Logic for Describing CHCs**

To begin with, we introduce a first-order multi-sorted logic for describing the CHC representation of COR programs.

**Syntax.** The syntax is defined as follows.

(CHC) <sup>Φ</sup> ::= <sup>∀</sup> <sup>x</sup><sup>0</sup>: <sup>σ</sup><sup>0</sup>,...,xm−<sup>1</sup>: <sup>σ</sup>m−<sup>1</sup>. <sup>ϕ</sup><sup>ˇ</sup> ⇐<sup>=</sup> <sup>ψ</sup><sup>0</sup> ∧ ··· ∧ <sup>ψ</sup>n−<sup>1</sup> := the nullary conjunction of formulas (formula) ϕ, ψ ::= <sup>f</sup>(t<sup>0</sup>,...,tn−<sup>1</sup>) (elementary formula) ˇ<sup>ϕ</sup> ::= <sup>f</sup>(p<sup>0</sup>,...,pn−<sup>1</sup>) (term) <sup>t</sup> ::= <sup>x</sup> | t|t<sup>∗</sup>, t◦ | inji <sup>t</sup> <sup>|</sup> (t<sup>0</sup>, t<sup>1</sup>) | ∗<sup>t</sup> | ◦<sup>t</sup> <sup>|</sup> t.i <sup>|</sup> const <sup>|</sup> <sup>t</sup> op <sup>t</sup> - (value) v, w ::= v|v<sup>∗</sup>, v◦ | inji <sup>v</sup> <sup>|</sup> (v<sup>0</sup>, v<sup>1</sup>) <sup>|</sup> const (pattern) p, q ::= <sup>x</sup> | p|p<sup>∗</sup>, p◦ | inji <sup>p</sup> <sup>|</sup> (p<sup>0</sup>, p<sup>1</sup>) <sup>|</sup> const (sort) σ, τ ::= <sup>X</sup> <sup>|</sup> μX.σ <sup>|</sup> C σ <sup>|</sup> <sup>σ</sup><sup>0</sup> <sup>+</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup><sup>0</sup> <sup>×</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> int <sup>|</sup> unit (container kind) C ::= box <sup>|</sup> mut const ::= same as COR op ::= same as COR bool := unit + unit true := inj<sup>1</sup> () false := inj<sup>0</sup> () X ::= (sort variable) x, y ::= (variable) f ::= (predicate variable)

We introduce box σ and mut σ, which correspond to own T/immut<sup>α</sup> T and mut<sup>α</sup> T respectively. t/ t∗, t◦ is the constructor for box σ/mut σ. ∗t takes the body/first value of −/ −,− and ◦t takes the second value of −,−. We restrict the form of CHCs here to simplify the proofs later. Although the logic does not have a primitive for equality, we can define the equality in a CHC system (e.g. by adding ∀ x: σ.*Eq*(x, x) ⇐= ).

A *CHC system* (**Φ**, **Ξ**) is a pair of a finite set of CHCs **Φ** = {Φ0,...,Φ<sup>n</sup>−<sup>1</sup>} and **Ξ**, where **Ξ** is a finite map from predicate variables to tuples of sorts (denoted by Ξ), specifying the sorts of the input values. Unlike the informal description in § 1, we add **Ξ** to a CHC system.

**Sort System.** 't:**<sup>Δ</sup>** σ' (the term t has the sort σ under **Δ**) is defined as follows. Here, **Δ** is a finite map from variables to sorts. σ ∼ τ is the congruence on sorts induced by μX.σ ∼ σ[μX.σ/X].


'wellSorted**Δ**,**Ξ**(ϕ)' and 'wellSorted**Ξ**(**Φ**)', the judgments on well-sortedness of formulas and CHCs, are defined as follows.

$$\frac{\mathsf{E}(f) = (\sigma\_0, \dots, \sigma\_{n-1}) \quad \text{for any } i \in [n], \ t\_i \colon \Delta, \sigma\_i}{\text{wellSsorted}\,\Delta, \mathfrak{g}\left(f(t\_0, \dots, t\_{n-1})\right)}$$

$$\Delta = \{(x\_i, \sigma\_i) \mid i \in [m]\} \quad \text{wellSsorted}\,\Delta, \mathfrak{g}\left(\bar{\varphi}\right) \quad \text{for any } j \in [n], \text{wellSsorted}\,\Delta, \mathfrak{g}\left(\psi\_j\right)$$

$$\frac{\text{wellSsorted}\,\underline{\mathfrak{g}}\left(\forall x\_0 ; \sigma\_0, \dots, x\_{m-1} ; \sigma\_{m-1} \cdot \varphi \iff \psi\_0 \wedge \dots \wedge \psi\_{n-1}\right)}{\text{wellSsorted}\,\underline{\mathfrak{g}}\left(\forall x\_0 ; \sigma\_0, \dots, x\_{m-1} ; \sigma\_{m-1} \cdot \varphi \iff \psi\_0 \wedge \dots \wedge \psi\_{n-1}\right)}$$

The CHC system (**Φ**, **Ξ**) is said to be well-sorted if wellSorted**Ξ**(Φ) holds for any Φ ∈ **Φ**.

**Semantics.** '[[t]]**I**', the interpretation of the term t as a value under **I**, is defined as follows. Here, **I** is a finite map from variables to values. Although the definition is partial, the interpretation is defined for all well-sorted terms.

[[x]]**<sup>I</sup>** := **<sup>I</sup>**(x) [[t]]**<sup>I</sup>** := [[t]]**I** [[t<sup>∗</sup>, t◦]]**<sup>I</sup>** := [[t<sup>∗</sup>]]**<sup>I</sup>**, [[t◦]]**I** [[inj<sup>i</sup> <sup>t</sup>]]**<sup>I</sup>** := inji[[t]]**<sup>I</sup>** [[(t<sup>0</sup>, t<sup>1</sup>)]]**<sup>I</sup>** := ([[t<sup>0</sup>]]**<sup>I</sup>**, [[t<sup>1</sup>]]**I**) [[∗t]]**<sup>I</sup>** := <sup>v</sup> ([[t]]**<sup>I</sup>** <sup>=</sup> v) <sup>v</sup><sup>∗</sup> ([[t]]**<sup>I</sup>** <sup>=</sup> v<sup>∗</sup>, v◦) [[◦t]]**<sup>I</sup>** := <sup>v</sup>◦ if [[t]]**<sup>I</sup>** <sup>=</sup> v<sup>∗</sup>, v◦ [[t.i]]**<sup>I</sup>** := <sup>v</sup><sup>i</sup> if [[t]]**<sup>I</sup>** = (v<sup>0</sup>, v<sup>1</sup>) [[const]]**<sup>I</sup>** := const [[<sup>t</sup> op <sup>t</sup> - ]]**<sup>I</sup>** := [[t]]**<sup>I</sup>** [[op]][[<sup>t</sup> - ]]**I** [[op]]: the binary operation on values corresponding to op

A *predicate structure* **M** is a finite map from predicate variables to (concrete) predicates on values. **M**, **I** |= f(t0,...,t<sup>n</sup>−1) means that **M**(f)([[t0]]**I**,..., [[t<sup>m</sup>−1]]**I**) holds. **M** |= Φ is defined as follows.

$$\frac{\text{for any } \mathbf{I} \text{ s.t. } \forall i \in [m]. \mathbf{I}(x\_i) \colon\_{\mathcal{B}} \sigma\_i, \ \mathbf{M}, \mathbf{I} \succeq \psi\_0, \dots, \psi\_{n-1} \text{ implies } \mathbf{M}, \mathbf{I} \succeq \varphi}{\mathbf{M} \succeq \forall x\_0 ; \sigma\_0, \dots, x\_{m-1} \colon\_{\mathcal{T}} \sigma\_{m-1}} \varphi \iff \psi\_0 \land \dots \land \psi\_{n-1}$$

Finally, **M** |= (**Φ**, **Ξ**) is defined as follows.

$$\frac{\begin{array}{l}\text{for any } (f, (\sigma\_0, \dots, \sigma\_{n-1})) \in \Xi, \,\,\mathbf{M}(f) \text{ is a predicate on values of sort } \sigma\_0, \dots, \sigma\_{n-1})}{\text{dom}\,\mathbf{M} = \text{dom}\,\Xi \quad \text{for any } \Phi \in \Phi, \,\mathbf{M} \mid = \Phi} \\ \hline \mathbf{M} \mid = (\Phi, \Xi) \end{array}$$

When **M** |= (**Φ**, **Ξ**) holds, we say that **M** is a *model* of (**Φ**, **Ξ**). Every wellsorted CHC system (**Φ**, **Ξ**) has the *least model* on the point-wise ordering (which can be proved based on the discussions in [16]), which we write as **M**least (**Φ**,**Ξ**).

#### **3.2 Translation from COR Programs to CHCs**

Now we formalize our translation of Rust programs into CHCs. We define (|Π|), which is a CHC system that represents the input-output relations of the functions in the COR program Π.

Roughly speaking, the least model **M**least (|Π|) for this CHC system should satisfy: for any values <sup>v</sup>0,...,v<sup>n</sup>−<sup>1</sup>, w, **<sup>M</sup>**least (|Π|) <sup>|</sup><sup>=</sup> <sup>f</sup>entry(v0,...,v<sup>n</sup>−<sup>1</sup>, w) holds exactly if, in COR, a function call f(v0,...,v<sup>n</sup>−<sup>1</sup>) can return w. Actually, in concrete operational semantics, such values should be read out from the heap memory. The formal description and proof of this expected property is presented in § 3.3.

**Auxiliary Definitions.** The sort corresponding to the type T, (|T|), is defined as follows. Pˇ is a meta-variable for a non-mutable-reference pointer kind, i.e. own or immutα. Note that the information on lifetimes is all stripped off.

$$\begin{aligned} \{X\} &:= X \quad \langle \mu X.T \rangle = \mu X. \langle T\rangle \quad \langle \mathring{P} \ T \rangle := \mathsf{b} \ltimes \{T\} \quad \langle \mathsf{lmut}\_{\alpha} T \rangle := \mathsf{mut} \, \langle T\rangle\\ \{\mathsf{int}\} &:= \mathsf{int} \quad \langle \mathsf{unit} \rangle := \mathsf{unit} \quad \langle T\_{0} + T\_{1} \rangle := \langle T\_{0} \rangle + \langle T\_{1} \rangle \quad \langle \!\langle T\_{0} \times T\_{1} \rangle := \langle T\_{0} \rangle \times \langle T\_{1} \rangle \end{aligned}$$

We introduce a special variable res to represent the result of a function.<sup>19</sup> For a label L in a function f in a program Π, we define ˇϕΠ,f,L, ΞΠ,f,L and **Δ**Π,f,L

<sup>19</sup> For simplicity, we assume that the parameters of each function are sorted respecting some fixed order on variables (with res coming at the last), and we enumerate various items in this fixed order.

as follows, if the items in the variable context for the label are enumerated as x0: **<sup>a</sup>**<sup>0</sup> T0,...,xn−1: **<sup>a</sup>***n*−<sup>1</sup> Tn−<sup>1</sup> and the return type of the function is U.

$$\begin{aligned} \check{\varphi}\_{\Pi,f,L} &:= f\_L(x\_0, \dots, x\_{n-1}, \text{res}) & \Xi\_{\Pi,f,L} &:= (\{T\_0\}, \dots, \{T\_{n-1}\}, \{U\})\\ &\Delta\_{\Pi,f,L} &:= \{(x\_i, \{T\_i\}) \mid i \in [n] \} + \{(\text{res}, \{U\})\} \end{aligned}$$

∀(**Δ**) stands for ∀ x0: σ0, ..., xn−1: σn−1, where the items in **Δ** are enumerated as (x0, σ0),...,(xn−1, σn−1).

**CHC Representation.** Now we introduce '(|L: S|)Π,f ', the set (in most cases, singleton) of CHCs modeling the computation performed by the labeled statement L: S in f from Π. Unlike informal descriptions in § 1, we turn to *pattern matching* instead of equations, to simplify the proofs. Below we show some of the rules; the complete rules are presented in the full paper. The variables marked green (e.g. x◦) should be fresh. The following is the rule for mutable (re)borrow.

$$\coloneqq \begin{cases} \{L; \text{let} \, y = \text{untbor}\_{\alpha} x; \text{ go to } L' \} \mathbb{I}\_{\varPi, f} \\ \left\{ \begin{aligned} \forall (\Delta\_{\varPi, f, L} + \{ \langle x\_{\circ}, [T] \rangle \} ). \\ \check{\varphi}\_{\Pi, f, L} \leq \bar{\varphi}\_{\Pi, f, L'} [\langle \*x, x\_{\circ} \rangle / y, \langle x\_{\circ} \rangle / x] \end{aligned} \right\} & \text{(Ty}\_{\varPi, f, L}(x) = \text{own } T) \\ \left\{ \begin{aligned} \forall (\Delta\_{\varPi, f, L} + \{ \langle x\_{\circ}, [T] \rangle \} ). \\ \check{\varphi}\_{\Pi, f, L} \leq \bar{\varphi}\_{\Pi, f, L'} [\langle \*x, x\_{\circ} \rangle / y, \langle x\_{\circ}, \alpha x \rangle / x] \end{aligned} \right\} & \text{(Ty}\_{\varPi, f, L}(x) = \text{pmt}\_{\alpha} T) \end{cases} \end{cases}$$

The value at the end of borrow is represented as a newly introduced variable x◦. Below is the rule for release of a variable.

$$\coloneqq \begin{cases} \{\boldsymbol{L} : \mathsf{drop} \,\boldsymbol{x} \; \middle\} \,\boldsymbol{\Omega} \,\big|\,\boldsymbol{L}\_{\boldsymbol{f},\boldsymbol{f}}\\ \displaystyle\equiv \begin{cases} \{\forall(\Delta\_{\Pi,\boldsymbol{f},\boldsymbol{L}}).\,\,\varphi\_{\Pi,\boldsymbol{f},\boldsymbol{L}} \Longleftarrow \varphi\_{\Pi,\boldsymbol{f},\boldsymbol{L}'}\} & \mbox{(\bf{Ty}\_{\Pi,\boldsymbol{f},\boldsymbol{L}}(\boldsymbol{x}) = \boldsymbol{P} \,\boldsymbol{T})\\ \{\forall(\Delta\_{\Pi,\boldsymbol{f},\boldsymbol{L}} - \{\boldsymbol{\{x}},\boldsymbol{\mathsf{mut}}\,\{\boldsymbol{T}\})\} + \{\langle\boldsymbol{x},\boldsymbol{\mathsf{s}},\{\boldsymbol{T}\}\rangle\}\} \cdot & \mbox{(\bf{Ty}\_{\Pi,\boldsymbol{f},\boldsymbol{L}}(\boldsymbol{x}) = \boldsymbol{\mathsf{mut}}\,\boldsymbol{T}) \end{cases} \end{cases}$$

When a variable x of type mut<sup>α</sup> T is dropped/released, we check the prophesied value at the end of borrow. Below is the rule for a function call.

$$\begin{aligned} & \{ \boldsymbol{L} . \mathtt{let} \, y = g(\cdot \cdot \cdot)(x\_0, \ldots, x\_{n-1}); \; \mathtt{got} \, \boldsymbol{L}' \} \} \_{\Pi, f} \\ & \quad := \{ \forall (\Delta\_{\Pi, f, L} + \{ (y, \{ \mathtt{Ty}\_{\Pi, f, L'}(y) \} ) \} \} \cdot \varphi\_{\Pi, f, L} \iff g\_{\textrm{entry}}(x\_0, \ldots, x\_{n-1}, y) \wedge \varphi\_{\Pi, f, L'} \} \end{aligned}$$

The body (the right-hand side of ⇐= ) of the CHC contains two formulas, which yields a kind of call stack at the level of CHCs. Below is the rule for a return from a function.

(|L:return <sup>x</sup>|)Π,f := <sup>∀</sup>(**Δ**Π,f,L). <sup>ϕ</sup>ˇΠ,f,L[x/res] ⇐<sup>=</sup>

The variable res is forced to be equal to the returned variable x.

Finally, (|Π|), the CHC system that represents the COR program Π (or the *CHC representation* of Π), is defined as follows.

$$\{\lbrack H \rbrack\} := \left(\sum\_{F \text{ in } \Pi, \spaceL : S \in \text{LabelStmt}\_{F}} \{\space L \space S\}\!\_{\varPi, \spaceR \space \mathtt{name}\_{F}}, \ \{\underl@\_{\varPi, \spacel} \space L\}\!\_{\varPi, \spaces \space t . \prime} \{\space l \space \bot \space \mathtt{else} \space \mathtt{else} \space \mathtt{while} \space \mathtt{else} \}\right)$$

*Example 2 (CHC Representation).* We present below the CHC representation of take-max described in § 2.1. We omit CHCs on inc-max here. We have also excluded the variable binders '∀ ···'.<sup>20</sup>

take-maxentry(ma, mb,res) ⇐<sup>=</sup> take-maxL1(ma, mb,∗ma>=∗mb,res)

<sup>20</sup> The sorts of the variables are as follows: ma, mb,res: mut int; ma<sup>∗</sup>, mb∗: int; ou: box unit.

take-maxL1(ma, mb,inj1∗ou,res) ⇐<sup>=</sup> take-maxL2(ma, mb, ou,res) take-maxL1(ma, mb,inj0∗ou,res) ⇐<sup>=</sup> take-maxL5(ma, mb, ou,res) take-maxL2(ma, mb, ou,res) ⇐<sup>=</sup> take-maxL3(ma, mb,res) take-maxL3(ma,mb<sup>∗</sup>,mb∗,res) ⇐<sup>=</sup> take-maxL4(ma,res) take-maxL4(ma, ma) ⇐<sup>=</sup> take-maxL5(ma, mb, ou,res) ⇐<sup>=</sup> take-maxL6(ma, mb,res) take-maxL6(ma<sup>∗</sup>,ma∗, mb,res) ⇐<sup>=</sup> take-maxL7(mb,res) take-maxL7(mb, mb) ⇐<sup>=</sup>

The fifth and eighth CHC represent release of *mb*/*ma*. The sixth and ninth CHC represent the determination of the return value res.

#### **3.3 Correctness of the CHC Representation**

Now we formally state and prove the correctness of the CHC representation.

*Notations.* We use {|· · ·|} (instead of {· · · }) for the intensional description of a multiset. A ⊕ B (or more generally <sup>λ</sup> Aλ) denotes the multiset sum (e.g. {|0, 1|} ⊕ {|1|} = {|0, 1, 1|} = {|0, 1|}).

**Readout and Safe Readout.** We introduce a few judgments to formally describe how read out data from the heap.

First, the judgment 'readout**H**(∗a :: T | v; M)' (the data at the address a of type T can be read out from the heap **H** as the value v, yielding the memory footprint <sup>M</sup>) is defined as follows.<sup>21</sup> Here, a *memory footprint* <sup>M</sup> is a finite multiset of addresses, which is employed for monitoring the memory usage.

$$\begin{array}{c} \mathbf{H}(a) = a' \quad \text{readout}\_{\mathbf{H}}(\ast a :: T \mid v ; \mathcal{M}) & \text{readout}\_{\mathbf{H}}(\ast a :: T \mid \mu\mathcal{X}.T/\mathcal{X} \mid v; \mathcal{M})\\ \hline \text{readout}\_{\mathbf{H}}(\ast a :: \mathbf{own} \mid \langle v \rangle ; \mathcal{M} \oplus [\![a\!]\!]) & \text{readout}\_{\mathbf{H}}(\ast a :: \mu\mathcal{X}.T/\mathcal{X} \mid v; \mathcal{M})\\ \hline \text{readout}\_{\mathbf{H}}(\ast a :: \mathbf{int} \mid v ; \{a\!\!\!\}) & \text{readout}\_{\mathbf{H}}(\ast a :: \mathbf{un} \mid \langle \rangle \;\!/ ; \mathcal{D})\\ \hline \text{readout}\_{\mathbf{H}}(\ast a :: \mathbf{int} \mid v ; \{a\!\!\!\!\}) & \text{readout}\_{\mathbf{H}}(\ast a :: \mathbf{l} ; \mathcal{M} \oplus [\![a\!\!\!\!\!] \!])\\ \hline \text{readout}\_{\mathbf{H}}(\ast a :: T\_{0} \mathbf{+} \mathbf{l} \mid \mathbf{in}\_{i} v ; \mathcal{M} \oplus [\![a\!\!\!\!\!] \!] \!\!\!\/ a + 1 \&\text{if+} \mathcal{M} \vdash k \mid \mathcal{M} \oplus [\![\![\!\!\!T\!\!\!\!\!\!\!\!\!\!\!\!\/ a \!\!\!\!\!\!\!\!\!\!\/)]\\ \hline \text{readout}\_{\mathbf{H}}(\ast a :: T\_{0} \mathbf{+} \mathbf{l} \mid v\_{0} ; \mathcal{M} \oplus [\![a\!\!\!\!\!\!\!$$

For example, 'readout{(100,7),(101,5)}(∗100 :: int × int | (7, 5); {|100, 101|})' holds.

Next, 'readout**H**(**F** :: **Γ** | F; M)' (the data of the stack frame **F** respecting the variable context **Γ** can be read out from **H** as F, yielding M) is defined as follows. dom **Γ** stands for {x | x: **<sup>a</sup>** <sup>T</sup> <sup>∈</sup>**Γ**}.

$$\frac{\text{dom}\,\mathbf{F} = \text{dom}\,\mathbf{T} \quad \text{for any } x \colon \text{own } T \in \Gamma, \text{ readout}\_{\mathbf{H}}(\*\mathbf{F}(x) :: T \mid v\_{x} ; \mathcal{M}\_{x})}{\text{readout}\_{\mathbf{H}}(\mathbf{F} :: \Gamma \mid \{ (x, \langle v\_{x} \rangle) \mid x \in \text{dom } \mathbf{F} \} ; \bigoplus\_{x \in \text{dom } \mathbf{F}} \mathcal{M}\_{x})}$$

<sup>21</sup> Here we can ignore mutable/immutable references, because we focus on what we call simple functions, as explained later.

Finally, 'safe**H**(**F** :: **Γ** | F)' (the data of **F** respecting **Γ** can be *safely* read out from **H** as F) is defined as follows.

$$\frac{\text{readout}\_{\mathsf{H}}(\mathbf{F} :: \Gamma \mid \mathcal{F}; \mathcal{M}) \quad \mathcal{M} \text{ has no duplicate items}}{\text{safe}(\mathbf{F} :: \Gamma \mid \mathcal{F})}$$

Here, the 'no duplicate items' precondition checks the safety on the ownership.

**COS-based Model.** Now we introduce the *COS-based model* (COS stands for concrete operational semantics) f COS <sup>Π</sup> to formally describe the expected inputoutput relation. Here, for simplicity, f is restricted to one that does not take lifetime parameters (we call such a function *simple*; the input/output types of a simple function cannot contain references). We define f COS <sup>Π</sup> as the predicate (on values of sorts (|T0|),...,(|T<sup>n</sup>−<sup>1</sup>|),(|U|) if f's input/output types are T0,...,T<sup>n</sup>−1, U) given by the following rule.

$$\begin{array}{c} \mathbf{C}\_{0} \rightarrow\_{\varPi} \cdots \rightarrow\_{\varPi} \mathbf{C}\_{N} \quad \text{final}\_{\varPi}(\mathbf{C}\_{N}) \quad \mathbf{C}\_{0} = [f, \texttt{entry}] \, \mathbf{F} \mid \mathbf{H} \quad \mathbf{C}\_{N} = [f, L] \, \mathbf{F}' \mid \mathbf{H}' \\\text{safe} \, \Big(\mathbf{F} :: \Gamma\_{\varPi, f, \text{entry}} \mid \left\{(x\_{i}, v\_{i}) \,|\, i \in [n] \right\} \Big) \quad \text{safe}\_{\varPi'} \left(\mathbf{F}' :: \Gamma\_{\varPi, f, L} \; \big|\; \{(y, w) \} \right) \\\hline \quad \quad \quad f\_{\varPi}^{\text{COS}}(v\_{0}, \ldots, v\_{n-1}, w) \\\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \dots \end{array}$$

**<sup>Γ</sup>**Π,f,L: the variable context for the label <sup>L</sup> of <sup>f</sup> in the program <sup>Π</sup>

**Correctness Theorem.** Finally, the correctness (both soundness and completeness) of the CHC representation is simply stated as follows.

**Theorem 1 (Correctness of the CHC Representation).** *For any program* Π *and simple function* f *in* Π*,* f COS <sup>Π</sup> *is equivalent to* **M**least (|Π|) (fentry)*.*

*Proof.* The details are presented in the full paper. We outline the proof below.

First, we introduce *abstract operational semantics*, where we get rid of heaps and directly represent each variable in the program simply as a value with *abstract variables*, which is strongly related to *prophecy variables* (see § 5). An abstract variable represents the undetermined value of a mutable reference at the end of borrow.

Next, we introduce *SLDC resolution* for CHC systems and find a *bisimulation* between abstract operational semantics and SLDC resolution, whereby we show that the *AOS-based model*, defined analogously to the COS-based model, is *equivalent* to the least model of the CHC representation. Moreover, we find a *bisimulation* between concrete and abstract operational semantics and prove that the COS-based model is *equivalent* to the AOS-based model.

Finally, combining the equivalences, we achieve the proof for the correctness of the CHC representation.

Interestingly, as by-products of the proof, we have also shown the *soundness of the type system* in terms of preservation and progression, in both concrete and abstract operational semantics. Simplification and generalization of the proofs is left for future work.

#### **3.4 Advanced Examples**

We give advanced examples of pointer-manipulating Rust programs and their CHC representations. For readability, we write programs in Rust (with ghost annotations) instead of COR. In addition, CHCs are written in an informal style like § 1, preferring equalities to pattern matching.

*Example 3.* Consider the following program, a variant of just\_rec in § 1.1.

```
fn choose<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 {
  if rand() { drop ma; mb } else { drop mb; ma }
}
fn linger_dec<'a>(ma: &'a mut i32) -> bool {
  *ma -= 1; if rand() >= 0 { drop ma; return true; }
  let mut b = rand(); let old_b = b; intro 'b; let mb = &'b mut b;
  let r2 = linger_dec<'b> (choose<'b> (ma, mb)); now 'b;
  r2 && old_b >= b
}
```
Unlike just\_rec, the function linger\_dec can modify the local variable of an arbitrarily deep ancestor. Interestingly, each recursive call to linger\_dec can introduce a new lifetime 'b , which yields arbitrarily many layers of lifetimes.

Suppose we wish to verify that linger\_dec never returns false. If we use, like *JustRec*<sup>+</sup> in § 1.1, a predicate taking the memory states h, h and the stack pointer *sp*, we have to discover the quantified invariant: ∀ i ≤ *sp*. h[i] ≥ h- [i]. In contrast, our approach reduces this verification problem to the following CHCs:

```
Choose(a, a◦,b, b◦, r) ⇐= b◦ = b ∧ r = a, a◦
Choose(a, a◦,b, b◦, r) ⇐= a◦ = a ∧ r = b, b◦
LingerDec(a, a◦, r) ⇐= a-
                           = a − 1 ∧ a◦ = a-
                                             ∧ r = true
LingerDec(a, a◦, r) ⇐= a-
                           = a − 1 ∧ oldb = b ∧ Choose(a-

                                                          , a◦,b, b◦, mc)
                         ∧ LingerDec(mc, r-

                                           ) ∧ r = (r-
                                                      && oldb >= b◦)
r = true ⇐= LingerDec(a, a◦, r).
```
This can be solved by many solvers since it has a very simple model:

```
Choose(
        a, a◦,
               b, b◦, r) :⇐⇒ (b◦ = b ∧ r = 
                                             a, a◦) ∨ (a◦ = a ∧ r = 
                                                                    b, b◦)
   LingerDec(
               a, a◦, r) :⇐⇒ r = true ∧ a ≥ a◦.
```
*Example 4.* Combined with *recursive data structures*, our method turns out to be more interesting. Let us consider the following Rust code:<sup>22</sup>

```
enum List { Cons(i32, Box<List>), Nil } use List::*;
fn take_some<'a>(mxs: &'a mut List) -> &'a mut i32 {
  match mxs {
    Cons(mx, mxs2) => if rand() { drop mxs2; mx }
                           else { drop mx; take_some<'a> (mxs2) }
    Nil => { take_some(mxs) }
```
<sup>22</sup> In COR, List can be expressed as μX.int <sup>×</sup> own X <sup>+</sup> unit.

```
}
 }
 fn sum(xs: &List) -> i32 {
   match xs { Cons(x, xs2) => x + sum(xs2), Nil => 0 }
 }
 fn inc_some(mut xs: List) -> bool {
   let n = sum(&xs); intro 'a; let my = take_some<'a> (&'a mut xs);
   *my += 1; drop my; now 'a; let m = sum(&xs); m == n + 1
 }
502 Y. Matsushita et al.
```
This is a program that manipulates singly linked integer lists, defined as a recursive data type. take\_some takes a mutable reference to a list and returns a mutable reference to some element of the list. sum calculates the sum of the elements of a list. inc\_some increments some element of a list via a mutable reference and checks that the sum of the elements of the list has increased by 1.

Suppose we wish to verify that inc\_some never returns false. Our method translates this verification problem into the following CHCs.<sup>23</sup>

$$\begin{aligned} \text{Take} &\text{Some} (\langle [x | xs'], xs\_{\diamond} \rangle, r) \iff xs\_{\diamond} = [x\_{\diamond} | xs'\_{\diamond}] \land xs'\_{\diamond} = xs' \land r = \langle x, x\_{\diamond} \rangle \\ \text{Take} &\text{Some} (\langle [x | xs'], xs\_{\diamond} \rangle, r) \iff xs\_{\diamond} = [x\_{\diamond} | xs'\_{\diamond}] \land x\_{\diamond} = x \land \text{Take} &\text{Some} (\langle xs', xs'\_{\diamond} \rangle, r) \\ \text{Take} &\text{Some} (\langle [], xs\_{\diamond} \rangle, r) \iff \text{Take} &\text{Some} (\langle [], xs\_{\diamond} \rangle, r) \\ \text{Sum} (\langle [x | xs'] \rangle, r) \iff \text{Sum} (\langle xs' \rangle, r') \land r = x + r' \\ \text{Sum} (\langle [] \rangle, r) \iff r = 0 \\ \text{Inc} &\text{Some} (\langle xs \rangle, n) \iff \text{Sum} (\langle xs \rangle, n) \land \text{Take} &\text{Some} (\langle xs, xs\_{\diamond} \rangle, \langle y, y\_{\diamond} \rangle) \land y\_{\diamond} = y + 1 \\ &\qquad \land \text{Sum} (\langle xs\_{\diamond} \rangle, m) \land r = (m == n + 1). \end{aligned}$$

A crucial technique used here is *subdivision of a mutable reference*, which is achieved with the constraint *xs*◦ = [x◦|*xs*- ◦].

We can give this CHC system a very simple model, using an auxiliary function sum (satisfying sum([x|*xs*- ]) := x + sum(*xs*- ), sum([]) := 0):

$$\begin{array}{c} \text{Take} (\langle xs, xs\_{\diamonds} \rangle, \langle y, y\_{\diamonds} \rangle) \quad : \Longleftrightarrow \ y\_{\diamonds} - y = \mathsf{sum}(xs\_{\diamonds}) - \mathsf{sum}(xs) \\\quad \qquad Sum(\langle xs \rangle, r) \quad : \Longleftrightarrow \ r = \mathsf{sum}(xs) \\\quad \qquad \qquad \mathit{Inccome}(xs, r) \quad : \Longleftrightarrow \ r = \mathsf{true}. \end{array}$$

Although the model relies on the function sum, the validity of the model can be checked without induction on sum (i.e. we can check the validity of each CHC just by properly unfolding the definition of sum a few times).

The example can be *fully automatically and promptly* verified by our approach using HoIce [12,11] as the back-end CHC solver; see § 4.

#### **3.5 Discussions**

We discuss here how our idea can be extended and enhanced.

<sup>23</sup> [x|xs] is the cons made of the head x and the tail xs. [] is the nil. In our formal logic, they are expressed as inj0(x,xs) and inj1().

*Applying Various Verification Techniques.* Our idea can also be expressed as a translation of a pointer-manipulating Rust program into a program of a *stateless functional programming language*, which allows us to use *various verification techniques* not limited to CHCs. Access to future information can be modeled using *non-determinism*. To express the value a◦ coming at the end of mutable borrow in CHCs, we just *randomly guess* the value with non-determinism. At the time we actually release a mutable reference, we just *check* a' = a and cut off execution branches that do not pass the check.

For example, take\_max/inc\_max in § 1.2/Example 1 can be translated into the following OCaml program.

```
let rec assume b = if b then () else assume b
let take_max (a, a') (b, b') =
  if a >= b then (assume (b' = b); (a, a'))
            else (assume (a' = a); (b, b'))
let inc_max a b =
  let a' = Random.int(0) in let b' = Random.int(0) in
  let (c, c') = take_max (a, a') (b, b') in
  assume (c' = c + 1); not (a' = b')
let main a b = assert (inc_max a b)
```
'let a' = Random.int(0)' expresses a *random guess* and 'assume (a' = a)' expresses a *check*. The original problem "Does inc\_max never return false?" is reduced to the problem "Does main never fail at assertion?" on the OCaml program.<sup>24</sup>

This representation allows us to use various verification techniques, including model checking (higher-order, temporal, bounded, etc.), semi-automated verification (e.g. on Boogie [48]) and verification on proof assistants (e.g. Coq [15]). The property to be verified can be not only partial correctness, but also total correctness and liveness. Further investigation is left for future work.

*Verifying Higher-order Programs.* We have to care about the following points in modeling closures: **(i)** A closure that encloses mutable references can be encoded as a pair of the main function and the 'drop function' called when the closure is released; **(ii)** A closure that updates enclosed data can be encoded as a function that returns, with the main return value, the updated version of the closure; **(iii)** A closure that updates external data through enclosed mutable references can also be modeled by combination of (i) and (ii). Further investigation on verification of higher-order Rust programs is left for future work.

*Libraries with Unsafe Code.* Our translation does not use lifetime information; the correctness of our method is guaranteed by the nature of borrow. Whereas

<sup>24</sup> MoCHi [39], a higher-order model checker for OCaml, successfully verified the safety property for the OCaml representation above. It also successfully and instantly verified a similar representation of choose/linger\_dec at Example 3.

lifetimes are used for *static check* of the borrow discipline, many libraries in Rust (e.g. RefCell) provide a mechanism for *dynamic ownership check*.

We believe that such libraries with *unsafe code* can be verified for our method by a separation logic such as Iris [35,33], as RustBelt [32] does. A good news is that Iris has recently incorporated *prophecy variables* [34], which seems to fit well with our approach. This is an interesting topic for future work.

After the libraries are verified, we can turn to our method. For an easy example, Vec [58] can be represented simply as a functional array; a mutable/immutable slice &mut[T]/&[T] can be represented as an array of mutable/immutable references. For another example, to deal with RefCell [56], we pass around an *array* that maps a RefCell<T> address to data of type T equipped with an ownership counter; RefCell itself is modeled simply as an address.<sup>2526</sup> Importantly, *at the very time we take a mutable reference* a, a◦ *from a ref-cell, the data at the array should be updated into* a◦. Using methods such as pointer analysis [61], we can possibly shrink the array.

Still, our method does not go quite well with *memory leaks* [52] caused for example by combination of RefCell and Rc [57], because they obfuscate the ownership release of mutable references. We think that use of Rc etc. should rather be restricted for smooth verification. Further investigation is needed.

# **4 Implementation and Evaluation**

We report on the implementation of our verification tool and the preliminary experiments conducted with small benchmarks to confirm the effectiveness of our approach.

#### **4.1 Implementation of RustHorn**

We implemented a prototype verification tool *RustHorn* (available at https: //github.com/hopv/rust-horn) based on the ideas described above. The tool supports basic features of Rust supported in COR, including recursions and recursive types especially.

The implementation translates the MIR (Mid-level Intermediate Representation) [45,51] of a Rust program into CHCs quite straightforwardly.<sup>27</sup> Thanks to the nature of the translation, RustHorn can just rely on Rust's borrow check and forget about lifetimes. For efficiency, the predicate variables are constructed by the granularity of the vertices in the control-flow graph in MIR, unlike the perlabel construction of § 3.2. Also, assertions in functions are taken into account unlike the formalization in § 3.2.

<sup>25</sup> To borrow a mutable/immutable reference from RefCell, we check and update the counter and take out the data from the array.

<sup>26</sup> In Rust, we can use RefCell to naturally encode data types with circular references (e.g. doubly-linked lists).

<sup>27</sup> In order to use the MIR, RustHorn's implementation depends on the unstable nightly version of the Rust compiler, which causes a slight portability issue.

#### **4.2 Benchmarks and Experiments**

To measure the performance of RustHorn and the existing CHC-based verifier SeaHorn [23], we conducted preliminary experiments with benchmarks listed in Table 1. Each benchmark program is designed so that the Rust and C versions match. Each benchmark instance consists of either one program or a pair of safe and unsafe programs that are very similar to each other. The benchmarks and experimental results are accessible at https://github.com/hopv/rust-horn.

The benchmarks in the groups simple and bmc were taken from SeaHorn (https://github.com/seahorn/seahorn/tree/master/test), with the Rust versions written by us. They have been chosen based on the following criteria: they (i) consist of only features supported by core Rust, (ii) follow Rust's ownership discipline, and (iii) are small enough to be amenable for manual translation from C to Rust.

The remaining six benchmark groups are built by us and consist of programs featuring mutable references. The groups inc-max, just-rec and linger-dec are based on the examples that have appeared in § 1 and § 3.4. The group swap-dec consists of programs that perform repeated involved updates via mutable references to mutable references. The groups lists and trees feature destructive updates on recursive data structures (lists and trees) via mutable references, with one interesting program of it explained in § 3.4.

We conducted experiments on a commodity laptop (2.6GHz Intel Core i7 MacBook Pro with 16GB RAM). First we translated each benchmark program by RustHorn and SeaHorn (version 0.1.0-rc3) [23] translate into CHCs in the SMT-LIB 2 format. Both RustHorn and SeaHorn generated CHCs sufficiently fast (about 0.1 second for each program). After that, we measured the time of CHC solving by Spacer [40] in Z3 (version 4.8.7) [69] and HoIce (version 1.8.1) [12,11] for the generated CHCs. SeaHorn's outputs were not accepted by HoIce, especially because SeaHorn generates CHCs with arrays. We also made modified versions for some of SeaHorn's CHC outputs, adding constraints on address freshness, to improve accuracy of representations and reduce false alarms.<sup>28</sup>

#### **4.3 Experimental Results**

Table 1 shows the results of the experiments.

Interestingly, the combination of RustHorn and HoIce succeeded in verifying many programs with recursive data types (lists and trees), although it failed at difficult programs.<sup>29</sup> HoIce, unlike Spacer, can find models defined with primitive recursive functions for recursive data types.<sup>30</sup>

<sup>28</sup> For base/3 and repeat/3 of inc-max, the address-taking parts were already removed, probably by inaccurate pointer analysis.

<sup>29</sup> For example, inc-some/2 takes two mutable references in a list and increments on them; inc-all-t destructively increments all elements in a tree.

<sup>30</sup> We used the latest version of HoIce, whose algorithm for recursive types is presented in the full paper of [11].


**Table 1.** Benchmarks and experimental results on RustHorn and SeaHorn, with Spacer/Z3 and HoIce. "timeout" denotes timeout of 180 seconds; "false alarm" means reporting 'unsafe' for a safe program; "tool error" is a tool error of Spacer, which currently does not deal with recursive types well.

False alarms of SeaHorn for the last six groups are mainly due to problematic approximation of SeaHorn for pointers and heap memories, as discussed in § 1.1. On the modified CHC outputs of SeaHorn, five false alarms were erased and four of them became successful. For the last four groups, unboundedly many memory cells can be allocated, which imposes a fundamental challenge for SeaHorn's array-based approach as discussed in § 1.1. <sup>31</sup> The combination of RustHorn and HoIce took a relatively long time or reported timeout for some programs, including unsafe ones, because HoIce is still an unstable tool compared to Spacer; in general, automated CHC solving can be rather unstable.

# **5 Related Work**

*CHC-based Verification of Pointer-Manipulating Programs.* SeaHorn [23] is a representative existing tool for CHC-based verification of pointer-manipulating programs. It basically represents the heap memory as an array. Although some pointer analyses [24] are used to optimize the array representation of the heap, their approach suffers from the scalability problem discussed in §1.1, as confirmed by the experiments in § 4. Still, their approach is quite effective as automated verification, given that many real-world pointer-manipulating programs do not follow Rust-style ownership.

Another approach is taken by JayHorn [37,36], which translates Java programs (possibly using object pointers) to CHCs. They represent store invariants using special predicates *pull* and *push*. Although this allows faster reasoning about the heap than the array-based approach, it can suffer from more false alarms. We conducted a small experiment for JayHorn (0.6-alpha) on some of the benchmarks of § 4.2; unexpectedly, JayHorn reported 'UNKNOWN' (instead of 'SAFE' or 'UNSAFE') for even simple programs such as the programs of the instance unique-scalar in simple and the instance basic in inc-max.

*Verification for Rust.* Whereas we have presented the first CHC-based (fully automated) verification method specially designed for Rust-style ownership, there have been a number of studies on other types of verification for Rust.

RustBelt [32] aims to formally prove high-level safety properties for Rust libraries with unsafe internal implementation, using manual reasoning on the higher-order concurrent separation logic Iris [35,33] on the Coq Proof Assistant [15]. Although their framework is flexible, the automation of the reasoning on the framework is little discussed. The language design of our COR is affected by their formal calculus λRust.

Electrolysis [67] translates some subset of Rust into a purely functional programming language to manually verify functional correctness on Lean Theorem Prover [49]. Although it clears out pointers to get simple models like our approach, Electrolysis' applicable scope is quite limited, because it deals with mutable references by *simple static tracking of addresses based on lenses* [20], not

<sup>31</sup> We also tried on Spacer JustRec+, the stack-pointer-based accurate representation of just\_rec presented in § 1.1, but we got timeout of 180 seconds.

supporting even basic use cases such as dynamic selection of mutable references (e.g. take\_max in § 1.2) [66], which our method can easily handle. Our approach covers *all* usages of pointers of the safe core of Rust as discussed in § 3.

Some serial studies [27,3,17] conduct (semi-)automated verification on Rust programs using Viper [50], a verification platform based on separation logic with fractional ownership. This approach can to some extent deal with unsafe code [27] and type traits [17]. Astrauskas et al. [3] conduct semi-automated verification (manually providing pre/post-conditions and loop invariants) on many realistic examples. Because Viper is based on *fractional ownership*, however, their platforms have to use *concrete indexing on the memory* for programs like take\_max/inc\_max. In contrast, our idea leverages *borrow-based ownership*, and it can be applied also to semi-automated verification as suggested in § 3.5.

Some researches [65,4,44] employ bounded model checking on Rust programs, especially with unsafe code. Our method can be applied to bounded model checking as discussed in § 3.5.

*Verification using Ownership.* Ownership has been applied to a wide range of verification. It has been used for detecting race conditions on concurrent programs [8,64] and analyzing the safety of memory allocation [63]. Separation logic based on ownership is also studied well [7,50,35]. Some verification platforms [14,5,21] support simple ownership. However, most prior studies on ownershipbased verification are based on fractional or counting ownership. Verification under *borrow-based ownership* like Rust was little studied before our work.

*Prophecy Variables.* Our idea of taking a future value to represent a mutable reference is linked to the notion of *prophecy variables* [1,68,34]. Jung et al. [34] propose a new Hoare-style logic with prophecy variables. In their logic, prophecy variables are not copyable, which is analogous to uncopyability of mutable references in Rust. This logic can probably be used for generalizing our idea as suggested in § 3.5.

# **6 Conclusion**

We have proposed a novel method for CHC-based program verification, which represents a mutable reference as a pair of values, the current value and the future value at the time of release. We have formalized the method for a core language of Rust and proved its correctness. We have implemented a prototype verification tool for a subset of Rust and confirmed the effectiveness of our approach. We believe that this study establishes the foundation of verification leveraging borrow-based ownership.

**Acknowledgments.** This work was supported by JSPS KAKENHI Grant Number JP15H05706 and JP16K16004. We are grateful to the anonymous reviewers for insightful comments.

# **References**


on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14- 20, 2018, Proceedings, Part I. Lecture Notes in Computer Science, vol. 10805, pp. 365–384. Springer (2018). https://doi.org/10.1007/978-3-319-89960-2 20




Maun, Botswana, May 7-12, 2017. EPiC Series in Computing, vol. 46, pp. 368–384. EasyChair (2017)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A First-Order Logic with Frames**

Adithya Murali-† , Lucas Pe˜na-†, Christof L¨oding‡, and P. Madhusudan†

† University of Illinois at Urbana-Champaign, Department of Computer Science, Urbana, IL, USA {adithya5,lpena7, madhu}@illinois.edu

‡ RWTH Aachen University, Department of Computer Science, Aachen, Germany loeding@automata.rwth-aachen.de

**Abstract.** We propose a novel logic, called *Frame Logic* (FL), that extends first-order logic (with recursive definitions) using a construct *Sp*(·) that captures the *implicit supports* of formulas— the precise subset of the universe upon which their meaning depends. Using such supports, we formulate proof rules that facilitate frame reasoning elegantly when the underlying model undergoes change. We show that the logic is expressive by capturing several data-structures and also exhibit a translation from a *precise* fragment of separation logic to frame logic. Finally, we design a program logic based on frame logic for reasoning with programs that dynamically update heaps that facilitates local specifications and frame reasoning. This program logic consists of both localized proof rules as well as rules that derive the weakest tightest preconditions in FL.

**Keywords:** Program Verification, Program Logics, Heap Verification, First-Order Logic, First-Order Logic with Recursive Definitions

# **1 Introduction**

Program logics for expressing and reasoning with programs that dynamically manipulate heaps is an active area of research. The research on separation logic has argued convincingly that it is highly desirable to have *localized logics* that talk about small states (heaplets rather than the global heap), and the ability to do *frame reasoning*. Separation logic achieves this objective by having a tight heaplet semantics and using special operators, primarily a separating conjunction operator ∗ and a separating implication operator (the magic wand −∗).

In this paper, we ask a fundamental question: can classical logics (such as FOL and FOL with recursive definitions) be extended to support localized specifications and frame reasoning? Can we utilize classical logics for reasoning effectively with programs that dynamically manipulate heaps, with the aid of local specifications and frame reasoning?

The primary contribution of this paper is to endow a classical logic, namely first-order logic with recursive definitions (with least fixpoint semantics) with frames and frame reasoning.

<sup>-</sup>Equal contribution Corresponding Author

A formula in first-order logic with recursive definitions (FO-RD) can be naturally associated with a *support*— the subset of the universe that determines its truth. By using a more careful syntax such as guarded quantification (which continue to have a classical interpretation), we can in fact write specifications in FO-RD that have very precise supports. For example, we can write the property that x points to a linked list using a formula list(x) written purely in FO-RD so that its support is precisely the locations constituting the linked list.

In this paper, we define an extension of FO-RD, called Frame Logic (FL) where we allow a new operator *Sp*(α) which, for an FO-RD formula α, evaluates to the support of α. Logical formulas thus have access to supports and can use it to *separate* supports and do frame reasoning. For instance, the logic can now express that two lists are disjoint by asserting that *Sp*(list(x))∩*Sp*(list(y)) = <sup>∅</sup>. It can then reason that in such a program heap configuration, if the program manipulates only the locations in *Sp*(list(y)), then list(x) would continue to be true, using simple frame reasoning.

The addition of the support operator to FO-RD yields a very natural logic for expressing specifications. First, formulas in FO-RD have the same meaning when viewed as FL formulae. For example, f(x) = y (written in FO-RD as well as in FL) is true in any model that has x mapped by f to y, instead of a specialized "tight heaplet semantics" that demands that f be a partial function with the domain only consisting of the location x. The fact that the support of this formula contains only the location x is important, of course, but is made accessible using the support operator, i.e., *Sp*(f(x) = y) gives the set containing the sole element interpreted for x. Second, properties of supports can be naturally expressed using set operations. To state that the lists pointed to by x and y are disjoint, we don't need special operators (such as the ∗ operator in separation logic) but can express this as *Sp*(list(x)) <sup>∩</sup> *Sp*(list(y)) = <sup>∅</sup>. Third, when used to annotate programs, pre/post specifications for programs written in FL can be made *implicitly* local by interpreting their supports to be the localized heaplets accessed and modified by programs, yielding frame reasoning akin to program logics that use separation logic. Finally, as we show in this paper, the weakest precondition of specifications across basic loop-free paths can be expressed in FL, making it an expressive logic for reasoning with programs. Separation logic, on the other hand, introduces the magic wand operator −∗ (which is inherently higher-order) in order to add enough expressiveness to be closed under weakest preconditions [38].

We define frame logic (FL) as an extension of FO with recursive definitions (FO-RD) that operates over a multi-sorted universe, with a particular foreground sort (used to model locations on the heap on which pointers can mutate) and several background sorts that are defined using separate theories. Supports for formulas are defined with respect to the foreground sort only. A special background sort of *sets* of elements of the foreground sort is assumed and is used to model the supports for formulas. For any formula ϕ in the logic, we have a special construct *Sp*(ϕ) that captures its support, a set of locations in the foreground sort, that intuitively corresponds to the precise subdomain of functions

the value of ϕ depends on. We then prove a *frame theorem* (Theorem 1) that says that changing a model M by changing the interpretation of functions that are not in the support of ϕ will not affect the truth of the formula ϕ. This theorem then directly supports frame reasoning; if a model satisfies ϕ and the model is changed so that the changes made are disjoint from the support of ϕ, then ϕ will continue to hold. We also show that FL formulae can be translated to vanilla FO-RD logic (without support operators); in other words, the semantics for the support of a formula can be captured in FO-RD itself. Consequently, we can use any FO-RD reasoning mechanism (proof systems [19, 20] or heuristic algorithms such as the natural proof techniques [24, 32, 37, 41]) to reason with FL formulas.

We illustrate our logic using several examples drawn from program verification; we show how to express various data-structure definitions and the elements they contain and various measures for them using FL formulas (e.g., linked lists, sorted lists, list segments, binary search trees, AVL trees, lengths of lists, heights of trees, set of keys stored in the data-structure, etc.)

While the sensibilities of our logic are definitely inspired by separation logic, there are some fundamental differences beyond the fact that our logic extends the syntax and semantics of classical logics with a special support operator and avoids operators such as ∗ and −∗. In separation logic, there can be many supports of a formula (also called heaplets)— a heaplet for a formula is one that *supports its truth*. For example, a formula of the form α <sup>∨</sup> β can have a heaplet that supports the truth of α or one that supports the truth of β. However, the philosophy that we follow in our design is to have a *single* support that supports the truth value of a formula, whether it be *true or false*. Consequently, the support of the formula α <sup>∨</sup> β is the *union* of the supports of the formulas α and β.

The above design choice of the support being *determined* by the formula has several consequences that lead to a deviation from separation logic. For instance, the support of the negation of a formula ϕ is the same as the support of ϕ. And the support of the formula f(x) = y and its negation are the same, namely the singleton location interpreted for x. In separation logic, the corresponding formula will have the same heaplet but its negation will include *all* other heaplets. The choice of having determined supports or heaplets is not new, and there have been several variants and sublogics of separation logics that have been explored. For example, the logic Dryad [32, 37] is a separation logic that insists on determined heaplets to support automated reasoning, and the *precise* fragment of separation logic studied in the literature [29] defines a sublogic that has (essentially) determined heaplets. The second main contribution in this paper is to show that this fragment of separation logic (with slight changes for technical reasons) can be translated to frame logic, such that the unique heaplet that satisfies a precise separation logic formula is its support of the corresponding formula in frame logic.

The third main contribution of this paper is a program logic based on frame logic for a simple while-programming language destructively updating heaps. We

present two kinds of proof rules for reasoning with such programs annotated with pre- and post-conditions written in frame logic. The first set of rules are local rules that axiomatically define the semantics of the program, using the smallest supports for each command. We also give a frame rule that allows arguing preservation of properties whose supports are disjoint from the heaplet modified by a program. These rules are similar to analogous rules in separation logic. The second class of rules work to give a *weakest tightest precondition* for any postcondition with respect to non-recursive programs. In separation logic, the corresponding rules for weakest preconditions are often expressed using separating implication (the magic-wand operator). Given a small change made to the heap and a postcondition β, the formula α −∗ β captures all heaplets H where if a heaplet that satisfies α is joined with H, then β holds. When α describes the change effected by the program, α −∗ β captures, essentially, the weakest precondition. However, the magic wand is a very powerful operator that calls for quantifications over heaplets and submodels, and hence involves second order quantification. In our logic, we show that we can capture the weakest precondition with only first-order quantification, and hence first-order frame logic is closed under weakest preconditions across non-recursive programs blocks. This means that when inductive loop invariants are given also in FL, reasoning with programs reduces to reasoning with FL. By translating FL to pure FO-RD formulas, we can use FO-RD reasoning techniques to reason with FL, and hence programs.

In summary, the contributions of this paper are:


The paper is organized as follows. Section 2 sets up first-order logics with recursive definitions (FO-RD), with a special uninterpreted foreground sort of locations and several background sorts/theories. Section 3 introduces Frame Logic (FL), its syntax, its semantics which includes a discussion of design choices for supports, proves the frame theorem for FL, shows a reduction of FL to FO-RD, and illustrates the logic by defining several data-structures and their properties using FL. Section 4 develops a program logic based on FL, illustrating them with proofs of verification of programs. Section 5 introduces a precise fragment of separation logic and shows its translation to FL. Section 6 discusses comparisons of FL to separation logic, and some existing first-order techniques that can be used to reason with FL. Section 7 compares our work with the research literature and Section 8 has concluding remarks.

# **2 Background: First-Order Logic with Recursive Definitions and Uninterpreted Combinations of Theories**

The base logic upon which we build frame logic is a first order logic with recursive definitions (FO-RD), where we allow a foreground sort and several background sorts, each with their individual theories (like arithmetic, sets, arrays, etc.). The foreground sort and functions involving the foreground sort are *uninterpreted* (not constrained by theories). This hence can be seen as an uninterpreted combination of theories over disjoint domains. This logic has been defined and used to model heap verification before [23].

We will build frame logic over such a framework where supports are modeled as subsets of elements of the foreground sort. When modeling heaps in program verification using logic, the foreground sort will be used to model *locations of the heap*, uninterpreted functions from the foreground sort to foreground sort will be used to model *pointers*, and uninterpreted functions from the foreground sort to the background sort will model *data fields*. Consequently, supports will be subsets of locations of the heap, which is appropriate as these are the domains of pointers that change when a program updates a heap.

We define a signature as Σ = (S; C; F; <sup>R</sup>; <sup>I</sup>), where S is a finite non-empty set of sorts. C is a set of constant symbols, where each c <sup>∈</sup> C has some sort τ <sup>∈</sup> S. F is a set of function symbols, where each function f <sup>∈</sup> F has a type of the form <sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>τ</sup><sup>m</sup> <sup>→</sup> <sup>τ</sup> for some <sup>m</sup>, with <sup>τ</sup><sup>i</sup>, τ <sup>∈</sup> <sup>S</sup>. The sets <sup>R</sup> and <sup>I</sup> are (disjoint) sets of relation symbols, where each relation R ∈R∪I has a type of the form <sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>τ</sup><sup>m</sup>. The set <sup>I</sup> contains those relation symbols for which the corresponding relations are inductively defined using formulas (details are given below), while those in R are given by the model.

We assume that the set of sorts contains a designated "foreground sort" denoted by <sup>σ</sup><sup>f</sup>. All the other sorts in S are called background sorts, and for each such background sort σ we allow the constant symbols of type σ, function symbols that have type <sup>σ</sup><sup>n</sup> <sup>→</sup> σ for some n, and relation symbols have type σ<sup>m</sup> for some m, to be constrained using an arbitrary theory T<sup>σ</sup>.

A formula in first-order logic with recursive definitions (FO-RD) over such a signature is of the form (D, α), where <sup>D</sup> is a set of recursive definitions of the form <sup>R</sup>(x) := ρ<sup>R</sup>(x), where R ∈ I and ρ<sup>R</sup>(x) is a first-order logic formula, in which the relation symbols from <sup>I</sup> occur only positively. α is also a first-order logic formula over the signature. We assume D has at most one definition for any inductively defined relation, and that the formulas <sup>ρ</sup><sup>R</sup> and <sup>α</sup> use only inductive relations defined in D.

The semantics of a formula is standard; the semantics of inductively defined relations are defined to be the least fixpoint that satisfies the relational equations, and the semantics of α is the standard one defined using these semantics for relations. We do not formally define the semantics, but we will formally define the semantics of frame logic (discussed in the next section and whose semantics is defined in the Technical Report [25]) which is an extension of FO-RD.

# **3 Frame Logic**

We now define Frame Logic (FL), the central contribution of this paper.

FL formulas: ϕ ::= t<sup>τ</sup> = t<sup>τ</sup> | R(t<sup>τ</sup><sup>1</sup> ,...,t<sup>τ</sup>m) | ϕ ∧ ϕ | ¬ϕ | *ite*(γ : ϕ, ϕ) | ∃y : γ. ϕ τ ∈ S, R ∈R∪I of type τ<sup>1</sup> ×···× τ<sup>m</sup> Guards: γ ::= t<sup>τ</sup> = t<sup>τ</sup> | R(t<sup>τ</sup><sup>1</sup> ,...,t<sup>τ</sup>m) | γ ∧ γ | ¬γ | *ite*(γ : γ, γ) | ∃y : γ. γ τ ∈ S \ {σS(f)}, R ∈ R of type τ<sup>1</sup> ×···× τ<sup>m</sup> Terms: t<sup>τ</sup> ::= c | x | f(t<sup>τ</sup><sup>1</sup> ,...,t<sup>τ</sup>m) | *ite*(γ : t<sup>τ</sup> , t<sup>τ</sup> ) | *Sp*(ϕ) (if τ = σS(f)) | *Sp*(t<sup>τ</sup>- ) (if τ = σS(f)) τ, τ - ∈ S with constants c, variables x of type τ , and functions f of type τ<sup>1</sup> ×···× t<sup>m</sup> → τ Recursive definitions: R(x) := ρR(x) with R ∈ I of type τ<sup>1</sup> ×···× τ<sup>m</sup> with τ<sup>i</sup> ∈ S \ {σS(f)}, FL formula ρR(x) where all relation symbols R-∈ I occur only positively or inside a support expression.

**Fig. 1.** Syntax of frame logic: <sup>γ</sup> for guards, <sup>t</sup><sup>τ</sup> for terms of sort <sup>τ</sup> , and general formulas ϕ. Guards cannot use inductively defined relations or support expressions.

We consider a universe with a foreground sort and several background sorts, each restricted by individual theories, as described in Section 2. We consider the elements of the foreground sort to be *locations* and consider supports as *sets of locations*, i.e., sets of elements of the foreground sort. We hence introduce a background sort <sup>σ</sup>S(f); the elements of sort <sup>σ</sup>S(f) model sets of elements of sort <sup>σ</sup><sup>f</sup>. Among the relation symbols in <sup>R</sup> there is the relation <sup>∈</sup> of type <sup>σ</sup><sup>f</sup> <sup>×</sup> <sup>σ</sup>S(f) that is interpreted as the usual element relation. The signature includes the standard operations on sets <sup>∪</sup>, <sup>∩</sup> with the usual meaning, the unary function -· that is interpreted as the complement on sets (with respect to the set of foreground elements), and the constant ∅. For these functions and relations we assume a background theory <sup>B</sup><sup>σ</sup>S(f) that is an axiomatization of the theory of sets. We further assume that the signature does not contain any other function or relation symbols involving the sort σS(f).

For reasoning about changes of the structure over the locations, we assume that there is a subset <sup>F</sup><sup>m</sup> <sup>⊆</sup> <sup>F</sup> of function symbols that are declared mutable. These functions can be used to model mutable pointer fields in the heap that can be manipulated by a program and thus change. Formally, we require that each <sup>f</sup> <sup>∈</sup> <sup>F</sup><sup>m</sup> has at least one argument of sort <sup>σ</sup><sup>f</sup>.

For variables, let *Var* <sup>τ</sup> denote the set of variables of sort <sup>τ</sup> , where <sup>τ</sup> <sup>∈</sup> <sup>S</sup>. We let <sup>x</sup> abbreviate tuples <sup>x</sup><sup>1</sup>,...,x<sup>n</sup> of variables.

Our frame logic over uninterpreted combinations of theories is a variant of first-order logic with recursive definitions that has an additional operator *Sp*(ϕ) that assigns to each formula ϕ a set of elements (its support or "heaplet" in the context of heaps) in the foreground universe. So *Sp*(ϕ) is a term of sort <sup>σ</sup>S(f).

The intended semantics of *Sp*(ϕ) (and of the inductive relations) is defined formally as a least fixpoint of a set of equations. This semantics is presented in Section 3.3. In the following, we first define the syntax of the logic, then discuss informally the various design decisions for the semantics of supports, before proceeding to a formal definition of the semantics

#### **3.1 Syntax of Frame Logic (FL)**

The syntax of our logic is given in the grammar in Figure 1. This extends FO-RD with the rule for building *support expressions*, which are terms of sort <sup>σ</sup>S(f) of the form *Sp*(α) for a formula α, or *Sp*(t) for a term t.

The formulas defined by γ are used as *guards* in existential quantification and in the if-then-else-operator, which is denoted by *ite*. The restriction compared to general formulas is that guards cannot use inductively defined relations (R ranges only over <sup>R</sup> in the rule for γ, and over R∪I in the rule for ϕ), nor terms of sort <sup>σ</sup>S(f) and thus no support expressions (<sup>τ</sup> ranges over <sup>S</sup> \ {σS(f)} in the rules for <sup>γ</sup> and over S in the rule for ϕ). The requirement that the guard does not use the inductive relations and support expressions is used later to ensure the existence of least fixpoints for defining semantics of inductive definitions. The semantics of an *ite*-formula *ite*(γ, α, β) is the same as the one of (γ∧α)∨(¬γ∧β); however, the *supports* of the two formulas will turn out to be different (i.e., *Sp*(*ite*(γ : α, β)) and *Sp*((γ <sup>∧</sup> α) <sup>∨</sup> (¬γ <sup>∧</sup> β)) are different), as explained in Section 3.2. The same is true for existential formulas, i.e., <sup>∃</sup>y : γ.ϕ has the same semantics as <sup>∃</sup>y.γ <sup>∧</sup> ϕ but, in general, has a different support.

For recursive definitions (throughout the paper, we use the terms recursive definitions and inductive definitions with the same meaning), we require that the relation <sup>R</sup> that is defined does not have arguments of sort <sup>σ</sup>S(f). This is another restriction in order to ensure the existence of a least fixpoint model in the definition of the semantics.<sup>1</sup>

#### **3.2 Semantics of Support Expressions: Design Decisions**

We discuss the design decisions that go behind the semantics of the support operator *Sp* in our logic, and then give an example for the support of an inductive definition. The formal conditions that the supports should satisfy are stated in the equations in Figure 2, and are explained in Section 3.3. Here, we start by an informal discussion.

The first decision is to have every formula uniquely define a support, which roughly captures the subdomain of mutable functions that a formula ϕ's truthhood depends on, and have *Sp*(ϕ) evaluate to it.

The choice for supports of atomic formulae are relatively clear. An atomic formula of the kind f(x)=y, where x is of the foreground sort and f is a mutable function, has as its support the singleton set containing the location interpreted

<sup>1</sup> It would be sufficient to restrict formulas of the form R(t1,...,tn) for inductive relations R to not contain support expressions as subterms.

for x. And atomic formulas that do not involve mutable functions over the foreground have an empty support. Supports for terms can also be similarly defined. The support of a conjunction α <sup>∧</sup> β should clearly be the union of the supports of the two formulas.

*Remark 1.* In traditional separation logic, each pointer field is stored in a separate location, using integer offsets. However, in our work, we view pointers as references and disallow pointer arithmetic. A more accurate heaplet for such references can be obtained by taking heaplet to be the pair (x, f) (see [30]), capturing the fact that the formula depends only on the field f of x. Such accurate heaplets can be captured in FL as well— we can introduce a *non-mutable field lookup pointer* <sup>L</sup><sup>f</sup> and use x.L<sup>f</sup> .f in programs instead of x.f.

What should the support of a formula α <sup>∨</sup> β be? The choice we make here is that its support is the *union* of the supports of α and β. Note that in a model where α is true and β is false, we still include the heaplet of β in *Sp*(α <sup>∨</sup>β). In a sense, this is an overapproximation of the support as far as frame reasoning goes, as surely preserving the model's definitions on the support of α will preserve the truth of α, and hence of α <sup>∨</sup> β.

However, we prefer the support to be the union of the supports of α and β. We think of the support as the subdomain of the universe that determines the meaning of the formula, whether it be *true* or *false*. Consequently, we would like the support of a formula and its negation to be the same. Given that the support of the negation of a disjunction, being a conjunction, is the union of the frames of α and β, we would like this to be the support.

Separation logic makes a different design decision. Logical formulas are not associated with tight supports, but rather, the semantics of the formula is defined for models with given supports/heaplets, where the idea of a heaplet is whether it supports the *truthhood* of a formula (and not its falsehood). For example, for a model, the various heaplets that satisfy <sup>¬</sup>(f(x) = y) in separation logic would include all heaplets where the location of x is not present, which does not coincide with the notion we have chosen for supports. However, for positive formulas, separation logic handles supports more accurately, as it can associate several supports for a formula, yielding two heaplets for formulas of the form α∨β when they are both true in a model. The decision to have a single support for a formula compels us to take the union of the supports to be the support of a disjunction.

There are situations, however, where there are disjunctions α∨β, where only *one* of the disjuncts can possibly be true, and hence we would like the support of the formula to be the support of the disjunct that happens to be true. We therefore introduce a new syntactical form ite(γ : α, β) in frame logic, whose heaplet is the union of the supports of γ and α, if γ is true, and the supports of <sup>γ</sup> and <sup>β</sup> if <sup>γ</sup> is false. While the truthhood of ite(γ : α, β) is the same as that of (γ <sup>∧</sup> α) <sup>∨</sup> (¬γ <sup>∧</sup> β), its supports are potentially smaller, allowing us to write formulas with tighter supports to support better frame reasoning. Note that the support of ite(γ : α, β) and its negation ite(γ : <sup>¬</sup>α, <sup>¬</sup>β) are the same, as we desired.

Turning to quantification, the support for a formula of the form <sup>∃</sup>x.α is hard to define, as its truthhood could depend on the entire universe. We hence provide a mechanism for *guarded* quantification, in the form <sup>∃</sup>x : γ. α. The semantics of this formula is that there exists some location that satisfies the guard γ, for which α holds. The support for such a formula includes the support of the guard, and the supports of α when x is interpreted to be a location that satisfies γ. For example, <sup>∃</sup>x : (x <sup>=</sup> f(y)). g(x) = z has as its support the locations interpreted for y and f(y) only.

For a formula <sup>R</sup>(t) with an inductive relation R defined by R(x) := ρ<sup>R</sup>(x), the support descends into the definition, changing the variable assignment of the variables in <sup>x</sup> from the inductive definition to the terms in <sup>t</sup>. Furthermore, it contains the elements to which mutable functions are applied in the terms in <sup>t</sup>.

Recursive definitions are designed such that the evaluation of the equations for the support expressions is independent of the interpretation of the inductive relations. The equations mainly depend on the syntactic structure of formulas and terms. Only the semantics of guards, and the semantics of subterms under a mutable function symbol play a role. For this reason, we disallow guards to contain recursively defined relations or support expressions. We also require that the only functions involving the sort <sup>σ</sup>S(f) are the standard functions involving sets. Thus, subterms of mutable functions cannot contain support expressions (which are of sort <sup>σ</sup>S(f)) as subterms.

These restrictions ensure that there indeed exists a unique simultaneous least solution of the equations for the inductive relations and the support expressions.

We end this section with an example.

*Example 1.* Consider the definition of a predicate *tree*(x) w.r.t. two unary mutable functions *left* and *right*:

$$tree(x) := \newline (x = nil : true, \alpha) \text{ where } \newline \alpha = \exists \ell, r : (\ell = left(x) \land r = right(x)).tree(\ell) \land tree(r) \land \newline \operatorname{Sp}(tree(\ell)) \sqcap Sp(tree(r)) = \emptyset \land \neg(x \in Sp(tree(\ell)) \cup Sp(tree(r)))$$

This inductive definition defines binary trees with pointer fields *left* and *right* for left- and right-pointers, by stating that x points to a tree if either x is equal to nil (in this case its support is empty), or *left*(x) and *right*(x) are trees with disjoint supports. The last conjunct says that x does not belong to the support of the left and right subtrees; this condition is, strictly speaking, not required to define trees (under least fixpoint semantics). Note that the access to the support of formulas eases defining disjointness of heaplets, like in separation logic. The support of tree(x) turns out to be precisely the nodes that are reachable from x using *left* and *right* pointers, as one would desire. Consequently, if a pointer outside this support changes, we would be able to conclude using frame reasoning that the truth value of tree(x) does not change.

#### **3.3 Formal Semantics of Frame Logic**

Before we explain the semantics of the support expressions and inductive definitions, we introduce a semantics that treats support expressions and the symbols


**Fig. 2.** Equations for support expressions

from I as uninterpreted symbols. We refer to this semantics as *uninterpreted semantics*. For the formal definition we need to introduce some terminology first.

An occurrence of a variable x in a formula is free if it does not occur under the scope of a quantifier for x. By renaming variables we can assume that each variable only occurs freely in a formula or is quantified by exactly one quantifier in the formula. We write <sup>ϕ</sup>(x<sup>1</sup>,...,x<sup>k</sup>) to indicate that the free variables of ϕ are among x<sup>1</sup>,...,x<sup>k</sup>. Substitution of a term <sup>t</sup> for all free occurrences of variable <sup>x</sup> in a formula ϕ is denoted ϕ[t/x]. Multiple variables are substituted simultaneously as <sup>ϕ</sup>[t<sup>1</sup>/x<sup>1</sup>,...,t<sup>n</sup>/x<sup>n</sup>]. We abbreviate this by ϕ[t/x].

A model is of the form M = (U; -·<sup>M</sup>) where <sup>U</sup> = (U<sup>σ</sup>)<sup>σ</sup>∈<sup>S</sup> contains a universe for each sort, and an interpretation function -·<sup>M</sup>. The universe for the sort <sup>σ</sup>S(f) is the powerset of the universe for σ<sup>f</sup>.

A variable assignment is a function ν that assigns to each variable a concrete element from the universe for the sort of the variable. For a variable x, we write <sup>D</sup><sup>x</sup> for the universe of the sort of <sup>x</sup> (the domain of <sup>x</sup>). For a variable <sup>x</sup> and an element <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>x</sup> we write <sup>ν</sup>[<sup>x</sup> <sup>←</sup> <sup>u</sup>] for the variable assignment that is obtained from ν by changing the value assigned for x to u.

The interpretation function -·<sup>M</sup> maps each constant <sup>c</sup> of sort <sup>σ</sup> to an element <sup>c</sup><sup>M</sup> <sup>∈</sup> <sup>U</sup><sup>σ</sup>, each function symbol <sup>f</sup> : <sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>τ</sup><sup>m</sup> <sup>→</sup> <sup>τ</sup> to a concrete function <sup>f</sup><sup>M</sup> : <sup>U</sup><sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>U</sup><sup>τ</sup><sup>m</sup> <sup>→</sup> <sup>U</sup><sup>τ</sup> , and each relation symbol <sup>R</sup> ∈R∪I of type <sup>τ</sup><sup>1</sup> <sup>×</sup>...×τ<sup>m</sup> to a concrete relation -<sup>R</sup><sup>M</sup> <sup>⊆</sup> <sup>U</sup><sup>τ</sup><sup>1</sup> <sup>×</sup>...×U<sup>τ</sup>m. These interpretations are assumed to satisfy the background theories (see Section 2). Furthermore, the interpretation function maps each expression of the form *Sp*(ϕ) to a function -*Sp*(ϕ)<sup>M</sup> that assigns to each variable assignment <sup>ν</sup> a set -*Sp*(ϕ)<sup>M</sup>(ν) of foreground elements. The set -*Sp*(ϕ)<sup>M</sup>(ν) corresponds to the support of the formula when the free variables are interpreted by ν. Similarly, -*Sp*(t)<sup>M</sup> is a function from variable assignments to sets of foreground elements.

Based on such models, we can define the semantics of terms and formulas in the standard way. The only construct that is non-standard in our logic are terms of the form *Sp*(ϕ), for which the semantics is directly given by the interpretation function. We write <sup>t</sup>M,ν for the interpretation of a term <sup>t</sup> in <sup>M</sup> with variable assignment ν. With this convention, -*Sp*(ϕ)<sup>M</sup>(ν) denotes the same thing as -*Sp*(ϕ)M,ν. As usual, we write M,ν <sup>|</sup><sup>=</sup> ϕ to indicate that the formula ϕ is true in M with the free variables interpreted by ν, and <sup>ϕ</sup><sup>M</sup> denotes the relation defined by the formula ϕ with free variables <sup>x</sup>.

We refer to the above semantics as the *uninterpreted semantics* of ϕ because we do not give a specific meaning to inductive definitions and support expressions.

Now let us define the true semantics for FL. The relation symbols R ∈ I represent inductively defined relations, which are defined by equations of the form <sup>R</sup>(x) := ρ<sup>R</sup>(x) (see Figure 1). In the intended meaning, R is interpreted as the least relation that satisfies the equation

$$\|R(\overline{x})\|\_M = \|\rho\_R(\overline{x})\|\_{M}.$$

The usual requirement for the existence of a unique least fixpoint of the equation is that the definition of R does not negatively depend on R. For this reason, we require that in ρ<sup>R</sup>(x) each occurrence of an inductive predicate <sup>R</sup> ∈ I is either inside a support expression, or it occurs under an even number of negations.<sup>2</sup>

Every support expression is evaluated on a model to a set of foreground elements (under a given variable assignment ν). Formally, we are interested in models in which the support expressions are interpreted to be the sets that correspond to the *smallest solution of the equations given in Figure 2*. The intuition behind these definitions was explained in Section 3.2

*Example 2.* Consider the inductive definition tree(x) defined in Example 1. To check whether the equations from Figure 2 indeed yield the desired support, note that the supports of *Sp*(x <sup>=</sup> nil) = *Sp*(x) = *Sp*(*true*) = <sup>∅</sup>. Below, we write [u] for a variable assignment that assigns u to the free variable of the formula that we are considering. Then we obtain that *Sp*(*tree*(x))[u] = <sup>∅</sup> if u <sup>=</sup> *nil*, and *Sp*(*tree*(x))[u] = *Sp*(α)[u] if x <sup>=</sup> nil. The formula α is existentially quantified with guard <sup>=</sup> *left*(x) <sup>∧</sup> r <sup>=</sup> *right*(x). The support of this guard is {u} because mutable functions are applied to x. The support of the remaining part of α is the union of the supports of *tree*( )[*left*(u)] and *tree*(r)[*right*(u)] (the assignments for and r that make the guard true). So we obtain for the case that u <sup>=</sup> *nil* that the element u enters the support, and the recursion further descends into the subtrees of u, as desired.

<sup>2</sup> As usual, it would be sufficient to forbid negative occurrences of inductive predicates in mutual recursion.

A *frame model* is a model in which the interpretation of the inductive relations and of the support expressions corresponds to the least solution of the respective equations (see the Technical Report [25] for a rigorous formalisation).

**Proposition 1.** *For each model* M*, there is a unique frame model over the same universe and the same interpretation of the constants, functions, and noninductive relations.*

#### **3.4 A Frame Theorem**

The support of a formula can be used for frame reasoning in the following sense: if we modify a model M by changing the interpretation of the mutable functions (e.g., a program modifying pointers), then truth values of formulas do not change if the change happens outside the support of the formula. This is formalized below and proven in the Technical Report [25].

Given two models M,M over the same universe, we say that M is a *mutation of* M if -<sup>R</sup><sup>M</sup> <sup>=</sup> -R<sup>M</sup>- , <sup>c</sup><sup>M</sup> <sup>=</sup> c<sup>M</sup>- , and <sup>f</sup><sup>M</sup> <sup>=</sup> f<sup>M</sup>- , for all constants c, relations R ∈ R, and functions f <sup>∈</sup> F \ F<sup>m</sup>. In other words, <sup>M</sup> can only be different from M on the interpretations of the mutable functions, the inductive relations, and the support expressions.

Given a subset <sup>X</sup> <sup>⊆</sup> <sup>U</sup><sup>σ</sup><sup>f</sup> of the elements from the foreground universe, we say that the *mutation is stable on* X if the values of the mutable functions did not change on arguments from X, that is, f<sup>M</sup>(u1,...,u<sup>n</sup>) = f<sup>M</sup>- (u1,...,u<sup>n</sup>) for all mutable functions <sup>f</sup> <sup>∈</sup> <sup>F</sup><sup>m</sup> and all appropriate tuples <sup>u</sup>1,...,u<sup>n</sup> of arguments with {u1,...,u<sup>n</sup>} ∩ <sup>X</sup> <sup>=</sup> <sup>∅</sup>.

**Theorem 1 (Frame Theorem).** *Let* M,M *be frame models such that* <sup>M</sup> *is a mutation of* <sup>M</sup> *that is stable on* <sup>X</sup> <sup>⊆</sup> <sup>U</sup><sup>σ</sup><sup>f</sup> *, and let* <sup>ν</sup> *be a variable assignment. Then* M,ν <sup>|</sup><sup>=</sup> α *iff* M , ν <sup>|</sup><sup>=</sup> α *for all formulas* α *with* -*Sp*(α)<sup>M</sup>(ν) <sup>⊆</sup> X*, and* <sup>t</sup>M,ν <sup>=</sup> t<sup>M</sup>-,ν *for all terms* <sup>t</sup> *with* -*Sp*(t)<sup>M</sup>(ν) <sup>⊆</sup> X*.*

#### **3.5 Reduction from Frame Logic to FO-RD**

The only extension of frame logic compared to FO-RD is the operator *Sp*, which defines a function from interpretations of free variables to sets of foreground elements. The semantics of this operator can be captured within FO-RD itself, so reasoning within frame logic can be reduced to reasoning within FO-RD.

A formula <sup>α</sup>(y) with <sup>y</sup> <sup>=</sup> <sup>y</sup><sup>1</sup>,...,y<sup>m</sup> has one support for each interpretation of the free variables. We capture these supports by an inductively defined relation *Sp*α(y, z) of arity <sup>m</sup> + 1 such that for each frame model <sup>M</sup>, we have (u<sup>1</sup>,...,u<sup>m</sup>, u) <sup>∈</sup> -*Sp*<sup>α</sup><sup>M</sup> if <sup>u</sup> <sup>∈</sup> -*Sp*(α)<sup>M</sup>(ν) for the interpretation ν that interprets <sup>y</sup><sup>i</sup> as <sup>u</sup><sup>i</sup>.

Since the semantics of *Sp*(α) is defined over the structure of α, we introduce corresponding inductively defined relations *Sp*<sup>β</sup> and *Sp*<sup>t</sup> for all subformulas <sup>β</sup> and subterms <sup>t</sup> of either <sup>α</sup> or of a formula <sup>ρ</sup><sup>R</sup> for <sup>R</sup> ∈ I.

*list*(x) := *ite*(x = nil, *true*, ∃z : z = *next*(x). *list*(z) ∧ x ∈ *Sp*(*list*(z)) (linked list) *dll*(x) := *ite*(x = nil : , *ite*(*next*(x) = nil : , ∃z : z = *next*(x). *prev*(z) = x ∧ *dll*(z) ∧ x ∈ *Sp*(*dll*(z)))) (doubly linked list) *lseg*(x, y) := *ite*(x = y : , ∃z : z = *next*(x). *lseg*(z, y) ∧ x ∈ *Sp*(*lseg*(z, y))) (linked list segment) *length*(x, n) := *ite*(x = nil : n = 0, ∃z : z = *next*(x). *length*(z, n − 1)) (length of list) *slist*(x) := *ite*(x = nil : , *ite*(*next*(x) = nil, , ∃z : z = *next*(x). *key*(x) ≤ *key*(z) ∧ *slist*(z) ∧ x ∈ *Sp*(*slist*(z)))) (sorted list) *mkeys*(x, M) := *ite*(x = nil : M = ∅, ∃z,M<sup>1</sup> : z = *next*(x). M = M<sup>1</sup> ∪<sup>m</sup> {*key*(x)} ∧ *mkeys*(z,M1)) ∧ x ∈ *Sp*(*mkeys*(z,M1)) (multiset of keys in linked list) *btree*(x) := *ite*(x = nil : , ∃ , r : = *left*(x) ∧ r = *right*(x). *btree*( ) ∧ *btree*(r) ∧ x ∈ *Sp*(*btree*( )) ∧ x ∈ *Sp*(*btree*(r)) ∧ *Sp*(*btree*( )) ∩ *Sp*(*btree*(r)) = ∅) (binary tree) *bst*(x) := *ite*(x = nil : , *ite*(*left*(x) = nil ∧ *right*(x) = nil : , *ite*(*left*(x) = nil : ∃r : r = *right*(x). *key*(x) ≤ *key*(r) ∧ *bst*(r) ∧ x ∈ *Sp*(*bst*(r)), *ite*(*right*(x) = nil : ∃ : = *left*(x). *key*( ) ≤ *key*(x) ∧ *bst*( ) ∧ x ∈ *Sp*(*bst*( )), ∃ , r : = *left*(x) ∧ r = *right*(x). *key*(x) ≤ *key*(r) ∧ *key*( ) ≤ *key*(x) ∧ *bst*( ) ∧ *bst*(r) ∧ x ∈ *Sp*(*bst*( )) ∧ x ∈ *Sp*(*bst*(r)) ∧ *Sp*(*bst*( )) ∩ *Sp*(*bst*(r)) = ∅)))) (binary search tree) *height*(x, n) := *ite*(x = nil : n = 0, ∃ , r, n1, n<sup>2</sup> : = *left*(x) ∧ r = *right*(x). *height*( , n1) ∧ *height*(r, n2) ∧ *ite*(n<sup>1</sup> > n<sup>2</sup> : n = n<sup>1</sup> + 1, n = n<sup>2</sup> + 1)) (height of binary tree) *bfac*(x, b) := *ite*(x = nil : 0, ∃ , r, n1, n<sup>2</sup> : = *left*(x) ∧ r = *right*(x). *height*( , n1) ∧ *height*(r, n2) ∧ b = n<sup>2</sup> − n1) (balance factor (for AVL tree)) *avl*(x) := *ite*(x = nil : , ∃ , r : = *left*(x) ∧ r = *right*(x). *avl*( ) ∧ *avl*(r) ∧ *bfac*(x) ∈ {−1, 0, 1} ∧ x ∈ *Sp*(*avl*( )) ∪ *Sp*(*avl*(r)) ∧ *Sp*(*avl*( )) ∩ *Sp*(*avl*(r)) = ∅) (avl tree) *ttree*(x) := *pttree*(x, nil) (threaded tree) *pttree*(x, p) := *ite*(x = nil : , ∃ , r : = *left*(x) ∧ r = *right*(x). ((r = nil ∧ *tnext*(x) = p) ∨ (r = nil ∧ *tnext*(x) = r)) ∧ *pttree*( , x) ∧ *pttree*(r, p) ∧ x ∈ *Sp*(*pttree*( , x)) ∪ *Sp*(*pttree*(r, p)) ∧ *Sp*(*pttree*( , x)) ∩ *Sp*(*pttree*(r, p)) = ∅) (threaded tree auxiliary definition)

**Fig. 3.** Example definitions of data-structures and other predicates in Frame Logic

The equations for supports from Figure 2 can be expressed by inductive definitions for the relations *Sp*β. The translations are shown in the Technical Report [25]. It is not hard to see that general frame logic formulas can be translated to FO-RD formulas that make use of these new inductively defined relations.

**Proposition 2.** *For every frame logic formula there is an equisatisfiable FO-RD formula with the signature extended by auxiliary predicates for recursive definitions of supports.*

# **3.6 Expressing Data-Structures Properties in FL**

We now present the formulation of several data-structures and properties about them in FL. Figure 3 depicts formulations of singly- and doubly-linked lists, list segments, lengths of lists, sorted lists, the multiset of keys stored in a list (assuming a background sort of multisets), binary trees, their heights, and AVL trees. In all these definitions, the support operator plays a crucial role. We also present a formulation of *single threaded binary trees* (adapted from [7]), which are binary trees where, apart from tree-edges, there is a pointer tnext that connects every tree node to the inorder successor in the tree; these pointers go from leaves to ancestors arbitrarily far away in the tree, making it a nontrivial definition.

We believe that FL formulas naturally and succinctly express these datastructures and their properties, making it an attractive logic for annotating programs.

# **4 Programs and Proofs**

In this section, we develop a program logic for a while-programming language that can destructively update heaps. We assume that location variables are denoted by variables of the form x and y, whereas variables that denote other data (which would correspond to the *background* sorts in our logic) are denoted by v. We omit the grammar to construct background terms and formulas, and simply denote such 'background expressions' with be and clarify the sort when it is needed. Finally, we assume that our programs are written in Single Static Assignment (SSA) form, which means that every variable is assigned to at most once in the program text. The grammar for our programming language is in Figure 4.

$$\begin{array}{lclcl} S ::= x & := & c \mid x := y \mid x := y.f \mid v := be \mid x.f := y \\ \mid \text{ } \mathsf{a} \mathsf{l} \mathsf{c}(x) \mid \text{ } \mathsf{f} \mathsf{e} \mathsf{e}(x) \mid \text{ if } be \text{ then } S \text{ else } S \mid \text{ while } be \text{ do } S \mid S \text{ }; \text{ } S \end{array}$$

**Fig. 4.** Grammar of while programs. <sup>c</sup> is a constant location, <sup>f</sup> is a field pointer, and be is a background expression. In our logic, we model every field f as a function f() from locations to the appropriate sort.

#### **4.1 Operational Semantics**

A configuration <sup>C</sup> is of the form (M, H, U) where M contains interpretations for the store and the heap. The store is a partial map that interprets variables, constants, and non-mutable functions (a function from location variables to locations) and the heap is a total map on the domain of locations that interprets mutable functions (a function from pointers and locations to locations). H is a subset of locations denoting the set of allocated locations, and U is a subset of locations denoting *a subset* of unallocated locations that can be allocated in the future. We introduce a special configuration ⊥ that the program transitions to when it dereferences a variable not in H.

A configuration (M, H, U) is *valid* if all variables of the location sort map only to locations not in U, locations in H do not point to any location in U, and U is a subset of the complement of H that does not contain nil or the locations mapped to by the variables. We denote this by *valid*(M, H, U). Initial configurations and reachable configurations of any program will be valid.

The transition of configurations on various commands that manipulate the store and heap are defined in the natural way. Allocation adds a new location from U into H with pointer-fields defaulting to *nil* and default data fields. See the Technical Report [25] for more details.

#### **4.2 Triples and Validity**

We express specifications of programs using triples of the form {α}S{β} where α and β are FL formulae and S is a program. The formulae are, however, restricted— for simplicity, we disallow atomic relations on locations, and functions with arity greater than one. We also disallow functions from a background sort to the foreground sort (see Section 3). Lastly, quantified formulae can have supports as large as the entire heap. However, our program logic covers a more practical fragment without compromising expressivity. Thus, we require guards in quantification to be of the form f(z ) = z or z <sup>∈</sup> U (z is the quantified variable).

We define a triple to be *valid* if every valid configuration with heaplet being precisely the support of α, when acted on by the program, yields a configuration with heaplet being the support of β. More formally, a triple is valid if for every valid configuration (M, H, U) such that M <sup>|</sup><sup>=</sup> α, H <sup>=</sup> -*Sp*(α)<sup>M</sup>:

**<sup>–</sup>** it is never the case that the abort state ⊥ is encountered in the execution on S.

**<sup>–</sup>** if (M, H, U) transitions to (M , H , U ) on S, then M <sup>|</sup><sup>=</sup> β and H <sup>=</sup> -*Sp*(β)<sup>M</sup>-

#### **4.3 Program Logic**

First, we define a set of *local rules* and rules for conditionals, while, sequence, consequence, and framing:


The above rules are intuitively clear and are similar to the local rules in separation logic [38]. The rules for statements capture their semantics using minimal/tight heaplets, and the frame rule allows proving triples with larger heaplets. In the rule for alloc, the postcondition says that the newly allocated location has default values for all pointer fields and datafields (denoted as *def*<sup>f</sup> ). The soundness of the frame rule relies crucially on the frame theorem for FL (Theorem 1). The full soundness proof can be found in the Technical Report [25].

**Theorem 2.** *The above rules are sound with respect to the operational semantics.*

#### **4.4 Weakest-Precondition Proof Rules**

We now turn to the much more complex problem of designing rules that give weakest preconditions for arbitrary postconditions, for loop-free programs. In separation logic, such rules resort to using the magic-wand operator −∗ [12, 27, 28, 38], The magic-wand operator, a complex operator whose semantics calls for *second-order quantification* over arbitrarily large submodels. In our setting, our main goal is to show that FL is itself capable of expressing weakest preconditions of postconditions written in FL.

First, we define a notion of *Weakest Tightest Precondition* (WTP) of a formula β with respect to each command in our operational semantics. To define this notion, we first define a preconfiguration, and use that definition to define weakest tightest preconditions:

**Definition 1.** *The preconfigurations corresponding to a valid configuration* (M, H, U) *with respect to a program* S *are a set of valid configurations of the form* (M<sup>p</sup>, H<sup>p</sup>, U<sup>p</sup>) *(with* <sup>M</sup><sup>p</sup> *being a model,* <sup>H</sup><sup>p</sup> *and* <sup>U</sup><sup>p</sup> *a subuniverse of the locations in* <sup>M</sup>p*, and* <sup>U</sup><sup>p</sup> *being unallocated locations) such that when* <sup>S</sup> *is executed on* <sup>M</sup><sup>p</sup> *with unallocated set* <sup>U</sup><sup>p</sup> *it dereferences only locations in* <sup>H</sup><sup>p</sup> *and results (using the operational semantics rules) in* (M, H, U) *or gets stuck (no transition is available). That is:*

*preconfigurations*((M, H, U), S) =

$$\begin{aligned} \{ (M\_p, H\_p, U\_p) \mid valid(M\_p, H\_p, U\_p) \text{ and } (M\_p, H\_p, U\_p) \stackrel{S}{\Rightarrow} (M, H, U) \text{ or } \emptyset \} \\ (M\_p, H\_p, U\_p) \text{ gets stack on } S \} \end{aligned}$$

**Definition 2.** α *is a WTP of a formula* β *with respect to a program* S *if*

{(M<sup>p</sup>, H<sup>p</sup>, U<sup>p</sup>) <sup>|</sup> <sup>M</sup><sup>p</sup> <sup>|</sup><sup>=</sup> α, H<sup>p</sup> <sup>=</sup> -*Sp*(α)<sup>M</sup><sup>p</sup> , *valid*(M<sup>p</sup>, H<sup>p</sup>, U<sup>p</sup>)} <sup>=</sup> {*preconfigurations*((M, H, U), S) <sup>|</sup> M <sup>|</sup><sup>=</sup> β,H <sup>=</sup> -*Sp*(β)<sup>M</sup>, *valid*(M, H, U)}

With the notion of weakest tightest preconditions, we define global program logic rules for each command of our language. In contrast to local rules, global specifications contain heaplets that may be larger than the smallest heap on which one can execute the command.

Intuitively, a WTP of β for lookup states that β must hold in the precondition when x is interpreted as x , where x <sup>=</sup> f(y), and further that the location y must belong to the support of β. The rules for mutation and allocation are more complex. For mutation, we define a transformation *MW* x.f:=<sup>y</sup>(β) that evaluates a formula β in the pre-state as though it were evaluated in the poststate. We similarly define such a transformation *MW* alloc(x) <sup>v</sup> for allocation. We will define these in detail later. Finally, the deallocation rule ensures x is not in the support of the postcondition. The conjunct f(x) = f(x) is provided to satisfy the tightness condition, ensuring the support of the precondition is the support of the postcondition with x added. The rules can be seen below, and the proof of soundness for these global rules can be found in the Technical Report [25].

**Assignment-G:** {β[y/x]} x := y {β} {β[c/x]} x := c {β} **Lookup-G:** {∃x : x <sup>=</sup> f(y). (β <sup>∧</sup> y <sup>∈</sup> *Sp*(β))[x /x]} x := y.f {β} (where x does not occur in <sup>β</sup>) **Mutation-G:** {*MW* x.f:=<sup>y</sup>(β <sup>∧</sup> x <sup>∈</sup> *Sp*(β))} x.f := y {β} **Allocation-G:** {∀<sup>v</sup> : (<sup>v</sup> <sup>∈</sup> <sup>U</sup>).(<sup>v</sup> <sup>=</sup> *nil* <sup>⇒</sup> *MW* alloc(x) <sup>v</sup> (β))} alloc(x) {β} (for some fresh variable v)

**Deallocation-G:** {β <sup>∧</sup> x <sup>∈</sup> *Sp*(β) <sup>∧</sup> f(x) = f(x)} free(x) {β} (where <sup>f</sup> <sup>∈</sup> <sup>F</sup><sup>m</sup> is an arbitrary (unary) mutable function)

# **4.5 Definitions of** *MW* **Primitives**

Recall that the MW<sup>3</sup> primitives *MW* x.f:=<sup>y</sup> and *MW* alloc(x) <sup>v</sup> need to evaluate a formula β in the pre-state as it would evaluate in the post-state after mutation and allocation statements. The definition of *MW* x.f:=<sup>y</sup> is as follows:

$$MW^{x.f \coloneqq y}(\beta) = \beta[\lambda z. \; ; \; te(z = x : \; \imath te(f(x) = f(x) : y, y), f(z))/f]$$

The β[λz.ρ(z)/f] notation is shorthand for saying that each occurrence of a term of the form f(t), where t is a term, is substituted (recursively, from inside out) by the term ρ(t). The precondition essentially evaluates β taking into account f's transformation, but we use the ite expression with a tautological guard f(x) = f(x) (which has the support containing the singleton x) in order to preserve the support. The definition of *MW* alloc(x) <sup>v</sup> is similar. Refer to the Technical Report [25] for details.

**Theorem 3.** *The rules above suffixed with -G are sound w.r.t the operational semantics. And, each precondition corresponds to the weakest tightest precondition of* β*.*

#### **4.6 Example**

In this section, we will see an example of using our program logic rules that we described earlier. This will demonstrate the utility of Frame Logic as a logic for annotating and reasoning with heap manipulating programs, as well as offer some intuition about how our program logic can be deployed in a practical setting. The following program performs in-place list reversal: j := nil ; while (i != nil) do k := i.next ; i.next := j ; j := i ; i := k For the sake of simplicity, instead of proving that this program reverses a list, we will instead prove the simpler claim that after executing this program j is a *list*. The recursive definition of *list* we use for this proof is the one from Figure 3:

$$dist(x) := \newline it{e}(x = nil, true, \exists z \; :\; z = next(x). \newline list{e}(z) \land x \notin Sp(list(z)))$$

We need to also give an invariant for the while loop, simply stating that i and j point to disjoint lists: *list*(i) <sup>∧</sup> *list*(j) <sup>∧</sup> *Sp*(*list*(i)) <sup>∩</sup> *Sp*(*list*(j)) = <sup>∅</sup>.

We prove that this is indeed an invariant of the while loop below. Our proof uses a mix of both local and global rules from Sections 4.3 and 4.4 above to demonstrate how either type of rule can be used. We also use the consequence rule along with the program rule to be applied in several places in order to simplify presentation. As a result, some detailed analysis is omitted, such as proving supports are disjoint in order to use the frame rule.

$$\{list(i) \land list(j) \land Sp(list(i)) \land Sp(list(j)) = \emptyset \land i \neq nil \} \qquad \text{(convergence rule)}$$

<sup>3</sup> The acronym MW is a shout-out to the Magic-Wand operator, as these serve a similar function, except that they are definable in FL itself.

{*list*(i) <sup>∧</sup> *list*(j) <sup>∧</sup> *Sp*(*list*(i)) <sup>∩</sup> *Sp*(*list*(j)) = ∅ ∧ i <sup>=</sup> nil <sup>∧</sup> i /<sup>∈</sup> *Sp*(*list*(j))} (consequence rule: unfolding list definition) {∃k : k <sup>=</sup> *next*(i). *list*(k ) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(k )) <sup>∧</sup> *list*(j) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(j)) <sup>∧</sup> *Sp*(*list*(k )) <sup>∩</sup> *Sp*(*list*(j)) = ∅} (consequence rule) {∃k : k <sup>=</sup> *next*(i). *next*(i) = *next*(i) <sup>∧</sup> *list*(k ) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(k )) <sup>∧</sup> *list*(j) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(j)) <sup>∧</sup> *Sp*(*list*(k )) <sup>∩</sup> *Sp*(*list*(j)) = ∅} k := i.next ; (consequence rule, lookup-G rule) {*next*(i) = *next*(i) <sup>∧</sup> *list*(k) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(k)) <sup>∧</sup> *list*(j) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(j)) <sup>∧</sup> *Sp*(*list*(k)) <sup>∩</sup> *Sp*(*list*(j)) = ∅} i.next := j ; (mutation rule, frame rule) {*next*(i) = j <sup>∧</sup> *list*(k) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(k)) <sup>∧</sup> *list*(j) <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(j)) <sup>∧</sup> *Sp*(*list*(k)) <sup>∩</sup> *Sp*(*list*(j)) <sup>=</sup> ∅} (consequence rule) {*list*(k) <sup>∧</sup> *next*(i) = j <sup>∧</sup> i <sup>∈</sup> *Sp*(*list*(j)) <sup>∧</sup> *list*(j) <sup>∧</sup> *Sp*(*list*(k)) <sup>∩</sup> *Sp*(*list*(j)) = ∅} (consequence rule: folding list definition) {*list*(k) <sup>∧</sup> *list*(i) <sup>∧</sup> *Sp*(*list*(k)) <sup>∩</sup> *Sp*(*list*(i)) = ∅} j := i ; i := k (assignment-G rule) {*list*(i) <sup>∧</sup> *list*(j) <sup>∧</sup> *Sp*(*list*(i)) <sup>∩</sup> *Sp*(*list*(j)) <sup>=</sup> ∅}

Armed with this, proving j is a list after executing the full program above is a trivial application of the assignment, while, and consequence rules, which we omit for brevity.

Observe that in the above proof we were apply the frame rule because of the fact that i belongs neither to *Sp*(*list*(k)) nor *Sp*(*list*(j)). This can be dispensed with easily using reasoning about first-order formulae with least-fixpoint definitions, techniques for which are discussed in Section 6.

Also note the invariant of the loop is precisely the intended meaning of *list*(i)<sup>∗</sup> *list*(j) in separation logic. In fact, as we will see in Section 6, we can define a *first-order* macro *Star* as *Star* (ϕ, ψ) = ϕ <sup>∧</sup> ψ <sup>∧</sup> *Sp*(ϕ) <sup>∩</sup> *Sp*(ψ) = <sup>∅</sup>. We can use this macro to represent disjoint supports in similar proofs.

These proofs demonstrate what proofs of actual programs look like in our program logic. They also show that frame logic and our program logic can prove many results similarly to traditional separation logic. And, by using the derived operator *Star* , very little even in terms of verbosity is sacrificed in gaining the flexibility of Frame Logic(please see Section 6 for a broader discussion of the ways in which Frame Logic differs from Separation Logic and in certain situations offers many advantages in stating and reasoning with specifications/invariants).

# **5 Expressing a Precise Separation Logic**

In this section, we show that FL is expressive by capturing a fragment of separation logic in frame logic; the fragment is a syntactic fragment of separation logic that defines only *precise formulas*— formulas that can be satisfied in at most one heaplet for any store. The translation also shows that frame logic can naturally and compactly capture such separation logic formulas.

#### **5.1 A Precise Separation Logic**

As discussed in Section 1, a crucial difference between separation logic and frame logic is that formulas in separation logic have uniquely determined supports/heaplets, while this is not true in separation logic. However, it is well known that in verification, determined heaplets are very natural (most uses of separation logic in fact are precise) and sometimes desirable. For instance, see [8] where precision is used crucially to give sound semantics to concurrent separation logic and [29] where precise formulas are proposed in verifying modular programs as imprecision causes ambiguity in function contracts.

We define a fragment of separation logic that defines precise formulas (more accurately, we handle a slightly larger class inductively: formulas that when satisfiable have unique minimal heaplets for any given store). The fragment we capture is similar to the notion of precise predicates seen in [29]:

#### **Definition 3.** *PSL Fragment:*


Note that in the fragment negation and disjunction are disallowed, but mutually exclusive disjunction using *ite* is allowed. Existential quantification is only present when the topmost operator is a ∗ and where one of the formulas guards the quantified variable uniquely.

The semantics of this fragment follows the standard semantics of separation logic [12, 27, 28, 38], with the heaplet of x f −→ y taken to be {x}. See Remark <sup>1</sup> in Section 3.2 for a discussion of a more accurate heaplet for x f −→ y being the set containing the pair (x, f), and how this can be modeled in the above semantics by using field-lookups using non-mutable pointers.

**Theorem 4 (Minimum Heap).** *For any formula* ϕ *in the PSL fragment, if there is an* <sup>s</sup> *and* <sup>h</sup> *such that* s, h <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *then there is a* <sup>h</sup><sup>ϕ</sup> *such that* s, h<sup>ϕ</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *and for all* <sup>h</sup> *such that* s, h <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*,* <sup>h</sup><sup>ϕ</sup> <sup>⊆</sup> <sup>h</sup> *.*

<sup>4</sup> While we only assume unary inductive definitions here, we can easily generalize this to inductive definitions with multiple parameters.

#### **5.2 Translation to Frame Logic**

For a separation logic store and heap s, h (respectively), we define the corresponding interpretation <sup>M</sup>s,h such that variables are interpreted according to <sup>s</sup> and values of pointer functions on *dom*(h) are interpreted according to h. For ϕ in the PSL fragment, we first define a formula P(ϕ), inductively, that captures whether ϕ is precise. ϕ is a precise formula iff, when it is satisfiable with a store s, there is exactly one h such that s, h <sup>|</sup><sup>=</sup> ϕ. The formula P(ϕ) is in separation logic and will be used in the translation. To see why this formula is needed, consider the formula <sup>ϕ</sup><sup>1</sup> <sup>∧</sup>*ite*(*sf* , ϕ2, ϕ<sup>3</sup>). Assume that <sup>ϕ</sup><sup>1</sup> is imprecise, <sup>ϕ</sup><sup>2</sup> is precise, and <sup>ϕ</sup><sup>3</sup> is imprecise. Under conditions where *sf* is true, the heaplets for <sup>ϕ</sup><sup>1</sup> and <sup>ϕ</sup><sup>2</sup> must align. However, when *sf* is false, the heaplets for <sup>ϕ</sup><sup>1</sup> and <sup>ϕ</sup><sup>3</sup> can be anything. Because we cannot initially know when *sf* will be true or false, we need this separation logic formula P(ϕ) that is true exactly when ϕ is precise.

**Definition 4.** *Precision predicate* P*:*

**–** P(*sf* ) = <sup>⊥</sup> *and* P(x f −→ y) = **–** <sup>P</sup>(*ite*(*sf* , ϕ1, ϕ<sup>2</sup>)) = (*sf* <sup>∧</sup> P(ϕ<sup>1</sup>)) <sup>∨</sup> (¬*sf* <sup>∧</sup> P(ϕ<sup>2</sup>)) **–** <sup>P</sup>(ϕ<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup>) = <sup>P</sup>(ϕ<sup>1</sup>) <sup>∨</sup> <sup>P</sup>(ϕ<sup>2</sup>) **–** <sup>P</sup>(ϕ<sup>1</sup> <sup>∗</sup> <sup>ϕ</sup><sup>2</sup>) = <sup>P</sup>(ϕ<sup>1</sup>) <sup>∧</sup> <sup>P</sup>(ϕ<sup>2</sup>) **–** P(I) = *where* I ∈ I *is an inductive predicate* **–** P(∃y. (x f −→ y) <sup>∗</sup> ϕ<sup>1</sup>) = P(ϕ<sup>1</sup>)

Note that this definition captures precision within our fragment since stack formulae are imprecise and pointer formulae are precise. The argument for the rest of the cases follow by simple structural induction.

Now we define the translation T inductively:

**Definition 5.** *Translation from PSL to Frame Logic:*


$$-\left[T(\exists y.\left(x\xrightarrow{f}y\right)\*\varphi\_1\right)=\exists y:\left[f(x)=y\right].\left[T(\varphi\_1)\land x\notin Sp(T(\varphi\_1))\right].$$

Finally, recall that any formula ϕ in the PSL fragment has a unique minimal heap (Theorem 4). With this (and a few auxiliarly lemmas that can be found in the Technical Report [25]), we have the following theorem, which captures the correctness of the translation:

**Theorem 5.** *For any formula* ϕ *in the PSL fragment, we have the following implications:* s, h <sup>|</sup><sup>=</sup> <sup>ϕ</sup> <sup>=</sup>⇒ Ms,h <sup>|</sup><sup>=</sup> <sup>T</sup>(ϕ)

<sup>M</sup>s,h <sup>|</sup><sup>=</sup> <sup>T</sup>(ϕ) =<sup>⇒</sup> s, h <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *where* <sup>h</sup> ≡ Ms,h(*Sp*(T(ϕ)))

*Here,* <sup>M</sup>s,h(*Sp*(T(ϕ))) *is the interpretation of Sp*(T(ϕ)) *in the model* <sup>M</sup>s,h*. Note* <sup>h</sup> *is minimal and is equal to* <sup>h</sup><sup>ϕ</sup> *as in Theorem 4.*

# **6 Discussion**

*Comparison with Separation Logic.* The design of frame logic is, in many ways, inspired by the design choices of separation logic. Separation logic formulas implicitly hold on *tight* heaplets— models are defined on pairs (s, h), where s is a store (an interpretation of variables) and h is a heaplet that defines a subset of the heap as the domain for functions/pointers. In Frame Logic, we choose to not define satisfiability with respect to heaplets but define it with respect to the entire heap. However, we give access to the implicitly defined heaplet using the operator *Sp*, and give a logic over *sets* to talk about supports. The separating conjunction operation ∗ can then be expressed using normal conjunction and a constraint that says that the support of formulae are disjoint.

We do not allow formulas to have *multiple* supports, which is crucial as *Sp* is a function, and this roughly corresponds to *precise* fragments of separation logic. Precise fragments of separation logic have already been proposed and accepted in the separation logic literature for giving robust handling of modular functions, concurrency, etc. [8, 29]. Section 5 details a translation of a precise fragment of separation logic (with ∗ but not magic wand) to frame logic that shows the natural connection between precise formulas in separation logic and frame logic.

Frame logic, through the support operator, facilitates local reasoning much in the same way as separation logic does, and the frame rule in frame logic supports frame reasoning in a similar way as separation logic. The key difference between frame logic and separation logic is the adherence to a first-order logic (with recursive definitions), both in terms of syntax and expressiveness.

First and foremost, in separation logic, the magic wand is needed to express the weakest precondition [38]. Consider for example computing the weakest precondition of the formula list(x) with respect to the code y.n := z. The weakest precondition should essentially describe the (tight) heaplets such that changing the n pointer from y to z results in x pointing to a list. In separation logic, this is expressed typically (see [38]) using magic wand as (y <sup>n</sup> −→ z) −∗ (list(x)). However, the magic wand operator is inherently a *second-order* property. The formula α −∗β holds on a heaplet h if for any *disjoint* heaplet that satisfies α, β will hold on the conjoined heaplet. Expressing this property (for arbitrary α, whose heaplet can be *unbounded*) requires quantifying over unbounded heaplets satisfying α, which is not first order expressible.

In frame logic, we instead rewrite the recursive definition list(·) to a new one list (·) that captures whether x points to a list, assuming that n(y) = z (see Section 4.4). This property continues to be expressible in frame logic and can be converted to first-order logic with recursive definitions (see Section 3.5). Note that we are exploiting the fact that there is only a bounded amount of change to the heap in straight-line programs in order to express this in FL.

Let us turn to expressiveness and compactness. In separation logic, separation of structures is expressed using ∗, and in frame logic, such a separation is expressed using conjunction and an additional constraint that says that the supports of the two formulas are disjoint. A precise separation logic formula of the form <sup>α</sup><sup>1</sup> <sup>∗</sup> <sup>α</sup><sup>2</sup> <sup>∗</sup> ...α<sup>n</sup> is compact and would get translated to a much larger formula in frame logic as it would have to state that the supports of each pair of formulas is disjoint. We believe this can be tamed using macros (*Star*(α, β) = α <sup>∧</sup> β <sup>∧</sup> *Sp*(α) <sup>∩</sup> *Sp*(β) = <sup>∅</sup>).

There are, however, several situations where frame logic leads to more compact and natural formulations. For instance, consider expressing the property that x and y point to lists, which may or may not overlap. In Frame Logic, we simply write list(x) <sup>∧</sup> list(y). The support of this formula is the union of the supports of the two lists. In separation logic, we cannot use ∗ to write this compactly (while capturing the tightest heaplet). Note that the formula (list(x) <sup>∗</sup> true) <sup>∧</sup> (list(y) <sup>∗</sup> true) is *not* equivalent, as it is true in heaplets that are larger than the set of locations of the two lists. The simplest formulation we know is to write a recursive definition *lseg*(u, v) for list segments from u to v and use quantification: (∃z. *lseg*(x, z) <sup>∗</sup> *lseg*(y, z) <sup>∗</sup> list(z)) <sup>∨</sup> (list(x) <sup>∗</sup> list(y)) where the definition of *lseg* is the following: *lseg*(u, <sup>v</sup>) <sup>≡</sup> (<sup>u</sup> <sup>=</sup> <sup>v</sup> <sup>∧</sup> *emp*) <sup>∨</sup> (∃w. u <sup>→</sup> w <sup>∗</sup> lseg(w, v)).

If we wanted to say <sup>x</sup>1,...,x<sup>n</sup> all point to lists, that may or may not overlap, then in FL we can say list(x<sup>1</sup>) <sup>∧</sup> list(x<sup>2</sup>) <sup>∧</sup> ... <sup>∧</sup> list(x<sup>n</sup>). However, in separation logic, the simplest way seems to be to write using *lseg* and a linear number of quantified variables and an exponentially-sized formula. Now consider the property saying <sup>x</sup>1,...,x<sup>n</sup> all point to binary trees, with pointers *left* and *right*, and that can overlap arbitrarily. We can write it in FL as tree(x<sup>1</sup>)∧...∧tree(x<sup>n</sup>), while a formula in (first-order) separation logic that expresses this property seems very complex.

In summary, we believe that frame logic is a logic that supports frame reasoning built on the same principles as separation logic, but is still translatable to first-order logic (avoiding the magic wand), and makes different choices for syntax/semantics that lead to expressing certain properties more naturally and compactly, and others more verbosely.

*Reasoning with Frame Logic using First-Order Reasoning Mechanisms.* An advantage of the adherence of frame logic to being translatable to a first-order logic with recursive definitions is the power to reason with it using first-order theorem proving techniques. While we do not present tools for reasoning in this paper, we note that there are several reasoning schemes that can readily handle first-order logic with recursive definitions.

The theory of dynamic frames [18] has been proposed for frame reasoning for heap manipulating programs and has been adopted in verification engines like Dafny [21] that provide automated reasoning. A key aspect of dynamic frames is the notion of regions, which are subsets of locations that can be used to define subsets of the heap that change or do not change when a piece of code is executed. Program logics such as region logic have been proposed for objectoriented programs using such regions [1–3]. The supports of formulas in frame logic are also used to express such regions, but the key difference is that the definition of regions is given *implicitly* using supports of formulas, as opposed to explicitly defining them. Separation logic also defines regions implicitly, and

in fact, the work on implicit dynamic frames [31, 39] provides translations from separation logic to regions for reasoning using dynamic frames.

Reasoning with regions using set theory in a first-order logic with recursive definitions has been explored by many works to support automated reasoning. Tools like Vampire [20] for first-order logic have been extended in recent work to handle algebraic datatypes [19]; many data-structures in practice can be modeled as algebraic datatypes and the schemes proposed in [19] are powerful tools to reason with them using first-order theorem provers.

A second class of tools are those proposed in the work on natural proofs [23, 32, 37]. Natural proofs explicitly work with first order logic with recursive definitions (FO-RD), implementing validity through a process of unfolding recursive definitions, uninterpreted abstractions, and proving inductive lemmas using induction schemes. Natural proofs are currently used primarily to reason with separation logic by first translating verification conditions arising from Hoare triples with separation logic specifications (without magic wand) to first-order logic with recursive definitions. Frame logic reasoning can also be done in a very similar way by translating it first to FO-RD.

The work in [23] considers natural proofs and quantifier instantiation heuristics for FO-RD (using a similar setup of foreground sort for locations and background sorts), and the work identifies a fragment of FO-RD (called safe fragment) for which this reasoning is *complete* (in the sense that a formula is detected as unsatisfiable by quantifier instantiation iff it is unsatisfiable with the inductive definitions interpreted as fixpoints and not least fixpoints). Since FL can be translated to FO-RD, it is possible to deal with FL using the techniques of [23]. The conditions for the safe fragment of FO-RD are that the quantifiers over the foreground elements are the outermost ones, and that terms of foreground type do not contain variables of any background type. As argued in [23], these restrictions are typically satisfied in heap logic reasoning applications.

# **7 Related Work**

The frame problem [13] is an important problem in many different domains of research. In the broadest form, it concerns representing and reasoning about the effects of a local action without requiring explicit reasoning regarding static changes to the global scope. For example, in artificial intelligence one wants a logic that can seamlessly state that if a door is opened in a lit room, the lights continue to stay switched on. This issue is present in the domain of verification as well, specifically with heap-manipulating programs.

There are many solutions that have been proposed to this problem. The most prominent proposal in the verification context is separation logic [12, 27, 28, 38], which we discussed in detail in the previous section.

In contrast to separation logic, the work on Dynamic Frames [17, 18] and similarly inspired approaches such as Region Logic [1–3] allow methods to explicitly specify the portion of the support that may be modified. This allows fine-grained control over the modifiable section, and avoids special symbols like

∗ and −∗. However, explicitly writing out frame annotations can become verbose and tedious.

The work on Implicit Dynamic Frames [22, 39, 40] bridges the worlds of separation logic (without magic wand) and dynamic frames— it uses separation logic and fractional permissions to implicitly define frames (reducing annotation burden), allows annotations to access these frames, and translates them into set regions for first-order reasoning. Our work is similar in that frame logic also implicitly defines regions and gives annotations access to these regions, and can be easily translated to pure FO-RD for first-order reasoning.

One distinction with separation logic involves the non-unique heaplets in separation logic and the unique heaplets in frame logic. Determined heaplets have been used [29, 32, 37] as they are more amenable to automated reasoning. In particular a separation logic fragment with determined heaplets known as precise predicates is defined in [29], which we capture using frame logic in Section 5.

There is also a rich literature on reasoning with these heap logics for program verification. Decidability is an important dimension and there is a lot of work on decidable logics for heaps with separation logic specifications [4–6, 11, 26, 33]. The work based on EPR (Effectively Propositional Reasoning) for specifying heap properties [14–16] provides decidability, as does some of the work that translates separation logic specifications into classical logic [34].

Finally, translating separation logic into classical logics and reasoning with them is another solution pursued in a lot of recent efforts [10, 23, 24, 32, 32, 34–37, 41]. Other techniques including recent work on cyclic proofs [9, 42] use heuristics for reasoning about recursive definitions.

# **8 Conclusions**

Our main contribution is to propose *Frame Logic*, a classical first-order logic endowed with an explicit operator that recovers the implicit supports of formulas and supports frame reasoning. we have argued its expressive by capturing several properties of data-structures naturally and succinctly, and by showing that it can express a precise fragment of separation logic. The program logic built using frame logic supports local heap reasoning, frame reasoning, and weakest tightest preconditions across loop-free programs.

We believe that frame logic is an attractive alternative to separation logic, built using similar principles as separation logic while staying within the firstorder logic world. The first-order nature of the logic makes it potentially amenable to easier automated reasoning.

A practical realization of a tool for verifying programs in a standard programming language with frame logic annotations by marrying it with existing automated techniques and tools for first-order logic (in particular [19, 24, 32, 37, 41]), is the most compelling future work.

**Acknowledgements:** We thank ESOP'20 reviewers for their comments that helped improve this paper. This work is based upon research supported by the National Science Foundation under Grant NSF CCF 1527395.

# **Bibliography**


guages. pp. 123–136. POPL '12, ACM, New York, NY, USA (2012). https://doi.org/10.1145/2103656.2103673


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Proving the safety of highly-available distributed objects**

Sreeja S Nair<sup>1</sup> , Gustavo Petri<sup>2</sup> , and Marc Shapiro<sup>1</sup>

<sup>1</sup> Sorbonne Universit´e—LIP6 & Inria, Paris, France <sup>2</sup> ARM Research, Cambridge, UK

**Abstract.** To provide high availability in distributed systems, object replicas allow concurrent updates. Although replicas eventually converge, they may diverge temporarily, for instance when the network fails. This makes it difficult for the developer to reason about the object's properties, and in particular, to prove invariants over its state. For the subclass of state-based distributed systems, we propose a proof methodology for establishing that a given object maintains a given invariant, taking into account any concurrency control. Our approach allows reasoning about individual operations separately. We demonstrate that our rules are sound, and we illustrate their use with some representative examples. We automate the rule using Boogie, an SMT-based tool.

**Keywords:** Replicated objects · Consistency · Automatic verification · Distributed application design · Tool support

# **1 Introduction**

Many modern applications serve users accessing shared data in different geographical regions. Examples include social networks, multi-user games, cooperative engineering, collaborative editors, source-control repositories, or distributed file systems. One approach would be to store the application's data (which we call object) in a single central location, accessed remotely. However, users far from the central location would suffer long delays and outages.

Instead, the object is *replicated* to several locations. A user accesses the closest available replica. To ensure *availability*, an update must not synchronise across replicas; otherwise, when a network partition occurs, the system would block. Thus, a replica executes both queries and updates locally, and propagates its updates to other replicas asynchronously.

Updates at different locations are concurrent; this may cause replicas to diverge, at least temporarily. Replicas may diverge, but if the system ensures Strong Eventual Consistency (SEC), this ensures that replicas that have received the same set of updates have the same state [25], simplifying the reasoning.

The replicated object may also require to maintain some (application-specific) *invariant*, an assertion about the object. We say a state is safe if the invariant is true in that state; the system is safe if every reachable state is safe. In a sequential system, this is straightforward (in principle): if the initial state is safe,

and the final state of every update individually is safe, then the system is safe. However, these conditions are not sufficient in the replicated case, because concurrent updates at different replicas may interfere with one another. This can be fixed by synchronising between some or all types of updates. To maximise availability and latency, such synchronisation should be minimised. In this paper, we propose a proof methodology to ensure that a given object is system-safe, for a given invariant and a given amount of concurrency control. In contrast to previous works, we consider state-based objects.<sup>1</sup> Indeed, the specific properties of state-based propagation enable simple modular reasoning despite concurrency, thanks to the concept of *concurrency invariant*. Our proof methodology derives the concurrency invariant automatically from the sequential specification. Now, if the initial state is safe, and every update maintains both the application invariant and the concurrency invariant, then every reachable state is safe, even in concurrent executions, regardless of network partitions. We have developed a tool named Soteria, to automate our proof methodology. Soteria analyses the specification to detect concurrency bugs and provides counterexamples.

The contributions of this paper are as follows:


# **2 Background**

As a running example, consider a simple auction system (for simplicity, we consider a single auction). An auction object is composed of the following parts:

	- BidId: A unique identifier
	- Placed: A boolean flag to indicate whether the bid has been placed or not. Initially, it is FALSE. Once placed, a bid cannot be withdrawn.
	- The monetary Amount of the bid; this cannot be modified once the bid is created.

Fig. 1: Evolution of state of an auction object

Figure 1 illustrates how the auction state evolves over time. The state of the object is geo-replicated at data centers in Adelaide, Brussels, and Calgary. Users at different locations can start an auction, place bids, close the auction, declare a winner, inspect the local replica, and observe if a winner is declared and who it is. The updates are propagated asynchronously to other replicas. All replicas will eventually agree on the same auction status, the same set of bids and the same winner.

There are two basic approaches to propagating updates. The operation-based approach applies an update to some origin replica, then transmits the operation itself to be replayed at other replicas. If messages are delivered in causal order, exactly once, and concurrent operations are commutative, then two replicas that received the same updates reach the same state (this is the Strong Eventual Consistency guarantee, or SEC) [25].

The state-based approach applies an update to some origin replica. Occasionally, a replica sends its full state to some other replica, which *merge*s the received state into its own. If the state space forms a monotonic semi-lattice, an update is an inflation (its output state is not lesser than the input state), and *merge* computes the least-upper-bound of the local and received states, then SEC is guaranteed [25]. As long as every update eventually reaches every replica, messages may be dropped, re-ordered or duplicated, and the set of replicas may be unknown. Due to these relaxed requirements, state-based propagation is widely used in industry. Figure 1 shows the state-based approach with local operations and merges. Alternatives exist where only a delta of the state —that is, the portion of the state not known to be part of the other replicas— is sent as a message [1]; since this is an optimisation, it is of no consequence to the results of this paper.

<sup>1</sup> As opposed to operation-based. These terms are defined in Section 2.

Looking back to Figure 1, we can see that replicas diverge temporarily. This temporary divergence can lead to an unsafe state, in this case declaring a wrong winner. This correctness problem has been addressed before; however, previous works mostly consider the operation-based propagation approach [11, 13, 19, 24].

# **3 System Model**

In this section, we first introduce the object components, explain the underlying system model informally, and then formalise the operational semantics.

# **3.1 General Principles**

An object consists of a state, a set of operations, a merge function and an invariant. Figure 1 illustrates three replicas of an auction object, at three different locations, represented by the horizontal lines. The object evolves through a set of states. Each line depicts the evolution of the state of the corresponding replica; time flows from left to right.

*State.* A distributed system consists of a number of servers, with disjoint memory and processing capabilities. The servers might be distributed over geographical regions. A set of servers at a single location stores the state of the object. This is called a single *replica*. The object is replicated at different geographical locations, each location having a full copy of the state. In the simplest case (for instance at initialisation) the state at all replicas will be identical. The state of each replica is called a *local state*. The global view, comprising all local states is called the *global state*.

*Operations.* Each replica may perform the operations defined for the object. To support availability, an operation modifies the local state at some arbitrary replica, the *origin replica* for that operation, without synchronising with other replicas (the cost of synchronisation being significant at scale). An operation might consist of several changes; these are applied to the replica as a single atomic unit.

Executing an operation on its origin replica has an immediate effect. However, the state of the other replicas, called *remote replicas*, remains unaltered at this point. The remote replicas get updated when the state is eventually propagated. An immediate consequence of this execution model is that in the presence of concurrent operations, replicas can reach different states, i.e. they diverge.

Let us illustrate this with our example in Figure 1. Initially, the auction is yet to start, the winner is not declared and no bids are placed. By default, a replica can execute any operation - start auction, place bid, and close auction - locally without synchronising with other replicas. We see that the local states of replicas occasionally diverge. For example at the point where operation close auction completes at the Adelaide replica, the Adelaide replica is aware of only a \$100 bid, the Brussels replica has two bids, and the Calgary replica observes only one bid for \$105.

*State Propagation.* A replica occasionally propagates its state to other replicas in the system and a replica receiving a remote state *merges* it into its own.

In Figure 1, the arrows crossing between replicas represent the delivery of a message containing the state of the source replica, to be merged into the target replica. A message is labelled with the state propagated. For instance, the first message delivery at the Brussels replica represents the result of updating the local state (setting auction status to ACTIVE), with the state originating in the replica at Adelaide (auction started).

Similar to the operations, a merge is atomic. In Figure 1, Alice closes the auction at the Adelaide replica. This atomically sets the status of the auction to CLOSED and declares a winner from the set of bids it is aware of. The updated auction state and winner are transmitted together. Merging is performed atomically by the Brussels replica.<sup>2</sup>

We now specify the merge operation for an auction. The receiving replica's local state is denoted σ = (status, winner, Bids), the received state is denoted σ- = (status- , winner- , Bids- ) and the result of merge is denoted as σnew = (statusnew, winnernew, Bidsnew).

```
merge((status ,winner ,Bids),(status-
                                      ,winner-
                                               ,Bids-
                                                     )) :
  statusnew := max(status ,status-
                                    )
  winnernew := winner-
                        = ⊥ ? winner-
                                        : winner
  for (b in Bids ∪ Bids-
                          )
    Bidsnew .b.placed := Bids.b.placed ∨ Bids-
                                                 .b.placed
    Bidsnew .b.amount := max(Bids .b.amount , Bids-
                                                    .b.amount)
```
Furthermore, we require the operations and merge to be defined in a way that ensures convergence. We discuss the relevant properties later in Section 6.1.

*Invariants.* An invariant is an assertion that must evaluate to true in every local state of every replica. Although evaluated locally at each replica, the invariant is in effect global, since it must be true at all replicas, and replicas eventually converge. For our running example, the invariant can be stated as follows:


This condition must hold true in all possible executions of the object.

#### **3.2 Notations and Assumptions**

First, we introduce some notations and assumptions:


<sup>2</sup> We see that this leads to an unsafe state, we discuss this in detail in Section 4.2


#### **3.3 Operational Semantics**

In this and the following subsections we will present two semantics for systems propagating states. Importantly, while the first semantics takes into account the effects of the network on the propagation of the states, and is hence an accurate representation of the execution of systems with state propagation, we will show in the next subsection that reasoning about the network is unnecessary in this kind of system. We will demonstrate this claim by presenting a much simpler semantics in which the network is abstracted away. The importance of this reduction is that the number of events to be considered, both when conducting proofs and when reasoning about applications, is greatly reduced. As informal evidence of this claim, we point at the difference in complexity between the semantic rules presented in Figure 2 and Figure 3. We postpone the equivalence argument to Theorem 1.

Figure 2 presents the semantic rules describing what we shall call the *precise semantics* (we will later present a more abstract version) defining the transition relations describing how the state of the object evolves.

The figure defines a semantic judgement of the form (Ω, M) −→ (Ωnew, Mnew) where (Ω, M) is a configuration where the replica states are given by Ω as shown above, and M is a set of messages that have been transmitted by different replicas and are pending to be received by their target replicas.

Rule Operation presents the state transition resulting from a replica r executing an operation op. The operation queries the state of replica r, evaluates the semantic function for operation op and updates its state with the result. The

<sup>3</sup> This notation of a global state is used only to explain and prove our proof rule. In fact, the rule is based only on the local state of each replica.

$$\begin{array}{c c c} \text{Operator} & \text{Open} & \text{(\text{op})} (\sigma) = \sigma\_{new} & \Omega\_{new} = \Omega [\mathbf{r} \leftarrow \sigma\_{new}] \\ \hline & (\Omega, \mathbb{M}) \to (\Omega\_{new}, \mathbb{M}) \\ \hline & \Sigma(\mathbf{r}) = \sigma & \mathbf{r}' \in \text{dom}(\Omega) \; \{\mathbf{r}\} & \mathbb{M}\_{new} = \mathbb{M} \cup \{\langle \mathbf{r} \stackrel{\sigma}{\rightarrow} \mathbf{r}' \rangle\} \\ \hline & (\Omega, \mathbb{M}) \to (\Omega, \mathbb{M}\_{new}) \end{array}$$

Merge

$$\Omega(\mathbf{r}) = \sigma$$

$$\frac{\mathfrak{M}\_{new} = \mathfrak{N} \mid \{ \langle \text{ } \mathbf{r}' \xrightarrow{\sigma'} \mathbf{r} \text{ } \rangle \} \qquad \text{[nerge]} \\ (\sigma, \sigma') = \sigma\_{new} \qquad \Omega\_{new} = \Omega[\mathbf{r} \leftarrow \sigma\_{new}]$$

$$\frac{(\Omega, \mathfrak{N}) \to (\Omega\_{new}, \mathfrak{M}\_{new})}{(\Omega, \mathfrak{N}) \to (\Omega\_{new}, \mathfrak{M}\_{new})}$$

$$\begin{array}{l} \text{Op} \& \text{PROADC} \\ \Omega(\mathbf{r}) = \sigma \quad \text{[op]}(\sigma) = \sigma\_{new} \quad \quad \Omega\_{new} = \Omega[\mathbf{r} \leftarrow \sigma\_{new}] \\ \hline \mathsf{M}\_{new} = \mathsf{M} \cup \{ \ \langle \text{ } \texttt{r} \xrightarrow{\sigma\_{new}} \mathsf{r}' \rangle \mid \mathbf{r}' \in \mathsf{dom}(\Omega) \backslash \{ \mathbf{r} \} \} \\ \hline \end{array}$$

Merge & Broadcast

$$\Omega(\mathbf{r}) = \sigma$$

$$\begin{array}{c} \mathsf{M}\_{new} = \mathsf{M} \mid \{ \langle \text{ } \mathbf{r}' \xrightarrow{\sigma'} \mathbf{r} \text{ } \rangle \} \quad \quad \quad [\texttt{merge}] (\sigma, \sigma') = \sigma\_{new} \qquad \quad \quad \Omega\_{new} = \Omega[\mathbf{r} \leftarrow \sigma\_{new}]$$

$$\begin{array}{c} \mathsf{M}\_{new'} = \mathsf{M}\_{new} \cup \{ \langle \text{ } \mathbf{r} \xrightarrow{\sigma\_{new}} \mathbf{r} \text{ } \rangle \mid \mathbf{r} \text{ } \rangle^{\prime} \in \mathsf{dom}(\Omega) \mid \{ \mathbf{r} \} \} \\\hline \quad \quad \quad (\Omega, \mathbb{M}) \to (\Omega\_{new}, \mathbb{M}\_{new}) \end{array}$$

set of messages M does not change. The second rule, Send, represents the nondeterministic sending of the state of replica r to replica r- . The rule has no other effect than to add a message to the set of pending messages M. The Merge rule picks any message, r σ- −→ r , in the set of pending messages M, and applies the merge function to the destination replica with the state in the payload of the message, removing r σ- −→ r from M.

The final two rules, Op & Broadcast and Merge & Broadcast represent the specific case when the states are immediately sent to all replicas. These rules are not strictly necessary since they are subsumed by the application of either Operation or Merge followed by one Send per replica. We will, however, use them to simplify a simulation argument in what follows.

We remark at this point that no assumptions are made about the duplication of messages or the order in which messages are delivered. This is in contrast to other works on the verification of properties of replicated objects [11, 13]. The reason why this assumption is not a problem in our case is that the least-upperbound assumption of the merge function, as well as the inflation assumptions on the states considered in Item 2 (Section 6.1) mean that delayed messages have no effect when they are merged.

Operation <sup>Ω</sup>(r) = <sup>σ</sup> op(σ) = <sup>σ</sup>new <sup>Ω</sup>new <sup>=</sup> <sup>Ω</sup>[<sup>r</sup> <sup>←</sup> <sup>σ</sup>new] (Ω, S) −→ (Ωnew, S ∪ {σnew}) Merge Ω(r) = σ σ- <sup>∈</sup> <sup>S</sup> merge(σ, σ- ) = σnew Ωnew = Ω[r ← σnew] (Ω, σ) −→ (Ωnew, S ∪ {σnew})

Fig. 3: Semantic Rules with a History of States

As customary we will denote with (Ω, M) <sup>∗</sup> −→ (Ωnew, Mnew) the repeated application of the semantic rules zero or more times, from the state (Ω, M) resulting in the state (Ωnew, Mnew).

It is easy to see how the example in Figure 1 proceeds according to these rules for the auction.

The following lemma,<sup>4</sup> to be used later, establishes that whenever we use only the broadcast rules, for any intermediate state in the execution, and for any replica, when considering the final state of the trace, either the replica has already observed a fresher version of the state in the execution, or there is a message pending for it with that state. This is an obvious consequence of broadcasting.

**Lemma 1.** *If we consider a restriction to the semantics of Figure 2 where instead of applying the* Operation *rule of Figure 2 we apply the* Op & Broadcast *rule always, and instead of applying the* Merge *rule we apply* Merge & Broadcast *always, we can conclude that given an execution starting from an initial global state* Ω<sup>i</sup> *with*

$$(\Omega\_i, \emptyset) \xrightarrow{\ast} (\Omega, \mathbb{M}) \xrightarrow{\ast} (\Omega\_{new}, \mathbb{M}\_{new})$$

*for any two replicas* r *and* r*and a state* σ *such that* Ω(r) = σ*, then either:*

**–** Ωnew(r- ) ≥ σ*, or* **–** <sup>r</sup> <sup>σ</sup> −→ r-∈ Mnew*.*

#### **3.4 Operational Semantics with State History**

We now turn our attention to a simpler semantics where we omit messages from configurations, but instead, we record in a separate set all the states occurring in any replica throughout the execution.

The semantics in Figure 3 presents a judgement of the form (Ω, S) −→ (Ωnew, Snew) between configurations of the form (Ω, S) as before, but where the set of messages is replaced by a set of states denoted with the meta-variable <sup>S</sup> <sup>∈</sup> <sup>P</sup>(Σ).

<sup>4</sup> The proofs for the lemmas are included in the extended version[23].

The rules are simple. Operation executes an operation as before, and it adds the resulting new state to the set of observed states. The rule Merge non-deterministically selects a state in the set of states and it merges a nondeterministically chosen replica with it. The resulting state is also added to the set of observed states.

**Lemma 2.** *Consider a state* (Ω, S) *reachable from an initial global state* Ω<sup>i</sup> *with the semantics of Figure 3. Formally:* (Ωi, {σi}) <sup>∗</sup> −→ (Ω, S)*. We can conclude that the set of recorded states in the final configuration* S *includes all of the states present in any of the replicas*

$$\left(\bigcup\_{\mathbf{r}\in\mathsf{dom}(\Omega)}\{\Omega(\mathbf{r})\}\right)\subseteq\mathbf{s}$$

### **3.5 Correspondence between the semantics**

In this section, we show that removing the messages from the semantics, and choosing to record states instead renders the same executions. To that end, we will define the following relation between configurations of the two semantics which will be later shown to be a bisimulation.

**Definition 1 (Bisimulation Relation).** *We define the relation* R<sup>Ω</sup><sup>i</sup> *between a configuration* (Ω, M) *of the semantics of Figure 2 and a configuration* (Ω, S) *of the semantics of Figure 3 parameterized by an initial global state* Ω<sup>i</sup> *and denoted by*

$$(\Omega, \mathbb{M})\,\,\mathcal{R}\_{\Omega\_i}\,\,(\Omega, \mathbb{S})\,.$$

*when the following conditions are met:*

*1.* (Ωi, <sup>∅</sup>) <sup>∗</sup> −→ (Ω, M)*, and 2.* (Ωi, {σi}) <sup>∗</sup> −→ (Ω, S)*, and 3.* { <sup>σ</sup> | <sup>r</sup> <sup>σ</sup> −→ r-∈ M } ⊆ S

In other words, two states represented in the two configurations are related if both are reachable from an initial global state and all the states transmitted by the messages (M) is present in the history (S).

We can now show that this relation is indeed a bisimulation. We first show that the semantics of Figure 3 simulates that of Figure 2. That is, all behaviours produced by the precise semantics with messages can also be produced by the semantics with history states. This is illustrated in the commutative diagram of Figure 4a and Figure 4b, where the dashed arrows represent existentially quantified components that are proven to exist in the theorem.

**Lemma 3 (State-semantics simulates Messages-semantics).** *Consider a reachable state* (Ω, M) *from the initial state* Ω<sup>i</sup> *in the semantics of Figure 2. Consider moreover that according to that semantics there exists a transition of the form*

$$(\Omega, \mathbb{M}) \to (\Omega\_{new}, \mathbb{M}\_{new})$$

$$\begin{array}{c} (\Omega\_i, \emptyset) \xrightarrow{\star} \begin{array}{c} (\Omega, \mathsf{M}) \xrightarrow{\star} \begin{array}{c} (\Omega, \mathsf{M}) \xrightarrow{\star} \begin{array}{c} (\Omega\_{new}, \mathsf{M}\_{new}) \end{array} \end{array} \end{array} \xrightarrow{\begin{array}{c} (\Omega\_{new}, \mathsf{M}\_{new})} \end{array} \xrightarrow{\begin{array}{c} (\Omega\_{new}, \mathsf{M}\_{new}) \end{array}} \end{array}$$

$$(\Omega\_i, \{\sigma\_i\}) \xrightarrow{\star} \begin{array}{c} (\Omega, \mathsf{S}) \xrightarrow{\star} \begin{array}{c} (\Omega\_{new}, \mathsf{S}\_{new}) \end{array} \end{array} \xrightarrow{\begin{array}{c} (\Omega\_{new}, \mathsf{S}\_{new}) \end{array}} \right)$$

(a) Precise to History-preserving Simulation

(Ωi, {σi}) (Ω, S) (Ωnew, Snew) (Ωi, ∅) (Ω, M) (Ωnew, Mnew) ∗ RΩi RΩi ∗

(b) History-preserving to Precise Simulation

#### Fig. 4: Simulation Schema

*and consider that there exists a state* (Ω, S) *of the history preserving semantics of Figure 3 such that they are related by the simulation relation*

$$(\Omega, \mathbb{M})\,\,\mathcal{R}\_{\Omega\_i}\,\,(\mathcal{Q}, \mathbb{S})\,.$$

*We can conclude that, as illustrated in Figure 4a, there exists a state* (Ωnew, Snew) *such that*

(Ω, S) −→ (Ωnew, Snew) *and* (Ωnew, Mnew) R<sup>Ω</sup><sup>i</sup> (Ωnew, Snew)

We will now consider the lemma showing the inverse relation. To that end we will consider a special case of the semantics of Figure 2 where instead of applying the Operation rule, we will always apply the Op & Broadcast rule, and instead of the Merge rule, we will apply Merge & Broadcast. As we mentioned before, this is equivalent to the application of the Operation/Merge rule, followed by a sequence of applications of Send. The reason we will do this is that we are interested in showing that for any execution of the semantics in Figure 3 there is an equivalent (simulated) execution of the semantics of Figure 2. Since all states can be merged in the semantics of Figure 3 we have to assume that in the semantics of Figure 2 the states have been sent with messages. Fortunately, we can choose how to instantiate the existential send messages to apply the rules as necessary, and that justifies this choice.

**Lemma 4 (Messages-semantics simulates State-semantics).** *Consider a reachable state* (Ω, S) *from the initial state* Ω<sup>i</sup> *in the semantics of Figure 3. Consider moreover that according to that semantics there exists a transition of the form*

$$(\Omega, \mathbb{S}) \to (\Omega\_{new}, \mathbb{S}\_{new})$$

*and consider that there exists a state* (Ω, M) *of the state-preserving semantics of Figure 3 such that they are related by the simulates relation*

$$(\Omega, \mathbb{M})\,\,\mathcal{R}\_{\Omega\_i}\,\,(\mathcal{Q}, \mathbb{S})\,.$$

*We can conclude that there exists a state* (Ωnew, Mnew) *such that*

(Ω, M) −→ (Ωnew, Mnew) *and* (Ωnew, Mnew) R<sup>Ω</sup><sup>i</sup> (Ωnew, Snew)

As before, an illustration of this lemma is presented in Figure 4b.

We can now conclude that the two semantics are bisimilar:

**Theorem 1 (Bisimulation).** *The semantics of Figure 2 and Figure 3 are bisimilar as established by the relation defined in Definition 1.*

The theorem above justifies carrying out our proofs with respect to the semantics of Figure 3, which has fewer rules and it better aligns with our proof methodology. This is also justifies that when reasoning semantically about statepropagating object systems we can generally ignore the effects of network delays and messages.

From the standpoint of concurrency, the system model allows the execution of asynchronous concurrent operations, where each operation is executed atomically in each replica, and the aggregation of results of different operations is performed lazily as replicas exchange their state. At this point, we assume the set of states, along with the operations and merge, forms a monotonic semi-lattice. This is a sufficient condition for Strong Eventual Consistency [3, 4, 25].

We have seen that even though we achieve convergence later, there can be instances or even long periods of time during which replicas might diverge. We need to ensure that the concurrent executions are still safe. In the next section, we discuss how to ensure safety of distributed objects built on top of the system model we described.

# **4 Proving Invariants**

In this section, we report our invariant verification strategy. Specifically, we consider the problem of verifying *invariants* of highly-available distributed objects.

To support the verification of invariants we will consider a syntactic-driven approach based on program logic. Bailis et al.[2] identifies necessary and sufficient run-time conditions to establish the security of application invariants for highlyavailable distributed databases in a criterion dubbed I-confluence. Moreover, they consider the validity of a number of typical invariants and applications. Our work improves on the I-confluence criterion defined in [2] by providing a static, syntax-driven, and mostly-automatic mechanism to verify the correctness of an invariant for an application. We will address the specific differences in Section 7, the related work.

An important consequence of our verification strategy is that while we are proving invariants about a concurrent highly-distributed system, our verification conditions are modular (on the number of API operations), and can be carried out using standard sequential Hoare-style reasoning. These verification conditions in turn entail stability of the assertions as one would have in a logic like Rely/Guarantee.

Let us start by assuming that a given initial state for the object is denoted σi. Initially, all replicas have σ<sup>i</sup> as their local state. As explained earlier, each replica executes a sequence of state transitions, due either to a local update or to a merge incorporating remote updates.

Let us call *safe state* a replica state that satisfies the invariant. Assuming the current state is safe, any update (local or merge) must result in a safe state. To ensure this, every update is equipped with a precondition that disallows any unsafe execution.<sup>5</sup> Thus, a local update executes only when, at the origin replica, the current state is safe and its precondition currently holds.

Formally, an update u (an operation or a merge), mutates the local state σ, to a new state σnew = u(σ). To preserve the invariant, Inv, we require that the local state respects the precondition of the update, Preu: σ ∈ Pre<sup>u</sup> =⇒ u(σ) ∈ Inv

To illustrate local preconditions, consider an operation close auction(w: BidId), which sets auction status to CLOSED and the winner to w (of type BidId). The developer may have written a precondition such as status = ACTIVE because closing an auction doesn't make sense otherwise. In order to ensure the invariant that the winner has the highest amount, one needs to strengthen it with the clause is highest(Bids, w), defined as

∀ b ∈ Bids , b.placed =⇒ b.Amount ≤ w.Amount

Similarly, merge also needs to be safe. To illustrate merge precondition, let us use our running example. We wish to maintain the invariant that the highest bid is the winner. Assume a scenario where the local replica declared a winner and closed the auction. An incoming state from a remote replica contains a bid with a higher amount. When the two states are merged, we see that the resulting state is unsafe. So we must strengthen the merge operation with a precondition. The strengthened precondition looks like this:

status = CLOSED <sup>=</sup>⇒ ∀ Bids <sup>∈</sup> <sup>P</sup>(Bids), is\_highest(Bids , w) ∧ status- = CLOSED <sup>=</sup>⇒ ∀ Bids <sup>∈</sup> <sup>P</sup>(Bids), is\_highest(Bids , w-)

This means that if the status is CLOSED in either of the two states, the winner should be the highest bid in any state. This condition ensures that when a winner is declared, it is the highest bid among the set of bids in any state at any replica.

Since merge can happen at any time, it must be the case that its precondition is always true, i.e., it constitutes an additional invariant. We call this as the *concurrency invariant*. Now our global invariant consists of two parts: first, the invariant (Inv), and second, the concurrency invariant(Invconc).

#### **4.1 Invariance Conditions**

The verification conditions in Figure 5 ensure that for any reachable local state of a replica, the global invariant Inv ∧ Invconc, is a valid assertion. We assume the invariant to be a Hoare-logic style assertion over the state of the object. In a nutshell, all of these conditions check (i) the precondition of each of the operations, and that of the merge operation uphold the global invariant, and (ii) the global invariant of the object consists of the invariant and the concurrency invariant (precondition of merge).

We will develop this intuition in what follows. Let us now consider each of the rules:

<sup>5</sup> Technically, this is at least the weakest-precondition of the update for safety. It strengthens any a priori precondition that the developer may have set.

σ<sup>i</sup> -Inv (1)

$$\forall \text{ op}, \sigma, \sigma\_{new}, \left( \begin{array}{c} \sigma \models \mathsf{Pre}\_{\mathsf{op}} \wedge \\ \sigma \models \mathsf{Inv} \wedge \\ \mathsf{[op]}(\sigma) = \sigma\_{new} \end{array} \right) \Rightarrow \qquad \qquad \sigma\_{new} \models \mathsf{Inv} \tag{2}$$

$$\forall \ \sigma, \sigma', \sigma\_{new}, \left( \begin{array}{c} (\sigma, \sigma') \models \mathsf{Pre\_{narga}} \land \\ \sigma \models \mathsf{Inv} \land \\ \sigma' \models \mathsf{Inv} \land \\ \mathsf{[merge]} (\sigma, \sigma') = \sigma\_{new} \end{array} \right) \Rightarrow \qquad \sigma\_{new} \models \ \mathsf{Inv} \tag{3}$$

(σi, σi) -Invconc (4)

$$\forall \text{ } \textsf{op}, \sigma, \sigma', \sigma\_{new}, \left( \begin{array}{c} \sigma \models \textsf{Pre}\_{\textsf{op}} \land\\ (\sigma, \sigma') \models \textsf{Inv}\_{conc} \land\\ \textsf{[op}](\sigma) = \sigma\_{new} \end{array} \right) \Rightarrow \qquad \left( \sigma\_{new}, \sigma' \right) \models \textsf{Inv}\_{conc} \tag{5}$$

$$(\forall \ \sigma, \sigma', \sigma\_{new}, \begin{pmatrix} (\sigma, \sigma') \models \mathsf{Pre\_{narga}} \land \\ (\sigma, \sigma') \models \mathsf{Inv\_{conc}} \land \\ \mathsf{[merge]}(\sigma, \sigma') = \sigma\_{new} \end{pmatrix} \Rightarrow \qquad (\sigma\_{new}, \sigma') \models \mathsf{Inv\_{conc}} \tag{6}$$

**–** Clearly, the initial state of the object must satisfy the global invariant, this is checked by conditions (1) and (4).

The rest of the rules perform a kind of inductive reasoning. Assuming that we start in a state that satisfies the global invariant, we need to check that any state update preserves the validity of said invariant. Importantly, this reasoning is not circular, since the initial state is known by the rule above to be safe.<sup>6</sup>

**–** Condition (2) checks that each of the operations, when executed starting in a state satisfying its precondition and the invariant, is safe. Notice that we require that the precondition of the operation be satisfied in the starting state. This is the core of the inductive argument alluded to above, all operations – which as we mentioned in Section 3 execute atomically w.r.t. concurrency – preserve the invariant Inv.

Other than the execution of operations, the other source of local state changes is the execution of the merge function in a replica. It is not true in general that for any two given states of an object, the merge should compute a safe state. In particular, it could be the case that the merge function needs a precondition that is stronger than the conjunction of the invariants in the two states to be merged. The following rules deal with these cases.

**–** We require the merge function to be annotated with a precondition strong enough to guarantee that merge will result in a safe state. Generally, this

<sup>6</sup> Indeed, the proof of soundness of program logics such as Rely/Guarantee are typically inductive arguments of this nature.

precondition can be obtained by calculating the weakest precondition [9] of merge w.r.t. the desired invariant. Since merge is the only operation that requires two states as input, the precondition of merge has two states. We can then verify that merging two states is safe. This is the purpose of rule (3).

As per the program model of Section 3, any two replicas can exchange their states at any given point of time and trigger the execution of a merge operation. Thus, it must be the case that the precondition of the merge function is enabled at all times between any two replica local states. Since merge is the only point where a local replica can observe the result of concurrent operations in other replicas, we call this a *concurrency invariant* (Invconc). In other words: the *concurrency invariant is part of the global invariant* of the object. This is the main insight that allows us to reduce the proof of the distributed object to checking that both the invariant Inv and the concurrency invariant Invconv are global invariants. In particular, the latter implies the former, but for exposition purposes we shall preserve the invariant Inv in the rules.


As anticipated at the beginning of this section, the reasoning about the concurrency is performed in a completely local manner, by carefully choosing the verification conditions, and it avoids the stability blow-up commonly found in concurrent program logics. The program model, and the verification conditions allow us to effectively reduce the problem of verifying safety of an asynchronous concurrent distributed system, to the modular verification of the global invariant (Inv ∧ Invconc) as pre and post conditions of all operations and merge.

**Proposition 1 (Soundness).** *The proof rules in equations (1)-(6) guarantee that the implementation is safe.*

To conduct an inductive proof of this lemma we need to strengthen the argument to include the set of observed states as given by the semantics of Figure 3.

**Lemma 5 (Strengthening of Soundness).** *Assuming that the equations (1)- (6) hold for an implementation of a replicated object with initial state* Ωi*. For any state* (Ω, <sup>S</sup>) *reachable from* (Ωi, {σi})*, that is* (Ωi, {σi}) <sup>∗</sup> −→ (Ω, S)*, we have that:*


#### **Corollary 1.** *The soundness proposition (1) is a direct consequence of Lemma 5.*

We remark at this point that there are numerous program logic approaches to proving invariants of shared-memory concurrent programs, with Rely/Guarantee [15] and concurrent separation logic [6] underlying many of them. While these approaches could be adapted to our use case (propagating-state distributed systems), this adaptation is not evident. As an indication of this complexity: one would have to predicate about the different states of the different replicas, restate the invariant to talk about these different versions of the state, encode the non-deterministic behaviour of merge, etc. Instead, we argue that our specialised rules are much simpler, allowing for a purely sequential and modular verification that we can mechanise and automate. This reduction in complexity is the main theoretical contribution of this paper.

#### **4.2 Applying the proof rule**

Let us apply the proof methodology to the auction object. Its invariant is the following conjunction:


Computing the weakest precondition of each update operation, for this invariant is obvious. For instance, as discussed earlier, close auction(w) gets precondition is highest(Bids, w), because of Invariant Item 2 above.

Despite local updates to each replica respecting the invariant Inv, Figure 1 showed that it is susceptible of being violated by merging. This is the case if Bob's \$100 bid in Brussels wins, even though Charles concurrently placed a \$105 bid in Calgary; this occurred because status became CLOSED in Brussels while still ACTIVE in Calgary. The weakest precondition of merge for safety expresses that, if status in either state is CLOSED, the winner should be the bid with the highest amount in both the states. This merge precondition, now called the concurrency invariant, strengthens the global invariant to be safe in concurrent executions.

Let us now consider how this strengthening impacts the local update operations. Since starting the auction doesn't modify any bids, the operation trivially preserves it. Placing a bid might violate Invconc if the auction is concurrently closed in some other replica; conversely, closing the auction could also violate Invconc, if a higher bid is concurrently placed in a remote replica. Thus, the auction object is safe when executed sequentially, but it is unsafe when updates are concurrent. This indicates the specification has a bug, which we now proceed to fix.

#### **4.3 Concurrency Control for Invariant Preservation**

As we discussed earlier, the preconditions of operations and merge are strengthened in order to be sequentially safe. An object must also preserve the concurrency invariant in order to ensure concurrent safety. Violating this indicates the presence of a concurrency bug in the specification. In that case, the operations that fail to preserve the concurrency invariant might need to synchronise. The developer adds the required concurrency control mechanisms as part of the state in our model. The modified state is now composed of the state and the concurrency control mechanism.

Recall that in the auction example, placing bids and closing the auction did not preserve the precondition of merge. This requires strengthening the specification by adding a concurrency control mechanism to restrict these operations. We can enforce them to be strictly sequential, thereby avoiding any concurrency at all. But this will affect the availability of the object.

A concurrency control can be better designed with the workload characteristics in mind. For this particular use case, we know that placing bids are much more frequent operations than closing an auction. Hence we try to formulate a concurrency control like a readers-writer lock. In order to realise this we distribute tokens to each replica. As long as a replica has the token, it can allow placing bids. Closing the auction requires recalling the tokens from all replicas. This ensures that there are no concurrent bids placed and thus a winner can be declared, respecting the invariant. The addition of this concurrency control also updates the Invconc. Clearly, all operations must respect this modification for the specification to be considered safe.

Note that the token model described here restricts availability in order to ensure safety. Adding efficient synchronization is not a problem to be solved only with application specification in hand, it rather requires the knowledge of the application dynamics such as the workload characteristics and is part of our future work.

Figure 6 shows the evolution of the modified auction object with concurrency control. The keys shown are the tokens distributed to each replica. When a replica wants to close the auction, it can request tokens from other replicas. When a replica releases its token, it is indicated by a cross mark on the key. This concurrency control mechanism makes sure that the object is safe during concurrent executions as well. The specification including the concurrency control is given in the extended version[23].

To summarize, all updates (operations and merge) have to respect the global invariant (Inv∧Invconc). If an update violates Inv, the developer must strengthen its precondition. If an update violates Invconc, the developer must add concurrency control mechanisms.

# **5 Case Studies**

This section presents three representative examples of different consistency requirements of several distributed applications. The consensus object is an ex-

Fig. 6: Evolution of state in an auction object with concurrency control

ample of a coordination-free design, illustrating a safe object with just eventual consistency. The next example of a distributed lock shows a design that maintains a total order, illustrating strong consistency. And the final example of courseware shows a mix of concurrent operations and operations with restrained concurrency. This example, similar to our auction example, illustrates applications that might require coordination for some operations to ensure safety.

For each case study, we give an overview of the operational semantics informally. We then discuss how the design preserves the safety conditions discussed in Section 4. We also provide pseudocode for better comprehension.

#### **5.1 Consensus application**

Consensus is required in distributed systems when all replicas have to agree upon a single value. We consider the specification of a consensus object with a fixed number of replicas. We assume that replica failures are solved locally by redundancy or other means, and all replicas participate.

The state consists of a boolean flag indicating the result of consensus, and a boolean array indicating the votes from replicas. Each replica agrees on a proposal by setting its dedicated entry in the boolean array. A replica cannot withdraw its agreement. A replica sets the consensus flag when it sees all entries of the boolean array set.

The consistency between the values of agree flag and the boolean array is ensured by the invariant. The merge function is the disjunction of the individual components. In this case study, we can see that the merge ensures safety without any additional precondition. This means that the object is trivially safe under concurrent executions.

```
Initial state:
  ¬B ∧ ¬flag
Invariant:
  flag =⇒ B
{Premerge : True}
# no precondition
merge(B, flag , B0 , flag0 ):
    B := B ∨ B0
    flag := flag ∨ flag0
                                Comparison function:
                                  flag ∨ (¬flag0 ∧ ( B ∨ ¬B0 ))
                               {Premark : True}
                               # no precondition
                                mark():
                                  B.me := true
                               {Preagree : B}
                                agree():
                                  flag := true
```
Fig. 7: Pseudocode for consensus

```
Initial state:
    ∃ r, V.r ∧ t=0
  {Pretransfer : V.me}
  transfer(ro ):
    t = t+1
    V.me := false
    V.r0 := true
                                 Comparison function:
                                   t>t0
                                   ∨ (t = t0 ∧ V=V0 )
                                 {Premerge :
                                   (t = t0 =⇒ V=V0 )
                                   ∧ (V.me =⇒ t ≥ t0 )}
                                 merge((t,V),(t0 ,V0 )):
                                   t = max(t,t0 )
                                   v = (t0 <t)?V:V0
Invariant:
  ∃ r, V.r ∧ ∀ r, r0 , (V.r ∧ V.r0 ) =⇒ r=r0
```
Fig. 8: Specification of a distributed lock

The pseudo code of the consensus example is shown in Figure 7. The design for consensus can be relaxed, requiring only the majority of replicas to mark their boxes. The extension for that is trivial.

#### **5.2 A replicated concurrency control**

We now discuss an object, a distributed lock, that ensures mutual exclusion. We use an array of boolean values, one entry per replica, to model a lock. If a replica owns the lock, the corresponding array entry is set to true. The lock is transferred to any other replica by using the transfer function. The full specification is shown in Figure 8.

We need to ensure that the lock is owned by exactly one replica at any given point in time, which is the invariant here. For simplicity, we are not considering failures. In order to preserve safety, we need to enforce a precondition on the transfer operation such that the operation can only transfer the ownership of its origin replica. For state inflation, a timestamp associated with the lock is incremented during each transfer.

A merge of two states of this distributed lock will preserve the state with the highest timestamp. In order for the merge function to be the least upper bound, we must specify that if the timestamps of the two states are equal, their corresponding boolean arrays are also equal. Also if the origin replica owns the lock, it has the highest timestamp. The conjunction of these two restrictions which form the precondition of merge, Premerge, is the concurrency invariant, Invconc.

Consider the case of three replicas r1, r<sup>2</sup> and r<sup>3</sup> sharing a distributed lock. Assume that initially replica r<sup>1</sup> owns the lock. Replicas r<sup>2</sup> and r<sup>3</sup> concurrently place a request for the lock. The current owner r1, has to make a decision on the priority of the requests based on the business logic. r<sup>1</sup> calculates a higher priority for r<sup>3</sup> and transfers the lock to r3. Since r<sup>1</sup> no longer has the lock, it cannot issue any further transfer operations. We see here clearly that the transfer operation is safe. In the new state, r<sup>3</sup> is the only replica that can perform a transfer operation. We can also note that this prevents any concurrent transfer operations. This can guarantee mutual exclusion and hence ensures safety in a concurrent execution environment.

An interesting property we can observe from this example is total order. Due to the preconditions imposed in order to be safe, we see that the states progress through a total order, ordered by the timestamp. The transfer function increases the timestamp and merge function preserves the highest timestamp.

#### **5.3 Courseware**

We now look at an application that allows students to register and enroll in a course. For space reasons, we elide the pseudocode which can be found in the extended version[23]. The state consists of a set of students, a set of courses and enrollments of students for different courses. Students can register and deregister, courses can be created and deleted, and a student can enroll for a course. The invariant requires enrolled students and courses to be registered and created respectively.

The set of students and courses consists of two sets - one to track registrations or creations and another to track deregistrations or deletions. Registration or creation monotonically adds the student or course to the registered sets respectively and deregistration or deletion monotonically adds them to the unregistered sets. The semantics currently doesn't support re-registration, but that can be fixed by using a slightly modified data structure that counts the number of times the student has been registered/unregistered and decides on the status of registration. Enrollment adds the student-course pair to the set. Currently, we do not consider canceling an enrollment, but it is a trivial extension. Merging two states takes the union of the sets.

Let us consider the safety of each operation. The operations to register a student and create a course are safe without any restrictions. Therefore they do not need any precondition. The remaining three operations might violate the

invariant in some cases. This leads to strengthening their preconditions. The precondition of the operation for deregistering a student and deleting a course requires no existing enrollments for them. For enrollment, both the student and the course should be registered/created and not unregistered/deleted.

Merge also requires strengthening of its precondition. It requires the set of enrolled students and courses to be registered and not unregistered in all the remote states as well. This is the concurrent invariant (Invconc) for this object.

Running this specification through our tool which we describe in Section 6 reveals concurrency issues for deregistering a student, deleting a course and enrollment. This means that we need to add concurrency control to the state.

For this use case, we know that enrolling will be more frequent than deregistering a student or deleting a course. So, we model a concurrency control mechanism as in the case of the auction object discussed earlier. We assign a token to each replica for each student and course, called a student token and course token respectively. A replica will have a set of student tokens indicating the registered students and course tokens indicating the created courses. In order to deregister a student or delete a course, all replicas must have released their tokens for that particular student/course. Enroll operations can progress as long as the student token and course token are available at the local replica for the student and course for that particular enrollment.

This concurrency control mechanism now forms part of the state. The preconditions of operations and merge are recomputed and the concurrency invariant is updated. The edited specification passes all checks and is deemed safe.

# **6 Automation**

In this section, we present a tool to automate the verification of invariants as discussed in the previous sections. Our tool, called *Soteria* is based on the Boogie [5] verification framework. The input to Soteria is a specification of the object written as Boogie procedures, augmented with a number of domain-specific annotations needed to check the properties described in Section 4.

Let us now consider how a distributed object is specified in Soteria.:


programmer to provide the precondition Preop. In general, operations are encoded as Boogie procedures. Alternatively, we could just require only a post-condition describing how the state transitions from the precondition to the post-condition. Notice that since in our program model operations are atomic, this is an unambiguous encoding of the operations.

A few things are important in this code. The specification declares operations that can modify the contents of the global variables as declared in the modifies clause. Preconditions are annotated with the requires clauses, and the postcondition is specified by the ensures clauses. The semantics of multiple requires and ensures clauses is conjunction.

**– Merge function:** We require the special merge operation to be distinguished from other operations. To that end, we use the annotation @merge. While, as mentioned before, the precondition of merge can be obtained by calculating the weakest precondition to ensure safety. The current version of Soteria does not perform this step automatically, it relies on the developer to provide the preconditions. Notice that, as we argued in Section 4.1, Soteria will consider this as the concurrency invariant (Invconc).

While in Section 3 we mentioned that the merge procedure takes two states as arguments, in the specification input to Soteria, the procedure merge takes only one state as the argument. This is because this procedure assumes that the merge is being applied in a replica, and therefore, the local state of the replica is captured by the global variables.

**– Invariant:** Clearly, we require the programmer to provide the invariant to be verified by the tool. This invariant is simply provided as a Boogie assertion over the state of the object. Once more, we require the invariant to be annotated with the special keyword @invariant.

While these are the components required by Soteria to check the safety, often Boogie requires additional information to verify the procedures. Some of these components are:


#### **6.1 Verification passes**

The verification of a specification is performed in multiple stages. Let us consider these in order:

# 1. **Syntax checks**

The first simple checks validate that the specification provided respects Boogie syntax when ignoring Soteria annotations. It also calls Boogie to validate that the types are correct and that the pre/post conditions provided are sound.

Then it checks that the specification provides all the elements necessary for a complete specification. Specifically, it checks the function signatures marked by @gteq and @invariant and the procedure marked by @merge.

# 2. **Convergence check**

This stage checks the convergence of the specification. Specifically, it checks whether the specification respects Strong Eventual Consistency. The *Strong Eventual Consistency* (SEC) property states that any two replicas that received the same set of updates are in the same state. To guarantee this, objects are designed to have certain sufficient properties in the encoding of the state [3, 4, 25], which can be summarised as follows:


We present the conditions formally in the extended version[23].

An alternative is to make use of the CALM theorem [12]. This allows nonmonotonic operations, but requires them to coordinate. However, our aim is to provide maximum possible availability with SEC. <sup>7</sup>

To ensure these conditions of Strong Eventual Consistency, the tool performs the following checks:

**–** That each operation is an inflation. In a nutshell, we prove using Boogie the following Hoare-logic triple:

$$\begin{array}{ll} \mathsf{a} \ \mathsf{a} \ \mathsf{a} \ \mathsf{un} \ \sigma \ \ \sigma \in \mathsf{Pre}\_{\mathsf{op}}\\ \mathsf{c} \ \mathsf{a} \ \mathsf{l} \ \mathsf{l} \ \ \sigma\_{new} := \mathsf{op}(\sigma) \\ \mathsf{a} \ \mathsf{a} \ \mathsf{a} \ \mathsf{a} \ \mathsf{r} \ \mathsf{t} \ \ \sigma\_{new} \geq \sigma \end{array}$$

**–** Merge computes the least upper bound. The verification condition discharged is shown below:

$$\begin{array}{l} \mathsf{a}\texttt{a}\texttt{a}\texttt{un}\,\mathsf{o} \quad (\sigma,\sigma') \in \mathsf{Pre}\_{\texttt{narg}}\\ \mathsf{c}\texttt{a}\texttt{11}\quad\sigma\_{new} := \texttt{narg}(\sigma,\sigma')\\ \mathsf{a}\texttt{a}\texttt{a}\texttt{o}\texttt{r}\texttt{t} \quad \sigma\_{new} \geq \sigma \wedge \sigma\_{new} \geq \sigma'\\ \mathsf{a}\texttt{a}\texttt{a}\texttt{o}\texttt{r}\texttt{t} \quad \forall \sigma\*,\sigma\* \geq \sigma \wedge \sigma\* \geq \sigma' \implies \sigma\* \geq \sigma\_{new} \end{array}$$

	- **–** *Sequential safety:* Soteria checks whether each individual operation is safe. This corresponds to the conditions (2) and (3) in Figure 5. The verification condition discharged by the tool to ensure sequential safety of operations is:

<sup>7</sup> Convergence of our running example is discussed in the extended version[23].

```
assume σ ∈ Preop ∧ Inv
call σnew := op(σ)
assert σnew ∈ Inv
```
The special case of the merge function is verified with the following verification condition:

```
assume (σ, σ-

              ) ∈ Premerge ∧ σ ∈ Inv ∧ σ-
                                         ∈ Inv
call σnew := merge(σ, σ-

                           )
assert σnew ∈ Inv
```
Notice that in this condition we assume that there are two copies of the state, the state of the replica applying the merge, and the state with superscript representing a state arriving from another replica. In case of failure of the sequential safety check, the designer needs to strengthen the precondition of the operation (or merge) which was unsafe.

**–** *Concurrent safety:* Here we check whether each operation upholds the precondition of merge. This corresponds to the conditions (5) and (6) in Figure 5. Notice that while this check relates to the concurrent behaviour of the distributed object, the check itself is completely sequential; it does not require reasoning about operations performed by other processes. As shown in Section 4, this ensures safety during concurrent operation.

The verification conditions are:

```
assume σ ∈ Preop ∧ Inv ∧ (σ, σ-

                                ) ∈ Invconc
call σnew := op(σ)
assert (σnew, σ-

                  ) ∈ Invconc
```
to validate each operation op, and

assume (σ, σ- ) ∈ Invconc ∧ σ ∈ Inv ∧ σ- ∈ Inv call σnew := merge(σ, σ- ) assert (σnew, σ) ∈ Invconc

to validate a call to merge. If the concurrent safety check fails, the design of the distributed object needs a replicated concurrency control mechanism embedded as part of the state.

When all checks are validated, the tool reports that the specification is safe. Whenever a check fails, Soteria provides a counterexample <sup>8</sup> along with the failure message tailored to the type of check. This can help the developer identify issues with the specification and fix it.

Once the invariants and specification of an application is given, Soteria is fully automatic, thanks to Z3, an SMT solver that is fully automated. The specification of the application includes the state, all the operations including the pre and post conditions (including merge). In case the invariant cannot be proven, Soteria provides counter-examples. The programmer can leverage these to update the specification with appropriate concurrency control, rerun Soteria, and so on until the application is correct. As far as the proof system is concerned, no programmer involvement is required. Currently, the effort of adding the required synchronization conditions is manual, but as the next step, we are working on

<sup>8</sup> Soteria uses the counter model provided by Boogie.

automating the efficient generation of synchronization control considering the workload characteristics. The tool and the full specifications in the form of the tool input are available at Soteria [22]. <sup>9</sup>

# **7 Related Work**

Several works have concentrated on the formalisation and specification of eventually consistent systems [7, 8, 27] to mention but a few.

A number of works concentrate on the specification and correct implementation of replicated data types [10, 14]. Unlike these works, we are not concerned with the correctness of the data type implementation with respect to a specification, but rather on proving properties that hold of a distributed object.

Gotsman et al.[11] present a proof methodology for proving invariants of distributed objects. In fact, that work has been extended with a tool called CISE [24] which, similar to Soteria, performs the check using an SMT solver as a backend. Another more user-friendly tool was developed by Marcelino et al.[19] based on the principle of CISE. It is named Correct Eventual Consistency(CEC) Tool. The tool is based on Boogie verification framework and also proposes sets of tokens that the developer might use. An improved token generation by using the counterexamples generated by Boogie is discussed by Nair and Shapiro[20].

Unlike our work, CISE and CEC (and more generally the work of Gotsman et al.[11]) consider the implementation of operation-based objects. As a consequence, they assume that the underlying network model ensures causal consistency, and the proof methodology therein presented requires reasoning about concurrent behaviours (reflected as stability verification conditions on assertions). We position Soteria as a *complementary* tool to CISE, since CISE is not well-adapted to reason about systems that propagate state, and Soteria is not well-adapted to reason about objects that propagate operations. We consider, as part of our future work, the use of both CISE and Soteria in tandem to prove properties depending on the implementation of the objects at hand.

Houshmand et al.[13] extends CISE by lowering the causal consistency requirements and generating concurrency control protocols. It still requires reasoning about concurrent behaviours.

As anticipated in Section 4, Bailis et al. [2] introduced the concept of Iconfluence based on a similar system model. I-confluence states that for an invariant to hold in a lattice-based state-propagating distributed application, the set of *reachable* valid (i.e. invariant preserving) states must be closed under operations and merge. This condition is similar to the ones presented in Figure 5. However, there is a fundamental difference: while Bailis et al. [2] recognises that one needs to consider only *reachable* states when checking that the merge operation satisfies the invariant, they do not provide means to identify these reachable states. This is indeed a hard problem. In Soteria, we instead *over-approximate the set of reachable states* by ignoring whether the states are indeed reachable,

<sup>9</sup> Experimental results with verification time is provided in the extended version[23].

but requiring that their merge satisfies the invariant. This is captured in the concurrency invariant, Invconc, which is synthesised from the user provided invariant. How to obtain this invariant is understandably not addressed in Bailis et al.[2] since no proof technique is provided. Notice that this is a sound approximation since it guarantees the invariant is satisfied, and we also verify that every operation preserves this condition as shown in Corollary 1. In this sense we say that the pre-condition of merge for a given invariant I, is also an invariant of the system. It is this abstraction step that makes the analysis performed by Soteria to be syntax-driven, automated, and machine-checked. The fact that Soteria is an analysis of a program is in contrast with I-confluence [2] where no means to link a given program text to the semantical model, let alone rules to show that the syntax implies invariant preservation, are provided. In other words, Iconfluence [2] does not provide a program logic, but rather a meta-theoretical proof about lattice-based state-propagating systems.

Our previous work [21], provides an informal proof methodology for ensuring safety of Convergent Replicated Data Types(CvRDTs), which are a group of specialised data structures used to ensure convergence in distributed programming. This work builds upon it, and formalises the proof rules and prove them sound. We relax the requirement of CvRDTs by allowing the usage of any data types, that together respect the lattice conditions mentioned in Section 3. We also show several case studies which demonstrate the use of the rule.

A final interesting remark is that we can show how our methodology can aid in the verification of distributed objects mediated by concurrency control. Some works [16, 17, 26, 27] have considered this problem from the standpoint of synthesis, or from the point of view of which mechanisms can be used to check a certain property of the system.

# **8 Conclusion**

We have presented a sound proof rule to verify invariants of state-based distributed objects, i.e., the objects that propagate state. We present the proof obligations guaranteeing that the implementation is safe in concurrent execution by reducing the problem to checking that each operation of the object satisfies a precondition of the merge function of the state.

We presented Soteria, a tool sitting on top of the Boogie verification framework. This tool can be used to identify the concurrency bugs in the design of a distributed object. Soteria also checks convergence by checking the lattice conditions on the state, described by [3]. We have shown multiple compelling case-studies showing how Soteria can be leveraged to ensure the correctness of distributed objects that propagate state. It would be an interesting next step to look into automatic concurrency control synthesis. The synthesised concurrency control can be analysed and adapted dynamically to minimise the cost of synchronisation.

**Acknowledgements.** This research is supported in part by the RainbowFS project (*Agence Nationale de la Recherche*, France, number ANR-16-CE25-0013-01) and by European H2020 project 732 505 LightKone (2017–2020).

# **Bibliography**


workshops, Assoc. for Computing MachinerySpecial Interest Group on Op. Sys. (SIGOPS), Assoc. for Computing Machinery, London, UK (Apr 2016), http://dx.doi.org/10.1145/2911151.2911160


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Solving Program Sketches with Large Integer Values

Rong Pan1, Qinheping Hu2, Rishabh Singh3, and Loris D'Antoni<sup>2</sup>

<sup>1</sup> The University of Texas at Austin, Austin, USA <sup>2</sup> University of Wisconsin-Madison, Madison, USA

<sup>3</sup>Google, Mountain View, USA

Abstract. Program sketching is a program synthesis paradigm in which the programmer provides a partial program with holes and assertions. The goal of the synthesizer is to automatically find integer values for the holes so that the resulting program satisfies the assertions. The most popular sketching tool, Sketch, can efficiently solve complex program sketches, but uses an integer encoding that often performs poorly if the sketched program manipulates large integer values. In this paper, we propose a new solving technique that allows Sketch to handle large integer values while retaining its integer encoding. Our technique uses a result from number theory, the Chinese Remainder Theorem, to rewrite program sketches to only track the remainders of certain variable values with respect to several prime numbers. We prove that our transformation is sound and the encoding of the resulting programs are exponentially more succinct than existing Sketch encodings. We evaluate our technique on a variety of benchmarks manipulating large integer values. Our technique provides speedups against both existing Sketch solvers and can solve benchmarks that existing Sketch solvers cannot handle.

# 1 Introduction

Program synthesis, the art of automatically generating programs that meet a user's intent, promises to increase the productivity of programmers by automating tedious, error-prone, and time-consuming tasks. Syntax-guided Synthesis (SyGuS) [2], where the search space of possible programs is defined using a grammar or a domain-specific language, has emerged as a common program synthesis paradigm for many synthesis domains. One of the earliest and successful syntaxguided program synthesis frameworks is program sketching [19], where (i) the search space of the synthesis problem is described using a partial program in which certain integer constants are left unspecified (represented as holes), and (ii) the specification is provided as a set of assertions describing the intended behavior of the program. The goal of the synthesizer is to automatically replace the holes in the program with integer values so that the resulting complete program satisfies all the assertions. Thanks to its simplicity, program sketching has found wide adoption in applications such as data-structure design [20], personalized education [18], program repair [7], and many others.

The most popular sketching tool, Sketch [21], can efficiently solve complex program sketches with hundreds of lines of code. However, Sketch often performs poorly if the sketched program manipulates large integer values. Sketch's synthesis is based on an algorithm called counterexample-guided inductive synthesis (Cegis) [21]. The Cegis algorithm iteratively considers a finite set I of inputs for the program and performs SAT queries to identify values for the holes so that the resulting program satisfies all the assertions for the inputs in I. Further SAT queries are then used to verify whether the generated solution is correct on all the possible inputs of the program. Sketch represents integers using a unary encoding (a variable for each integer value) so that arithmetic computations such as addition, multiplication etc. can be represented efficiently in the SAT formulas as lookup operations. This unary encoding, however, results in huge formulas for solving sketches with larger integer values as we also observe in our evaluation. Recently, an SMT-like technique that extends the SAT solver with native integer variables and integer constraints was proposed to alleviate this issue in Sketch. It guesses values for the integer variables and propagates them through the integer constraints, and learns from conflict clauses. However, this technique does not scale well when the sketches contain complex arithmetic operations—e.g., non-linear integer arithmetic.

In this paper, we propose a program transformation technique that allows Sketch to solve program sketches involving large integer values while retaining the unary encoding used by the traditional Sketch solver. Our technique rewrites a Sketch program into an equivalent one that performs computations over smaller values. The technique is based on the well-known Chinese Remainder Theorem, which states that, given distinct prime numbers p1,...,p<sup>n</sup> such that <sup>N</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> · ... · <sup>p</sup>n, for every two distinct numbers <sup>0</sup> <sup>≤</sup> <sup>k</sup>1, k<sup>2</sup> < N, there exists a <sup>p</sup><sup>i</sup> such that <sup>k</sup><sup>1</sup> mod <sup>p</sup><sup>i</sup> <sup>=</sup> <sup>k</sup><sup>2</sup> mod <sup>p</sup>i. Intuitively, this theorem states that tracking the modular values of a number smaller than N for each p<sup>i</sup> is enough to uniquely recover the actual value of the number itself. We use this idea to replace a variable x in the program with n variables x<sup>p</sup><sup>1</sup> ,...,x<sup>p</sup><sup>n</sup> , so that for every i, x<sup>p</sup><sup>i</sup> = x mod pi. Using closure properties of modular arithmetic we show that, as long as the program uses the operators <sup>+</sup>, <sup>−</sup>, <sup>∗</sup>, ==, tracking the modular values of variables and performing the corresponding operations on such values is enough to ensure correctness. For example, to reflect the variable assignment x = y+z, we perform the assignment x<sup>p</sup><sup>i</sup> = (y<sup>p</sup><sup>i</sup> +z<sup>p</sup><sup>i</sup> ) mod pi, for every pi. Similarly, the Boolean operation x == y will only hold if x<sup>p</sup><sup>i</sup> = y<sup>p</sup><sup>i</sup> , for every pi. To identify what variables and values in the program can be rewritten, we develop a data-flow analysis that computes what variables may flow into operations that are not sound in modular arithmetic—e.g., <, >, <sup>≤</sup>, and /.

We provide a comprehensive theoretical analysis of the complexity of the proposed transformation. First, we derive how many prime numbers are needed to track values in a certain integer range. Second, we analyze the number of bits required to encode values in the original and rewritten program and show that, for the unary encoding used by Sketch, our technique offers an exponential saving in the number of required bits.

We evaluate our technique on 181 benchmarks from various applications of program sketching. Our results show that our technique results in significant speedups over existing Sketch solvers and is able to solve 48 benchmarks on which Sketch times out.

Contributions. In summary, our contributions are:


An extended version containing all proofs and further details has been uploaded to arXiv as supplementary material.

# 2 Motivating Example

In this section, we use a simple example to illustrate our technique and its effectiveness. Consider the Sketch program polyArray presented in Figure 1b. The goal of this synthesis problem is to synthesize a two-variable quadratic polynomial (lines 7–8) whose evaluation p on given inputs x and y is equal to a given expected-output array z (line 9). Solving the problem amounts to finding non-negative integer values for the holes (??) and sign values, i.e., -1 or 1, for the holes (??<sup>s</sup>) such that the assertion becomes true.<sup>1</sup> In this case, a possible solution is the polynomial:

p[i] = -17\*y[i]^2-8\*x[i]\*y[i]-17\*x[i]^2-3\*x[i];

When attempting to solve this problem, the Sketch synthesizer times out at 300 seconds. To solve this problem, Sketch creates SAT queries where the variables are the holes. Due to the large numbers involved in the computation of this program, the unary encoding of Sketch ends up with SAT formulas with approximately 45 million clauses.

<sup>1</sup> In Sketch, holes can only assume positive values. This is why we need the sign holes, which are implemented using regular holes as follows: if(??) then 1 else -1.

```
1 // n=4, x=[24,-1,0,-19], y=[-7,11,-3,13]
 2 // z=[-9353,-1983,-153,-6977]
 3 polyArray(int n, int[n] x, int[n] y, int[n] z){
 4 int[n] p;
 5 int i=0;
 6 while (i<n){
 7 p[i]=?? s
              1 *??1 *y[i]2 +??s
                           2 *??2 *x[i]2 +??s
                                        3 *??3 *x[i]*y[i]
 8 +??s
               4 *??4 *y[i ]+?? s
                           5 *??5 *x[i ]+?? s
                                        6 *??6 ;
 9 assert p[i] == z[i];
10 i++; }
11 }
                (a) Original sketch program.
1 // n=4, x=[24,-1,0,-19], y=[-7,11,-3,13]
2 // z=[-9353,-1983,-153,-6977]
3 pAPrime(int n, int[n] x, int[n] y, int[n] z){
4 int[n] x2,x3,x5 ,x7 ,x11 ,x13 ,x17;
5 while (i<n){ // Initialize modular variables
6 x2[i]=x[i]%2;
7 x3[i]=x[i]%3;
8 ... i++; }
9 int i=0;
10 int[n] p2,p3,p5 ,p7 ,p11 ,p13 ,p17;
11 while (i<n){
12 p2[i]=(?? s
               1 *(??1 %2)*(y2[i]2 %2)%2
13 +??s
               2 *(??2 %2)*(x2[i]2 %2)%2
14 +??s
               3 *(??3 %2)*(x2[i]%2)*(y2[i]%2)%2
15 +??s
               4 *(??4 %2)*(y2[i]%2)%2
16 +??s
               5 *(??5 %2)*(x2[i]%2)%2
17 +??s
               6 *(??6 %2)%2)%2;
18 ...
19 assert p2[i] = z2[i];
20 assert p3[i] = z3[i];
21 ...
22 i++; }
23 }
```
(b) Rewritten sketch program.

Fig. 1: Sketch program (a) and rewritten version with values tracked for different moduli (b).

Sketch Program with Modular Arithmetic The technique we propose in this paper has the goal of reducing the complexity of the synthesis problem by transforming the program into an equivalent one that manipulates smaller integer values and that yields easier SAT queries. Given the Sketch program in Figure 1b, our technique produces the modified Sketch program pAPrime in Figure 1a. The new Sketch program has the same control flow graph as the original one, but instead of computing the actual values of the expressions x[·] and y[·], it tracks their remainders for the set of prime numbers {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>} using new variables—e.g., x2[i] tracks the remainder of x[i] modulo 2.

The program pAPrime initializes the modular variables with the corresponding modular values (lines 5–8). When rewriting a computation over modular variables, the same computation is performed modularly (lines 12–17). For example, the term ??<sup>s</sup> <sup>1</sup> <sup>∗</sup> ??1\*y[i]<sup>2</sup> when tracked modulo 2 is rewritten as

$$(\text{??}\,\text{?}\,\_1^a \star (\text{?}\,\_1\text{?}\,\_2) \star (\text{(}\,\_2\text{I}\,\text{?}\,\_1\text{?}\,\_2\text{?}\,\_2) \,\text{?}\,\_2^a)$$

In the rewritten program, the variables i and n are not tracked modularly, since such a transformation would incorrectly access array indices. Finally, the assertions for different moduli share the same holes as the solution to the Sketch has to be correct for all modular values. In the rest of the paper, we develop a data flow analysis that detects when variables can be tracked modularly.

Sketch can solve the rewritten program in less than 2 seconds and produce hole values that are correct solutions for the original program. This speedup is due to the small integer values manipulated by the modular computations. In fact, the intermediate SAT formulas generated by Sketch for the program pAPrime have approximately 120 thousand clauses instead of the 45 million clauses for polyArray. Due to the complex arithmetic in the formulas, even if Sketch uses the SMT-like native integer encoding, it still requires more than 300 seconds to solve this problem.

While this technique is quite powerful, it does have some limitations. In particular, the solution to the rewritten Sketch is guaranteed to be a correct solution only for inputs that cause intermediate values of the program to be in a range [d1, d2] such that <sup>d</sup><sup>2</sup> <sup>−</sup> <sup>d</sup><sup>1</sup> <sup>≤</sup> <sup>2</sup> <sup>×</sup> <sup>3</sup> <sup>×</sup> <sup>5</sup> <sup>×</sup> <sup>7</sup> <sup>×</sup> <sup>11</sup> <sup>×</sup> <sup>13</sup> <sup>×</sup> 17 = 510, <sup>510</sup>. We will prove this result in Section 4.

# 3 Preliminaries

In this section, we describe the IMP language that we will consider throughout the paper and briefly recall the counter-example guided inductive synthesis algorithm employed by the Sketch solver.

For simplicity, we consider a simple imperative language IMP with integer holes for defining the hypothesis space of programs. The syntax and semantics of IMP are shown in Appendix ??. Without loss of generality, we assume the programs consists of a single program <sup>f</sup>(v1, ··· , vn, ??1,...??m) with <sup>n</sup> integer variables and m integer holes. The body of the program f consists of a sequence of statements, where a statement s can either be a variable assignment, a while loop statement, an if conditional statement, or an assert statement. The holes ?? denote integer constant values that are unknown and the goal of the synthesis process is to compute these values such that a set of desired program assertions are satisfied for every possible input values to f. 2

<sup>2</sup> Our implementation also supports for-loops, recursion, arrays, and complex types.

Example 1. An example IMP sketch denoting a partial program is shown below.

$$\mathtt{tr}\mathtt{if}\mathtt{p1e}\{\mathtt{n},\mathtt{h},\mathtt{?}\mathtt{?}\}\{\mathtt{\h}=\mathtt{?}\mathtt{?},\mathtt{a}\mathtt{s}\mathtt{s}\mathtt{er}\mathtt{st}\mathtt{.}\mathtt{h}\mathtt{\*}\mathtt{n}=\mathtt{n}+\mathtt{n}+\mathtt{n};\quad\mathtt{\h}\}$$

The goal of the synthesizer is to compute the value of the hole ?? such that the assertion is true for all possible input values of n and h. For this example, ?? = 3 is a valid solution.

The Sketch solver uses the counter-example guided inductive synthesis algorithm (Cegis) to find hole values such that the desired assertions hold for all input values. Formally, the Sketch synthesizer solves the following constraint:

$$\exists \mathcal{T} \equiv (\mathcal{T}\mathbf{?}\_1, \dots, \mathbf{?}\mathbf{?}\_m) \in \mathbb{Z}^m. \,\forall in \in \mathcal{T}. \,\left[f\{in, \mathbf{?}\mathbf{?}\}\right]^{\mathbf{IIP}} \neq \perp$$

where Z denotes the domain of all integer values, ?? denotes the list of unknown hole values (??1, ··· , ??m) <sup>∈</sup> <sup>Z</sup><sup>m</sup>, <sup>I</sup> denotes the domain of all input argument values to the function <sup>f</sup>, and <sup>f</sup>(in, ??)IMP <sup>=</sup> <sup>⊥</sup> denotes that the program satisfies all assertions. The synthesis problem is in general undecidable for a language with complex operations such as the IMP language because of the infinite size of possible hole and input values. To make the synthesis process more tractable, Sketch imposes a bound on the sizes of both the input domain (Ib) and the domain of hole values (Zb) to obtain the following constraint:

$$\exists \mathcal{T} \mathcal{T} \equiv (\mathcal{T}\mathcal{T}\_1, \dots, \mathcal{T}\mathcal{T}\_m) \in \mathbb{Z}\_b^m. \,\forall in \in \mathbb{Z}\_b. \,\left[f \,\{in, \text{??}\}\right]^{\text{IMP}} \neq \perp.$$

The bounded domains make the synthesis problem decidable, but the secondorder quantified formula results in a search space of hole values that is still huge for any reasonable bounds. To solve such bounded equations efficiently, Sketch uses the Cegis algorithm to incrementally add inputs from the domain until obtaining hole values ?? that satisfy the assertion predicates for all the input values in the bounded domain. The algorithm solves the second-order formula by iteratively solving a series of first-order queries. It first encodes the existential query (synthesis query) over a randomly selected input value in<sup>0</sup> to find the hole values *H* that satisfy the predicate for in<sup>0</sup> using a SAT solver in the backend.

$$\exists \mathcal{T} \mathcal{T} \equiv (\mathcal{T}\mathcal{T}\_1, \dots, \mathcal{T}\mathcal{T}\_m) \in \mathbb{Z}\_b^m. \ [f \,\!\!/ \, in\_0, \mathcal{T}\mathcal{T}\!\!\!/]^{\text{IMP}} \neq \perp.$$

It then encodes another existential query (verification) to now find a counterexample in<sup>1</sup> for which the predicate is not satisfied for the previously found hole values.

$$\exists in\in\mathcal{I}\_b.\ \neg[f\,\langle in,H\rangle]^{\mathsf{IMP}}\neq\bot$$

If no counter-example input can be found, the hole values are returned as the desired solution. Otherwise, the algorithm computes a new hole value that satisfies the assertion for all the counter-example inputs found so far. This process continues iteratively until either a desired hole value is found (i.e. no counter-example input exists), no satisfiable hole value is found (i.e. the synthesis problem is infeasible), or the SAT solver times out.

Integer Encoding The Sketch solver can efficiently solve the synthesis constraint in many domains, but it does not scale well for sketches manipulating large numbers. Sketch uses a unary encoding to represent integers, where the encoded formula consists of a variable for each integer value. The unary encoding allows for simplifying the representation of complex non-linear arithmetic operations. For example, a multiplication operation can be represented as simply a lookup table using this encoding. In practice, the unary encoding results in magnitudes of faster solving times compared to the logarithmic encoding for many synthesis problems. However, this also results in huge SAT formulas in presence of large integers. Recently, a new SMT-like technique based on extending the SAT solver with native integer variables and constraints was proposed to alleviate this issue in Sketch. Similar to the Boolean variables, this extended solver guesses for integer values and propagates them in the constraints while also learning from conflict clauses. Note that Sketch uses these SAT extensions and encodings instead of an SMT solver as SMT doesn't scale well for the nonlinear constraints typically found in the synthesis problems. Our new technique for handling computations over large numbers still maintains the efficient unary encoding of integers and computations over them.

# 4 Modular Arithmetic Semantics

In this section, we present the language IMP-MOD in which variables can be tracked using modular arithmetic. We start by recalling the Chinese Remainder Theorem, then define both a modular and integer semantics for the IMP-MOD language, and show that the two semantics are equivalent.

# 4.1 The Chinese Remainder Theorem

The Chinese Remainder Theorem is a powerful number theory result that shows the following: given a set of distinct primes <sup>P</sup> <sup>=</sup> {p1,...,pk}, any number <sup>n</sup> in an interval of size <sup>p</sup><sup>1</sup> · ... · <sup>p</sup><sup>k</sup> can be uniquely identified from the remainders [<sup>n</sup> mod <sup>p</sup>1, ··· , n mod <sup>p</sup>k]. In Section 4.2, we will use this idea to define the semantics of the IMP-MOD language. The main benefit of this idea is that the remainders could be much smaller than actual program values.

Example 2. For P = [3, 5, 7] and an integer 101, its remainders [2, 1, 3] are much smaller than 101. However, any number of the form 101 + 105 <sup>×</sup> <sup>n</sup> also has remainders [2, 1, 3] with respect to the same prime set.

In general, one cannot uniquely determine an arbitrary integer value from its remainders for some set P—i.e., the mapping from a number to its remainders is an abstraction in the sense of abstract interpretation [6]. However, if we are interested in a limited range of integer values [L, U), one can choose a set of primes <sup>P</sup> <sup>=</sup> {p1,...,pk} such that, for values <sup>L</sup> <sup>≤</sup> x<U, the map [r1, ··· , rk] → <sup>x</sup>, where <sup>x</sup> <sup>≡</sup> <sup>r</sup><sup>i</sup> mod <sup>p</sup>i, is an injection.

Modular Expr a<sup>P</sup> := c P <sup>|</sup> <sup>v</sup><sup>P</sup> <sup>|</sup> <sup>a</sup><sup>P</sup> <sup>1</sup> op<sup>P</sup> *<sup>a</sup>* <sup>a</sup><sup>P</sup> <sup>2</sup> <sup>|</sup> toPrime(a) Modular Op op<sup>P</sup> *<sup>a</sup>* := + |−|∗ Arith Expr a := ?? | c | v | a<sup>1</sup> op*<sup>a</sup>* a<sup>2</sup> Arith Op op*<sup>a</sup>* := + |−|∗| / Bool Expr <sup>b</sup> := not <sup>b</sup> <sup>|</sup> <sup>a</sup><sup>1</sup> op*<sup>c</sup>* <sup>a</sup><sup>2</sup> <sup>|</sup> <sup>b</sup><sup>1</sup> and <sup>b</sup><sup>2</sup> <sup>|</sup> <sup>b</sup><sup>1</sup> or <sup>b</sup><sup>2</sup> <sup>|</sup> <sup>a</sup><sup>P</sup> 1==a<sup>P</sup> 2 Comp Op op*<sup>c</sup>* := < | > |≤|≥ Stmt <sup>s</sup> := <sup>v</sup> <sup>=</sup> <sup>a</sup> <sup>|</sup> <sup>v</sup><sup>P</sup> <sup>=</sup> <sup>a</sup><sup>P</sup> <sup>|</sup> <sup>s</sup>1; <sup>s</sup><sup>2</sup> | while(b) {s} | if(b) s<sup>1</sup> else s<sup>2</sup> | assert b Program <sup>P</sup> := <sup>f</sup>(v1, ··· , v*n*, v<sup>P</sup> <sup>1</sup>, ··· , v<sup>P</sup> *<sup>m</sup>*, ??1,..., ??*l*) {s}

Fig. 2: Syntax of the IMP-MOD language.

Theorem 1 (Chinese Remainder Theorem [4]). Let p1, ..., p<sup>k</sup> be positive integers that are pairwise co-prime—i.e., no two numbers share a divisor larger than 1. Denote N = k <sup>i</sup>=1 <sup>p</sup>i, and let <sup>d</sup>, <sup>r</sup>1, <sup>r</sup>2, ..., <sup>r</sup><sup>k</sup> be any integers. Then there is one and only one integer <sup>d</sup> <sup>≤</sup> x<d <sup>+</sup> <sup>N</sup> such that <sup>x</sup> <sup>≡</sup> <sup>r</sup><sup>i</sup> mod <sup>p</sup><sup>i</sup> for every <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>.

We define the translation function <sup>m</sup>P(x) := [<sup>x</sup> mod <sup>p</sup>i, ··· , x mod <sup>p</sup>k] that maps an integer to its set of remainders with respect to P. When mP(x) is bijective on some set R, we denote with m−1,R <sup>P</sup> : [0, p1) ×···× [0, pk) <sup>→</sup> <sup>R</sup> its inverse function.

Example 3. Let <sup>x</sup> be a integer in the range [0, 105) (note that 105 = 3 <sup>×</sup> <sup>5</sup> <sup>×</sup> <sup>7</sup>). If we know that the value of <sup>x</sup> is congruent to [2, <sup>1</sup>, 3] modulo {3, <sup>5</sup>, <sup>7</sup>}, we can uniquely identify the value of <sup>x</sup> to be <sup>101</sup> by observing that <sup>101</sup> <sup>≡</sup> 2 mod 3, <sup>101</sup> <sup>≡</sup> 1 mod 5, and <sup>101</sup> <sup>≡</sup> 3 mod 7.

The following lemma shows that the function m<sup>P</sup> is closed under addition, subtraction and multiplication of integers.

Lemma 1. For every set of primes <sup>P</sup>, integers <sup>x</sup> and <sup>y</sup>, and op ∈ {+, <sup>−</sup>, ∗}, the following holds: mP(x op y) = mP(x) op mP(y).

# 4.2 The IMP-MOD Language

In this section, we define the IMP-MOD language (syntax in Figure 2), a variant of the IMP language for which the semantics can be defined using modular arithmetic.<sup>3</sup> An IMP-MOD program is parametric on a set <sup>P</sup> <sup>=</sup> {p1,...,pk} of distinct

<sup>3</sup> We consider the simple subset for a clear presentation of the semantics, but our framework works for the full IMP language (and for more complex language constructs) as we will see in the later sections.

toPrime(a) P *σ,σ*<sup>P</sup> := [ a P *σ,σ*<sup>P</sup> mod p1, ··· ] vP P *σ,σ*<sup>P</sup> := <sup>σ</sup><sup>P</sup>(v) cP P *σ,σ*<sup>P</sup> := [ c mod p1, ··· , c mod p*<sup>k</sup>* ] aP <sup>1</sup> op<sup>P</sup> *<sup>a</sup>* <sup>a</sup><sup>P</sup> 2 P *σ,σ*<sup>P</sup> := [ (x<sup>1</sup> <sup>1</sup> op<sup>P</sup> *<sup>a</sup>* x<sup>2</sup> <sup>1</sup>) mod <sup>p</sup>1, ··· ] where aP *i* <sup>P</sup> = [ x*<sup>i</sup>* <sup>1</sup>, ··· , x*<sup>i</sup> <sup>k</sup>* ] aP <sup>1</sup>== a<sup>P</sup> 2 P *σ,σ*<sup>P</sup> := <sup>x</sup><sup>1</sup> 1==x<sup>2</sup> <sup>1</sup> ∧···∧ <sup>x</sup><sup>1</sup> *<sup>k</sup>* == x<sup>2</sup> *<sup>k</sup>* where aP *i* <sup>P</sup> = [ x*<sup>i</sup>* <sup>1</sup>, ··· , x*<sup>i</sup> <sup>k</sup>* ] c P *σ,σ*<sup>P</sup> := <sup>c</sup> v P *σ,σ*<sup>P</sup> := <sup>σ</sup>(v) <sup>a</sup><sup>1</sup> op*<sup>a</sup>* <sup>a</sup><sup>2</sup> P *σ,σ*<sup>P</sup> := a1 P *σ,σ*<sup>P</sup> op*<sup>a</sup>* a2 P *σ,σ*<sup>P</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup> P *σ,σ*<sup>P</sup> := (σ[<sup>v</sup> <sup>←</sup> a P *σ,σ*<sup>P</sup> ], σ<sup>P</sup>) <sup>v</sup><sup>P</sup> <sup>=</sup> <sup>a</sup><sup>P</sup> P *σ,σ*<sup>P</sup> := (σ, σ<sup>P</sup>[v<sup>P</sup> <sup>←</sup> aP P *σ,σ*<sup>P</sup> ])

Fig. 3: Modular semantics.

prime numbers. The structure of an IMP-MOD program is similar to an IMP program, but IMP-MOD supports two types of variables and arithmetic expressions: the regular IMP ones (i.e., v, a, and b), which operate over an integer semantics, and the modular ones (i.e., v<sup>P</sup>, a<sup>P</sup>, and b<sup>P</sup>), which take as an additional parameter the set of primes P and operate over a modular semantics. The semantics of some of the key constructs of IMP-MOD is shown in Figure 3.

The key idea of the modular semantics is that the value of each program variable in v<sup>P</sup> and arithmetic expressions in a<sup>P</sup> is denoted by a tuple of values, one for each prime number <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>P</sup>. For example, the value of the constant <sup>c</sup><sup>P</sup> is represented by the tuple [<sup>c</sup> mod <sup>p</sup>1, ··· , c mod <sup>p</sup>k], where each individual value denotes the remainder of c when divided by the prime number <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>P</sup>. Formally, the program <sup>f</sup> has two sets of variables <sup>V</sup> <sup>Z</sup> <sup>=</sup> {v1, ··· , vn} and <sup>V</sup> <sup>P</sup> <sup>=</sup> {v<sup>P</sup> <sup>1</sup> , ··· , v<sup>P</sup> m}, which contain all the integer and prime variables respectively, and a set of holes <sup>H</sup> <sup>=</sup> {??1,..., ??k}. The denotation function, uses two valuation functions: (i) <sup>σ</sup> : <sup>V</sup> <sup>Z</sup> <sup>∪</sup> <sup>H</sup> <sup>→</sup> <sup>Z</sup>, which maps variables and holes to integer values, (ii) <sup>σ</sup><sup>P</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> [0, p1) ×···× [0, pk), which maps primed variables to modular values. The expression toPrime(a) converts the integer value of an integer expression a to a modular tuple. Arithmetic expressions in a<sup>P</sup> are computed using modular values with the result being obtained using modular arithmetic with respect to the corresponding primes in P. Note that the only comparison operator allowed over modular expressions is == and that the division operator cannot be applied to modular expressions. While the syntax does not directly allow for holes to be represented modularly—i.e., we do not have expressions of the form ??<sup>P</sup>—an expression of the form toPrime(??) effectively achieves the objective of representing a hole ?? modularly.

#### 4.3 Equivalence between the two Semantics

Next, we provide an alternative integer semantics, which applies the IMP integer semantics to modular expressions and show that, under some assumptions on the values manipulated by the program, the modular and integer semantics are equivalent. We will use this result to build our modified synthesis algorithm.

toPrime(a)*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>a</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> <sup>v</sup><sup>P</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>σ</sup>2(v<sup>P</sup>) <sup>c</sup><sup>P</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>c</sup> aP <sup>1</sup> op<sup>P</sup> *<sup>a</sup>* <sup>a</sup><sup>P</sup> <sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := aP <sup>1</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> op*<sup>a</sup>* aP <sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> aP 1==a<sup>P</sup> <sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := aP <sup>1</sup>*<sup>σ</sup>*1*,σ*2==aP <sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> <sup>c</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>c</sup> <sup>v</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>σ</sup>1(v) <sup>a</sup><sup>1</sup> op*<sup>a</sup>* <sup>a</sup><sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := <sup>a</sup><sup>1</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> op*<sup>a</sup>* <sup>a</sup><sup>2</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := (σ1[<sup>v</sup> <sup>←</sup> <sup>a</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> ], σ2) <sup>v</sup><sup>P</sup> <sup>=</sup> <sup>a</sup><sup>P</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> := (σ1, σ2[v<sup>P</sup> <sup>←</sup> <sup>a</sup><sup>P</sup>*<sup>σ</sup>*1*,σ*<sup>2</sup> ])

Fig. 4: Integer semantics.

Integer Semantics The integer semantics of IMP-MOD is shown in Figure 4 (denoted -·<sup>σ</sup>1,σ<sup>2</sup> ). In this semantics, modular expressions are evaluated as integer expressions using the same semantics as for IMP—i.e., the values of modular variables and modular arithmetic expressions are denoted by integer values. Therefore, in the integer semantics, we use two valuation functions <sup>σ</sup><sup>1</sup> : <sup>V</sup> <sup>Z</sup> <sup>∪</sup> <sup>H</sup> <sup>→</sup> <sup>Z</sup> mapping variables and holes to integers and <sup>σ</sup><sup>2</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup> mapping modular variables to integers.

Relation between the Two Semantics We now show that the modular semantics is, in some sense, equivalent to the integer semantics. For the rest of this section, we fix a set of distinct primes <sup>P</sup> <sup>=</sup> {p1, ··· , pk}.

To prove the equivalence of the two program semantics, we will require the values of modular expressions to lie in some range that is covered by the prime numbers in P. The following definition captures this restriction.

Definition 1. Given a modular arithmetic expression a<sup>P</sup> (resp. Boolean expression b) and some integers L<U, we say a<sup>P</sup> with context (σ1, σ2) is uniformly in the range <sup>R</sup> := [L, U) —a<sup>P</sup> <sup>∈</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>R</sup> for short—if under the integer semantics, all evaluation of modular subexpressions of a<sup>P</sup> (resp. b) are in the range R:


Given a valuation function <sup>σ</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup>, we write <sup>m</sup><sup>P</sup> ◦ <sup>σ</sup> to denote the modular valuation obtained by applying the m<sup>P</sup> function to σ—i.e., for every <sup>v</sup><sup>P</sup> <sup>∈</sup> <sup>V</sup> <sup>P</sup>, (m<sup>P</sup> ◦ <sup>σ</sup>)(v<sup>P</sup>) = <sup>m</sup>P(σ(v<sup>P</sup>)). Similarly, for a modular valuation function <sup>σ</sup><sup>P</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> [0, p1) ×··· [0, pk), we denote <sup>m</sup>−1,R <sup>P</sup> ◦ <sup>σ</sup><sup>P</sup> the integer valuation from <sup>V</sup> <sup>P</sup> to <sup>R</sup> such that, for every <sup>v</sup><sup>P</sup> <sup>∈</sup> <sup>V</sup> <sup>P</sup>, (m−1,R <sup>P</sup> ◦ <sup>σ</sup><sup>P</sup>)(v<sup>P</sup>) = <sup>m</sup>−1,R <sup>P</sup> (σ<sup>P</sup>(v<sup>P</sup>)). The following lemma shows that, when the values of modular arithmetic expressions lay in an interval of size <sup>N</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> · ... · <sup>p</sup><sup>k</sup> the modular and integer semantics of modular arithmetic expressions are equivalent.

Lemma 2. Given a set of primes <sup>P</sup> <sup>=</sup> {p1, ··· , pk}, an arithmetic expression <sup>a</sup>P, and two valuation functions <sup>σ</sup><sup>1</sup> : <sup>V</sup> <sup>Z</sup> <sup>∪</sup> <sup>H</sup> <sup>→</sup> <sup>Z</sup> and <sup>σ</sup><sup>2</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup>, we have

$$m\_{\mathbb{P}}([a^{\mathbb{P}}]\_{\sigma\_1,\sigma\_2}) = [a^{\mathbb{P}}]\_{\sigma\_1,m\_{\mathbb{P}}\diamond \sigma\_2}^{\mathbb{P}}$$

Moreover, if there exists an interval <sup>R</sup> of size <sup>N</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> · ... · <sup>p</sup><sup>k</sup> such that <sup>a</sup><sup>P</sup> <sup>∈</sup>σ1,σ<sup>2</sup> <sup>R</sup>, then

$$m\_{\mathbb{P}}^{-1,R}([a^{\mathbb{P}}]\_{\sigma\_1,m\_{\mathbb{P}}\diamond \sigma\_2}^{\mathbb{P}}) = [a^{\mathbb{P}}]\_{\sigma\_1,\sigma\_2}.$$

Similarly, we show that the two semantics are also equivalent for Boolean expressions.

Lemma 3. Given a set of primes <sup>P</sup> <sup>=</sup> {p1, ··· , pk}, an interval <sup>R</sup> of size <sup>N</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> ·...· <sup>p</sup>k, a Boolean expression <sup>b</sup>, and two valuation functions <sup>σ</sup><sup>1</sup> : <sup>V</sup> <sup>Z</sup> <sup>∪</sup><sup>H</sup> <sup>→</sup> <sup>Z</sup> and <sup>σ</sup><sup>2</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup>, if <sup>b</sup> <sup>∈</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>R</sup>, then <sup>b</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>=</sup> bP <sup>σ</sup>1,mP◦σ<sup>2</sup> .

We are now ready to show the equivalence between the modular semantics and the integer semantics for programs <sup>P</sup> <sup>∈</sup> IMP-MOD. The semantics of a program <sup>P</sup> <sup>=</sup> <sup>f</sup>(<sup>V</sup> <sup>Z</sup>, V <sup>P</sup>, H) {s} is a map from valuations to valuations, i.e., given a valuation <sup>σ</sup><sup>1</sup> : <sup>V</sup> <sup>Z</sup> <sup>→</sup> <sup>Z</sup> for integer variables, a valuation <sup>σ</sup><sup>2</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup> for modular variables and a valuation <sup>σ</sup><sup>H</sup> : <sup>H</sup> <sup>→</sup> <sup>Z</sup> for holes, we have -<sup>P</sup>(σ1, σ2, σ<sup>H</sup>) = <sup>s</sup><sup>σ</sup>1∪σH,σ<sup>2</sup> and -<sup>P</sup><sup>P</sup>(σ1, σ2, σ<sup>H</sup>) = sP <sup>σ</sup>1∪σH,mP◦σ<sup>2</sup> . Therefore, it is sufficient to show that the two semantics are equivalent for any statement s.

The two semantics are equivalent for a statement s if, under the same input valuations, the resulting valuations of the semantics can be translated to each other. Formally, given valuations σ1, σ<sup>2</sup> and an interval R of size N, we say <sup>s</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>≡</sup><sup>P</sup> sP <sup>σ</sup>1,mP◦σ<sup>2</sup> iff <sup>σ</sup> <sup>1</sup> = σ <sup>1</sup> , <sup>m</sup><sup>P</sup> ◦ <sup>σ</sup> <sup>2</sup> <sup>=</sup> <sup>σ</sup><sup>P</sup> <sup>2</sup> and σ <sup>2</sup> <sup>=</sup> <sup>m</sup>−1,R <sup>P</sup> ◦ <sup>σ</sup><sup>P</sup> <sup>2</sup> where <sup>s</sup><sup>σ</sup>1,σ<sup>2</sup> = (σ 1, σ <sup>2</sup>) and sP <sup>σ</sup>1,mP◦σ<sup>2</sup> = (σ <sup>1</sup> , σ<sup>P</sup> 2 ).

We define uniform inclusion for statements.

Definition 2. Given a set of primes P, two integers L<U and a statement s, we say <sup>s</sup> with context (σ1, σ2) is uniformly in the range <sup>R</sup> := [L, U)—<sup>s</sup> <sup>∈</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>R</sup> for short—if under the integer semantics, all evaluation of modular subexpressions of s are in the range R:

$$\begin{array}{ll} & -\left(v^{\mathtt{P}} = a^{\mathtt{P}}\right) \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{iff} \ a^{\mathtt{P}} \in\_{\sigma\_{1},\sigma\_{2}} R. \\ & -\mbox{\$\mathtt{w}\boldsymbol{h}\,\mathtt{i}\,\mathtt{e}}(b) \{s\} \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{iff} \ s \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{and} \ b \in\_{\sigma\_{1},\sigma\_{2}} R. \\ & -\mbox{\$\mathtt{s}\_{1}\$}; \boldsymbol{s}\_{2} \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{iff} \ s\_{1} \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{and} \ s\_{2} \in\_{\sigma\_{1},\sigma\_{2}} R. \\ & -\mbox{\$\mathtt{if}(b)\$} \ s\_{1} \ \mathtt{else} \ s\_{2} \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{iff} \ s\_{1} \in\_{\sigma\_{1},\sigma\_{2}} R, \ s\_{2} \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{and} \ b \in\_{\sigma\_{1},\sigma\_{2}} R. \\ & -\mbox{\$\mathtt{assert}\$} \ b \in\_{\sigma\_{1},\sigma\_{2}} R \ \mbox{iff} \ b \in\_{\sigma\_{1},\sigma\_{2}} R. \end{array}$$

At last, the two semantics are equivalent for statements.

Theorem 2. Given a set of primes <sup>P</sup> = [p1, ··· , pk], a statement <sup>s</sup> and two valuation functions <sup>σ</sup><sup>1</sup> : <sup>V</sup> <sup>Z</sup> <sup>∪</sup><sup>H</sup> <sup>→</sup> <sup>Z</sup> and <sup>σ</sup><sup>2</sup> : <sup>V</sup> <sup>P</sup> <sup>→</sup> <sup>Z</sup>, if there exists an interval <sup>R</sup> of size <sup>N</sup> such that <sup>s</sup> <sup>∈</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>R</sup>, then <sup>s</sup><sup>σ</sup>1,σ<sup>2</sup> <sup>≡</sup><sup>P</sup> sP <sup>σ</sup>1,mP◦σ<sup>2</sup> .

Algorithm 1: returns variables that should be tracked using modular/integer semantics.

```
/* f: sketched function, V P variables to be tracked modularly, V Z
    variables to be tracked with integer values */
1 function DataFlowAnalysis(f)
2 S ← {/, <, >, ≤, ≥}; V Z ← ∅
3 for op ∈ S do
       /* Compute all variables v that may flow into op */
4 V Z ← V Z ∪ Dataflow(op, f)
5 V P ← V \ V Z
6 return (V Z, V P)
```
# 5 From IMP to IMP-MOD Programs

In this section, we develop a data flow analysis for detecting variables in IMP programs for which it is sound to track values modularly. We then use this data flow analysis to rewrite an IMP program to an equivalent IMP-MOD program.

#### 5.1 Data Flow Analysis

The formalization of IMP-MOD in Section 4.2 made it clear that the modular semantics is only appropriate when integer values are manipulated using addition, multiplication, subtraction, and equality. Other operations like division and less-than comparison cannot be computed soundly in modular arithmetic.

Example 4. Consider an integer variable x with modular value x<sup>2</sup> under modulus 2 and x<sup>3</sup> under modulus 3, and an integer variable y with modular value y2, y<sup>3</sup> under corresponding moduli. Then the assignment of x = y + y; implies x<sup>2</sup> = (y<sup>2</sup> + y2) mod 2; and x<sup>3</sup> = (y<sup>3</sup> + y3) mod 3. However, x = x/y; does not imply x<sup>2</sup> = (x2/y2) mod 2; and x<sup>3</sup> = (x3/y3) mod 3.

We now define a data flow analysis (shown in Algorithm 1) for computing which variables in a program must be tracked with the integer semantics (i.e., the set V <sup>Z</sup>) and which variables can be soundly tracked using the modular semantics (i.e., the set <sup>V</sup> <sup>P</sup>). For each operator op in {/, <, >, <sup>≤</sup>, ≥}, the analysis computes the set of variables that may flow into the operands of an expression of the form e<sup>1</sup> op e2. In practice, this is done via backward may analysis, noted as Dataflow procedure in Algorithm 1. The obtained set of variables must be tracked using the integer semantics. The remaining variables will never flow into a problematic operator and can therefore be tracked using the modular semantics.

Implementation Remark Since our implementation also supports arrays and recursion, the data flow analysis in Algorithm 1 is inter-procedural and the set S also contains the array indexing operator [ ]—i.e., given an expression arr[a], if a variable v may flow into a, then a must be tracked using the integer semantics.

R*a*(a) = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ <sup>v</sup><sup>P</sup> if <sup>a</sup> <sup>≡</sup> <sup>v</sup> and <sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>P</sup> <sup>c</sup><sup>P</sup> if <sup>a</sup> <sup>≡</sup> <sup>c</sup> R*a*(a1) op<sup>P</sup> *<sup>a</sup>* <sup>R</sup>*a*(a2) if <sup>a</sup> <sup>≡</sup> <sup>a</sup><sup>1</sup> op<sup>P</sup> *<sup>a</sup>* a<sup>2</sup> toPrime(a) otherwise R*b*(b) = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ <sup>R</sup>*a*(a1) == <sup>R</sup>*a*(a2) if <sup>b</sup> <sup>≡</sup> <sup>a</sup><sup>1</sup> == <sup>a</sup><sup>2</sup> <sup>R</sup>*b*(b1) and <sup>R</sup>*b*(b2) if <sup>b</sup> <sup>≡</sup> <sup>b</sup><sup>1</sup> and <sup>b</sup><sup>2</sup> not <sup>R</sup>*b*(b1) if <sup>b</sup> <sup>≡</sup> not <sup>b</sup><sup>2</sup> b otherwise R*s*(s) = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎩ <sup>R</sup>*s*(s1); <sup>R</sup>*s*(s2) if <sup>s</sup> <sup>≡</sup> <sup>s</sup>1; <sup>s</sup><sup>2</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup> if <sup>s</sup> <sup>≡</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup> and <sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>Z</sup> <sup>v</sup><sup>P</sup> <sup>=</sup> <sup>R</sup>*a*(a) if <sup>s</sup> <sup>≡</sup> <sup>v</sup> <sup>=</sup> <sup>a</sup> and <sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>P</sup> if(R*b*(b)) <sup>R</sup>*s*(s0) else <sup>R</sup>*s*(s1) if <sup>s</sup> <sup>≡</sup> if(b) <sup>s</sup><sup>0</sup> else <sup>s</sup><sup>1</sup> while(R*b*(b)) {R*s*(s)} if <sup>s</sup> <sup>≡</sup> while <sup>b</sup> {s} assert <sup>R</sup>*b*(b) if <sup>s</sup> <sup>≡</sup> assert <sup>b</sup>

Fig. 5: Subset of rules for the translation from IMP to IMP-MOD programs. Rules are parametric in <sup>V</sup> <sup>Z</sup>, <sup>V</sup> <sup>P</sup> with <sup>P</sup>: <sup>R</sup><sup>f</sup> (f(V, ??){s}) = <sup>f</sup>(<sup>V</sup> <sup>Z</sup>, V <sup>P</sup>, ??){Rs(s)}.

Furthermore, while in our formalization we allow variables to be tracked using only one of the two semantics, in our implementation, we allow variables to be tracked differently (using actual values or modular values) at different program points by tracking, for each variable v, the program points for which the actual value of v is needed, which is done by using the same data-flow analysis. In this case, a variable might initially need to be tracked using actual values but can later be tracked using modular values.

Example 5. Consider the sketch program polyArray in Figure 1b. For this program, Algorithm 1 will return that the variables x and y can be tracked modularly. However, the variables i and n must be tracked using the integer semantics since they are used in a < operation and as array indices.

# 5.2 From IMP to IMP-MOD

Now that we have computed what sets of variables can be tracked modularly, we can transform the IMP program into an IMP-MOD program. The transformation R<sup>f</sup> that rewrites f into an IMP-MOD program is shown in Figure 5. The key idea of the program transformation is to use the sets V <sup>Z</sup> and V <sup>P</sup> to only rewrite variables and sub-expressions of f for which the modular arithmetic can be performed soundly.

Once we get a solution for the IMP-MOD program as hole values, we can get a solution for the IMP program by mapping the hole to integer values given by the integer semantics.

Example 6. Consider a program where the dataflow analysis computes V <sup>Z</sup> = {i, n} and <sup>V</sup> <sup>P</sup> <sup>=</sup> {x}. The statement <sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>+</sup> <sup>i</sup> + 1 is rewritten to <sup>x</sup><sup>P</sup> <sup>=</sup> <sup>x</sup><sup>P</sup> <sup>+</sup> toPrime(i)+1P.

The transformation R<sup>f</sup> is sound.

Theorem 3. Given an IMP program f, and sets V <sup>Z</sup> and V <sup>P</sup> resulting from the data flow analysis on f, the program R<sup>f</sup> (f) is in the IMP-MOD language. Moreover, <sup>f</sup>IMP <sup>=</sup> -<sup>R</sup><sup>f</sup> (f).

# 6 Solving IMP-MOD Sketches

In this section, we discuss how synthesis in the modular semantics relates to synthesis in the integer semantics and provide an incremental algorithm for solving IMP-MOD sketches.

# 6.1 Synthesis in IMP-MOD

Given a set of integers R we say that a variable valuation σ is in R (denoted <sup>σ</sup> <sup>∈</sup> <sup>R</sup>) if for every <sup>v</sup>, we have <sup>σ</sup>(v) <sup>∈</sup> <sup>R</sup>. Similarly to what we saw in Section 3, we assume that the sketch has to be solved for finite ranges of possible values for the hole (RH) and input values (Rin). Solving an IMP-MOD problem <sup>P</sup> <sup>=</sup> <sup>f</sup>(V,V <sup>P</sup>, H){s} for the integer semantics amounts to solving the following constraint:

$$
\exists \sigma^H \in R\_H. \forall \sigma\_1, \sigma\_2 \in R\_{in}. \left[ \left. s \right|\_{\sigma\_1 \cup \sigma^H, \sigma\_2} \neq \bot \dots \right]
$$

According to Theorem. 2, given a set of distinct primes <sup>P</sup> <sup>=</sup> {p1, ··· , pk} and variable valuations σH, σ1, and σ2, if there exists a range R of size N = <sup>p</sup><sup>1</sup> · ... ··· <sup>p</sup><sup>k</sup> such that <sup>s</sup> <sup>∈</sup><sup>σ</sup>1∪σH,σ<sup>2</sup> <sup>R</sup>, the modular semantics and the integer semantics are equivalent to each other. Using this observation, we can define the set of variable valuations for which the two semantics are guaranteed to be equivalent:

$$\mathcal{L}\_R^\mathbb{P} \coloneqq \left\{ (\sigma\_1, \sigma\_2) \mid \forall \sigma^H \in R\_H. \exists R. \ |R| = N \land s \in\_{\sigma\_1 \cup \sigma^H, \sigma\_2} R \right\} \dots$$

Since for every <sup>σ</sup><sup>H</sup> <sup>∈</sup> <sup>R</sup><sup>H</sup> and <sup>σ</sup>1, σ<sup>2</sup> ∈ I<sup>P</sup> <sup>R</sup> we have that sP <sup>σ</sup>1∪σH,mP◦σ<sup>2</sup> <sup>=</sup> <sup>s</sup><sup>σ</sup>1∪σH,σ<sup>2</sup> , any solution to an IMP-MOD program in the modular semantics is also a solution to the following formula in the integer semantics:

$$\exists \sigma^H \in R\_H. \forall \sigma\_1, \sigma\_2 \in \mathcal{T}\_R^{\mathbb{P}}. \lbrack s \rbrack\_{\sigma\_1 \cup \sigma^H, \sigma\_2} \neq \bot.$$

When all valuations in <sup>σ</sup>1, σ<sup>2</sup> <sup>∈</sup> <sup>R</sup>in are also elements of <sup>I</sup><sup>P</sup> <sup>R</sup>, any solution to an IMP-MOD program in the modular semantics is guaranteed to be a correct solution under the integer semantics.

To summarize, if the synthesizer returns UNSAT for the IMP-MOD program, the problem is unrealizable and does not admit a solution. When it returns a solution, the solution is correct if it only produces valuations in the range allowed by

```
Algorithm 2: Incremental synthesis for IMP-MOD.
```
/\* f: function, P: set of primes \*/ <sup>1</sup> function IncrementalSynthesis(f, P) <sup>2</sup> P- ← [p1] <sup>3</sup> <sup>f</sup>syn <sup>←</sup> Synthesis(f, <sup>P</sup>- ) <sup>4</sup> while <sup>∃</sup>pcex <sup>∈</sup> <sup>P</sup> : <sup>¬</sup>Verify(fsyn, pcex) do <sup>5</sup> P- <sup>←</sup> <sup>P</sup>- ∪ pcex <sup>6</sup> <sup>f</sup>syn <sup>←</sup> Synthesis(f, <sup>P</sup>- ) <sup>7</sup> if fsyn == UNSAT then return ∅ ; 8 return fsyn

the choice of prime numbers. In practice, one can use a verifier to check the correctness of the synthesized solution and add more prime numbers to the modular synthesizer if needed. In fact, this is the main idea behind the counterexampleguided inductive synthesis algorithm used by Sketch (Section 3).

#### 6.2 Incremental Synthesis Algorithm

In this section, we propose an incremental synthesis algorithm that builds on the following observation. The set of variable valuations for which modular and integer semantics are equivalent increases monotonically in the size of P:

$$\mathbb{P}\_1 \subseteq \mathbb{P}\_2 \implies \mathcal{T}\_R^{\mathbb{P}\_1} \subseteq \mathcal{T}\_R^{\mathbb{P}\_2}.\tag{1}$$

Algorithm 2 uses Equation 1 to add prime numbers lazily during the synthesis process. The algorithm first constructs a set <sup>P</sup> <sup>=</sup> {p1} with the first prime number <sup>p</sup><sup>1</sup> <sup>∈</sup> <sup>P</sup> and synthesizes a solution that is correct for computations modulo the set P . It then checks if the synthesized solution fsyn satisfies the assertions with respect to all prime numbers in P. If yes, fsyn is returned as the solution. Otherwise, the algorithm finds a prime <sup>p</sup>cex <sup>∈</sup> <sup>P</sup> where Verify(fsyn, pcex) does not hold and it adds it to the set P continuing the iterative algorithm. Due to Equation 1, Algorithm 2 is sound and complete with respect to the synthesis algorithm that considers the full prime set P all at once.

In practice, the user could use domain knowledge to estimate a suitable set of primes or alternatively use our incremental algorithm to discover appropriate prime sets. The set of prime numbers {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>} could usually instantiate a range R that is large enough for most synthesis tasks based on Sketch.

# 7 Complexity of Rewritten Programs

In this section, we analyze how many bits are necessary to encode numbers for both semantics using unary and binary bit-vector encodings of integers (Sec. 7.1 and 7.2), and show how many prime numbers are necessary in the modular semantics to cover values up to a certain bound (Sec. 7.3). The following results build upon several number theory results that the reader can consult at [9, 15].

#### 7.1 Bit-complexity of Binary Encoding

In this section, we analyze how many bits are necessary when representing an interval of size N in binary in our modular semantics. In the rest of the section, we consider the set of primes <sup>P</sup><sup>n</sup> <sup>=</sup> {<sup>p</sup> <sup>|</sup> p<n} <sup>=</sup> {p1,...,pk} containing the prime numbers that have value smaller than n. We will show in Section 8 that this choice of prime number also yields good performance in practice. Concretely, we are interested in knowing what is the magnitude of the number <sup>N</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> ·...·p<sup>k</sup> and how many bits are used to represent the numbers in Pn.

We start by introducing the notion of primorial.

Definition 3 (Primorial). Given a number n, the primorial n# is defined as the product of all primes smaller than n—i.e., n# = - <sup>p</sup>∈P<sup>n</sup> p.

The primorial captures the size N of the interval covered by the Chinese Remainder Theorem when using prime numbers up to n. The following number theory result gives us a close form for the primorial and shows that the number N has approximately n bits.

$$n\# = e^{(1+o(1))n} = 2^{(1+o(1))n} \tag{2}$$

We use another number theory notion to quantify the number of bits in Pn.

Definition 4 (Chebyshev function). Given a number n, the Chebyshev function ϑ(n) is the sum of the logarithms of all the prime numbers smaller than n—i.e., ϑ(n) = <sup>p</sup>∈P<sup>n</sup> log p.

The following number theory result relates the primorial to the Chebyshev function.

$$\vartheta(n) = \log(n\#) = \log 2^{(1+o(1))n} = (1+o(1))n \tag{3}$$

Aside from rounding errors, the Chebyshev function captures the number of bits required to represent the numbers in Pn. To obtain a more precise bound on this number, we need a bound for the formula log <sup>p</sup>.

<sup>p</sup>∈P<sup>n</sup> We start by recalling the following fundamental number theory result.

Theorem 4 (Prime number theorem). The set P<sup>n</sup> has size approximately n/ log n.

Using Theorem 4, we get the following result.

$$\sum\_{p \in \mathbb{P}\_n} \lceil \log p \rceil \le n/\log n + \sum\_{p \in \mathbb{P}\_n} \log p \approx (1 + o(1))n \tag{4}$$

Representing a number e<sup>n</sup> in a classic binary encoding requires log2(e<sup>n</sup>) = (1 + o(1))n bits and, combining Equations 2 and 4, we get the following result.

Theorem 5. Representing a number 2<sup>n</sup> in binary requires (1+o(1))n bits under both modular and integer semantics.

Hence, representing a number in binary requires the same number of bits in the both semantics.

Example 7. Consider the set <sup>P</sup><sup>18</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>}, which can model an interval of N = 510, 510 integers (i.e., n = 18 in Theorem 5). Representing N in binary requires 19 bits while the binary representations of all the primes in P<sup>18</sup> use 22 bits. Both numbers are close to 18 as predicted by the theorem.

# 7.2 Bit-complexity of Unary Encoding

As discussed in Sec. 3, the default Sketch solver encodes numbers using a unary encoding—i.e., Sketch requires 2<sup>n</sup> bits to encode the number 2<sup>n</sup>. Representing the same number in unary under the modular semantics requires only prime numbers smaller than n and therefore <sup>p</sup>∈P<sup>n</sup> p bits. We can then use the following

closed form to approximate this quantity.

$$\sum\_{p \in \mathbb{P}\_n} p \sim \frac{n^2}{2\log n} \tag{5}$$

Equation 5 yields the following theorem.

Theorem 6. Representing a number 2<sup>n</sup> in unary requires 2<sup>n</sup> bits in the integer semantics and approximately <sup>n</sup><sup>2</sup> 2 log <sup>n</sup> bits in the modular semantics.

These results show that, under a unary encoding, the modular semantics is exponentially more succinct than the integer semantics.

Example 8. Consider again the prime set <sup>P</sup><sup>18</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>}, which can model an interval of N = 510, 510 integers. Representing N in unary requires 510,510 bits. On the other hand, the sum of the bits in the unary encoding of the primes in P<sup>18</sup> is 58.

#### 7.3 Number of Required Primes

We analyze how many primes are needed to represent a certain number in the modular semantics. We start by introducing the following alternative version of the primorial.

Definition 5 (Prime Primorial). For the n-th prime number pn, the prime primorial pn# is defined as the product of the first n primes—i.e., pn# = n k=1 pi.

The following known number theory result gives us an approximation for the prime primorial.

$$p\_n \# = e^{(1+o(1))n\log n} \tag{6}$$

Notice how the approximation of the primorial differs from that of the prime primorial. This is due to the fact that prime numbers are sparse—i.e., the n-th prime number is approximately n log n.

Using Equation 6 we obtain the following result.

Theorem 7. Representing numbers in an interval of size N = e<sup>n</sup> log <sup>n</sup> in the modular semantics requires the first n prime numbers.

Since the relation k = n log n does not admit a closed form for n, we cannot derive exactly how many primes are needed to represent a number 2<sup>k</sup> with k bits. It is however clear from the theorem that the number of required primes grows slower than k.

# 8 Evaluation

We implemented a prototype of our technique as a simple compiler in Java. Our implementation provides a simplified Sketch frontend, which only allows the limited syntax we support. Given a Sketch file, our tool rewrites it into a different Sketch file that operates according to the modular semantics. We will use Unary to denote the result obtained by running the default version of Sketch with unary integer encoding on the original Sketch file, Binary to denote the result obtained by running the version of Sketch using an SMT-like native integer solver based on binary integer encoding, Unary-p to denote the result of running the default Sketch version on our modified Sketch file, and Unaryp-inc to denote the result of running the default version of Sketch on the file generated by the incremental version of our algorithm described in Section 6. As expected from our theory, the prime technique is not beneficial for the SMT-like native integer solver and always results in worse runtime. Therefore, we do not present data for this solver. All experiments were performed on a machine with 4.0GHz Intel Core i7 CPU with 16GB RAM with Sketch-1.7.5 and we use a timeout value of 300 seconds (we also report out-of-memory errors as timeouts).

Our evaluation answers the following research questions:

Q1 How does the performance of Unary-p compare to Unary and Binary?

Q2 How does the incremental algorithm compare to the non-incremental one?

Q3 Is Unary-p's performance sensitive to the set of selected prime numbers?

Q4 How many primes are needed by Unary-p to produce correct solutions?

Q5 Does Unary generate larger SAT queries than Unary-p?

#### 8.1 Benchmarks

We perform our evaluation on three families of programs.

Polynomials The first set of benchmarks contains 81 variants of the polynomial synthesis problem presented in Figure 1. The original version of this benchmark appears in the Sketch benchmark suite under the name polynomial.sk. For each benchmark, we generate a random polynomial <sup>f</sup>, random inputs {−→<sup>x</sup> }, and take the set {( −→x,f(x))} as specification. Each benchmark in this set has the following parameters: #Ex∈ {2, <sup>4</sup>, <sup>6</sup>} is the number of input-output examples as specification, cbits∈ {5, <sup>6</sup>, <sup>7</sup>} denote the number of bits hole values can use, exIn∈ {[−10, 10], [−30, 30], [−50, 50]} denotes the range of randomly generated input examples and coeff∈ {[−10, 10], [−30, 30], [−50, 50]} denotes the range of randomly generated coefficients in the polynomial f.

Invariants The second set of benchmarks contain 46 variants of two invariant generation problems obtained from a public set of programs that require polynomial invariants to be verified [8]. We selected the two programs in which at least one variable could be tracked modularly by our tool (the other programs involved complex array operations or inequality operators) and turned the verification problems into synthesis problems by asking Sketch to find a polynomial equality (using the program variables) that is an invariant for the loop in the program. To control the size of the magnitudes of the inputs, we only require the invariants to hold for a fixed set of input examples.

The first problem, mannadiv, iteratively computes the remainder and the quotient of two numbers given as input. The invariant required to verify mannadiv is a polynomial equality of degree 2 involving 5 variables. The Sketch template required to describe the space of all polynomial equalities has 32 holes and cannot be handled by any of the Sketch solvers we consider. We therefore simplify the invariant synthesis problems in two ways. In the first variant, we reduce the ranges of the hole values in the templates by considering cbits ∈ {2, <sup>3</sup>}. In the second variant, we set cbits <sup>=</sup> {5, <sup>6</sup>, <sup>7</sup>}, but reduce the number of missing hole values to 4 (i.e., we provide part of the invariant). Each benchmark takes two random inputs and we consider the following input ranges {[1, 50], [1, 100]}. In total, we have 10 benchmarks for mannadiv.

The second problem, petter, iteratively computes the sum <sup>1</sup>≤i≤<sup>n</sup> <sup>i</sup> <sup>5</sup> for a given input n. The invariant required to verify petter is a polynomial equality of degree 6 involving 3 variables. The Sketch template required to describe all such polynomial equalities has 56 holes and cannot be handled by any of the Sketch solvers we consider. We consider the following simplified variants of the problem: (i) petter\_0 computes <sup>1</sup>≤i≤<sup>n</sup> <sup>1</sup> and requires a polynomial invariant of degree one, (ii) petter\_x computes <sup>1</sup>≤i≤<sup>n</sup> <sup>x</sup> for a given input variable <sup>x</sup> and requires a polynomial invariant of degree two, ( iii) petter\_1 computes <sup>1</sup>≤i≤<sup>n</sup> <sup>i</sup> and requires a polynomial invariant of degree two, and (iv) petter\_10 computes <sup>1</sup>≤i≤<sup>n</sup> <sup>i</sup> + 1 and requires a polynomial invariant of degree two. Each benchmark takes two random inputs and we consider the following input ranges {[1, 10], [1, 100], [1, 1000]}. In total, we have 12 variants of petter, each run for values of cbits ∈ {5, <sup>6</sup>, <sup>7</sup>}—i.e., a total of 36 benchmarks.

Program Repair The third set of benchmarks contains 54 variants of Sketch problems from the domain of automatic feedback generation for introductory programming assignments [7]. Each benchmark corresponds to an incorrect program submitted by a student and the goal of the synthesizer is to find a small variation of the program that behaves correctly on a set of test cases. We select the 6/11 benchmarks from the tool Qlose [7] for which (i) our implementation can support all the features in the program, and (ii) our data flow analysis identifies at least one variable that can be tracked modularly. Of the remaining benchmarks, 3/11 do not contain variables that can be tracked modularly, and 2/11 call auxiliary functions that cannot be translated into Sketch. For each Table 1: Effectiveness of different solvers. SAT (resp. UNSAT) denotes the number of benchmarks for which solver could find a solution to the benchmarks (resp. prove no solution existed) while TO denotes the number of timeouts.


program, we consider the original problem and two variants where the integer inputs are multiplied by 10 and 100, respectively. Further, for each program variants, we impose an assertion specifying that the distance between the original program and the repaired program is within a certain bound. We select three different bounds for each program: the minimum cost c, c + 100, and c + 200.

# 8.2 Performance of Unary-p

Table 1 summarizes our comparison. First, we compare the performance of Unary-p and Unary. We use <sup>P</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>}, which is enough for Unary-p to always find correct solutions (we verify the correctness of a solution by instantiating the hole values in the original sketch programs). Unary can only solve 69/181 benchmarks while Unary-p can solve 169/181. Figure 7a shows a scatter plot (log scale) of the solving times for the two techniques: each point below the diagonal line denotes a benchmark on which Unary-p was faster than Unary. Points on the extreme right-hand side of the plot denote timeout for Unary. When both solvers terminate, Unary-p (avg. 1.7s) is 6.1X (geometric mean) faster than Unary (avg. 25.0s).

Next, we compare the performance of Unary-p and Binary (Figure 7b). On the 64 easier benchmarks that Binary can solve in less than 1 second, Binary (avg. 0.55s) outperforms Unary-p (avg. 2.32s), but Unary-p still has reasonable performance. On the 49 benchmarks that Binary can solve between 1 and 10 seconds, Unary-p (avg. 3.5s) is on average 1.9X faster than Binary (avg. 6.9s). Most interestingly, for the 14 harder benchmarks for which Binary takes more than 10 seconds, Unary-p (avg. 5.7s) is on average 15.9X faster than Binary (avg. 90.9s). Remarkably, Unary-p solved 43 of the benchmarks (in less than 8s each) for which Binary timed out<sup>4</sup>, and Unary-p only timed out for two benchmarks that Binary could solve in less than a second and one benchmark that Binary could solve in 260s. Finally, we would like to highlight that for 41/208 benchmarks, even Unary outperforms Binary. As expected from

<sup>4</sup> During our experiment, we observed that Binary *incorrectly* reported UNSAT for 10 satisfiable benchmarks. We reported these benchmarks as timeouts and have contacted the authors of Sketch to address the issue.

the discussion throughout the paper, these are benchmarks typically involving complex operations but not involving overly large numbers.

We can now answer Q1. First, Unary-p consistently outperforms Unary across all benchmarks. Second, Unary-p outperforms Binary on hard-tosolve problems and can solve problems that Binary cannot solve e.g., Unary-p solved 28/46 invariant problems that Sketch could not solve. Unary-p and Binary have similar performance on easy problems.

Comparison to full SMT encoding For completeness, we also compare our approach to a tool that uses SMT solvers to model the entire synthesis problem. We choose the state-of-the-art SMT-based synthesizer Rosette [23] for our comparison. Rosette is a programming language that encodes verification and

synthesis constraints written in a domainspecific language into SMT formulas that can be solved using SMT solvers.

We only run Rosette on the set of Polynomials because Rosette does support the theories of integers, but does not have native support for loops, so there is no direct way to encode Invariants and Program Repair benchmarks. To our knowledge, Rosette provides a way to specify the number k it uses to model integers and reals as k-bit words, but the user has no control over how many bits it uses for unknown holes specifically. So

Fig. 6: Rosette vs Binary

we evaluate 27 instead of 81 variants of the polynomial synthesis problem on Rosette, i.e., we consider different numbers of cbits.

Figure 6 shows the running times (log scale) for Rosette and Binary with cbits=6. Rosette successfully solved 16/27 benchmarks and it terminates quickly (avg. 2.9s) when it can find a solution. However, Rosette times out on 11 benchmarks for which Binary terminates. The timeouts are due to the fact that Rosette employs full SMT encodings that combine multiple theories while Binary uses a SAT solver that is only modified to accommodate SMT-like integer constraints. Since we now know full SMT encodings are not as general and efficient as the encodings used in Sketch, we will only evaluate the effectiveness of our technique based on comparison with Binary.

Finally, we tried applying our prime-based technique to Rosette and, as expected, the technique is not beneficial due to the binary encoding of numbers in SMT, and causes all benchmarks to timeout. To summarize, (i) SMT solvers cannot efficiently handle the synthesis problems considered in this paper, and (ii) our technique is better suited for SAT solvers than SMT solvers.

#### 8.3 Performance of Incremental Solving

Our implementation of the incremental solver Unary-p-inc first attempts to find a solution with the prime set <sup>P</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>}. If the solver returns a correct

Fig. 7: Performance of Unary, Binary, and Unary-p.

solution, Unary-p-inc terminates. Otherwise, Unary-p-inc incrementally adds the next prime to P until it finds a correct solution, it proves there is no solution, or it times out. Unary-pinc is 25.2% (geometric mean) slower than Unary-p (Figure 8 (log scale)). Unary-p-inc can solve three benchmarks for which both Unary-p and Binary timed out. To answer Q3, Unary-p-inc and Unary-p have similar performance.

Fig. 8: Unary-p-inc vs Unary-p

# 8.4 Varying the Prime Number Set P

In this experiment, we evaluate how different prime number sets affect Unary-p.

We consider the 5 increasing sets of primes: <sup>P</sup><sup>5</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>}, <sup>P</sup><sup>7</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>}, <sup>P</sup><sup>11</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>}, <sup>P</sup><sup>13</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>}, and <sup>P</sup><sup>17</sup> <sup>=</sup> {2, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>11</sup>, <sup>13</sup>, <sup>17</sup>}. Figure 9a (log scale) shows the running times for all the polynomial benchmarks with cbits=7 (showing all benchmarks would clutter the plot). The points where the lines change from dashed to solid denote the number of primes for which the algorithm starts yielding correct solutions. As expected, a smaller set of primes leads to faster solving times as the resulting constraints are smaller and fewer bits are needed for encoding intermediate values. The runtime on average grows with the increasing size of the primes. For example, across all benchmarks, using P<sup>17</sup> takes 23% longer on average than using P11. To answer Q3, Unary-p is slower when using increasingly large sets of prime.

In terms of correctness, we find that smaller prime sets often yield incorrect solutions (P<sup>5</sup> (37% correct), P<sup>7</sup> (70%), P<sup>11</sup> (86%), P<sup>13</sup> (97 %), and P<sup>17</sup> (100%) because there is not enough discriminative power with fewer primes and the

Fig. 9: Performance for different sets of prime numbers.

solutions may overfit to the smaller set of intermediate values. It is interesting to note that even prime sets of intermediate size often lead to correct solutions in practice, which explains some of the speedups observed in the incremental synthesis algorithm. To answer Q4, Unary-p is able to synthesize correct solutions even with intermediate sized sets of primes.

Changing Magnitude of Primes We also evaluate the performance of Unaryp when using primes of different magnitudes. We consider the sets of primes {11, <sup>17</sup>, <sup>19</sup>, <sup>23</sup>}, {31, <sup>41</sup>, <sup>47</sup>}, and {251, <sup>263</sup>}, which define similar integer ranges, but pose different trade-offs between the number of used primes and their sizes e.g., the set {251, <sup>263</sup>} only uses two very large primes. Since the different sets cover similar integer ranges, they all produce correct solutions. Figure 9b (log scale) shows the running time of Unary-p for the same benchmarks as Figure 9a. Larger prime sets of smaller prime values require less time to solve than smaller prime sets of larger prime values. This result is expected since, in the unary encoding of numbers, representing larger numbers requires more bits.

#### 8.5 Size of SAT Formulas

In this experiment, we compare the sizes of the intermediate SAT formulas generated by Unary-p and Unary. Figure 10a shows a scatter plot (log scale) of the number of clauses of the largest intermediate SAT query generated by the CEGIS algorithm for the two techniques. We only plot the instances in which Unary was able to produce at least a SAT formula. Unary produces SAT formulas that are on average 19.3X larger than those produced by Unary-p. To answer Q5, as predicted by our theory, Unary-p produces significantly smaller SAT queries than Unary.

Performance vs Size of SAT Queries We also evaluate the correlation between synthesis time and size of SAT queries. Figure 10b plots the synthesis times of both solvers against the sizes of the SAT queries. It is clear that the synthesis

Fig. 10: SAT formulas sizes and performance.

time increases with larger SAT queries. The plot illustrates how the solving time strongly depends on the size of the generated formulas.

# 9 Related Work

Program Sketching Program sketching was designed to automatically synthesize efficient bit-vector manipulations from inefficient iterative implementations [21]. The Sketch tool has since been engineered to support complex language features and operations [19]. Thanks to its simplicity, sketching has found wide adoption in applications such as optimizing database queries [3], automated feedback generation [18], program repair [7], and many others. Our work further extends the capabilities of Sketch in a new direction by leveraging number theory results. In particular, our technique allows Sketch to handle sketches manipulating large integer numbers. To the best of our knowledge, our technique is the first one that can solve many of the benchmarks presented in this paper. Uses of Chinese Remainder Theorem The Chinese Remainder Theorem and its derivative corollaries have found wide application in several branches of Computer Science and, in particular, in Cryptography [11, 26].

The idea of using modular arithmetic to abstract integer values has been used in program analysis. Since modular fields are finite, they can be used as an abstract domain for verifying programs manipulating integers [5]—e.g., the abstract domain can track whether a number is even or odd. Our work extends this idea to the domain of program synthesis and requires us to solve several challenges. First, when used for verifying programs, the modular abstraction is used to overapproximate the set of possible values of the program and does not need to be precise. In particular, Clark et al. [5] allow program operations that are in the IMP language but not in the IMP-MOD language and lose precision when modeling such operations—e.g., when performing the assignment x = x/2 the value of x mod 2 can be either 0 or 1. Such imprecision is fine in program analysis since the abstraction is used to show that a program does not contain a bug i.e., even in the abstract domain, the problem behaves fine. In our setting, the problem is opposite as we use the abstraction to simplify the synthesis problem and provide a theory for when the modular and integer semantics are equivalent.

Pruning Spaces in Program Synthesis Many techniques have been proposed to prune large search space of possible programs [14]. Enumerative synthesis techniques [24, 12, 13, 17] enumerate programs in a search space and avoid enumerating syntactically and semantically equivalent terms. Some synthesizers such as Synquid [16] and Morpheus [10] use refinement types and first-order formulas over specifications of DSL constructs to refute inconsistent programs. Recently, Wang et al. [25] proposed a technique based on abstraction refinement for iteratively refining abstractions to construct synthesis problems of increasing complexity for incremental search over a large space of programs.

Instead of pruning programs in the syntactic space, our technique uses modular arithmetic to prune the semantic space—i.e., the complexity of verifying the correctness of the synthesized solution—while maintaining the syntactic space of programs. Our approach is related to that of Tiwari et al. [22], who present a technique for component-based synthesis using dual semantics—where syntactic symbols in a language are provided two different semantics to capture different requirements. Our technique is similar in the sense that we also provide an additional semantics based on modular arithmetic. However, we formalize our analysis based on number theory results and develop it in the context of generalpurpose Sketch programs that manipulate integer values, unlike Tiwari et al.'s work that is developed for straight-line programs composed of components.

Synthesis for Large Integer Values Abate et al. propose a modification of the Cegis algorithm for solving syntax-guided synthesis (SyGuS) problems with large constants [1]. SyGuS differs from program sketching in how the synthesis problem is posed and in the type of programs that can be modeled. In particular, in SyGuS one can only describe programs representing SMT formulas and the logical specification for the problem can only relate the input and output of the program—i.e., there cannot be intermediate assertions within the program. The problem setup and the solving algorithms proposed in this paper are orthogonal to those of Abate et al. First, we focus on program sketching, which is orthogonal to SyGuS as sketching allows for richer and more generic program spaces as well as richer specifications. While it is true that certain synthesis problems can be expressed both as sketches and as SyGuS problems, this is not the case for our benchmarks programs, which use loops, arrays and non-linear integer arithmetic, all of which are not supported by SyGuS. Second, our technique is motivated by how Sketch encodes and solves program sketches through SAT solving. While the traditional Sketch encoding can explode for large constants, the same encoding allows Sketch to solve program sketches involving complex arithmetic and complex programming constructs. The algorithm proposed by Abate et al. iteratively builds SMT (not SAT) formulas that are required to be in a decidable logical theory. Such an encoding only works for the restricted programming models used in SyGuS problems.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Modular Relaxed Dependencies in Weak Memory Concurrency***-*

Marco Paviotti12, Simon Cooksey2, Anouk Paradis3, Daniel Wright2, Scott Owens2, and Mark Batty<sup>2</sup>

<sup>1</sup> Imperial College London, United Kingdom m.paviotti@ic.ac.uk <sup>2</sup> University of Kent, Canterbury, United Kingdom {m.paviotti, sjc205, daw29, S.A.Owens, M.J.Batty}@kent.ac.uk <sup>3</sup> ETH Zurich, Switzerland anouk.paradis@polytechnique.org

**Abstract.** We present a denotational semantics for weak memory concurrency that avoids thin-air reads, provides data-race free programs with sequentially consistent semantics (DRF-SC), and supports a compositional refinement relation for validating optimisations. Our semantics identifies false program dependencies that might be removed by compiler optimisation, and leaves in place just the dependencies necessary to rule out thin-air reads. We show that our dependency calculation can be used to rule out thin-air reads in any axiomatic concurrency model, in particular C++. We present a tool that automatically evaluates litmus tests, show that we can augment C++ to fix the thin-air problem, and we prove that our augmentation is compatible with the previously used compilation mappings over key processor architectures. We argue that our dependency calculation offers a practical route to fixing the longstanding problem of thin-air reads in the C++ specification.

**Keywords:** Thin-air problem · Weak memory concurrency · Compiler Optimisations · Denotational Semantics · Compositionality

# **1 Introduction**

It has been a longstanding problem to define the semantics of programming languages with shared memory concurrency in a way that does not allow unwanted behaviours – especially observing *thin-air* values [8,7] – and that does not forbid compiler optimisations that are important in practice, as is the case with Java and Hotspot [30,29]. Recent attempts [16,11,25,15] have abandoned the style of *axiomatic models*, which is the de facto paradigm of industrial specification [8,2,6]. Axiomatic models comprise rules that allow or forbid individual program executions. While it is impossible to solve all of the problems in an

<sup>-</sup> This work was funded by EPSRC Grants EP/M017176/1, EP/R020566/1 and EP/S028129/1, the Lloyds Register Foundation, and the Royal Academy of Engineering.

axiomatic setting [7], abandoning it completely casts aside mature tools for automatic evaluation [3], automatic test generation [32], and model checking [23], as well as the hard-won refinements embodied in existing specifications like C++, where problems have been discovered and fixed [8,7,18]. Furthermore, the industrial appetite for fundamental change is limited. In this paper we offer a solution to the thin-air problem that integrates with existing axiomatic models.

The thin-air problem in C++ stems from a failure to account for dependencies [22]: *false dependencies* are those that optimisation might remove, and *real dependencies* must be left in place to forbid unwanted behaviour [7]. A single execution is not sufficient to discern real and false dependencies. A key insight from previous work [14,15] is that event structures [33,34] give us a simultaneous overview of all traces at once, allowing us to check whether a write is sure to happen in every branch of execution. Unfortunately, previous work does not integrate well with axiomatic models, nor lend itself to automatic evaluation.

To address this, we construct a denotational semantics in which the meaning of an entire program is constructed by combining the meanings of its subcomponents via a compositional function over the program text. This approach can be particularly amenable to automatic evaluation, reasoning and compiler certification [19,24], and fits with the prevailing axiomatic approach.

This paper uses this denotational approach to capturing program dependencies to explore the thin-air problem, resulting in a concrete proposal for fixing the thin-air problem in the ISO standard for C++.

*Contributions.* There are two parts to the paper. In the first, we develop a denotational model called "Modular Relaxed Dependencies model" (MRD) and build metatheory around it. The model uses a relatively simple account of synchronisation, but it demonstrates separation between the calculation of dependency and the enforcement of synchronisation. In the second, we evaluate the dependency calculation by combining it with the fully-featured axiomatic models RC11 [18] and IMM [26].

The denotational semantics has the following advantages:


We adopt the dependency calculation from the global semantics of point 4 as the basis of our C++ model, which we call MRD-C11. We establish the C++ DRF-SC property described in the standard [13] (§9.1) and we provide several desirable properties for a solution to the thin-air problem in C++:


#### **1.1 Modular Relaxed Dependency by example**

To simplify things for now, we will attach an Init program to the beginning of each example to initialise all global variables to zero. Doing this makes the semantics non-compositional, but it is a natural starting place and aligns well with previous work in the area. Later, after we have made all of our formal definitions, we will see why the Init program is not necessary.

For now, consider a simple programming language where all values are booleans, registers (ranging over r) are thread-local, and variables (ranging over x, y) are global. Informally, an event structure for a program consists of a directed graph of events. Events represent the global variable reads and writes that occur on all possible paths that the program can take. This can be built up over the program as follows: each write generates a single event, while each read generates two – one for each possible value that could be read. These read events are put in *conflict* with each other to indicate that they cannot both happen in a single execution, this is indicated with a zig-zag red arrow between the two events. Additionally, the event structure tracks true dependencies via an additional relation which we call *semantic dependencies* (dp). These are yellow arrows from read events to write events.

For example, consider the program

$$(\mathbf{r}\_1 := \mathbf{x}; \; \mathbf{y} := \mathbf{r}\_1) \tag{LB\_1}$$

that reads from a variable x and then writes the result to y. The interpretation of this program is an event structure depicted as follows:

Each event has a unique identifier (the number attached to the box). The straight black arrows represent program order, the curved yellow arrows indicate a causal dependency between the reads and writes, and the red zigzag represents a conflict between two events. If two events are in conflict, then their respective continuations are in conflict too.

If we interpret the program Init; LB1, as below, we get a program where the Init event sets the variables to zero.

In the above event structure, we highlight events {1, 2, 3} to identify an execution. The green dotted arrow indicates that event 2 reads its value from event 1, we call this relation *reads-from* (rf). This execution is *complete* as all of its reads read from a write and it is closed w.r.t conflict-free program order.

We interpret the following program similarly,

$$\{\mathbf{r}\_2 \colon \mathbf{\color{red}{y} : x \mathrel{\begin{array}{c} = \mathbf{r}\_2 \ \\ \cdot \end{array}}\} \tag{LB\_2}$$

leading to a symmetrical event structure where the write to x is dependent on the read from y.

The interpretation of Init; (LB<sup>1</sup> - LB2) gives the event structure where (LB1) and (LB2) are simply placed alongside one another.

The interpretation of parallel composition is the union of the event structures from LB<sup>1</sup> and LB<sup>2</sup> without any additional conflict edges. When parallel composing the semantics of two programs, we add all rf-edges that satisfy a coherence axiom. Here we present an axiom that provides desirable behaviour in this example (Section 4 provides our model's complete axioms).

(dp <sup>∪</sup> rf) is acyclic

The program Init; (LB<sup>1</sup> - LB2) allows executions of the following three shapes.

Note that in this example, we are not allowed to read the value 1 – reading a value that does not appear in the program is one sort of thin-air behaviour, as described by Batty et al. [7]. For example, the execution {1, 4, 5, 8, 9} does not satisfy the coherence axiom as 4 dp −→ <sup>5</sup> rf −→ <sup>8</sup> dp −→ <sup>9</sup> rf −→ 4 forms a cycle.

We now substitute (LB2) with the following code snippet

$$\mathbf{r}\_1 \colon= \mathbf{y}; \; \mathbf{x} \; := 1 \tag{L B\_3}$$

where the value written to the variable x is a constant. Its generated event structure is depicted as follows

In this program, for each branch, we can reach a write of value 1 to location x. Hence, this will happen no matter which branch is chosen: we say b and d are *independent writes* and we draw no dependency edges from their preceding reads.

Consider now the program (LB3) in parallel with LB<sup>1</sup> introduced earlier in this section. As usual, we interpret the Init program in sequence with (LB<sup>1</sup> - LB3) as follows:

The resulting event structure is very similar to that of (LB<sup>1</sup> - LB2), but the executions permitted in this event structure are different. The dependency edges calculated when adding the read are preserved, and now executions {1, 2, 3, a, b} and {1, a, b, 4, 5} are allowed. However, this event structure also contains the execution in which d is independent.

In the execution {<sup>d</sup> rf −→ <sup>4</sup> dp −→ <sup>5</sup> rf −→ c} there is no rf or dp edge between d and c that can create a cycle, hence this is a valid complete execution in which we can observe x = 1, y = 1. Note that the Init is irrelevant in the consistency of this execution.

*Modularity.* It is worthwhile underlining the role that modularity plays here. In order to compute the behaviour of (LB<sup>1</sup> - LB2) and (LB<sup>1</sup> - LB3) we did not have to compute the behaviour of LB<sup>1</sup> again. In fact, we computed the semantics of LB1, LB<sup>2</sup> and LB<sup>3</sup> in isolation and then we observed the behaviour in parallel composition.

*Thin-air values.* The program (LB<sup>1</sup> - LB3) is a standard example in the weak memory literature called *load buffering*. In the program (LB<sup>1</sup> - LB2), if event 5 or 9 were allowed in a complete execution, that would be an undesirable thin-air behaviour: there is no value 1 in the program text, nor does any operation in the program compute the value 1. The program (LB<sup>1</sup> - LB3) is similar, but now contains a write of value 1 in the program text, so this is no longer a thin-air value. Note that the execution given for it is not sequentially consistent, but nonetheless a weak memory model needs to allow it so that a compiler can, for example, swap the order of the two commands in LB3, which are completely independent of each other from its perspective.

# **2 Event Structures**

Event structures will form the semantic domain of our denotational semantics in Section 5. Our presentation follows the essential ideas of Winskel [33] and is further influenced by the treatment of shared memory by Jeffrey and Riely [15].

#### **2.1 Background**

A partial order (E, ) is a set E equipped with a reflexive, transitive and antisymmetric relation . A well-founded partial order is a partial order that has no infinite decreasing chains of the form ··· e<sup>i</sup>−<sup>1</sup> e<sup>i</sup> ei+1 ··· .

<sup>A</sup> *prime event structure* is a triple (E, , #). <sup>E</sup> is a set of events, is a well-founded partial order on E and # is a conflict relation on E. # is binary, symmetric and irreflexive such that, for all c, d, e <sup>∈</sup> <sup>E</sup>, if <sup>c</sup>#<sup>d</sup> <sup>e</sup> then <sup>c</sup>#e. We write Con(E) for the set of *conflict-free* subsets of E, *i.e.* those subsets C ⊆ E for which there is no c, d <sup>∈</sup> <sup>C</sup> such that <sup>c</sup>#d.

*Notation.* We use E to range over (prime/labelled/memory) event structures, and also the event set contained within, when there is no ambiguity. We also use E for event structures.

<sup>A</sup> *labelled event structure* (E, , #, λ), over a set of labels <sup>Σ</sup>, is a prime event structure together with a function λ : E → Σ which assigns a label to an event. We make events explicit using the notation {e : σ} for λ(e) = σ. We sometimes avoid using names and just write the label σ when there is no risk of confusion.

Consider the labelled event structure formed by the set {1, 2, 3, 4}, where the order relation is defined such that 1 2 3 and 1 4, the conflict relation is defined such that 2#4 and 3#4, and the labelling function is defined such that λ(1) = (W x 0), λ(2) = (R x 0), λ(3) = (W y 1) and λ(4) = (R x 1). The event structure is visualised on the left (we elide conflict edges that can be inferred from order).

Given labelled event structures E<sup>1</sup> and E<sup>2</sup> define the *product* labelled event structure <sup>E</sup>1×E<sup>2</sup> - (E, , #, λ). <sup>E</sup> is <sup>E</sup>1∪E2, assuming <sup>E</sup><sup>1</sup> and <sup>E</sup><sup>2</sup> to be disjoint, is <sup>1</sup> ∪ 2, # is #<sup>1</sup> <sup>∪</sup> #<sup>2</sup> and <sup>λ</sup> is <sup>λ</sup><sup>1</sup> <sup>∪</sup> <sup>λ</sup>2.

The *coproduct* labelled event structure E<sup>1</sup> + E<sup>2</sup> is the same as the product, except that the conflict relation # is #<sup>1</sup> <sup>∪</sup> #<sup>2</sup> ∪ {E<sup>1</sup> <sup>×</sup> <sup>E</sup>2}∪{E<sup>2</sup> <sup>×</sup> <sup>E</sup>1}. We can use a similar construction for the co-product of an infinite set of pairwisedisjoint labelled event structures, indexed by I: we take infinite unions on the underlying sets and relations, along with extra conflicts for every pair of indices. Where the E<sup>i</sup> are not disjoint, we can make them so by renaming with fresh event identifiers. In particular, we will need the infinite coproduct - <sup>i</sup>∈<sup>I</sup> <sup>E</sup> with as many copies of E as the cardinality of the set I, and all the events between each copy in conflict. Each of these copies will by referred to as <sup>E</sup><sup>i</sup> .

For a labelled event structure E<sup>0</sup> and an event e, where e ∈ E0, define the *prefix* labelled event structure, <sup>e</sup> • E0, as a labelled event structure (E, , #, λ) where <sup>E</sup> equals <sup>E</sup><sup>0</sup> ∪ {e}, equals <sup>0</sup> <sup>∪</sup> ({e} × <sup>E</sup>), and # equals #0.

#### **2.2 The fork-join event structure**

Our language supports parallel composition nested under sequential composition, so we will need to model spawning threads and a subsequent wait for their termination. To support this, we define the *fork-join* composition of two labelled event structures, E<sup>1</sup> E2. First we define the leaves, ↓ (E), as the -maximal elements of E. Let I be the set of maximal conflict-free subsets of ↓ (E1). Intuitively, each event set in I corresponds to the last events<sup>4</sup> of one way of executing the concurrent threads in E1. We then generate a fresh copy of E<sup>2</sup> for each of the executions: E<sup>3</sup> = - <sup>i</sup>∈<sup>I</sup> <sup>E</sup>2.

Now <sup>E</sup><sup>1</sup> E<sup>2</sup> - (E, , #, λ) such that <sup>E</sup> is <sup>E</sup>1∪E3, # is #1∪#3, <sup>λ</sup> is <sup>λ</sup>1∪λ3, is the transitive closure of

$$\subseteq\_1 \cup \subseteq\_3 \cup \bigcup\_{i \in I} \{ (e, e') \mid e \in i \land e' \in \mathcal{E}\_2^E \} $$

The set of events, E, is the set E<sup>1</sup> plus all the elements from the copies of <sup>E</sup>3. The order, , is constructed by linking every event in the copy <sup>E</sup><sup>i</sup> <sup>2</sup>, with all the events in the set i, plus the obvious order from E<sup>1</sup> and the order in the local copy <sup>E</sup><sup>i</sup> <sup>2</sup>. Finally, the conflict relation is the union of the conflict in E<sup>1</sup> and E3.

# **3 Coherent event structure**

The signature of labels, Σ, is defined as follows:

$$\Sigma = (\{\mathbf{R}, \mathbf{W}\} \times \mathcal{X} \times \mathcal{V}) + \{\mathbf{L}\} + \{\mathbf{U}\},$$

where (<sup>W</sup> x v) <sup>∈</sup> <sup>Σ</sup> and (<sup>R</sup> x v) <sup>∈</sup> <sup>Σ</sup> are the usual write and read operations and L, U are the lock and unlock operations respectively.

A *coherent event structure* is a tuple (E, S, , ≤) where E is a labeled event structure. S is a set of *partial executions*, where each execution is a tuple comprising a maximal conflict-free set of events, together with an intra-thread reads-from

<sup>4</sup> We assume that there are no infinite increasing --chains in E1.

relation rfi, an extra-thread reads-from rfe, a dependency relation dp, and a *partial order* on lock/unlock events lk. The justification relation, , is a relation between conflict-free sets and events. Finally, the *preserved program order*, ≤<sup>X</sup> , is a restriction of the program order, , for events on the same variable. <sup>≤</sup><sup>L</sup> is the restriction of program order on events related in program order with locks or unlocks. Finally, we define rf to be rf<sup>e</sup> <sup>∪</sup> rf<sup>i</sup> and <sup>≤</sup> to be <sup>≤</sup><sup>X</sup> ∪ ≤L. For a partial execution, <sup>X</sup> <sup>∈</sup> <sup>S</sup>, we denote its components as lkX, rf<sup>X</sup> and dpX.

Justification, , collects dependency information in the program and is used to calculate dpX. For a conflict-free set C and an event e, we say C *justifies* e or e *depends* on C whenever C e. We collect dependencies between events modularly in order to identify the so-called independent writes which will be introduced shortly.

For a given partial execution, X, we define the order hb<sup>X</sup> as the reflexive transitive closure of ( ∪ lkX). A coherent event structure contains a *data race* if there exists an execution X, with two events on the same variable x, at least one of which is a write, that are not ordered by hbX. A coherent event structure is *data-race-free* if it does not contain any data race. A *racy* rfX*-edge* is when two events <sup>w</sup> and <sup>r</sup> are racy and <sup>w</sup> rf<sup>e</sup> −−→<sup>X</sup> <sup>r</sup>. Note that rf<sup>i</sup> edges cannot ever be racy. We now define a coherent partial execution.

**Definition 1 (Coherent Partial Execution).** *A partial execution* X *is* coherent *if and only if:*


A *complete execution* X is an execution where all read events r have a write <sup>w</sup> that they read from, i.e. <sup>w</sup> rf −→<sup>X</sup> r.

# **4 Weak memory model**

Central to the model is the way it records program dependencies in and dp. Justification, , records the structure of those dependencies in the program that may be influenced by further composition. As we shall see, composing programs may add or remove dependencies from justification: for example, composing a read may make later writes dependent, or the coproduct mechanism, introduced shortly, may remove them. In some parts of the program, e.g. inside locked regions, dependencies do not interact with the context. In this case, we *freeze* the justifications, using them to calculate dp. Following a freeze, the justification relation is redundant and can be forgotten – dp can be used to judge which executions are coherent.

*Freezing.* Here we define a function *freeze* which takes a justification C (w : <sup>W</sup> x v) and gives the corresponding dependency relation (<sup>r</sup> : <sup>R</sup> x v) dp −→ (w : <sup>W</sup> x v) iff <sup>r</sup> <sup>∈</sup> <sup>C</sup>. We lift *freeze* to a function on an event structure as follows:

$$\text{freeze}(E\_1, S\_1, \vdash\_1, \le\_1) \stackrel{\Delta}{=} (E\_1, S, \emptyset, \le\_1) \tag{1}$$

where S contains all the executions

$$\left(X\_1, \mathsf{LK}\_{X\_1}, \left(\mathsf{DP}\_{X\_1} \cup \mathsf{DP}\right), \mathsf{RF}\_{X\_1}\right)$$

where for each write, w<sup>i</sup> ∈ X1, we choose a justification so that C<sup>1</sup> <sup>1</sup> w1, ..., C<sup>n</sup> <sup>1</sup> w<sup>n</sup> covers all writes in X1. Furthermore, with dp defined as follows:

$$\text{DP} = \{ \bigcup\_{i \in \{1, \dots, n\}} \text{free}(C\_i \vdash w\_i) \}$$

X<sup>1</sup> must be a *coherent execution*. We prove that for a coherent execution there always exists a choice of write justifications that freeze into dependencies to form a coherent execution.

We will illustrate freezing of the program,

$$\mathbf{r}\_1 \mathrel{\mathop{:=}} \mathbf{x}; \; \mathbf{r}\_2 \mathrel{\mathop{:=}} \mathbf{t}; \; \mathbf{if} \, \{\mathbf{r}\_1 == 1 \lor \mathbf{r}\_2 == 1\} \{\mathbf{y} := 1\}$$

whose event structure is as follows:

The rules later on in this section will provide us with justifications {(6 : <sup>R</sup> <sup>t</sup> 1)} (9 : <sup>W</sup> <sup>y</sup> 1) and {(2 : <sup>R</sup> <sup>x</sup> 1)} (9 : <sup>W</sup> <sup>y</sup> 1) (but not the *independent justification* (9 : <sup>W</sup> <sup>y</sup> 1)). So in this program there are two *minimal* justifications of (9 : W y 1). The result of freezing is to duplicate all partial executions for each choice of write justifications. In this case, we get an execution containing 2 dp −→ 9 and another one containing 6 dp −→ 9.

#### **4.1 Prepending single events**

When prepending loads and stores, we model forwarding optimisations by updating the justification relation: e.g. when prepending a write, (w : W x 0), to an event structure where {(<sup>r</sup> : <sup>R</sup> <sup>x</sup> 0)} <sup>w</sup> , write forwarding satisfies the read of the justification, leaving an independently justified write, w .

Forwarding is forbidden if there exists e in E such that w ≤ e ≤ r, as in the example on the left. In this example we do not forward 1 to 6. The rules of this section give us that {1, 3, 6} 9: we have preserved program order over the accesses of x, 1 ≤ 3 ≤ 6, and we do not forward across the intervening read 3.

**Read Semantics** We now define the semantics of read prepending as follows:

$$((r:\mathcal{R}\ x\ v)\bullet(E\_1, S\_1, \vdash\_1, \le\_1) \triangleq ((r:\mathcal{R}\ x\ v)\bullet E\_1, S, \vdash\_1 \le) \tag{2}$$

where preserved program order ≤ is built straightforwardly out of ≤1, ordering locks, unlocks and same-location accesses, and S is defined as the set of all (<sup>X</sup> ∪ {r}, lkX, rfX, dpX), where <sup>X</sup> is a partial execution of <sup>S</sup><sup>1</sup> and is the smallest relation such that for all C <sup>1</sup> e we have

$$C\_1 \cup \{r\} \mid \mathrm{LF} \vdash e$$

with LF being the "*Load Forwarded*" set of reads, *i.e.*the set of reads consecutively following the matching prepended one:

$$\text{LF} = \{ (r': \text{R } x \; v) \in C\_1 \mid \nexists e', r \le^{\mathcal{X}} e' \le^{\mathcal{X}} r' \},$$

This allows for load forwarding optimisations and coherence is satisfied by construction.

**Write Semantics** The write semantics are then defined as follows:

$$((w:W\ x\ v)\bullet(E\_1, S\_1, \vdash\_1, \le\_1) \triangleq ((w:W\ x\ v)\bullet E\_1, S, \vdash\_1, \le) \tag{3}$$

where ≤ is built as in the read rule and S contains all *coherent* executions of the form,

(<sup>X</sup> ∪ {w}, lkX,(rf<sup>X</sup> <sup>∪</sup> rfi), dpX)

where <sup>X</sup> <sup>∈</sup> <sup>S</sup>1, and <sup>w</sup> rf<sup>i</sup> −−→ <sup>r</sup> for any set of matching reads <sup>r</sup> in <sup>E</sup><sup>1</sup> such that condition (1.2) of coherence is satisfied. Adding rf<sup>i</sup> edges leaves condition (1.1) satisfied.

The justification relation is the smallest upward-closed relation such that for all C <sup>1</sup> e:


with SF being the *Store Forwarding* set of reads, *i.e.*the set of reads that we are going to remove from the justification set for later events that are matching the write we are prepending. This is defined as follows:

$$\text{SF} = \{ (r' : \text{R } x \ v) \mid \nexists e, w \leq^{\mathcal{X}} e \leq^{\mathcal{X}} r' \} $$

When prepending a write to an event structure, we add it to justifications that contain a read to the same variable. Failing to do so would invalidate the DRF-SC property. We provide an example in Section 6.3, but we need to complete the definition of the semantics first, in particular, we need to explain first how the writes are lifted. This is coming in the next section (Section 4.2).

#### **4.2 Coproduct semantics**

The coproduct mechanism is responsible for making writes independent of prior reads if they are sure to happen, regardless of the value read. It produces the independent writes that enabled relaxed behaviour in the example in Section 1.

In the definition of coproduct we use an upward-closure of justification to enable the lifting of more dependencies. Whenever C e we define ↑ (C) as the upward-closed justification set, i.e. D e if C e, D is a conflict-free lock-free set with C ⊆ D, such that for all e ∈ D if e is an event such that e ≤ e then e ∈ D.

Now we define the coproduct operation. If E<sup>1</sup> is a labelled event structure of the form (r<sup>1</sup> : <sup>R</sup> x v1) • <sup>E</sup> <sup>1</sup> and, similarly, <sup>E</sup><sup>2</sup> is of the form (r<sup>2</sup> : <sup>R</sup> x v2) • <sup>E</sup> 2, the coproduct of event structures is defined as,

$$(E\_1, S\_1, \vdash\_1, \le\_1) + (E\_2, S\_2, \vdash\_2, \le\_2) \stackrel{\Delta}{=} (E\_1 + E\_2, S\_1 \cup S\_2, (\vdash\_1 \cup \vdash\_2 \cup \vdash), \le)$$

where whenever {r1} ∪ <sup>C</sup><sup>1</sup> <sup>1</sup> (<sup>w</sup> : <sup>W</sup> y v) and {r2} ∪ <sup>C</sup><sup>2</sup> <sup>2</sup> (w : <sup>W</sup> y v) then if the following conditions hold, we have D w and D w :


The example of Section 1 illustrates the application of condition (1) of coproduct. Recall the event structures of (LB1) and (LB3) respectively.

In each case, the event structure is built as the coproduct of the conflicting events. In (LB3), prior to applying coproduct we have {a} b and {c} d. The writes have the same label for both read values so, taking C<sup>1</sup> and C<sup>2</sup> to be empty, coproduct makes them independent, adding the independent writes b and d. In contrast, the values of writes 3 and 5 differ in (LB1), so the coproduct has {2} 3 and {4} 5. When ultimately frozen, the justifications of (LB1) will produce the dependency edges (2, 3) and (4, 5) as described in Section 1.

As for condition (2), if there is an event in the justification set that is ordered in ≤<sup>X</sup> with the respective top read, then the top read cannot be erased from the justification. Doing so would break the ≤<sup>X</sup> link.

When having value sets that contain more than two values, we use - <sup>v</sup>∈V to denote a *simultaneous coproduct* (rather than the infinite sum). More precisely, if we coproduct the event structures E0, E1, ··· , E<sup>n</sup> in a pairwise fashion as follows,

$$(\cdot \cdots (E\_0 + E\_1) + \cdots) + E\_v$$

we would get liftings that are undesirable. To see this, it suffices to consider the program,

$$\text{if } (\mathbf{r} = 3) \{ \mathbf{x} := 2 \} \{ \mathbf{x} := 1 \}.$$

where the write to x of 1 is independent for a coproduct over values 1 and 2, but not when considering the event structure following (R x 3).

#### **4.3 Lock semantics**

When prepending a lock, we order the lock before following events in ≤ and we freeze the justifications into dependencies. By freezing, we prevent justifications from events after the lock from interacting with newly appended events. This disables optimisations across the lock, e.g. store and load forwarding.

We define the semantics of locks as follows,

$$(l:\mathcal{L})\bullet(E\_1,\vdash\_1,S\_1,\leq\_1)\triangleq((l:\mathcal{L})\bullet E\_1,\emptyset,S,\leq)\tag{4}$$

where ≤<sup>X</sup> remains unchanged and (E <sup>1</sup>, ∅, S <sup>1</sup>, ≤ <sup>1</sup>) = *freeze*(E1, <sup>1</sup>, S1, ≤1), where S contains all partial executions of the form,

$$(X \cup \{l\}, (\text{LK}\_X \cup \text{LK}), \text{DP}\_X, \text{RF}\_X)$$

where X ∈ S <sup>1</sup> and the lock order lk is such that for all lock or unlock event l <sup>∈</sup> <sup>X</sup>, <sup>l</sup> lk −→ l . Finally, <sup>≤</sup><sup>L</sup> is <sup>≤</sup><sup>L</sup> <sup>1</sup> extended with the lock ordered before all events in E 1.

The semantics for the unlock is similar.

#### **4.4 Parallel composition**

We define the parallel semantics as follows. Note that this operation freezes the constituent denotations before combining them, erasing their respective justification relations. This choice prevents the optimisation of dependencies across forks and it makes thread inlining optimisations unsound, as they are in the Promising Semantics [16] and the Java memory model [21].

$$((E\_1, S\_1, \vdash\_1, \le\_1) \times (E\_2, S\_2, \vdash\_2, \le\_2) \stackrel{\Delta}{=} (E\_1 \times E\_2, S, \emptyset, \le\_1 \cup \le\_2))$$

where, S are all *coherent* partial executions of the form,

$$\left( \left( X\_1 \cup X\_2, \left( \text{LK}\_{X\_1} \cup \text{LK}\_{X\_2} \cup \text{LK} \right), \left( \text{D} \Gamma\_{X\_1} \cup \text{D} \Gamma\_{X\_2} \right), \left( \text{R} \Gamma\_{X\_1} \cup \text{R} \Gamma\_{X\_2} \cup \text{R} \Gamma\_e \right) \right) \right)$$

where <sup>X</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>F</sup> <sup>1</sup> , <sup>X</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>F</sup> <sup>2</sup> and

$$\begin{array}{c} -\mathit{freeze}(E\_1, S\_1, \vdash\_1, \le\_1) = (E\_1, S\_1^F, \emptyset, \le\_1) \\ -\mathit{freeze}(E\_2, S\_2, \vdash\_2, \le\_2) = (E\_2, S\_2^F, \emptyset, \le\_2) \end{array}$$

Furthermore, lk is constrained so that (lk<sup>X</sup><sup>1</sup> <sup>∪</sup> lk<sup>X</sup><sup>2</sup> <sup>∪</sup> lk) is a *total* order over the lock/unlock operations such that no lock/unlock operation is introduced between a lock and the next unlock on the same thread. Finally, we add all (<sup>w</sup> : <sup>W</sup> x v) rf<sup>e</sup> −−→ (<sup>r</sup> : <sup>R</sup> x v) edges such that the execution satisfies condition (1.1) of coherence<sup>1</sup> and such that w belongs to S<sup>F</sup> <sup>1</sup> and r belongs to S<sup>F</sup> <sup>2</sup> or vice versa.

#### **4.5 Join Semantics**

We define the join composition as follows:

$$(E\_1, S\_1, \vdash\_1, \le\_1) \star (E\_2, S\_2, \vdash\_2, \le\_2) \stackrel{\Delta}{=} (E\_1 \star E\_2, S, \vdash\_1, \le) \tag{5}$$

where ≤ is built as in the read rule and S are all executions of the form

$$\left( \left( X\_1 \cup X\_2, \left( \text{LK}\_{X\_1} \cup \text{LK}\_{X\_2} \cup \text{LK} \right), \left( \text{D} \Gamma\_{X\_1} \cup \text{D} \Gamma\_{X\_2} \right), \left( \text{R} \Gamma\_{X\_1} \cup \text{R} \Gamma\_{X\_2} \cup \text{R} \Gamma\_i \right) \right) \right)$$

where <sup>X</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> and <sup>X</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>2</sup> with <sup>X</sup><sup>1</sup> and <sup>X</sup><sup>2</sup> conflict-free. Lock order lk orders all lock/unlock of <sup>X</sup><sup>1</sup> before all lock/unlock of <sup>X</sup><sup>2</sup> and <sup>w</sup> rf<sup>i</sup> −−→ <sup>r</sup> whenever <sup>w</sup> <sup>∈</sup> <sup>X</sup><sup>1</sup> and r ∈ X<sup>2</sup> such that the execution is still coherent.

# **5 Language and Semantics**

We consider an imperative language that has sequential and parallel composition, and mutable shared memory.

#### **Definition 2 (Language).**

$$\begin{aligned} B &:= M = M \mid B \land B \mid B \lor B \mid \neg B & M &:= n \mid \mathbf{r} \\ P &:= \mathbf{s} \text{skip} \mid \mathbf{r} &:= \mathbf{x} \mid \mathbf{x} := M \mid P\_1 \; \mathbf{i} & P\_2 \mid P\_1 \; \| \; P\_2 \; \| \; \mathbf{i} \; \{B\} \{P\_1\} \{P\_2\} \\ &\mid \text{while} \{B\} \{P\} \mid \mathbf{L} \mid \mathbf{U} \end{aligned}$$

We have standard boolean expressions, B, and expressions, M, represented by natural numbers, n, or registers, r. Finally we have the set of command statements, P, where skip is the command that performs no action, r := x reads from a global variable and stores the value in r, x := M computes the expression M and stores its value to the global variable x, P1; P<sup>2</sup> is sequential composition,

<sup>1</sup> Note that condition (1.2) does not need to be checked.

and P<sup>1</sup> - P<sup>2</sup> is parallel composition. We have standard conditional statements, while loops, locks and unlocks. Moreover, a program P is *lock-well-formed* <sup>5</sup> if on every thread, every lock is paired with a following unlock instruction and vice versa, and there is no lock or unlock operation between pairs.

A *register environment*, R→V, is a function from the set of local registers, R, to the set of values, V. A *continuation* is a function taking a register environment, R→V, to an event structure, E. We write ∅ as a short-hand for λρ.∅, the continuation returning the empty event structure.

We interpret the syntax defined above into the semantic domain defined in Section 4. In Figure 1, we define -· as a function which takes a *step-index* <sup>n</sup>, a register environment ρ, and a continuation κ, and returns a coherent event structure.

The interpretation function -· is defined first by induction on the step-index and then by induction on the syntax of the program. When n = 1 the interpretation gives the empty event structure (undefined). Otherwise we proceed by induction on the structure of the program. skip is just the continuation applied to the environment. A read is interpreted as a set of conflicting read events for each value v attached with a continuation applied to the environment where the register is updated with v.

A write is interpreted as a write with a following continuation. We interpret sequencing by interpreting the second program and passing it on to the interpretation of the first as a continuation. Parallel composition is the interpretation of the two programs with empty continuations passed to the × operator. The conditional statement is interpreted as usual. For interpreting the while-loops we use the induction hypothesis on the step-index [9].

When parallel composing two threads, we want to forbid any reordering with events sequenced before or after the composition (as thread inlining would do). To forbid this local reordering we surround this composition with two lock-unlock pairs.

#### **5.1 Compositionality**

We define the language of contexts inductively in the standard way.

#### **Definition 3 (Context).**

$$\begin{aligned} \mathcal{C} &::= [-] \mid P; \mathcal{C} \mid \mathcal{C}; \; P \mid (\mathcal{C} \parallel P) \mid (P \parallel \mathcal{C}) \\ &\mid \text{if } (B) \{\mathcal{C}\} \{P\} \mid \text{if } (B) \{P\} \{\mathcal{C}\} \mid \text{while } (B) \{\mathcal{C}\} \end{aligned}$$

In the base case, the context is a hole, denoted by [−]. The inductive cases follow the structure of the program syntax. In particular, a context can be a program P in sequence with a context, a context in sequence with a program P and so on. For a context C we denote C[P] by the inductively defined function on the context C that substitutes the program P in every hole.

<sup>5</sup> Jeffrey and Riely [15] adopt the same restriction. We conjecture that modelling blocking locks [4] would not affect the DRF-SC property.

$$\begin{aligned} \left[\mathbb{P}\right]\_{1\,\rho\,\kappa} &= \underline{\emptyset} \\ \left[\mathsf{skip}\right]\_{n\,\rho\,\kappa} &= \kappa(\rho) \\ \left[\mathsf{x}\mathrel{\mathop{:}=}\mathsf{x}\right]\_{n\,\rho\,\kappa} &= \Sigma\_{v\in V}(\mathsf{R}\,\boldsymbol{x}\,\boldsymbol{v}\,\boldsymbol{\sigma}\,\kappa(\rho[\boldsymbol{r}\mapsto\boldsymbol{v}])) \\ \left[\mathsf{x}\mathrel{\mathop{:}=}\boldsymbol{M}\right]\_{n\,\rho\,\kappa} &= \left(\mathrm{W}\,\boldsymbol{x}\,\left[\boldsymbol{M}\right]\_{\rho}\right)\bullet\kappa(\rho) \\ \left[\left[\boldsymbol{P}\_{1}\mathrel{\mathop{:}}{}{}{}\_{1}\boldsymbol{P}\_{2}\right]\_{n\,\rho\,\kappa} &= \left[\boldsymbol{P}\_{1}\right]\_{n\,\rho}\left(\boldsymbol{\lambda}\boldsymbol{\rho}.\left[\boldsymbol{P}\_{2}\right]\_{n\,\rho\,\kappa}\right) \end{aligned} \qquad\text{where } \left(\boldsymbol{E}\_{1},\mathbb{H}\_{1}\right) = \kappa(\rho) \text{ } \\ \left[\boldsymbol{\operatorname{U}}\_{1}\right]\_{n\,\rho\,\kappa} &= \left(\mathrm{U}\,\boldsymbol{\operatorname{U}}\,\boldsymbol{E}\_{1}\right)\_{n\,\rho\,\kappa} \qquad\text{where } \left(\boldsymbol{E}\_{1},\mathbb{H}\_{1}\right) = \kappa(\rho) \text{ } \end{aligned}$$

$$\begin{aligned} \left[\boldsymbol{P}\_{1} \parallel \boldsymbol{P}\_{2}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}} &= \left[\mathbf{L}; \mathbf{U}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}'}\\ \text{where } \boldsymbol{\kappa}' &= \left(\lambda\rho. (\left[\boldsymbol{P}\_{1}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\varrho}}) \times (\left[\boldsymbol{P}\_{2}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}}) \star (\left[\boldsymbol{L}; \mathbf{U}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}})\right) \\ \left[\mathbf{H}\left(\boldsymbol{B}\right)\{\boldsymbol{P}\_{1}\}\{\boldsymbol{P}\_{2}\}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}} &= \begin{cases} \left[\boldsymbol{P}\_{1}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}} & \left[\boldsymbol{B}\right]\_{\boldsymbol{\rho}} = \mathbf{T} \\ \left[\boldsymbol{P}\_{2}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}} & \left[\boldsymbol{B}\right]\_{\boldsymbol{\rho}} = \mathbf{F} \end{cases} \\ \left[\mathbf{while}\right]\_{\boldsymbol{n}}\left(\boldsymbol{B}\right)\{\boldsymbol{P}\}\boldsymbol{\ll} &= \begin{cases} \left[\boldsymbol{P}, \mathbf{while}\right] \boldsymbol{\kappa}\left(\boldsymbol{B}\right)\{\boldsymbol{P}\}\boldsymbol{\ll} & \left[\boldsymbol{B}\right]\_{\boldsymbol{\rho}} = \mathbf{T} \\ \left[\mathbf{skip}\right]\_{\boldsymbol{n}\cdot\boldsymbol{\rho}\cdot\boldsymbol{\kappa}} & \left[\boldsymbol{B}\right]\_{\boldsymbol{\rho}} = \mathbf{F} \end{cases} \end{aligned}$$

Fig. 1: Semantic interpretation

The following lemma shows that the semantics preserve context application. This falls out from the fact that the semantic interpretation is compositional, that is, we define every constructor in terms of its subcomponents.

**Lemma 1 (Compositionality).** *For all programs* <sup>P</sup>1*,* <sup>P</sup>2*, if* -<sup>P</sup><sup>1</sup> <sup>=</sup> -<sup>P</sup><sup>2</sup> *then for all contexts* <sup>C</sup>*,* -<sup>C</sup>[P1] <sup>=</sup> -<sup>C</sup>[P2]*.*

The proof is a straightforward induction on the context C and it follows from the fact that semantics is inductively defined on the program syntax. The attentive reader may note that to prove -<sup>P</sup><sup>1</sup> <sup>=</sup> -<sup>P</sup><sup>2</sup> in the first place we have to assume <sup>n</sup>, <sup>ρ</sup> and <sup>κ</sup> and prove -<sup>P</sup><sup>1</sup>nρκ <sup>=</sup> -<sup>P</sup><sup>2</sup>nρκ. It is customary however in denotational semantics to have programs denoted by functions that are equal if they are equal at all inputs [31].

#### **5.2 Data Race Freedom**

Data race freedom ensures that we forbid optimisations which could lead to unexpected behaviour even in the absence of data races. We first define the *closed semantics* for a program <sup>P</sup>. For all <sup>n</sup>, the semantics of <sup>P</sup>, namely -P is -Init(P)n λx.<sup>0</sup> <sup>∅</sup>, where Init(P) is the program that takes the global variables in P and initialises them to 0. We now establish that race-free programs interpreted in the closed semantics have sequentially consistent behaviour.

*DRF semantics.* Rather than proving DRF-SC directly, we prove that race-free programs behave according to an intermediate semantics ·. This semantics differs from -· in only two ways: program order is used in the calculation of coherence instead of preserved program order, and no dependency edges are recorded (as these are subsumed by program order). More precisely, the semantics is calculated as in Figure 1 but we check that (rf<sup>e</sup> <sup>∪</sup> lk ∪ ) is acyclic.

Note that race-free executions of the intermediate semantics · satisfy the constraints of the model of Boehm and Adve [10], and the definition of race is the same between the two models. Boehm and Adve prove that in the absence of races, their model provides sequential consistency.

The DRF-SC theorem is stated as follows.

**Theorem 1.** *For any program* <sup>P</sup>*, if* P *is data race free then every execution* <sup>D</sup> *in* -<sup>P</sup> *is a sequentially consistent execution,* i.e. <sup>D</sup> *is in* P*.*

# **6 Tests and Examples**

In this section, four examples demonstrate aspects of the semantics: the first recognises a false dependency, the second forbids unintended behaviour allowed by Jeffrey and Riely [15], the third motivates the choice to add forwarded writes to justification, and the last shows how we support an optimisation forbidden by Java but performed by the Hotspot compiler.

#### **6.1 LB+ctrl-double**

In the first example, from Batty et al. [7], the compiler collapses conditionals to transform P<sup>1</sup> to P2.

Coproduct ensures that the denotations of P<sup>1</sup> and P<sup>2</sup> are identical, with the event structure above, together with justification b and d. From compositionality (Lemma 1) and equality of the denotations, we have equal behaviour of P<sup>1</sup> and P<sup>2</sup> in any context, and the optimisation is allowed.

#### **6.2 Jeffrey and Riely's TC7**

The next test is Java TC7. The outcome where r1, r<sup>2</sup> and r<sup>3</sup> all have value 1 is forbidden by Jeffrey and Riely [15, Section 7], but allowed in the Java Causality Test Cases [27].

$$\begin{array}{ll} T\_1 & T\_2 \\ \hline \mathbf{r}\_1 := \mathbf{z}; & \mathbf{r}\_3 := \mathbf{y}; \\ \mathbf{r}\_2 := \mathbf{x}; & \left\| \begin{array}{c} \mathbf{r}\_3 := \mathbf{y}; \\ \mathbf{z} := \mathbf{r}\_3; \\ \mathbf{x} := 1 \end{array} \right. \\ \end{array} \tag{TC7}$$

As noted by Jeffrey and Riely [15], the failure of this test "indicates a failure to validate the reordering of independent reads".

In the event structure of T<sup>1</sup> above, the justification relation is constructed according to Section 5. In particular, the rule for prepending reads (equation (4.1)) gives us {1, 2} <sup>T</sup><sup>1</sup> 4 and {1, 3} <sup>T</sup><sup>1</sup> 5 on the left-hand side, and {6, 7} <sup>T</sup><sup>1</sup> 9 and {6, 8} <sup>T</sup><sup>1</sup> 10 on the right. When composing the left and right sides, the coproduct rule (Section 4.2) makes four independent links, namely, {2} <sup>T</sup><sup>1</sup> 4, {3} <sup>T</sup><sup>1</sup> 5, {7} <sup>T</sup><sup>1</sup> 9, and {8} <sup>T</sup><sup>1</sup> 10. This is because, at the top level, for both branches, we can choose a write with the same label that is dependent on the same reads (plus the top ones on z). More precisely, on the left-hand side C<sup>1</sup> = {1, 2} is such that C<sup>1</sup> <sup>T</sup><sup>1</sup> 4, and on the right-hand side C<sup>2</sup> = {6, 7} is such that C<sup>2</sup> <sup>T</sup><sup>1</sup> 9. When the top events, 1 and 6 respectively, are removed, these contexts become isomorphic (C1[1] ∼= C2[6]). Hence, {2} <sup>T</sup><sup>1</sup> 4 and {7} <sup>T</sup><sup>1</sup> 9, and {3} <sup>T</sup><sup>1</sup> 5 and {8} <sup>T</sup><sup>1</sup> 10.

Now consider the event structure for the thread T2. Here we have two independent writes, namely <sup>T</sup><sup>2</sup> (15 : <sup>W</sup> <sup>x</sup> 1) and <sup>T</sup><sup>2</sup> (16 : <sup>W</sup> <sup>x</sup> 1), arising in the coproduct from justifications {11} <sup>T</sup><sup>2</sup> (15 : <sup>W</sup> <sup>x</sup> 1) and {12} <sup>T</sup><sup>2</sup> (16 : W x 1). Notice that by definition (3), we do not add the writes 13 and 14 to the justification sets of any W x 1, and because they write different values to z depending on the value of y, we have the dependencies {11} <sup>T</sup><sup>2</sup> 13 and {12} <sup>T</sup><sup>2</sup> 14.

When parallel composing, we connect the rf-edges that respect coherence. Thus we obtain the execution

{<sup>16</sup> rf −→ <sup>8</sup> dp −→ <sup>10</sup> rf −→ <sup>12</sup> dp −→ <sup>14</sup> rf −→ 6}, which is coherent, allowing the outcome with r1, r<sup>2</sup> and r<sup>3</sup> all 1 as desired.

#### **6.3 Adding writes to justifications**

In the definition of prepending writes (equation (3), condition (2)) we state that for any given justification, if there is an event in the justification set that is related via ≤<sup>X</sup> with the write we are prepending, then that write must be in the justification set as well.

To see why we made this choice consider the following program,

$$\begin{array}{l} \mathbf{x} := 1 \\ \mathbf{z}\_{1} := \mathbf{y} ; \\ \mathbf{if} \left( \mathbf{r}\_{1} = 0 \right) \{ \\ \mathbf{x} := 0 ; \; \mathbf{r}\_{2} := \mathbf{x} ; \; \mathbf{if} \left( \mathbf{r}\_{2} = 1 \right) \{ \mathbf{z} := 1 \} \\ \mathbf{j} \quad \text{else} \quad \{ \\ \mathbf{r}\_{3} := \mathbf{x} ; \; \mathbf{if} \left( \mathbf{r}\_{3} = 1 \right) \{ \mathbf{z} := 1 \} \\ \\ \mathbf{j} \end{array} \end{array} \qquad \left\| \begin{array}{l} \mathbf{r}\_{3} := \mathbf{z} ; \\ \mathbf{j} \left( \mathbf{z} := 1 \right) \{ \mathbf{y} := 1 \} \\ \mathbf{j} \left( \mathbf{z} := 1 \right) \{ \mathbf{z} := 1 \} \\ \\ \mathbf{j} \right\} \right\|$$

and its associated event structure,

We focus on the interpretation of the left-hand side thread. In the equation (3), because {7} 9 and 3 <sup>≤</sup><sup>X</sup> 7, the event (3 : <sup>W</sup> <sup>x</sup> 0) gets inserted in the justification set, leading to the justification {3, 7} 9. On the other branch, up until the coproduct of the read on <sup>y</sup>, we have {5} 8. At this point, the justifications {7} 9 and {5} 8 are not lifted because 9 requires 3 as well. Event 3 may not be removed because of the condition in the write prepending rule. Without this condition 3 would not be necessary to justify 9, yielding the lifting of the link {5} 8. This would also cause the execution {<sup>0</sup> rf −→ <sup>5</sup> dp −→ <sup>8</sup> rf −→ <sup>11</sup> dp −→ <sup>12</sup> rf −→ 2} to be coherent due to the lack of a dependency between 2 and 5.

This execution is not sequentially consistent, but under SC, the program is race free. Without writes in justifications, the model would violate the DRF-SC property described in Section 5.2.

#### **6.4 Java memory model, Hotspot.**

Finally, we discuss redundant read after read elimination, an optimisation performed by the Hotspot compiler but forbidden by the Java memory model. It is the first optimisation in the following sequence from Sevˇ ˇ c´ık and Aspinall [30, Figure 5], used to demonstrate that the Java memory model is too strict, and unsound with respect to the observable behaviour of Sun's Hotspot compiler.

$$\begin{array}{ccccc} & T\_3 & & T\_1 \\ \hline \mathbf{r\_2 := y}; & & \\ \hline \text{if } \{\mathbf{r\_2 = 1}\} & & \\ \{\mathbf{r\_3 := y}; \, \mathbf{x := r\_3\} \longrightarrow \mathbf{r\_3}; & \longrightarrow \mathbf{r\_2 := 1}; \\ \text{else} & & \\ \{\mathbf{x := 1}\} & & \\ \end{array} \longrightarrow \begin{array}{ccccc} \mathbf{x := 1}; \\ \hline \mathbf{x := 1}; \\ \mathbf{x := y := z}; \\ \hline \end{array}$$

Consider the event structures of the unoptimised T<sup>3</sup> and optimised T1.

The optimisation removes the apparently redundant pair of reads (4, 6), then reorders the now-independent write. This redundancy is represented in justification: when prepending the top read of y to the right-hand side of the event structure, the existing justification 6 7 is replaced by 3 7. When coproduct is applied, this matches with justification 1 2, leading to the independent writes 2 and 7. In a weak memory context however, a parallel thread could write a value to y between the two reads, thereby changing the value written to x. For this reason, we keep event 4 in the denotation and create the dependency edge <sup>4</sup> dp −→ 5.

Despite exhibiting the same behaviour here, the denotations of T<sup>3</sup> and T<sup>2</sup> do not match. We establish that the optimisation is sound in any context in the next section.

# **7 Refinement**

We have shown in Section 5.1 that our semantics enjoys a compositionality property: if we can prove that two programs have the same semantics (w.r.t set-theoretical equality) then they cannot be distinguished by any context. We also explained how equality is too strict, as it does not allow us to relate all programs that ought to be deemed semantically equivalent. Our Java Hotspot compiler example in Section 6 shows that the program T<sup>3</sup> is in practice optimised to <sup>T</sup><sup>2</sup> and then to <sup>T</sup>1. However, it is clearly not true that -<sup>T</sup><sup>1</sup>nρκ is a subset of <sup>T</sup><sup>2</sup>nρκ.


To show soundness we define *observational refinement* (Obs) which captures the intuitive notion of program equivalence: one program is a permissible optimisation of another if it does not increase the set of observable behaviours, defined here as changes to values of observed variables. The definition identifies related executions and compares the ordering of observable events, recognising that adding happens-before edges restricts behaviour. We then define a *refinement* relation and show this relation is a subset of observational refinement. This is formally stated in the following lemma:

**Lemma 2 (Soundness of Refinement (**⊆**Obs)).** *For all* <sup>P</sup><sup>1</sup> *and* <sup>P</sup>2*, if* -<sup>P</sup><sup>1</sup><sup>T</sup> n ρ <sup>∅</sup> -<sup>P</sup><sup>2</sup><sup>T</sup> n ρ <sup>∅</sup> *then* -<sup>P</sup><sup>1</sup><sup>T</sup> n ρ <sup>∅</sup> obs -<sup>P</sup><sup>2</sup><sup>T</sup> n ρ ∅

Note that the refinement relation is defined over a tweaked version of the semantics, -·<sup>T</sup> , a variant of -· in which the registers are explicit in the event structure.

Finally we show is compositional:

**Theorem 2 (Compositionality of Refinement ()).** *For all programs* P<sup>1</sup> *and* <sup>P</sup>2*, and indexes* <sup>n</sup>*, if for all* <sup>ρ</sup>*,* -<sup>P</sup><sup>1</sup><sup>T</sup> n ρ <sup>∅</sup> -<sup>P</sup><sup>2</sup><sup>T</sup> n ρ <sup>∅</sup> *then for all contexts* <sup>C</sup>*,* <sup>ρ</sup>*,* <sup>κ</sup> *and* <sup>κ</sup> *such that* <sup>κ</sup> <sup>κ</sup> *we have that* -<sup>C</sup>[P1]<sup>T</sup> nρκ -<sup>C</sup>[P2]<sup>T</sup> nρκ-

# **8 Showing implementability via** IMM

In this section we show that our calculation of relaxed dependencies can easily be reused to solve the thin-air problem in other state-of-the-art axiomatic models, drawing the advantages of these models over to ours. In particular, we augment the IMM and RC11 models of Podkopaev et al. [26]. We adopt their language, given below. It covers C++ atomics, fences, fetch-and-add and compare-andswap operations but excludes locks. Note that locks are implementable using compare and swap operations.

$$\begin{array}{lcl} M := n \mid \mathbf{r} & & &\\ B := M = M \mid B \land B \mid B \lor B \mid \neg B & & \\ T ::= \mathtt{skip} \mid \mathbf{r} : \mathtt{\*}^{\mathrm{op}} \mathbf{x} \mid \mathbf{x} : \mathtt{\*}^{\mathrm{ow}} M \mid T\_{1} \mathtt{i} : T\_{2} & & \\ & \mid \mathtt{if} (B) \{P\_{1}\} \{P\_{2}\} \mid \mathtt{while} (B) \{P\} & & \\ & \mid \mathtt{if} \mathtt{e} \mathtt{e}^{\mathrm{op}} \mid \mathtt{r} : \mathtt{F} \mathtt{A} \mathtt{D} \mathtt{D}^{\mathrm{op},\mathrm{ow}}\_{\mathrm{o} \mathtt{R} \mathrm{MW}} \mathtt{} \mathtt{} \mathtt{x}, M & \\ & \mid \mathtt{f} \mathtt{e} \mathtt{e}^{\mathrm{op}} \mid \mathtt{r} : \mathtt{F} \mathtt{A} \mathtt{D} \mathtt{D}^{\mathrm{op},\mathrm{ow}}\_{\mathrm{o} \mathtt{R} \mathrm{MW}} \mathtt{} \mathtt{x}, M & \\ & \mid \mathtt{C} \mathtt{A} \mathtt{S}\_{\mathtt{o} \mathtt{R} \mathrm{MW}}^{\mathrm{op},\mathrm{ow}} \mathtt{} \mathtt{x}, M, M \end{array}$$

First we provide a model, written (for a program <sup>P</sup>) as -<sup>P</sup>MRD+IMM, that combines our relaxed dependencies to the axiomatic model of IMM , here written as -<sup>P</sup>IMM. We will make these definitions precise shortly. We then show that -<sup>P</sup>MRD+IMM is weaker than -<sup>P</sup>IMM, making -<sup>P</sup>MRD+IMM implementable over hardware architectures like x86-TSO, ARMv7, ARMv8 and Power. Secondly, we relax the RC11 axiomatic model by using our relaxed dependencies model MRD to create a new model -<sup>P</sup>MRD-C11, and show this model weaker than the RC11 model. We argue that the mathematical description of -<sup>P</sup>MRD-C11 is lightweight and close to the C++ standard, it would therefore require minimal work to augment the standard with the ideas presented in this paper.

To prove implementability over hardware architectures we define a *pre-execution* semantics, where the relaxed dependency relation dp is calculated along with the data and control dependencies from IMM . To combine our model with IMM , we redefine the ar relation (we refer the reader to the IMM paper [26] for the details on ar) such that it is parametrised by an arbitrary relation which we put in place of the relations (data <sup>∪</sup> ctrl). ar(data <sup>∪</sup> ctrl) equals the original axiom ar and ar(dp) is the same axiom where dp is put in place of data <sup>∪</sup> ctrl.

We define the executions in -<sup>P</sup>MRD+IMM as the maximal conflict-free sets such that ar(dp) is acyclic, and executions in -<sup>P</sup>IMM as the maximal conflictfree sets such that ar(data <sup>∪</sup> ctrl) is acyclic.

#### **8.1 Implementability**

We can now state and prove that the MRD model is implementable over IMM, which gives us that MRD is implementable over x86-TSO, ARMv7, ARMv8, Power and RISC-V by combining our result with the implementability result of IMM .

**Theorem 3 (**MRD+IMM **is weaker than** IMM **).** *For all programs* P *by the* IMM *model,*

> -<sup>P</sup>MRD+IMM <sup>⊇</sup> -<sup>P</sup>IMM

# **9 Modular Relaxed Dependencies in RC11:** MRD-C11

We refer to the RC11 [18] model, as specified in Podkopaev et al. [26]. We call this model -<sup>P</sup>RC11. While -<sup>P</sup>RC11 forbids thin-air executions, it is not weak enough: it forbids common compiler optimisations by imposing that ( ∪ rf) is acyclic. We relax this condition by similarly replacing with our relaxed dependency relation dp, this time calculated on our preserved program order relation (≤). We call this model -<sup>P</sup>MRD-C11. Mathematically, this is done by imposing that (dp <sup>∪</sup> rf) is acyclic.

At this point, we prove the following lemma:

#### **Lemma 3 (Implementability of** MRD-C11**).** *For all programs* P*,*

$$[P]\_{\mathsf{MRD-C11}} \supseteq [P]\_{\mathsf{RC11}}$$

To show this it suffices to show that there always exists dp ⊆ . This is straightforward by induction on the structure of P, observing that the only place where dependencies go against is when hoisting a write in the coproduct case. However, in the same construction we always preserve the dependencies coming from the different branches of the structure which are, by inductive hypothesis, always agreeing with program order.

# **9.1** MRD-C11 **is DRF-SC**

We show that MRD-C11 validates the DRF-SC theorem of the C++ standard [13, §6.8.2.1 paragraph 20].

**Theorem 4 (**MRD-C11 **is DRF-SC).** *For a program whose atomic accesses are all SC-ordered, if there are no SC-consistent executions with a race over non-atomics, then the outcomes of* P *under* MRD-C11 *coincide with those under SC.*

*Sketch proof.* In the absence of races and relaxed atomics, the no-thin-air guarantee of RC11 is made redundant by the guarantee of happens-before acyclicity shared by RC11 and MRD-C11. The result follows from this observation, lemma 3 and Theorem 4 from Lahav et al. [18].

# **10 On the Promising Semantics and** weakestmo

In this section we present examples that differentiate the Promising Semantics and weakestmo from our MRD and MRD-C11 models.

First, we show that MRD correctly forbids the out-of-thin-air behaviour in the litmus test Coh-CYC from Chakraborty and Vafeiadis [11]. The test, given below, differentiates Promising and weakestmo: only the latter avoids the outcome r<sup>1</sup> = 3, r<sup>2</sup> = 2 and r<sup>3</sup> = 1.


MRD correctly forbids this outcome: it identifies a dependency on the lefthand thread from the read of 3 from x to the write y := 1, and on the right-hand thread from the read of 1 from y to the write x := 3. The desired outcome then has a cycle in dependency and reads-from, and it is forbidden.

Chakraborty and Vafeiadis ascribe the behaviour to "a violation of coherence or a circular dependency", and include specific machinery to weakestmo that checks for global coherence violations at each step of program execution. These global checks forbid the unwanted outcome.

The Promising Semantics, on the other hand, can make promises that are not sensitive to coherence order, and therefore allows the above outcome erroneously.

In Coh-CYC, enforcing coherence ordering at each step in weakestmo was enough to forbid the thin-air behaviour, but it is not adequate in all cases. The example below features an outcome that Promising and weakestmo allow, and that MRD-C11 and MRD forbid. It demonstrates that cycles in dependency can arise without violating coherence in weakestmo.

$$\mathbf{z} \coloneqq 1 \quad \parallel \quad \mathbf{y} \coloneqq \mathbf{x} \quad \parallel \quad \text{if } \{\mathbf{z} \coloneqq 0\} \{\mathbf{x} \coloneqq 1\} \{\mathbf{r}\_0 \coloneqq \mathbf{y} \; ; \; \mathbf{x} \coloneqq \mathbf{r}\_0 \; ; \; \mathbf{a} \coloneqq \mathbf{r}\_0\}$$

The program is an adaptation<sup>6</sup> of a Java test, where the the unwanted outcome represents a violation of type safety [20]. Observing the thin-air behaviour where a = 1 in the adaptation above is the analogue of the unwanted outcome in the original test. If in the end a = 1, then the second branch of the conditional in the rightmost thread must execute. It contains a read of 1 from y, and a dependent write of x := 1. On the middle thread there is a read of 1 from x, and a dependent write of y := 1. These dependencies form the archetypal thin-air shape in the execution where a = 1. MRD correctly identifies these dependencies and the outcome is prohibited due to its cycle in reads-from and dependency.

The a = 1 outcome is allowed in the Promising Semantics: a promise can be validated against the write of x := 1 in the true branch of the righthand thread, and later switched to a validation with x := r<sup>0</sup> from the false branch, ignoring the dependency on the read of y.

In the previous example, Coh-CYC, a stepwise global coherence check caused weakestmo to forbid the unwanted behaviour allowed by Promising, but that machinery does not apply here. weakestmo allows the unwanted outcome, and we conjecture that this deficiency stems from the structure of the model. Dependencies are not represented as a relation at the level of the global axiomatic constraint, so one cannot check that they are consistent with the dynamic execution of memory, as represented by the other relations. Adopting a coherence check in the stepwise generation of the event structure mitigates this concern for Coh-CYC, but not for the test above.

In contrast, MRD does represent dependencies as a relation, allowing us to check consistency with the rf relation here. The axiom that requires acyclicity of (dp <sup>∪</sup> rf) forbids the unwanted outcome, as desired.

# **11 Evaluating** MRD-C11 **with the** MRD-er **tool**

MRD-C11 is the first weak memory model to solve the thin-air problem for C++ atomics that has a tool for automatically evaluating litmus tests. Our tool, MRDer, evaluates litmus tests under the base model, RC11 augmented with MRD, and IMM augmented with MRD. It has been used to check the result of every litmus test in this paper, together with many tests from the literature, including the Java Causality Test cases [7,11,15,16,18,25,26,27].

When evaluating whether a particular execution is allowed for a given test, a model that solves the thin-air problem must take other executions of the program into account. For example, the semantics of Pichon-Pharabod et al., having explored one execution path, may ultimately backtrack [25]. Jeffrey and Riely phrase their semantics as a two player game where at each turn, the player explores all forward executions of the program [15]. At each operational step, the Promising Semantics [16] has to run forwards in a limited local way to validate

<sup>6</sup> James Riely, Alan Jeffrey and Radha Jagadeesan provided the precise example presented here [28]. It is based on Fig. 8 of Lochbihler [20], and its problematic execution under Promising was confirmed with the authors of Promising.

that promised writes will be reached. The invisible events of Chakraborty et al. [11] are used to similar effect.

In MRD-C11, it is the calculation of justification that draws in information from other executions. This mechanism is localised, it avoids making choices about the execution that prune behaviours, and it does not require backtracking. MRD-C11 acts in a "bottom-up" fashion, and modularity ensures that justifications drawn from the continuation need not be recalculated. These properties have supported the development of MRD-er: automation of the model requires only a single pass across the program text to construct the denotation.

# **12 Discussion**

Four recent papers have presented models that forbid thin-air values and permit previously challenging compiler optimisations. The key insight from these papers is that it is necessary to consider multiple program executions simultaneously. To do this, three of the four [15,25,11] use event structures, while the Promising Semantics [16] is a small-step operational semantics that explores future traces in order to take a step.

Although the Promising Semantics [16] is quite different from MRD, its mechanism for promising focuses on future writes, and MRD has parallels in its calculation of independent writes. Note also that both Promising's certification mechanism and MRD's lifting are thread-local.

The previous event-structure-based models are superficially similar to MRD, but all have a fundamentally different approach from ours: Pichon-Pharabod and Sewell [25] use event structures as the state of a rewriting system; Jeffrey and Riely [14,15] build whole-program event structures and then use a global mechanism to determine which executions are allowed; and Chakraborty et al. [11] transform an event structure using an operational semantics. In contrast, we follow a more traditional approach [33] where our event structures are used as the co-domain of a denotational semantics. Further, Jeffrey and Riely [14,15] and Pichon-Pharabod and Sewell [25] do not cover a significant subset set of C++ relaxed concurrency primitives.

MRD does not suffer from known problems with existing models. As noted by Kang et al. [16], the Pichon-Pharabod and Sewell model produces behaviour incompatible with the ARM architecture. The Jeffrey and Riely model forbids the reordering of independent reads, as demonstrated by Java Causality Test 7 (see Section 6.2). The Promising semantics allows the cyclic coherence ordering of the problematic Coh-CYC example [11]. weakestmo allows the thin-air outcome in the Java-inspired test of Section 10. In all four cases MRD provides the correct behaviour.

MRD is also highly compatible with the existing C++ standard text. The dp relation generated by MRD can be used directly in the axiomatic model to forbid thin-air behaviour. We are working on standards text with the ISO C++ committee based on this work, and have a current working paper with them [5].

The notion in C++ that data-race free programs should not exhibit observable weak behaviours goes back to Adve and Hill [1], and formed the basis of the original proposal for C++ [10]. This was formalised by Batty et al. [8] and adopted into the ISO standard. Despite the pervasiveness of DRF-SC theorems for weak memory models, these have remained whole-program theorems that do not support breaking a program into separate DRF and racy components. Our DRF theorem for our denotational model demonstrates a limited form of modularity that merits further exploration.

Other denotational approaches to relaxed concurrency have not tackled the thin-air problem. Dodds et al. [12] build a denotational model based on an axiomatic model similar to C++. It forms the basis of a sound refinement relation and is used to validate data-structures and optimisations. Their context language is too restrictive to support a compositional semantics, and their compromise to disallow thin-air executions forbids important optimisations. Kavanagh and Brookes [17] provide a denotational account of TSO concurrency, but their model is based on pomsets and suffers from the same limitation as axiomatic models [7]: it cannot be made to recognise false dependencies.

*Future Work.* We envisage a generalised theorem that would, on augmentation with MRD, extend an axiomatic DRF-SC proof to a proof that applies to the augmented model.

The ISO have struggled to define memory order::consume [13]. It is intended to provide ordering through dependencies that the compiler will not optimise away. The semantic dependency relation calculated by MRD identifies just these dependencies, and may support a better definition.

Finally, where we have used a global semantics to provide a full C++ model, it would be interesting to extend the denotational semantics to also cover all of C++, thereby allowing reasoning about C++ code in isolation from its context.

# **13 Conclusions**

We have used the relatively recent insight that to avoid thin-air problems, a semantics should consider some information about what might happen in other program executions. We codify that into a modular notation of justification, leading to a semantic notion of independent writes, and finally of dependency (dp). We demonstrate the effectiveness of these concepts in three ways. One, we define a denotational semantics for a weak memory model, show it supports DRF-SC, and build a compositional refinement relation strong enough to verify difficult optimisations. Two, we show how to use dp with other axiomatic models, supporting the first optimal implementability proof for a thin-air solution via IMM , and showing how to repair the ISO C++ model. Three, we build a tool for executing litmus tests allowing us to check a large number of examples.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# ARMv8-A system semantics: instruction fetch in relaxed architectures

Ben Simner1, Shaked Flur1∗, Christopher Pulte1∗, Alasdair Armstrong1, Jean Pichon-Pharabod1, Luc Maranget2, and Peter Sewell<sup>1</sup>

> <sup>1</sup> University of Cambridge, UK <sup>2</sup> INRIA Paris, France <sup>∗</sup> These authors contributed equally

Abstract. Computing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and "user-mode" concurrency, clarifying architectures and bringing them into the scope of programming-language semantics and verification. However, the system semantics, of instruction-fetch and cache maintenance, exceptions and interrupts, and address translation, remains obscure, leaving us without a solid foundation for verification of security-critical systems software.

In this paper we establish a robust model for one aspect of system semantics: instruction fetch and cache maintenance for ARMv8-A. Systems code relies on executing instructions that were written by data writes, e.g. in program loading, dynamic linking, JIT compilation, debugging, and OS configuration, but hardware implementations are often highly optimised, e.g. with instruction caches, linefill buffers, out-of-order fetching, branch prediction, and instruction prefetching, which can affect programmer-observable behaviour. It is essential, both for programming and verification, to abstract from such microarchitectural details as much as possible, but no more. We explore the key architecture design questions with a series of examples, discussed in detail with senior Arm staff; capture the architectural intent in operational and axiomatic semantic models, extending previous work on "user-mode" concurrency; make these models executable as test oracles for small examples; and experimentally validate them against hardware behaviour (finding a bug in one hardware device). We thereby bring these subtle issues into the mathematical domain, clarifying the architecture and enabling future work on system software verification.

# 1 Introduction

Computing relies on the *architectural abstraction*: the specification of an envelope of allowed hardware behaviour that hardware implementations should lie within, and that software should assume. These interfaces, defined by hardware vendors and relatively stable over time, notionally decouple hardware and software development; they are also, in principle, the foundation for software verification. In practice, however, industrial architectures have accumulated great complexity and subtlety: the ARMv8-A and Intel architecture reference manuals are now 7476 and 4922 pages [9,26], and hardware optimisations, including outof-order and speculative execution, result in surprising and poorly-understood programmer-observable behaviour. Architecture specifications have historically also been entirely informal, describing these complex envelopes of allowed behaviour solely in prose and pseudocode. This is problematic in many ways: do not serve as clear documentation, with the inevitable ambiguity and incompleteness of informal prose leaving major questions unanswered; without a specification that is executable as a test oracle (that can decide whether some observed behaviour is allowed or not), hardware validation relies on test suites that must be manually curated; without an architecturally-complete emulator (that can exhibit all allowed behaviour), it is very hard for software developers to "program to the specification" – they rely on test-and-debug development, and can only test above the hardware implementation(s) they have; and without a mathematically rigorous semantics, formal verification of hardware or software is impossible.

Over the last 10 years, much has been done to put architecture specifications on a more rigorous footing, so that a single specification can serve all those purposes. There are three main problems, two of which are now largely solved.

The first is the instruction-set architecture (ISA): the specification of the sequential behaviour of individual instructions. This is chiefly a problem of scale: modern industrial architectures such as Arm or x86 have large instruction sets, and each instruction involves many details, including its behaviour at different privilege levels, virtual-to-physical address translation, and so on – a single Arm instruction might involve hundreds of auxiliary functions. Recent work by Reid et al. within Arm [40,41,42] transitioned their internal ISA description into a mechanised form, used both for documentation and testing, and with him we automatically translated this into publicly available Sail definitions and thence into theorem-prover definitions [11,10]. Other related work is in §7.

The second is the relaxed-memory concurrent behaviour of "user-mode" operations: memory writes and reads, and the mechanisms that architectures provide to enforce ordering and atomicity (dependencies, memory barriers, loadlinked/store-conditional operations, etc.). In 2008, for ARMv7, IBM POWER, and x86, this was poorly understood, and the architects regarded even their own prose specifications as inscrutable. Now, following extensive work by many people [36,37,19,18,22,8,31,45,7,46,48,35,6,2,47,13,1], ARMv8-A has a well-defined and simplified model as part of its specification [9, B2.3], including a prose transcription of a mathematical model [15], and an equivalence proof between operational and axiomatic presentations [36,37]; RISC-V has adopted a similar model [52]; and IBM POWER and x86 have well-established de-facto-standard models. All of these are experimentally validated against hardware, and supported by tools for exhaustively running tests [17,4]. The combination of these models and the ISA semantics above is enough to let one reason about or modelcheck concurrent algorithms.

That leaves the third part of the problem: the "system" semantics, of instruction-fetch and cache maintenance, exceptions and interrupts, and address translation and TLB (translation lookaside buffer) maintenance. Just as for "user-mode" relaxed memory, these are all areas where microarchitectural optimisations can have surprising programmer-visible effects, especially in the concurrent context. The mechanisms are relied on by all code, but they are explicitly managed only by systems code, in just-in-time (JIT) compilers, dynamic loaders, operating-system (OS) kernels, and hypervisors. This is, of course, exactly the security-critical computing base, currently trusted but not trustworthy, that is especially in need of verification – which requires a precise and well-validated definition of the architectural abstraction. Previous work has scarcely touched on this: none of seL4 [27], CertiKOS [24,23], Komodo [16], or [25,12], address realistic architecture concurrency, and they use (at best) idealised models of the sequential systems architecture. The CakeML [51,28] and CompCert [29] verified compilers target only sequential user-mode ISA fragments.

In this paper we focus on one aspect of system semantics: instruction fetch and cache maintenance, for ARMv8-A. The ability to execute code that has previously been written to data memory is fundamental to computing: finegrained self-modifying code is now rare, and (rightly) deprecated, but program loading, dynamic linking, JIT compilation, debugging, and OS configuration all rely on executing code from data writes. However, because these are relatively infrequent operations, hardware designers have been able to optimise by partially separating the instruction and data paths, e.g. with distinct instruction caching, which by default may not be coherent with data accesses. This can introduce programmer-visible behaviour analogous to that of user-mode relaxed-memory concurrency, and require specific additional synchronisation to correctly pick up code modifications. Exactly what these are is not entirely clear in the current ARMv8-A architecture text, just as pre-2018 user-mode concurrency was not.

Our main contribution is to clarify this situation, developing precise abstractions that bring the instruction-fetch part of ARMv8-A system behaviour into the domain of rigorous semantics. Arm have stated [private communication] that they intend to incorporate a version of this into their architecture. We aim thereby to enable future work on system software verification using the techniques of programming languages research: program analysis, model-checking, program logics, etc. We begin (§2) by recalling the informal architectural guarantees that Arm provide, and the ways in which real-world software systems such as Linux, JavaScript, and WebAssembly change instruction memory. Then:

(1) We explore the fundamental phenomena and architecture design questions with a series of examples (§3). We explore the interactions between instruction fetching, cache maintenance and the 'usual' relaxed memory stores and loads, showing that instruction fetches are more relaxed, and how even fundamental coherence guarantees for data memory do not apply to instruction fetches. Most of these questions arose during the development of our models, in detailed ongoing discussion with the Arm Chief Architect and other Arm staff. They include questions of several different kinds. Six are clear from the Arm prose specification. Of the others: two are not implied by the prose but are natural choices; five involved substantive new choices by Arm that had not previously been considered and/or documented; for two, either choice could be reasonable, and Arm chose the simpler (and weaker) option; and for one, Arm were independently already strengthening the architecture to accommodate existing software.

(2) We give an operational semantics for Arm instruction fetch and icache maintenance (§4). This is in an abstract-microarchitectural style that supports an operational intuition for how hardware actually works, while abstracting from the mass of detail and the microarchitectural variation of actual hardware implementations. We do so by extending the Flat model [37] with simple abstractions of instruction caches and the coherent data cache network, in a way that captures the architectural intent, defining the entire envelope of behaviours that implementations should be allowed to exhibit.

(3) We give a more concise presentation of the model in an axiomatic style (§5), extending the "user-mode" axiomatic model from previous work [37,36,15,9], and intended to be functionally equivalent. We discuss how this too matches the architectural intent.

(4) We validate all this in two ways: by the extensive discussion with Arm staff mentioned above, and by experimental testing of hardware behaviour, on a selection of ARMv8-A cores designed by multiple vendors (§6). We run tests on hardware with a mild extension of the Litmus tool [5,7]. We make the operational model executable as a test oracle by integrating it into the RMEM tool and its web interface [17], introducing optimisations that make it possible to exhaustively execute the examples. We make the axiomatic model executable as a test oracle with a new tool that takes litmus tests and uses a Sail [11] definition of a fragment of the ARMv8-A ISA to generate SMT problems for the model. We then compare hardware and the two models for the handwritten tests (modulo two tests not supported by the axiomatic checker), compare hardware and the operational model on a suite of 1456 tests, automatically generated with an extension of the diy tool [3], and check the operational and axiomatic models against sets of previous non-ifetch tests. In all this data our models are equivalent to each other and consistent with hardware observations, except for one case where our testing uncovered a hardware bug on a Qualcomm device.

Finally, we discuss other related work (§7) and conclude (§8). We do all this for ARMv8-A, but other relaxed architectures, e.g. IBM POWER and RISC-V, face similar issues; our tests and tooling should enable corresponding work there.

The models are too large to include or explain in full here, so we focus on explaining the motivating examples, the main intuition and style of the operational model, in a prose rendering of its executable mathematics, and the definition of the axiomatic model. Appendices provide additional examples, a complete prose description of the operational model, and additional explanation of the axiomatic model. The complete executable mathematics version, the web-interface tool for running it, and our test results are at https: //www.cl.cam.ac.uk/~pes20/iflat/.

*Caveats and Limitations* Our executable models are integrated with a substantial fragment of the Sail ARMv8-A ISA (similar to that used for CakeML), but not yet with the full ISA model [11,40,41,42]; this is just a matter of additional engineering. We only handle the 64-bit AArch64 part of ARMv8-A, not AArch32. We do not handle the interaction between instruction fetch and mixed-size accesses, or other variants of the cache maintenance instructions, e.g. those used for interaction with DMA engines, and variants by set or way instead of by virtual address. Finally, the equivalence between our operational and axiomatic models is validated experimentally. A proof of this equivalence is essential in the long term, but would be a major work in itself: the complexity makes mechanisation essential, but the operational model (in all its scale and complexity) has not yet been subject to mechanised proof. Without instruction fetch, a non-mechanised proof was the main result of an entire PhD thesis [36], and we expect the addition of instruction fetch to require global changes to the argument.

# 2 Industry Practice and the Existing ARMv8-A Prose

Computer architecture relies on a host of sophisticated techniques, including buffering, caching, prediction, and pipelining, for performance. For the normal memory reads and writes of "user-mode" concurrency, the programmer-visible relaxed-memory effects largely arise from store buffering and from out-of-order and speculative pipeline behaviour, not from the cache hierarchy (though some IBM POWER phenomena do arise from the interconnect, and from late processing of cache invalidates). All major architectures provide a strong per-location guarantee of *coherence*: for each memory location, different threads cannot observe the writes to that location in different orders. This is implemented in hardware by coherent cache protocols, ensuring (roughly) that each cache line is writable by at most one hardware thread at a time, and by additional machinery restricting store buffer and pipeline behaviour. Then each architecture provides additional synchronisation mechanisms to let the programmer enforce ordering properties involving multiple locations.

At first sight, one might expect instruction fetches to act like other memory reads but, because writes to instruction memory are relatively rare, hardware designers have adopted different caching mechanisms. The Arm architecture carefully does not mandate exactly what these must be, to allow a wide range of possible hardware implementations, but, for example, a high-performance Arm processor might have per-core separate L1 instruction and data caches, above a unified per-core L2 cache and an L3 cache shared between cores. There may also be additional structures, e.g. per-core fetch queues, and caching of decoded micro-operations. This instruction caching is not necessarily coherent with data memory accesses: *"the architecture does not require the hardware to ensure coherency between instruction caches and memory"* [9, B2.4.4 (B2-114)]; instead, programmers must use explicit cache maintenance instructions. The documentation gives a particular sequence of these: *"If software requires coherency between instruction execution and memory, it must manage this coherency using Context* *synchronization events and cache maintenance instructions. The following code sequence can be used to allow a processing element (PE) to execute code that the same PE has written."*

```
; Coherency example for data and instruction accesses [...]
; Enter this code with <Wt> containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.
STR Wt, [Xn]; Store new instruction
DC CVAU, Xn ; Clean data cache by virtual address (VA) to PoU
DSB ISH ; Ensure visibility of the data cleaned from cache
IC IVAU, Xn ; Invalidate instruction cache by VA to PoU
DSB ISH ; Ensure completion of the invalidations
ISB ; Synchronize the fetched instruction stream
```
At first sight, this may be entirely mysterious. The remainder of the paper establishes precise semantics for each instruction, explaining why each is required, but as a rough intuition:


Some hardware implementations provide extra guarantees, rendering the DC or IC instructions unnecessary. Arm allow software to discover this in an architectural way, by reading the CTR\_EL0 register's DIC and IDC bits. Our modelling handles this, but for brevity we only discuss the weakest case, with CTR\_EL0.DIC=CTR\_EL0.IDC=0, that requires full cache maintenance.

Arm make clear that instructions can be prefetched (perhaps speculatively): *"How far ahead of the current point of execution instructions are fetched from is IMPLEMENTATION DEFINED. Such prefetching can be either a fixed or a dynamically varying number of instructions, and can follow any or all possible future execution paths. For all types of memory, the PE might have fetched the instructions from memory at any time since the last Context synchronization event on that PE."*

Concurrent modification and instruction fetch require the same sequence, with an ISB on each thread that executes the new instructions, and the rest of the sequence on the modifying thread [9, B2.2.5 (B2-94)]. Concurrent modification without synchronisation is restricted to particular instructions (B (branch), BL (branch-and-link), BRK (break), SMC, HVC, SVC (secure monitor, hypervisor, and supervisor calls), ISB, and NOP), otherwise there could be *constrained unpredictable behaviour* : *"any behavior that can be achieved by executing any sequence of instructions that can be executed from the same Exception level"*. Concurrent modification of conditional branches is allowed but can result in the old condition with the new target address or vice versa.

All this gives some guidance for programmers, but it leaves the exact semantics of instruction fetch and those cache maintenance instructions unclear, and in practice software typically does not use the above sequence verbatim. For example, it may synchronise a range of addresses at once, looping the DC and IC parts, or the final ISB may be subsumed by instruction synchronisation from exception entry or return. Linux has many places where it modifies code at runtime: in boot-time patching of *alternatives*, modifying kernel code to specialise it to the particular hardware being run on; when the kernel loads code (e.g. when the user calls dl\_open); and in the ptrace system call, used e.g. by the GDB debugger to patch arbitrary instructions with breakpoints at runtime. In Google's *Chrome* web browser, its WebAssembly and JavaScript just-in-time (JIT) compilers are required to both write new code during execution and modify existing code at runtime. In JavaScript, this modification happens inside a single thread and so is quite straightforward. The WebAssembly case is more complex, as one thread is modifying the code of another. A software thread can also be moved (by the OS or hypervisor) from one hardware thread to another, perhaps while it is in the middle of some instruction cache maintenance. Moreover, for security reasoning, we have to be able to bound the possible behaviour of arbitrary code.

All this means that we cannot treat the above sequence as a whole, as an opaque black box. Instead, we need a precise semantics for each individual instruction, but the existing prose documentation does not provide that.

The problem we face is to give such a semantics, that correctly defines behaviour in arbitrary concurrent contexts, that captures the Arm architectural intent, that is strong enough for software, and that abstracts from the variety of hardware implementations (e.g. with differing cache structures) that the architecture intends to allow – but which programmers should not have to think about.

# 3 Instruction Fetch Phenomena and Examples

We now describe the main instruction-fetch phenomena and architecture design questions for ARMv8-A, illustrated by handwritten litmus tests, to guide the following model design.

#### 3.1 Instruction-Fetch Atomicity

The first point, as mentioned in §2, is that concurrent modification and fetch is only permitted if the original and modified instructions are in a particular set: various branches, supervisor/hypervisor/secure-monitor calls, the ISB instruction synchronisation barrier, and NOP. Otherwise, the architecture permits *constrained unpredictable* behaviour, meaning that the resulting machine state could be anything that would be reachable by arbitrary instructions at the same exception level. The following W+F test illustrates this.


In this test Thread 0 performs a memory store (with the STR instruction) to the code that Thread 1 is executing; overwriting the ADD X0,X0,#1 instruction with the 32-bit encoding of the SUB X0,X0,#1 instruction. If the fetch were atomic, the outcome of this test would be the result of executing either the ADD or the SUB instruction, but, since at least one of those is not in the set of the 8 atomically-fetchable instructions given previously, Thread 1 has constrainedunpredictable behaviour and the final state is very loosely constrained. Note, however, that this is nonetheless much stronger than the C/C++ whole-program undefined behaviour in the presence of a data race: unlike C/C++, a hardware architecture has to define a useful envelope of behaviour for arbitrary code, to provide guarantees for the rest of the system when one user thread has a race.

Conditional Branches For conditional branches, the Arm architecture provides a specific non-single-copy-atomic fetch guarantee: the execution will be consistent with either the old or new target, and either the old or new condition.

For example, this W+F+branches test can overwrite a B.EQ g with a B.NE h, and end up executing B.NE g or B.EQ h instead of one of those. Our future examples will only modify NOPs and unconditional branch instructions.


#### 3.2 Coherence

Data writes and reads are coherent, in Arm and in other major architectures: in any execution, for each address, the reads of each hardware thread must see a subsequence of the total *coherence order* of all writes to that address. The plain-data CoRR test [46] illustrates one case of this: it is forbidden for a thread to read a new write of x and then the initial state for x. However, instruction fetches are not necessarily coherent: one instruction fetch may be inconsistent with a program-order-previous fetch, and the data and instruction streams can become out-of-sync with each other. We explore three kinds of coherence:


Instruction-to-Instruction Coherence Arm explicitly do not guarantee any consistency between fetches of the same location: fetching an instruction does not mean that a later fetch of that location will not see an older instruction [9, B2.4.4]. This is illustrated by CoFF, like CoRR but with fetches instead of reads.


Here Thread 1 makes two calls to address f (BL is branch-and-link), while Thread 0 overwrites the instruction at that address. The interesting potential execution is that in which the first call to f fetches and executes the newlywritten B l1, but the second call fetches and executes the original B l0. We can view such executions as graphs, similar to previous axiomatic-model candidate executions but with new fetch events, one per instruction, and new edges. As usual, we use po and rf edges for the program-order and reads-from relations, together with:


Edges from the initial state are drawn from a small circle. Since we do not modify the code of most locations, we usually omit the fetch events for those instructions, showing only a subgraph of the interesting events, e.g. as on the right above. For Arm, this execution is both architecturally allowed and experimentally observed.

Here, and in future tests, we assume some common code consisting of a function at address f which always has the same shape: a branch that might be overwritten, which selects a block that writes a value to register X10 before returning. This is sometimes duplicated at different addresses (f1, f2, ...) or extended to g, with three cases. We sometimes elide the common code.

Data-to-Instruction Coherence Fetching from a particular write does imply that program-order-later reads from the same address will see that write (or a coherence successor thereof). This is a *data-to-instruction* coherence property, illustrated by CoFR below. Here Thread 1 fetches the newly-written B l1 at f and then, when reading from f with its LDR load instruction, cannot read the original B l0 instruction (it can only read the new B l1).

This is not clear in the existing prose specification, but the architectural intent that emerged during discussion with Arm is that the given execution should be forbidden, reflecting microarchitectural choices that (1) instructions decode in order, so the fetch b must occur before the read d, and (2) fetches that miss in the instruction cache must read from data storage, so the instruction cache cannot be ahead of the available data. This ensures that fetching from a write means that all threads are now guaranteed to read from that write (or another coherence-after it).

Instruction-to-Data Coherence In the other direction, reading from a particular write to some location does *not* imply that later fetches of that location will see that write (or a coherence successor), as in the following CoRF+ctrl-isb.


Here Thread 1 has a control dependency and an instruction synchronisation barrier (the CBNZ conditional branch, dependent on the value read by its LDR load, and ISB), abbreviated to ctrl+isb, between its load and the fetch from f. If the latter were a data load, this would ensure the two loads are satisfied in order. This is not explicit in the existing prose, but it is what one would expect, and it is observed in practice. Microarchitecturally, it is easily explained by an out-ofdate entry for f in the instruction cache of Thread 1: if Thread 1 had previously fetched f (perhaps speculatively), and that instruction cache entry has not been evicted or explicitly invalidated since, then this fetch of f will simply read the old value from the instruction cache without going out to data memory. The ISB ensures that f is freshly fetched, but does not ensure that Thread 1's instruction cache is up-to-date with respect to data memory.

# 3.3 Instruction Synchronisation

Instruction fetches satisfy few guarantees, so explicit synchronisation must be performed when modifying the instruction stream.

Same-Thread Synchronisation Test SM below shows the simplest selfmodifying code case: without additional synchronisation, a write to program memory can be ignored by a program-order-later fetch.


In this execution, the fetch b, fetching the instruction at f, fetches a value from a write coherence-before a, even though b is the fetch of an instruction program-order after a. We illustrate this with an *instruction from-reads* (ifr) edge. This is a derived relation, analogous to the usual *from-reads* (fr) relation, that relates each fetch to all writes that are coherence-after the write it read from; it is defined as ifr = irf−1;co. If the fetch were a data read, this would be a forbidden coherence shape (COWR). As it is, it is architecturally allowed, as described explicitly by Arm [9, B2.4.4], and it is experimentally observed on all devices we have tested. Microarchitecturally, this too is simply due to fetches from old instruction cache entries.

Cache Maintenance As we saw in §2, the Arm architecture provides cache maintenance instructions to synchronise the instruction and data streams: the DC data-cache clean and IC instruction-cache invalidate instructions. To forbid the relaxed outcome of SM, by forcing a fetch of the modified code, the specified sequence of cache maintenance instructions must be inserted, with an ISB.


Now the outcome is forbidden. The cache synchronisation sequence DC CVAU; DSB ISH; IC IVAU; DSB ISH (which we abbreviate to a single cachesync edge) ensures that by the time the ISB executes, the instruction and data memory have been made coherent with each other for f. The ISB then ensures the final fetch of f is ordered after this sequence. The microarchitectural intuition for this was in §2; our §4 operational model will describe the semantics of each instruction.

Cross-Thread Synchronisation We now consider modifying code that can be fetched by other threads, using variants of the standard message-passing shape MP. That checks whether two writes (to different locations) on one thread can be seen out-of-order by two reads on another thread; here we replace one or both of those reads by fetches, and ask what synchronisation is required to ensure that the relaxed outcome is forbidden. Consider first an MP variant where the first write is of a new instruction, and the second is just a simple data memory flag:

This test includes sufficient synchronisation on each thread to enforce threadlocal ordering of data accesses: the DMB in Thread 0 ensures the writes a and b propagate to memory in program order, and the control-dependency into an ISB on Thread 1 ensures the read c and the fetch e happen in program order. However, as we saw in §2, this is not enough to synchronise concurrent modification and execution of code in ARMv8-A. Thread 0 needs the entire cache synchronization sequence (giving test MP.RF+cachesync+ctrl-isb, not shown), not just a DMB, to forbid this outcome.

Another variant of this MP-shape test where the message passing itself is done using modification of code gives a much stronger guarantee, as can be seen from the following MP.FR+dmb+fpo-fe test. This is not clear from the

architecture manual, but this outcome is already forbidden with only the DMB.

This is for similar reasons to the above CoFR test: since Thread 1 fetched the updated value for f, we know that value must have reached at least the data caches (since that is where the instruction cache reads from) and therefore multicopy atomicity guarantees that a normal load instruction will observe it.

The final variant of these MP-shaped tests has both Thread 0 writes be of new instructions. This idiom is very common in practice; it is currently how Chrome's WebAssembly JIT synchronises the modified thread with the new code.


Without the full cachesync sequence on Thread 0, this is an allowed outcome. Interestingly, adding the cachesync sequence to Thread 0 (Test MP.FF+cachesync+fpo, not shown) is sufficient to make the outcome forbidden, without an ISB in Thread 1, as the cachesync sequence is intended to make it appear that fetches occur in program order. Microarchitecturally, that could be ensured in two ways: either by actually fetching in-order, or by making the IC instruction not only invalidate all the instruction caches (for this address) but also clean any core's pre-fetch buffer stale entries (for this address). Architecturally, this is not clear in the current prose, but, concurrent with this work, Arm were independently strengthening their definition to make it so.

Incremental Synchronisation The cache synchronisation sequence need not be contiguous, or even all in the same thread. So long as the sequence in its entirety has been performed by the time the fetch happens, then the instruction stream will have been made consistent with the data stream for that address.

This is demonstrated by the following test, where Thread 0 performs a write to f and then only a DC before synchronizing with Thread 1, which performs the IC, while Thread 2 observes the modified code. This can happen in practice when a software thread is migrated between hardware threads at runtime, by a hypervisor or OS. Thread 0 and Thread 1 may just represent the runtime scheduling of a single process, beginning execution on hardware Thread 0 but migrated to hardware Thread 1 between the DC and IC instructions. In the graph, the dcsync and icsync represent the DC;DSB ISH and DSB ISH;IC;DSB ISH combinations. The DC does not need a preceding DSB ISH because it is ordered w.r.t. the preceding store to the same cache line.

Here the IC gets broadcast to all threads [9, B2.2.5p3], and so the fact that it happens on a different thread to the DC does not affect the outcome. Similarly, if the DC were to happen on another thread first (to get the test MP.RF+[dc] ic+ctrl-isb, not shown), then it would have the effect of ensuring consistency globally, for all threads.

#### 3.4 Multi-Copy Atomicity

For data accesses, the question of whether they are *multi-copy atomic* is a crucial one for relaxed architectures. IBM POWER, ARMv7, and pre-2018 ARMv8-A are/were non-multi-copy atomic: two writes to different addresses could become visible to distinct other threads in different orders. Post-2018 ARMv8-A and RISC-V are multi-copy atomic (or "other multi-copy-atomic" in Arm terminology) [37,36,9]: the programmer can assume there is a single shared memory, with all relaxed-memory effects due to thread-local out-of-order execution.

However, for fetches, due to the lack of any fetch atomicity guarantee for most instructions (§3.1), and the lack of coherent fetches for the others (§3.2), the question of multi-copy atomicity is not particularly interesting. Tests are either trivially forbidden (by data-to-instruction coherence) or are allowed but only the full cache synchronisation sequence provides enough guarantees to forbid it, and (§3.3) this ensures all cores will share the same consistent view of memory.

#### 3.5 Strength of the **IC** Instruction

Multiple Points of Unification Cleaning the data cache, using the DC instruction, makes a write visible to instruction memory. It does this by pushing the write past the Point of Unification. However, there may be multiple Points of Unification: one for each core, where its own instruction and data memory become unified, and one for the entire system (or shareability domain) where all the caches unify. Fetching from a write implies that it has reached the closest PoU, but does not imply it has reached any others, even if the write originated from a distant core. Consider: Here Thread 0 modifies f, Thread 1 fetches the new value and performs just an IC and DSB, before signalling Thread 0 which also fetches f. That IC is not strong enough to ensure that the write is pulled into the instruction cache of Thread 0.

This is not clear in the existing prose, but the architectural intent is that it be allowed (i.e., that IC is weak in this respect). We have not so far observed it in practice. The write may have passed the Point of Unification for Thread 1, but not the shared Point of Unification for both threads. In other words, the write might reach Thread 1's instruction cache without being pushed down from Thread 0's data cache. Microarchitecturally this can be explained by *direct data*

*intervention* (DDI), an optimisation allowing cache lines to be migrated directly from one thread's (data) cache to another. The line could be migrated from Thread 0 to Thread 1, then pushed past Thread 1's Point of Unification, making it visible to Thread 1's instruction memory without ever making it visible to Thread 0's own instruction memory. The lack of coherence between instruction and data caches would make this observable, even in multi-copy atomic machines.

Stale Fetches So far, we have only talked about fetching from two distinct writes. But theoretically there is no limit to how far back we can fetch from, with insufficient synchronization. The MP.RF+dmb+ctrl-isb test (§3.3) required the full cachesync sequence to forbid the given behaviour. Below we give a test, FOW, similar to that MP-shaped test but allowing many consumer threads to independently and simultaneously see different values in their instruction memory, even after invalidating their caches.

This is not clear in the existing architecture text. It is a case where the architecture design is not very constrained. On the one hand, it has not been observed, and it is thought unlikely that hardware will ever exhibit this behaviour: it would

require keeping multiple writes in the coherent part of the data caches, rather than a single dirty line, which would require more complex cache coherence protocols. On the other hand, there does not seem to be any benefit to software from forbidding it. Arm therefore prefer the choice that gives a simpler and weaker model (here the two happen to coincide), to make it easier to understand and to provide more flexibility for future microarchitectural optimisations. We therefore design our models to allow the above behaviour.

# 3.6 Strength of the **DC** Instruction

Instruction Cache depth Test CoFF (§3.2) showed that fetches can see "old" writes. In principle, there is no limit to the depth of the instruction-cache hierarchy: there could be many values for a single location cached in the instruction memory for each core, even if the data cache has been cleaned. The test below illustrates this, with Thread 1 able to see all three values for g.

This is similar to the preceding FOW case: it is thought unlikely that hardware will exhibit this in practice, but the desire for the simpler and weaker option means the architectural intent is to allow it, and we follow that in our models.

# 4 An Operational Semantics for Instruction Fetch

Previous work on operational models for IBM POWER and Arm "usermode" concurrency [46,45,22,18,19,37] has shown, surprisingly, that as far as programmer-visible behaviour is concerned, one can abstract from almost all hardware implementation details of data memory (store queues, the cache hierarchy, the cache protocol, etc.). For ARMv8-A, following their 2018 shift to a multicopy-atomic architecture, one can do so completely: the *Flat* model of [37] has a shared flat memory, with a per-thread out-of-order thread subsystem, modelling pipeline effects, responsible for all observable relaxed behaviour. For instruction-fetch, it is no longer possible to abstract completely from the data and instruction cache hierarchy, but we can still abstract from much of it.

The Flat Model is a small-step operational semantics for multi-copy atomic ARMv8-A, including the relaxed behaviours of loads and stores [37]. Its states are abstract machine states consisting of a tree of instructions for each thread, and a flat memory subsystem shared by all threads. Each instruction in each thread corresponds to a sequence of transitions, with some guards and a potential effect on the shared memory state. The Flat model is made executable in our RMEM tool, which can exhaustively interleave transitions to enumerate all the possible behaviours. The tree of instructions for each thread models out-of-order and speculative execution explicitly. Below we show an example for a thread that is

executing 10 instruction instances. Some (grey) are finished, no longer subject to restart; others (pink) have run some but perhaps not all of their instruction semantics; in-

structions are not necessarily atomic. Those with multiple children are branch instructions with multiple potential successors speculated simultaneously.

For each state, the model defines the set of allowed transitions, each of which steps to a new machine state. Transitions correspond to steps of single instructions, and individual instructions may give rise to many. Example transitions include Register Write, Propagate Write to Memory, etc.

iFlat Extension Originally, Flat had a fixed instruction memory, with a single transition that can speculate the address of any program-order successor of any instruction in flight, fetch it from the fixed instruction memory, and decode it. We now remove that fixed instruction memory, so that instructions can be fetched from data writes, and add the additional structures as shown on the right. These are all of unbounded size, as is appropriate for an architecture definition.

Fetch Queues (per-thread) These are ordered buffers of pre-fetched entries, waiting to be decoded and begin execution. Entries are either a fetched 32-bit opcode, or an unfetched request. The fetch queues allow the model to speculate and pre-fetch many instructions ahead of where the thread is currently executing. The model's fetch queues abstract from multiple real-hardware structures: instruction queues, line-fill buffers, loop buffers, and slots objects. We keep a close relation to this underlying microarchitecture by allowing out-of-order fetches, but we believe this is not experimentally observable on real hardware.

Abstract Instruction Cches (per-thread) These are just sets of writes. When the fetch queue requests a new entry, it gets satisfied from the instruction cache, either immediately (a *hit*) or at some later point in time (a *miss*). The instruction cache can contain many possible writes for each location (§3.6), and it can be spontaneously updated with new writes in the system at any time ([9, B2.4.4]). To manage IC instructions, each thread keeps a list of addresses yet to be invalidated by in-flight ICs.

Data Cache (global) Above the single shared flat memory for the entire system, which sufficed for the multi-copy-atomic ARMv8-A data memory, we insert a shared buffer which is just a list of writes; abstracting from the many possible coherent data cache hierarchies. Data reads must be coherent, reading from the most recent write to the same address in the buffer, but instruction fetches are allowed to read from any such write in the buffer (§3.2).

Transitions To accommodate instruction fetch and cache maintenance, we introduce new transitions: Fetch Request, Fetch Instruction, Fetch Instruction (Unpredictable), Fetch Instruction (B.cond), Decode Instruction, Begin IC, Propagate IC to Thread, Complete IC, Perform DC, and Update Instruction Cache. We also have to modify some Flat transitions: Commit ISB, Wait for DSB, Commit DSB, Propagate Memory Write, and Satisfy Read from Memory. These transitions define the lifecycle of each instruction: a request gets issued for the fetch, then at some later point the fetch gets satisfied from the instruction cache, the instruction is then decoded (in program-order) and then handed to the existing semantics to be executed. To give a flavour, we show just one, the *Propagate IC to Thread* transition, which is responsible for invalidation of the abstract instruction caches. This is a prose rendering of the rule in our executable mathematical model, which is expressed in the typed functional subset of Lem [32].

Propagate IC to Thread An instruction *i* (with ID *iiid*) in state Wait\_IC*(address, state\_cont)* can do the relevant invalidate for any thread *tid'*, modifying that thread's instruction cache and fetch queue, if there exists a pending entry *(iiid, address)* in that thread's *ic\_writes*. Action:


This rule can be found under the same name in the full prose description, and in the handle\_ic\_ivau and flat\_propagate\_cache\_maintenance functions in machineDefThreadSubsystem.lem and machineDefFlatStorageSubsystem.lem in the executable mathematics. Cache maintenance operations work over entire cache lines, not individual addresses. Each address is associated with at least one cache line for the data (and unified) caches, and one for the instruction caches. The cache line of minimum size is the (architected) smallest possible cache line for each of these.

Example This model correctly explains all the behaviours of §3. We illustrate this by revisiting the cache synchronization explanation of §2, which can now be re-interpreted w.r.t. our precise model, and using this to explain the thread migration case of §3.3. Given DC Xn; DSB; IC Xn; DSB we can use this model to give meaning to it (omitting uninteresting transitions): First the DC CVAU causes a Perform DC transition. This pushes any write that might have been in the abstract data cache into memory. Now the first DSB's Commit DSB can be taken, allowing Begin IC to happen. This creates entries for each thread, which are discharged by each Propagate IC to Thread (see above). Once all entries are invalidated, a Complete IC can happen. Now, if any thread decodes an instruction for that address, it must have been fetched from the write the DC pushed, or something coherence-after it. If the software thread performing this sequence is interrupted and migrated (by the OS) to a different hardware thread, then, so long as the OS includes the DSB to maintain the thread-local DC ordering, the DC will push the write in an identical way, since it only affects the global abstract data cache. The IC transitions can all be taken, and the sequence continues as before, just on a new hardware thread. So when the second DSB finishes, and the final Commit DSB transitions is taken, the effect of the full sequence will be seen system-wide even if the thread was migrated.

# 5 An Axiomatic Semantics for Instruction Fetch

Based on the operational model, we develop an axiomatic semantics, as an extension of the ARMv8 axiomatic reference model [15,37]. Since that does not have mixed-size support, we do not model the concurrent modification of conditional branches (§3.1), as this would require mixed-size machinery. The existing axiomatic model is a predicate on *candidate executions*, hypothetical complete executions of the given program that satisfy some basic well-formedness conditions, defining the set of *valid* executions to be those satisfying its axioms. Each candidate execution abstractly captures a particular concrete execution of the program in terms of events and relations over them. This model is expressed in the herd language [8,6,4]. The events of these executions are memory reads (the set R), memory writes (W), and memory barrier/fence events (F). The relations are: *program order* (po), capturing the sequencing of events by the same thread in the execution's control-flow unfolding; *reads-from* (rf), relating a write event w with any read event r that reads from it; the *coherence order* (co), recording the execution's sequencing of same-address writes in memory; and *read-modify-write* (rmw), capturing which load/store exclusive instructions form a successful exclusive *pair* in the execution. The derived relation *from-reads* fr = rf−1;co relates a read r with a write w if r reads from a write w coherence before w . In addition, candidate executions also have relations capturing dependencies between events: address (addr), data (data), and control dependencies (ctrl). The relation loc relates any two read/write events that are to the same memory address. The model also has relations suffixed "i" and "e": rfi/rfe, coi/coe, fri/fre. These are the restrictions of the relations rf, co, and fr, to same-thread/"internal" event pairs or different-thread/"external" event pairs. The model is defined in relational algebra. In herd, R;S stands for sequential composition of relations R and S, R−<sup>1</sup> for the inverse of relation R, R|S and R&S for the union and intersection of R and S, and [A];R;[B] for the restriction of R to the domain A and range B.

Handling instruction fetch requires extending the notion of candidate execution. We add new events: an *instruction-fetch* (IF) event for each executed instruction; a DC event for each DC CVAU instruction; an IC event for each IC IVAU and IC IALLU instruction. We replace po with *fetch-program-order* (fpo) which orders the IF event of an instruction before any program-order later IF events. We add a relation *same-cache-line* (scl), relating reads, writes, fetches, DC and IC events to addresses in the same cache line. We add an acyclic transitively closed relation wco, which extends co with orderings for cache maintenance (DC or IC) events: it includes an ordering (e, e ) or (e , e) for any cache maintenance event e and same-cache-line event e if e is a write or another cache maintenance event; where co = ([W];wco;[W]) & loc. The loc, addr, and ctrl are all extended to include DC and IC events. We add a *fetch-to-execute* relation (fe), relating an IF event to any event generated by the execution of that instruction; and an *instruction-read-from* relation (irf), which relates a write to any IF event that fetches from it. Finally, we add a boolean *constrained-unpredictable* (CU) to detect badly behaved programs. Now we derive the following relations: the standard po relation, as po = fe−1;fpo;fe (two events e and e are po-related if their fetch-events are fpo-related); and *instruction-from-reads* (ifr), the analogue of fr for instruction fetches, relating a fetch to all writes coherence-after the one it fetched from: ifr = irf−1;co.

We then make two semantics-preserving rewrites of the existing model to make adding instruction fetches easier (described in the appendix); and make the following changes and additions to the model. The full model is shown in Figure 1, with comments pointing to the relevant locations in the model definition. For lack of space we only describe the main addition, the iseq relation, in detail (including its correspondence with the operational model of §4); for the others we give an overview and refer to the appendix for the full description.

We define the relation iseq, relating some write w to address x to an IC event completing a cache synchronisation sequence (not necessarily on a single thread): w is followed by a same-cache line DC event, which is in turn followed by a same-cache line IC event. In operational model terms, this captures traces that propagated w to memory, subsequently performed a same-cache-line DC, and then began an IC (and eagerly propagated the IC to all threads). In any state after this sequence it is guaranteed that w, or a coherence-newer sameaddress write, is in the instruction cache of all threads: performing the DC has cleared the abstract data cache of writes to x, and the subsequent IC has removed old instructions for location x from the instruction caches, so that any subsequent updates to the instruction caches have been with w, or co-newer writes. Adding ifr;iseq to the *observed-by* relation (obs) (4) relates an instruction fetch *i* to location x to an IC *ic* if: i fetched from a write w to x, some write w to x is coherence-after w, and *ic* completes a cache synchronisation sequence (iseq) starting from w . Then the irreflexive ob axiom requires that i must be ordered-before *ic* (because it would otherwise have fetched w ).We now

```
let iseq = [W];(wco&scl);[DC]; (*1*)
              (wco&scl);[IC]
(* Observed-by *)
let obs = rfe | fr | wco (*2*)
       | irf | (ifr;iseq) (*3, 4*)
(* Fetch-ordered-before *)
let fob = [IF]; fpo; [IF] (*5*)
 | [IF]; fe (*6*)
 | [ISB]; fe−1; fpo (*7*)
(* Dependency-ordered-before *)
let dob = addr | data
 | ctrl; [W]
 | (ctrl | (addr; po)); [ISB]
(*| [ISB]; po; [R] *) (*8*)
 | addr; po; [W]
 | (addr | data); rfi
(* Atomic-ordered-before *)
let aob = rmw
 | [range(rmw)]; rfi; [A|Q]
(* Barrier-ordered-before *)
let bob = [R|W]; po; [dmb.sy]
 | [dmb.sy]; po; [R|W]
 | [L]; po; [A]
 | [R]; po; [dmb.ld]
                                       | [dmb.ld]; po; [R|W]
                                       | [A|Q]; po; [R|W]
                                       | [W]; po; [dmb.st]
                                       | [dmb.st]; po; [W]
                                       | [R|W]; po; [L]
                                       | [R|W|F|DC|IC]; po; [dsb.ish] (*9*)
                                       | [dsb.ish]; po; [R|W|F|DC|IC] (*10*)
                                       | [dmb.sy]; po; [DC] (*11*)
                                     (* Cache-op-ordered-before *)
                                     let cob = [R|W]; (po&scl); [DC] (*12*)
                                       | [DC]; (po&scl); [DC] (*13*)
                                     (* Ordered-before *)
                                     let ob = (obs|fob|dob|aob|bob|cob)+
                                     (* Internal visibility requirement *)
                                     acyclic (po-loc|fr|co|rf) as internal
                                     (* External visibility requirement *)
                                     irreflexive ob as external
                                     (* Atomic *)
                                     empty rmw & (fre; coe) as atomic
                                     (* Constrained unpredictable *)
                                     let cff = ([W];loc;[IF]) \ (*14*)
                                                 ob−1 \ (co;iseq;ob)
                                     cff_bad cff ≡ CU (*15*)
```
Fig. 1. Axiomatic model

briefly overview other changes made to the axiomatic model and their intuition. We include irf in obs (3): for an instruction to be fetched from a write, the write has to have been done before. We add a relation *fetch-ordered-before* (fob) (5-7), which is included in *ordered-before*. The relation fob includes fpo and fe; including fpo (5) requires fetches to be ordered according to their position in the control-flow unfolding of the execution. and including the fe (*fetch-to-execute*) relation (6) captures the idea that an instruction must be fetched before it can execute; fetches program-order-after an ISB happen after the ISB (or else are restarted) (7). For DSB ISH instructions the edge [R|W|F|DC|IC];po;[dsb.ish] is included in ob (9): DSB ISHs are ordered with all program-order-preceding non-fetch events. Symmetrically, all non-IF events are ordered after programorder-preceding dsb.ish events (10). DCs wait for preceding dmb.sy events (11). We include the relation *cache-op-ordered-before* (cob) in ob. This relation orders DC instructions with program-order previous reads/writes and other DCs to the same cache line (12,13).

Finally, *could-fetch-from* (cff) (14) captures, for each fetch i, the writes it could have fetched from (including the one it did fetch from), which we use to define the *constrained unpredictable* axiom cff\_bad (not given) (15).

# 6 Validation

To gain confidence in the presented models we validated the models against the Arm architectural intent, against each other, and against real hardware.

Validation against the Architecture To ensure our models correctly captured the architectural intent we engaged in detailed discussions with Arm, including the Arm chief architect. These involved inventing litmus tests (including, those described in §3 and many others) and discussing what the architecture should allow in each case.

Validating against hardware To run instruction-fetch tests on hardware, we extended the litmus tool [7]. The most significant extension consists in handling code that can be modified, and thus has to be restored between experiments. To that end, code copies are executed, those copies reside in mmap'd memory with (execute permission granted. Copies are made from "master" copies, in effect C functions whose contents basically consist of gcc extended inline assembly. Of course, such code has to be position independent, and explicit code addresses in test initialisation sections (such as in 0:X1=l in the test of §3.1) are specific to each copy. All the cache handling instructions used in our experiments are all allowed to execute at exception level 0 (user-mode), and therefore no additional privilege is needed to run the tests.

To automatically generate families of interesting instruction-fetch tests, we extended the diy test generation tool [3] to support instruction-fetch readsfrom (irf) and instruction-fetch from-reads (ifr) edges, in both internal (samethread) and external (inter-thread) forms, and the cachesync edge. We used this to generate 1456 tests involving those edges together with po, rf, fr, addr, ctrl, ctrlisb, and dmb.sy. diy does not currently support bare DC or IC instructions, locations which are both fetched and read from, or repeated fetches from the same location.

We then ran the diy-generated test suite on a range of hardware implementations, to collect a substantial sample of actual hardware behaviour.

Correspondence between the models We experimentally test the equivalence of the operational and axiomatic models on the above hand-written and diy-generated tests, checking that the models give the same sets of allowed final states, and that these are consistent with the hardware observations.

Making the models executable as a test oracle To make the operational model executable as a test oracle, capable of computing the set of all allowed executions of a litmus test, we must be able to *exhaustively enumerate* all possible traces. For the model as presented, doing this naively is infeasible: for each instruction it is theoretically possible to speculate any of the 2<sup>64</sup> addresses as potential next address, and the interleaving of the new fetch transitions with others leads to an additional combinatorial explosion.

We address these with two new optimisations. First, we extend the fixed-point optimisation in RMEM (incrementally computing the set of possible branch targets) [37] to keep track not only of indirect branches but also the successors of every program location, and only allow speculating from this set of successors. Additionally, we track during a test which locations were both fetched and modified during the test, and eagerly take fetch and decode transitions for all other locations. As before, the search then runs until the set of branch targets *and* the set of modified program-locations reaches a fixed point. We also take some of the transitions eagerly to reduce the search space, in cases where this cannot remove behaviour: Wait for IC, Complete IC, Fetch Request, and Update Instruction Cache.

Making the axiomatic model executable as a test oracle The axiomatic model is expressed in a herd-like form, but the herd tool does not support instruction fetch and cache maintenance instructions. To make the model executable as a test oracle, we built a new tool that takes litmus tests and uses a Sail [11] definition of a fragment of the ARMv8-A ISA to generate SMT problems for the model. Using the Sail instruction semantics, we generate a Sail program that corresponds to each thread within a litmus test. The tool then partially evaluates these programs using the concrete values for addresses and registers specified in the litmus file, while allowing memory values and arbitrary addresses to remain symbolic. Using a Sail to SMT-LIB backend, these are translated into SMT definitions that include all possible behaviours of each thread as satisfiable solutions. The rules for the axiomatic model are then applied as assertions restricting the possible behaviours to just those allowed by the axiomatic model. The tool also derives the addr and data relations, using the syntactic dependencies within the instruction semantics to derive the syntactic dependencies between instructions.

For litmus tests, where we can know up-front which instructions may be modified, we would like to avoid generating IF events for instructions that cannot be modified. If we naively removed certain IF events, however, we would break the correspondence between po and fe−1;fpo;fe. This can be worked around by ensuring that every modifiable instruction generates an event which appears in po, allowing fpo between the modifiable instructions to instead be derived as fe;po;fe−<sup>1</sup>. Branches emit a special branch address announce event for this purpose, which is also used to derive the ctrl relation. The fpo relation can then be modified, replacing [ISB];fe−1;fpo with [ISB];po;fe−<sup>1</sup> and adding [ISB];po. The second change ensures that all the transitive edges generated by [ISB];fe−1;fpo followed by [IF];fe remain with fob and hence ob.

A limitation of this approach is it cannot support cases where two threads both attempt to execute the same possibly-modified instruction, as in the SM.F+ic and FOW tests.

Validation results First, to check for regressions, we ran the operational model on all the 8950 non-mixed-size tests used for developing the original Flat model (without instruction fetch or cache maintenance). The results are identical, except for 23 tests which did not terminate within two hours. We used a 160 hardware-thread POWER9 server to run the tests.

We have also run the axiomatic model on the 90 basic two-thread tests that do not use Arm release/acquire instructions (not supported by the ISA semantics used for this); the results are all as they should be. This takes around 30 minutes on 8 cores of a Xeon Gold 6140.

Then, for the key handwritten tests mentioned in this paper, together with some others (that have also been discussed with Arm), we ran them on various hardware implementations and in the operational and axiomatic models. The models' results are identical to the Arm architectural intent in all cases, except for two tests which are not currently supported by the axiomatic checker.


[The hardware observations are the sum of testing seven devices: a Snapdragon 810 (4x Arm A53 + 4x Arm A57 cores), Tegra K1 (2x NVIDIA Denver cores), Snapdragon 820 (4x Qualcomm Kryo cores), Exynos 8895 (4x Arm A53 + 4x Samsung Mongoose 2 cores), Snapdragon 425 (4x Arm A53), Amlogic 905 (4x Arm A53 cores), and Amlogic 922X (4x Arm A73 + 2x Arm A53 cores). U: allowed but unobserved. F: forbidden but observed.]

Our testing revealed a hardware bug in a Snapdragon 820 (4 Qualcomm Kryo cores). A version of the first cross-thread synchronisation test of §3.3 but with the full cache synchronisation (MP.RF+cachesync+ctrl-isb) exhibited an illegal outcome in 84/1.1G runs (not shown in the table), which we have reported. We have also seen an anomaly for MP.FF+cachesync+fpo, currently under investigation by Arm. Apart from these, the hardware observations are all allowed by the models. As usual, specific hardware implementations are sometimes stronger.

Finally, we ran the 1456 new instruction-fetch diy tests on a variety of hardware, for around 10M iterations each, and in the operational model. The model is sound with respect to the observed hardware behaviour except for that same Snapdragon 820 device.

# 7 Related Work

To the best of our knowledge, no previous work establishes well-validated rigorous semantics for any systems aspects, of any current production architecture, in a realistic concurrent setting.

The closest is Raad et al.'s work on non-volatile memory, which models the required cache maintenance for persistent storage in ARMv8-A [39], as an extension to the ARMv8-A axiomatic model, and for Intel x86 [38] as an operational model, but neither are validated against hardware. In the sequential case, Myreen's JIT compiler verification [33] models x86 icache behaviour with an abstract cache that can be arbitrarily updated, cleared on a jmp. For address translation, the authoritative Arm-internal ASL model [40,41,42], and Sail model derived from it [11] cover this, and other features sufficient to boot an OS (Linux), as do the handwritten Sail models for RISC-V (Linux and FreeBSD) and MIPS/CHERI-MIPS (FreeBSD, CheriBSD), but without any cache effects. Goel et al. [21,20] describe an ACL2 model for much of x86 that covers address translation; and the Forvis [34] and RISCV-PLV [14] Haskell RISC-V ISA models are also complete enough to boot Linux. Syeda and Klein [49,50] provide an somewhat idealised model for ARMv7 address translation and TLB maintenance. Komodo [16] uses a handwritten model for a small part of ARMv7, as do Guanciale et al. [25,12]. Romanescu et al. [44,43] do discuss address translation in the concurrent setting, but with respect to idealised models. Lustig et al. [30] describe a concurrent model for address translation based on the Intel Sandy Bridge microarchitecture, combined with a synopsis of some of the relevant Linux code, but not an architectural semantics for machine-code programs.

# 8 Conclusion

The mainstream architectures are the most important programming languages used in practice, and their systems aspects are fundamental to the security (or lack thereof) of our computing infrastructure. We have established a robust semantics for one of those systems aspects, soundly abstracting the hardware complexities to a manageable model that captures the architectural intent. This enables future work on reasoning, model-checking, and verification for real systems code.

Acknowledgements This work would not have been possible without generous technical assistance from Arm. We thank Richard Grisenthwaite, Will Deacon, Ian Caulfield, and Dave Martin for this. We also thank Hans Boehm, Stephen Kell, Jaroslav Ševčík, Ben Titzer, and Andrew Turner, for discussions of how instruction cache maintenance is used in practice, and Alastair Reid for comments on a draft. This work was partially supported by EPSRC grant EP/K008528/1 (REMS), ERC Advanced Grant 789108 (ELVER), an ARM iCASE award, and ARM donation funding. This work is part of the CIFV project sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8650-18-C-7809. The views, opinions, and/or findings contained in this paper are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the Department of Defense or the U.S. Government.

# References


Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017. pp. 287–305 (2017). https://doi.org/10.1145/3132747.3132782


Andrew Waterman, Robert Watson, Derek Williams, Andrew Wright, Reinoud Zandijk, and Sizhuo Zhang

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Higher-Ranked Annotation Polymorphic Dependency Analysis**

Fabian Thorand and Jurriaan Hage

Dept. of Information and Computing Sciences, Utrecht University The Netherlands f.thorand@gmail.com, j.hage@uu.nl

**Abstract.** The precision of a static analysis can be improved by increasing the context-sensitivity of the analysis. In a type-based formulation of static analysis for functional languages this can be achieved by, e.g., introducing let-polyvariance or subtyping. In this paper we go one step further by defining a higher-ranked polyvariant type system so that even properties of lambda-bound identifiers can be generalized over. We do this for dependency analysis, a generic analysis that can be instantiated to a range of different analyses that in this way all can profit.

We prove that our analysis is sound with respect to a call-by-name semantics and that it satisfies a so-called noninterference property. We provide a type reconstruction algorithm that we have proven to be terminating, and sound and complete with respect to its declarative specification. Our principled description can serve as a blueprint for making other analyses higher-ranked.

# **1 Introduction**

The typical compiler for a statically typed functional language will perform a number of analyses for validation, optimisation, or both (e.g., strictness analysis, control-flow analysis, and binding time analysis). These analyses can be specified as a type-based static analysis so that vocabulary, implementation and concepts from the world of type systems can be reused in this setting [19,24]. In that setting the analysis properties are taken from a language of annotations which adorn the types computed for the program during type inference: the analysis is specified as an annotated type system, and the payload of the analysis corresponds to the annotations computed for a given program.

Consider for example binding-time analysis [5,7]. In this case, we have a twovalue lattice of annotations containing S for static and D for dynamic (where <sup>⊥</sup> <sup>=</sup> <sup>S</sup> - D = , so that whenever an expression is annotated with S, it can be soundly changed to D, because that is a strictly weaker property). An expression that is known to be static may be evaluated at compile time, because the analysis has determined that all the values that determine its outcome are in fact available at compile-time while all other expressions are annotated with D, and must be evaluated at run-time; the goal of binding-time analysis is then to (soundly) assign S to as many expressions as possible.

Static analyses may differ in precision, e.g., a monovariant binding-time analysis lacks context-sensitivity for let-bound identifiers (although some of it can be recovered with subtyping). Assuming *id* to be the identity function, if in the program

**let** *id x* = *x* **in** . . *id s* . . *id d* . .

the subexpression *s* is a statically known integer, which we denote as *s* : int*S*, and *d* : int*D* a dynamic integer, then for *id* we arrive at int*D* → int*D*, so that the property found for *id s* is that it is a dynamic integer. Clearly, however, if the value of *s* is known statically then also that of *id s* is! The fact that values with different properties flow to a function and we have to be (overly) pessimistic for some of these is a phenomenon sometimes called *poisoning* [28]. Context-sensitivity reduces poisoning; it can be achieved by making the analysis *polyvariant*. In that case, our type for *id* may become ∀β.intβ → intβ, so that for the first call to *id* we may instantiate β with S and for the second choose D, essentially mimicking the polymorphic lambda-calculus at the level of annotations.

But what about a function like

*foo* = λ*f* .(*f d*, *f s*)

in which we have two calls to a lambda-bound function argument *f* ? Can we treat these context-sensitively as well, so that we can have the most precise types for both calls, independent of each other? The answer is: yes, we can.

Independence can be achieved by inferring for *foo* a type that associates with *f* an annotation polymorphic type,

$$
\forall \beta\_1. (\forall \beta\_0. \text{int}\langle \beta\_0 \rangle \to \text{int}\langle \beta\_1. \beta\_0 \rangle)
$$

Here, β<sup>0</sup> ranges over simple annotations (such as S and D), and β<sup>1</sup> ranges over annotation level functions (in the terminology of this paper, these annotations are higher-sorted; see section 3). The annotation variable β<sup>0</sup> is a placeholder for the analysis property of the actual argument to *f* , while β<sup>1</sup> represents how that property propagates to the value returned by *f* . If the identity function ∀β.intβ → intβ is passed to *foo*, a pair with annotated type int*D* × int*S* will be returned. This is because the types of *f d* and *f s* can be determined independently of each other, because the choice for β<sup>0</sup> can be made separately for each call. The "price" we pay is that we have to know how the annotations on the values returned by *f* can be derived from the annotations on the arguments. This is exactly what β<sup>1</sup> represents.

If β<sup>0</sup> or β<sup>1</sup> would range over (annotated) types, then the underlying language itself would be higher-ranked, and inference in that case is known to be undecidable [14]. However, as we show in this paper, if they range only over annotations (even higher-sorted ones), then inference may become decidable again. Why is that? Intuitively, this is because the underlying types provide structure to the analysis inference algorithm, while a higher-ranked polymorphic type system does not have this advantage.

In which situations can we expect to benefit from higher-ranked polyvariance? Generally speaking, this is when we have functions of order 2 and higher, functions that often show up in idiomatic functional code.

Languages like Haskell do support higher-rank types [13]. Decidability is not problematic then, because the compiler expects the programmer to provide the higher-rank type signatures where necessary, and the compiler only needs to verify that the provided types are consistent: type checking *is* decidable. In our situation this is typically not acceptable: we cannot expect programmers to provide explicit control-flow [12] or binding-time information. So we have to insist on full inference of analysis information, and this paper shows how this can be done for dependency analysis [1].

Dependency analysis is in fact a family of analyses; instances include bindingtime analysis, exception analysis, secure information flow analysis and static slicing. The precision of our higher-ranked polyvariant annotated type system for dependency analysis thereby carries over immediately to the instances, and metatheoretical properties we prove, like a noninterference theorem [8], need to be proven only once.

In summary, this paper offers the following *contributions*. We (1) define a higher-ranked annotation polymorphic type system for a generic dependency analysis (section 4) for a call-by-name language that takes its annotations from a simply typed lambda-calculus enriched with lattice operations (section 3). The analysis also supports polyvariant recursion [10] to improve precision for certain recursive functions. Due to the principled way in which the analysis is set-up it can serve as a blueprint for giving other analyses the same treatment. We (2) prove our system sound with respect to a call-by-name operational semantics. We also formulate and prove a noninterference theorem for our system (section 5). We (3) give a type reconstruction algorithm that is sound and complete with respect to the type system (section 6) and provide a prototype implementation (section 7). For reasons of space we omit many details that are available in a separate document [26].

# **2 Intuition and motivation**

Before we go on to the technical details of this paper, we want to elaborate upon our intuitive description from the introduction. We do this by means of a few small examples, keeping the discussion informal. Formally discussed examples, as generated by our implementation, become big and hard to read pretty quickly; these can be found in section 7.

We start with a few examples in which binding-time analysis is the dependency analysis instance, followed by a few examples that use security flow analysis; our implementation supports both instances. We note that our implementation supports a few more language constructs than the formal specification given in this paper, giving us a bit more flexibility. Neither, however, supports polymorphism at the type level. This substantially simplifies the technicalities.

For the following example

 $foo : ((\text{int} \rightarrow \text{int}) \rightarrow \text{int}) \rightarrow \text{int} \times \text{int}$   $foo = \lambda f : (\text{int} \rightarrow \text{int}) \rightarrow \text{int}. (f \ (\lambda x : \text{int}. x), f \ (\lambda x : \text{int}. 0))$ 

our analysis can derive a higher-ranked polyvariant type for *f* ,

$$
\forall \beta\_1. (\forall \beta\_2. \text{int}\langle \beta\_2 \rangle \to \text{int}\langle \beta\_1 \ \beta\_2 \rangle) \to \text{int}\langle \beta\_3 \ \beta\_2 \ \beta\_1 \rangle)
$$

where β<sup>1</sup> and β<sup>2</sup> can be instantiated independently for each of the two calls to *f* in *foo*, and β<sup>3</sup> is universally bound by *foo* and represents how the argument *f* uses its function argument.

Since the argument to *f* is itself a function, the information that flows out of, say, the first call to *f* can be independent of the analysis of the function that flows into the second call (and vice versa), thereby avoiding unnecessary poisoning. This means that the binding-time of, say, the second component of the pair depends only on *f* and the function λ*x* : int.0, irrespective of *f* also receiving λ*x* : int.*x* as argument to compute the first component.

For the next example, let us consider security flow analysis in which we have annotations L and H that designate values (call these L-values and H-values) of low respectively high confidentiality. An important scenario where additional precision can be achieved is when analyzing Haskell code in which type classes have been desugared to dictionary-passing functional core. A function like

$$g \ x \ y = (x + y, y + y)$$

is then transformed into something like *g* (+) *x y* = (*x* + *y*, *y* + *y*). Now, consider the case that we pass an H-value to x and an L-value to y; the operator (+) produces an L-value if and only if both arguments are L-values. Without higher-ranked annotations, the annotation on the first argument to (+) has to be consistent with all uses of (+). Because x is an H-value, that will then also be the case for the second call to (+), leading to a pair of values of which the components are both H-values. With higher-ranked annotations, we can instantiate the two instances independently, and the second component of the pair is analyzed to produce an L-value. Functions in Haskell that use type classes are extremely common.

#### **3 The** *λ-***-calculus**

An essential ingredient of our annotated type system is the language of annotations that we use to decorate our types and to represent the dependencies resulting from evaluating an expression. Indeed, the fact that annotations are in fact "programs" in a lambda calculus is what allows us to make our analysis a higher-ranked polyvariant one. For the purpose of this paper, we generalize the λ∪-calculus of [16] to the λ-calculus (λ for short) a simply typed lambda calculus extended with a lattice structure.

The syntax of λ is given in figure 1; from now on, we refer to its types exclusively as *sorts*. Here, κ ranges over sorts, β over annotation variables, etc. <sup>κ</sup> <sup>∈</sup> **AnnSort** ::= (base sort) | κ<sup>1</sup> ⇒ κ<sup>2</sup> (function sort) <sup>β</sup> <sup>∈</sup> **AnnVar** (annotation variables) <sup>ξ</sup> <sup>∈</sup> **AnnTm** ::= <sup>β</sup> (variable) | λβ :: κ.ξ (abstraction) | ξ<sup>1</sup> ξ<sup>2</sup> (application) | (lattice value, ∈ L) | ξ<sup>1</sup> ξ<sup>2</sup> (lattice join operation)

Fig. 1: The syntax of the λ-calculus, sorts and annotations

In order to avoid confusion with the field of (algebraic) effects, we refer to terms of λ as *dependency terms* or *dependency annotations*. Terms are either of base sort , representing values in the underlying lattice L, or of function sort κ<sup>1</sup> ⇒ κ2.

On the term level, we allow arbitrary elements of the underlying lattice and taking binary joins, in addition to the usual variables, function applications and lambda abstractions. Lattice elements are assumed to be taken from a *bounded join-semilattice* L, an algebraic structure L, consisting of an underlying set L and an associative, commutative and idempotent binary operation , called *join* (we usually write ∈ L for ∈ L), and a least element ⊥.

The sorting rules of λ are straightforward (see [26]). Values of the underlying lattice are always of sort , and the join operator is defined on arbitrary terms of the same sort:

$$\begin{array}{c} \Sigma \vdash\_{\mathsf{s}} \xi\_{1} : \kappa \qquad \Sigma \vdash\_{\mathsf{s}} \xi\_{2} : \kappa \\ \hline \Sigma \vdash\_{\mathsf{s}} \xi\_{1} \sqcup \xi\_{2} : \kappa \end{array} [\mathsf{S}\text{-JonN}].$$

The sorting rule uses *sort environments* denoted by the letter Σ that map annotation variables β to sorts κ. We denote the set of sort environments by **SortEnv**. More precisely, a *sort environment* or *sort context* Σ is a finite list of bindings from annotation variables β to sorts κ. The empty context is written as ∅ (in code as []), and the context Σ extended with the binding of the variable

$$\begin{aligned} V\_{\star} &= \mathcal{L} \\ V\_{\kappa\_1 \mapsto \kappa\_2} &= \{ f : V\_{\kappa\_1} \to V\_{\kappa\_2} \mid f \text{ mono} \} \\ \rho : \mathbf{AnnVar} &\rightarrow \text{fin} \bigcup \{ V\_{\kappa} \mid \kappa \in \mathbf{AnnSort} \} \\ \left[ \mathbb{\hat{\beta}} \right]\_{\rho} &= \rho(\beta) \end{aligned}$$

$$\begin{aligned} \left[ \begin{aligned} \mathbb{\hat{\beta}} \boldsymbol{\beta} &: \kappa\_1 \boldsymbol{\xi} \right]\_{\rho} = \lambda v \in V\_{\kappa\_1} \dots \mathbb{\left[ \boldsymbol{\xi} \right]}\_{\rho \left[ \boldsymbol{\beta} \mapsto v \right]} \\ \left[ \mathbb{\boldsymbol{\xi}}\_1 \ \mathbb{\boldsymbol{\xi}} \right]\_{\rho} = \left[ \mathbb{\boldsymbol{\xi}}\_1 \right]\_{\rho} \left( \left[ \mathbb{\boldsymbol{\xi}} \boldsymbol{\xi} \right]\_{\rho} \right) \\ \left[ \mathbb{\boldsymbol{\xi}} \right]\_{\rho} = \boldsymbol{\ell} \\ \left[ \mathbb{\boldsymbol{\xi}}\_1 \ \mathbb{\boldsymbol{\xi}} \right]\_{\rho} = \left[ \mathbb{\boldsymbol{\xi}}\_1 \right]\_{\rho} \sqcup \left[ \mathbb{\boldsymbol{\xi}}\_2 \right]\_{\rho} \end{aligned} \right.$$

Fig. 2: The semantics of λ-calculus

β to the sort κ is written Σ,β : κ. We denote the set of annotation variables in the context Σ with dom(Σ). When we write, Σ(β) = κ this means that β ∈ dom(Σ) and the rightmost occurrence of β binds it to κ. Moreover, Σ \ B where B ⊆ **AnnVar** denotes the context Σ where all bindings of annotation variables in B have been removed. In the remainder of this paper, we shall overload this notation for all kinds of other environments we shall be needing, including type environments, and annotated type environments.

The λ-calculus enjoys a number of properties, many of which are what one might expect; we have put these and their proofs in [26].

A *substitution* is a map from variables to terms usually denoted by the letter θ. The application of a substitution θ to a term ξ is written θξ and replaces all free variables in ξ that are also in the domain of θ with the corresponding terms they are mapped to. A concrete substitution replacing the variables β1,...,β<sup>n</sup> with terms ξ1,...,ξ<sup>n</sup> is written [ξ1/β1,...,ξn/βn].

Assuming the usual definitions for the pointwise extension of a lattice L, and for monotone (order-preserving) functions between lattices, Figure 2 shows the denotational semantics of λ, where we employ the pointwise lifting of ∪ to functions to give semantics to the join of λ. The universe V<sup>κ</sup> denotes the lattice that is represented by the sort κ. The base sort represents the underlying lattice L and the function sort κ<sup>1</sup> ⇒ κ<sup>2</sup> represents the lattice constructed by pointwise extension of the lattice V<sup>κ</sup><sup>2</sup> restricted to monotone functions.

The denotation function -·<sup>ρ</sup> is parameterized with an environment <sup>ρ</sup> of the given type that provides the values of variables. The denotation of a lambda term is simply an element of the corresponding function space. Applications are therefore mapped directly to the underlying function application of the metatheory. This is unlike the λ∪-calculus of [16] where lambda terms are mapped to singleton sets of functions and function application is defined in terms of the union of the results of individually applying each function. The crucial difference is that we have offloaded this complexity into the definition of the pointwise extension of lattices. It is therefore important to note that the join operator used in the denotation of a term ξ<sup>1</sup> ξ<sup>2</sup> depends on the sort κ of this term and belongs to the lattice Vκ.

An environment ρ : **AnnVar** →fin - {V<sup>κ</sup> | κ ∈ **AnnSort**} and a sort environment Σ are *compatible* if dom(Σ) = dom(ρ) and for all β ∈ dom(Σ) we have ρ(β) ∈ VΣ(β). Given two dependency terms ξ<sup>1</sup> and ξ<sup>2</sup> and a sort κ such that Σ <sup>s</sup> ξ<sup>1</sup> : κ and Σ <sup>s</sup> ξ<sup>2</sup> : κ, we say that ξ<sup>2</sup> *subsumes* ξ<sup>1</sup> under the environment Σ, written Σ sub ξ<sup>1</sup> ξ2, if for all environments ρ compatible with Σ, we have <sup>ξ</sup><sup>1</sup><sup>ρ</sup> <sup>ξ</sup><sup>2</sup><sup>ρ</sup>. They are *semantically equal* under <sup>Σ</sup>, written <sup>Σ</sup> <sup>ξ</sup><sup>1</sup> <sup>≡</sup> <sup>ξ</sup>2, if for all environments <sup>ρ</sup> compatible with <sup>Σ</sup>, we have <sup>ξ</sup><sup>1</sup><sup>ρ</sup> <sup>=</sup> <sup>ξ</sup><sup>2</sup><sup>ρ</sup>.

# **4 The declarative type system**

The types and syntax of our source language are given in figure 3. The types of our source language consist of a unit type, and product, sum and function types. As mentioned earlier, let-polymorphism at the type level is not part of the


Fig. 3: The types and terms of the source language

type system. The language itself is then hardly suprising and includes variables, a unit constant, lambda abstraction, function application, projection functions for product types, sum constructors, a sum eliminator (case), fixpoints, seq for explicitly forcing evaluation in our call-by-name language, and, finally, a special operation ann(*t*) that raises the annotation level of *t* to . We omit the underlying type system for the source language since it consists mostly of the standard rules (see [26]). A notable exception is the rule for ann(*t*). Such an explicitly annotated term has the same underlying type as *t*:

$$\frac{\Gamma \vdash t : \tau}{\Gamma \vdash \text{ann}\_{\ell}(t) : \tau} \text{ [U-ANN]};$$

The annotation imposed on t only becomes relevant in the annotated type system that we discuss next. In the following, we assume the usual definitions for computing the set of free term variables of a term, ftv(t).

**The annotated type system** The source language is simply a desugared variant of the functional language a programmer deals with. The target language has the same structure, but adds dependency annotations to the source syntax. These annotations are the payload of the dependency analaysis and computed by the algorithm given in section 6, so that the analysis results can be employed in the back-end of a compiler. In other words, the algorithm *elaborates* a source level term into a target term.

The syntax of the target language is shown in figure 4. *Annotated types* of the target language are denoted by <sup>τ</sup> and *annotated terms* are denoted by t. The annotations that we put on compound types, as well as their components are not just there for uniformity. Because of our non-strict semantics and the


Fig. 4: The annotated types and terms of the target language

presence of seq, we can observe the effects on a pair constructor independently of its values, so we have separate annotations to represent these.

On the type level, there is an additional construct <sup>∀</sup><sup>β</sup> :: κ.τ quantifying over an annotation variable β of sort κ. Furthermore, the recursive occurrences in the sum, product and arrow types now each carry an annotation. On the term level, the explicit type annotations of lambda expressions and fixpoints are now annotated types and also include a dependency annotation. Moreover, dependency abstraction and application have been added to reflect the quantification of dependency variables on the type level. We denote the set of free (term) variables in a target term *<sup>t</sup>* by ftv(*t*).

The formal definition of well-formedness for annotated types can be found in [26]. Informally, a type is well-formed only if all annotations are of sort and all annotation variables that are used have previously been bound.

Below, we assume the unsurprising recursive definitions for computing the underlying terms *t* and underlying types τ that correspond to annotated terms *<sup>t</sup>* and annotated types <sup>τ</sup>. We also straightforwardly extend the definition of free annotation variables to annotated types, and denote these by fav(τ).

**Subtyping** To define subtyping we need an auxiliary relation that says when two annotated types <sup>τ</sup><sup>1</sup> and <sup>τ</sup><sup>2</sup> *have the same shape*. The unsurprising formal definition is in [26], but essentially they have the same syntactic structure, and in the forall case, quantify over the same annotation variable. It can be quite easily proven that if two types have the same shape, then they have the same underlying type. This is not true the other way around: the annotated types ∀β1.∀β2.intβ1 → intβ<sup>1</sup> β2 and ∀β1.intβ1 → intβ1 have the same underlying type, int → int, but do not have the same shape.

Figure 5 shows the rules defining the subtyping relation on annotated types of the same shape, that allows us to weaken the annotations on a type to a less demanding one. Intuitively, a type <sup>τ</sup><sup>1</sup> is a subtype of <sup>τ</sup><sup>2</sup> under a sort environment

[Sub-Refl] <sup>Σ</sup> sub <sup>τ</sup> <sup>τ</sup> <sup>Σ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup><sup>2</sup> <sup>Σ</sup> sub <sup>τ</sup><sup>2</sup> <sup>τ</sup><sup>3</sup> [Sub-Trans] <sup>Σ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup><sup>3</sup> Σ,β :: <sup>κ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup><sup>2</sup> [Sub-Forall] <sup>Σ</sup> sub <sup>∀</sup><sup>β</sup> :: κ.τ<sup>1</sup> <sup>∀</sup><sup>β</sup> :: κ.τ<sup>2</sup> Σ sub ξ<sup>1</sup> ξ 1 <sup>Σ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup> 1 Σ sub ξ<sup>2</sup> ξ 2 <sup>Σ</sup> sub <sup>τ</sup><sup>2</sup> <sup>τ</sup> 2 [Sub-Prod] <sup>Σ</sup> sub <sup>τ</sup>1ξ1 × <sup>τ</sup>2ξ2 <sup>τ</sup> 1ξ <sup>1</sup> × τ 2ξ 2 Σ sub ξ <sup>1</sup> ξ<sup>1</sup> Σ sub τ <sup>1</sup> <sup>τ</sup><sup>1</sup> Σ sub ξ<sup>2</sup> ξ 2 <sup>Σ</sup> sub <sup>τ</sup><sup>2</sup> <sup>τ</sup> 2 [Sub-Arr] <sup>Σ</sup> sub <sup>τ</sup>1ξ1 → <sup>τ</sup>2ξ2 <sup>τ</sup> 1ξ <sup>1</sup> → τ 2ξ 2

Fig. 5: Subtyping relation (<sup>Σ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup>2), [Sub-Sum] is like [Sub-Prod]

<sup>Σ</sup>, written <sup>Σ</sup> sub <sup>τ</sup><sup>1</sup> <sup>τ</sup>2, if a value of type <sup>τ</sup><sup>1</sup> can be used in places where a value of type <sup>τ</sup><sup>2</sup> is required. The subtyping relation only relates the annotations inside the types using the subsumption relation Σ sub ξ<sup>1</sup> ξ<sup>2</sup> between dependency terms. Moreover, the subtyping relation implicitly demands that both types are well-formed under the environment. The [Sub-Forall] rule requires that the quantified variable has the same name in both types. This is not a restriction, as we can simply rename the variables in one or both of the types accordingly in order to make them match and prevent unintentional capturing of previously free variables. Note that [Sub-Arr] is contravariant for argument positions. We omitted [Sub-Sum] which can be derived from [Sub-Prod] by replacing <sup>×</sup> with +.

**The annotated type rules** An *annotated type environment* <sup>Γ</sup> is defined analogously to sort environments, but instead maps term variables x to pairs of an annotated type <sup>τ</sup> and a dependency term <sup>ξ</sup>. We extend the definition of the set of free annotation variables to annotated environments by taking the union of the free annotation variables of all annotated types and dependency terms occurring in the environment, denoted by fav(Γ). We denote the set of annotated type environments by **AnnTyEnv**.

We have now all the definitions in place in order to define the declarative annotated type system shown in figure 6. It consists of judgments of the form <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup> & <sup>ξ</sup> expressing that under the sort environment <sup>Σ</sup> and the annotated type environment <sup>Γ</sup>, the annotated term <sup>t</sup> has the annotated type <sup>τ</sup> and the dependency term ξ. The dependency term in this context is also called the *dependency term* of <sup>t</sup> <sup>1</sup>. It is implicitly assumed that every type <sup>τ</sup> is also wellformed under <sup>Σ</sup>, i.e. <sup>Σ</sup> wft <sup>τ</sup>, and that the resulting dependency annotation <sup>ξ</sup> is of sort , i.e. Σ <sup>s</sup> ξ : .

We now discuss some of the more interesting rules of figure 6. In [T-Var], both the annotated type and the dependency annotation are looked up in the environment. The dependency annotation of the unit value defaults to the least annotation in [T-Unit]. While we could admit an arbitrary dependency annotation here, the same can be achieved by using the subtyping rule [T-Sub]. We employ this principle more often, e.g., in [T-Abs], and [T-Pair]. This essentially means that the context in which such a term is used completely determines the annotation.

The rule [T-App] may seem overly restrictive by requiring that the types and dependency annotations of the arguments match, and that the dependency annotations of the return value and the function itself are the same. However, in combination with the subtyping rule [T-Sub], this effectively does not restrict the analysis in any way. We see the same happening in other rules, such as [T-Case] and [T-Proj]. Note that the dependency annotation of the argument does not play a role in the resulting dependency annotation of the application. This is because we are dealing with a call by name semantics which means that the argument is not necessarily evaluated before the function call. It should be noted that this does not mean that the dependency annotations of arguments are ignored completely. If the body of a function makes use of an argument, the type system makes sure that its dependency annotation is also incorporated into the result.

When constructing a pair (rule [T-Pair]), the dependency annotations of the components are stored in the type while the pair itself is assigned the least dependency annotation. When accessing a component of a pair (rule [T-Proj]), we require that the dependency annotation of the pair matches the dependency annotation of the projected component. Again, this is no restriction due to the subtyping rule.

In [T-Inl/Inr], the argument to the injection constructor only determines the type and annotation of one component of the sum type while the other component can be chosen arbitrarily as long as the underlying type matches the annotation on the constructor. The destruction of sum types happens in a case statement that is handled by rule [T-Case]. Again, to keep the rule simple and without loss of precision due to judicious use of rule [T-Sub], we may demand that the types of both branches match, and that additionally the dependency annotations of both branches and the scrutinee are equal.

The annotation rule [T-Ann] requires that the dependency annotation of the term being annotated is at least as large as the lattice element . In the fixpoint rule, [T-Fix], not only the types but also the dependency annotations of the term itself and the bound variables must match. Note that this rule also

<sup>1</sup> Following the literature of type and effect systems we would much like to use the term "effect" at this point, but decided to use a different term to avoid confusion with the literature on effect handlers.

<sup>Γ</sup>(x) = <sup>τ</sup> & <sup>ξ</sup> [T-Var] Σ | Γ te x : τ & ξ [T-Unit] Σ | Γ te () : unit & ⊥ <sup>Σ</sup> <sup>|</sup> Γ , <sup>x</sup> : <sup>τ</sup><sup>1</sup> & <sup>ξ</sup><sup>1</sup> te <sup>t</sup> : <sup>τ</sup><sup>2</sup> & <sup>ξ</sup><sup>2</sup> [T-Abs] Σ | Γ te λx : τ<sup>1</sup> & ξ1.t : τ1ξ1 → τ2ξ2 & ⊥ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>1</sup> : <sup>τ</sup>1ξ1 → <sup>τ</sup>2ξ2 & <sup>ξ</sup><sup>2</sup> <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>2</sup> : <sup>τ</sup><sup>1</sup> & <sup>ξ</sup><sup>1</sup> [T-App] Σ | Γ te t<sup>1</sup> t<sup>2</sup> : τ<sup>2</sup> & ξ<sup>2</sup> <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup> & <sup>ξ</sup><sup>1</sup> <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>2</sup> : <sup>τ</sup><sup>2</sup> & <sup>ξ</sup><sup>2</sup> [T-Pair] Σ | Γ te (t1, t2) : τ1ξ1 × τ2ξ2 & ⊥ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup>1ξ1 × <sup>τ</sup>2ξ2 & <sup>ξ</sup><sup>i</sup> [T-Proj] Σ | Γ te proji(t) : τ<sup>i</sup> & ξ<sup>i</sup> <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup><sup>1</sup> & <sup>ξ</sup><sup>1</sup> [T-Inl] <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te inlτ-<sup>2</sup>(t) : τ1ξ1 + τ2ξ2 & ⊥ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup><sup>2</sup> & <sup>ξ</sup><sup>2</sup> [T-Inr] <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te inrτ-<sup>1</sup>(t) : τ1ξ1 + τ2ξ2 & ⊥ Σ | Γ te t : τ1ξ1 + τ2ξ2 & ξ Σ | Γ , y : τ<sup>2</sup> & ξ<sup>2</sup> te t<sup>2</sup> : τ & ξ Σ | Γ , x : τ<sup>1</sup> & ξ<sup>1</sup> te t<sup>1</sup> : τ & ξ [T-Case] <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te **case** <sup>t</sup> **of** {inl(<sup>x</sup> ) <sup>→</sup> <sup>t</sup>1; inr(y) <sup>→</sup> <sup>t</sup><sup>2</sup> } : <sup>τ</sup> & <sup>ξ</sup> Σ | Γ te t : τ & ξ Σ sub <sup>ξ</sup> [T-Ann] Σ | Γ te ann(t) : τ & ξ <sup>Σ</sup> <sup>|</sup> Γ , <sup>x</sup> : <sup>τ</sup> & <sup>ξ</sup> te <sup>t</sup> : <sup>τ</sup> & <sup>ξ</sup> [T-Fix] Σ | Γ te μx : τ & ξ.t : τ & ξ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>1</sup> : <sup>τ</sup><sup>1</sup> & ξ Σ <sup>|</sup> <sup>Γ</sup> te <sup>t</sup><sup>2</sup> : <sup>τ</sup><sup>2</sup> & <sup>ξ</sup> [T-Seq] Σ | Γ te seq t<sup>1</sup> t<sup>2</sup> : τ<sup>2</sup> & ξ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup> & <sup>ξ</sup> <sup>Σ</sup> sub <sup>τ</sup> τ Σ sub <sup>ξ</sup> <sup>ξ</sup> [T-Sub] Σ | Γ te t : τ & ξ Σ,β : <sup>κ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup> & ξ β <sup>∈</sup> fav(Γ) <sup>∪</sup> fav(ξ) [T-AnnAbs] Σ | Γ te Λβ :: κ.t : ∀β :: κ.τ & ξ <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>∀</sup><sup>β</sup> :: κ.τ & ξ Σ <sup>s</sup> <sup>ξ</sup> : <sup>κ</sup> [T-AnnApp] Σ | Γ te t ξ : [ξ / β]τ & ξ Fig. 6: Declarative annotated type system (<sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te <sup>t</sup> : <sup>τ</sup> & <sup>ξ</sup>)

<sup>v</sup> <sup>∈</sup> **Nf** ::= <sup>λ</sup><sup>x</sup> : <sup>τ</sup> & ξ.<sup>t</sup> <sup>|</sup> Λβ :: κ.<sup>t</sup> <sup>|</sup> () <sup>|</sup> inl<sup>τ</sup> (t) <sup>|</sup> inr<sup>τ</sup> (t) <sup>|</sup> (t1, <sup>t</sup>2) <sup>v</sup> <sup>∈</sup> **Nf** ::= <sup>v</sup> <sup>|</sup> ann(v )

Fig. 7: Values in the target language

admits polyvariant recursion [23], since quantification can occur anywhere in an annotated type. Since seq *t*<sup>1</sup> *t*<sup>2</sup> forces the evaluation of its first argument, it requires that *t*1's dependency annotation is part of the final result. This is justified, because the result depends on the termination behavior of *t*1.

The subtyping rule [T-Sub] allows us to weaken the annotations nested inside a type through the subtyping relation (see figure 5), as well as the dependency annotations itself through the subsumption relation. The rule [T-AnnAbs] introduces an annotation variable β of sort κ in the body t of the abstraction. The second premise ensures that the annotation variable does not escape its scope determined by the quantification on the type level. The annotation application rule [T-AnnApp] allows the instantiation of an annotation variable with an arbitrary well-sorted dependency term.

# **5 Metatheory**

In this section we develop a noninterference proof for our declarative type system, based on a small-step operational call-by-name semantics for the target language.

Figure 7 defines the values of the target language, i.e. those terms that cannot be further evaluated. Apart from a technicality related to annotations, they correspond exactly to the weak head normal forms of terms. The distinction for **Nf** ⊂ **Nf** is made to ensure that there is at most one annotation at top level.

The semantics itself is largely straightforward, except for the handling of annotations. These are moved just as far outwards as necessary in order to reach a normal form, thereby computing the least "permission" an evaluator must possess for computing a certain output. Figure 8 shows two rules: a lifting rule (for applications) and the rule for merging adjacent annotations (see the supplemental material for the others).

In the remainder of this section we state the standard *progress* and *subject reduction* theorems that ensure that our small-step semantics is compatible with

$$\frac{v' \in \mathbf{Mf}'}{\left(\text{ann}\_{\ell}(v')\right) t\_2 \to \text{ann}\_{\ell}(v'|t\_2)} \text{ [E-LirrtApp]}$$

$$\frac{v' \in \mathbf{Mf}'}{\text{ann}\_{\ell\_1}(\text{ann}\_{\ell\_2}(v')) \to \text{ann}\_{\ell\_1 \sqcup \ell\_2}(v')} \text{ [E-JoinANN]}$$

Fig. 8: Small-step semantics (*t* → *t* ) (excerpt)

the annotated type system. The following progress theorem demonstrates that any well-typed term is in normal form, or an evaluation step can be performed.

**Theorem 1 (Progress).** *If* ∅|∅te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup>*, then either* <sup>t</sup> <sup>∈</sup> **Nf** *or there is a t such that t* → *t .*

The subject reduction property says that the reduction of a well-typed term results in a term of the same type.

**Theorem 2 (Subject Reduction).** *If* ∅|∅te *<sup>t</sup>* : <sup>τ</sup>& <sup>ξ</sup> *and there is a t such that t* → *t , then* ∅|∅te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup>*.*

As expected, subject reduction extends naturally to a sequence of reductions by induction on the length of the reduction sequence:

**Corollary 1.** *If we have* ∅|∅te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> *and t* <sup>→</sup><sup>∗</sup> *v , then* ∅|∅te *<sup>v</sup>* : <sup>τ</sup> & <sup>ξ</sup>*.*

where, as usual, we write *t* →<sup>∗</sup> *v* if there is a finite sequence of terms (ti)0in with t<sup>0</sup> = t and t<sup>n</sup> = v ∈ **Nf** and reductions (t<sup>i</sup> → ti+1)0i<n between them. If there is no such sequence, this is denoted by t ⇑ and t is said to *diverge*.

Finally, if a term evaluates to an annotated value, this annotation is compatible with the dependency annotation that has been assigned to the term:

**Theorem 3 (Semantic Soundness).** *If we have* ∅|∅te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> *and t* <sup>→</sup><sup>∗</sup> ann(*v* )*, then* ∅ sub ξ*.*

**The noninterference property** An important theorem for the safety of program transformations/optimizations using the results of dependency analysis is *noninterference*. It guarantees that if there is a target term *t* depending on some variable *<sup>x</sup>* such that ∅ | *<sup>x</sup>* :τ &ξ te *<sup>t</sup>*:τ&<sup>ξ</sup> holds and the dependency annotation ξ of the variable is not encompassed by the resulting dependency annotation ξ (i.e. ∅ sub ξ ξ), then *t* will always evaluate to the same normal form, regardless the value of *x* .

Since we are in a non-strict setting, our noninterference property only applies to the topmost constructors of values. This is because the dependency annotations derived in the annotated type system only provide information about the evaluation to weak head normal form. Nested terms might possess lower as well as higher classifications. In particular, the subterms with greater dependency annotations than their enclosing constructors prevent us from making a more general statement because those can still depend on the context whereas the toplevel constructor cannot. In the noninterference theorem presented for the SLam calculus, this problem is circumvented by restricting the statement to so called *transparent* types, where the annotations of nested components are decreasing when moving further inward [9].

In the following we consider two normal forms v1, v<sup>2</sup> ∈ **Nf** to be *similar*, denoted v<sup>1</sup> v2, if their top level constructors (and annotations, if present) match (see the supplemental material for the unsurprising definition of ). So, v<sup>1</sup> v<sup>2</sup> implies that these two values are indistinguishable without further evaluation, which is the property guaranteed by the noninterference theorem.

**Theorem 4 (Noninterference).** *Let t be a target term such that* ∅ | *<sup>x</sup>* : <sup>τ</sup> & <sup>ξ</sup> te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> *and* <sup>∅</sup>sub <sup>ξ</sup> <sup>ξ</sup>*. Let v be a value.*

*If there is a t*<sup>1</sup> *with* ∅|∅te *<sup>t</sup>*<sup>1</sup> : <sup>τ</sup> & <sup>ξ</sup> *such that* [*t*<sup>1</sup> / *<sup>x</sup>* ]*<sup>t</sup>* <sup>→</sup><sup>∗</sup> *v , then there is a t such that for all t*<sup>2</sup> *with* ∅|∅te *<sup>t</sup>*<sup>2</sup> : <sup>τ</sup> & <sup>ξ</sup> *we have* [*t*<sup>2</sup> / *<sup>x</sup>* ]*<sup>t</sup>* <sup>→</sup><sup>∗</sup> [*t*<sup>2</sup> / *<sup>x</sup>* ]*t and* [*t*<sup>1</sup> / *x* ]*t* [*t*<sup>2</sup> / *x* ]*t .*

The noninterference proofs crucially rely on the fact that the source term is well-typed, and the additional assumption ∅ sub ξ ξ stating that the dependency annotation of the variable in the context is not encompassed by the dependency annotation of the term being evaluated.

By introducing the restriction to transparent types, we can recover the notion of noninterference used for the SLam calculus. For example, if we have a transparent type <sup>τ</sup><sup>1</sup>ξ1 × <sup>τ</sup><sup>2</sup>ξ2 & <sup>ξ</sup> (i.e. <sup>∅</sup>sub <sup>ξ</sup><sup>1</sup> <sup>ξ</sup> and <sup>∅</sup>sub <sup>ξ</sup><sup>2</sup> <sup>ξ</sup>) and ∅ sub ξ ξ holds, then we also know ∅ sub ξ ξ<sup>1</sup> and ∅ sub ξ ξ2. Otherwise, we would get ∅ sub ξ ξ by transitivity, contradicting the assumption. This means all prerequisites of the noninterference theorem are still fulfilled.

Hence, it is possible in these cases to apply the noninterference theorem to the nested (possibly unevaluated) subterms of a constructor in weak head normal form. As in the work of [1], our noninterference theorem is restricted to deal with terms depending on exactly one variable.

# **6 The type reconstruction algorithm**

**Modularity considerations** When designing the type reconstruction algorithm we have two goals: it should be a conservative extension of the underlying type system, and types assigned by the analysis should be as general as possible. Concretely, a function's type must be general enough to be able to adapt to arguments with arbitrary annotations. These two goals give rise to the notion of *fully flexible* and *fully parametric* types defined by [12]. [16] calls these types *conservative* and *pattern* types respectively. Informally, an annotated type is a pattern type if it can be instantiated to any conservative type of the same shape and a conservative type is an analysis of an expression that is able to cope with any arguments it might depend on. These types are conservative in the sense that they make the least assumptions about their arguments and therefore are a conservative estimate compared to other typings with fewer degrees of freedom.

For a pattern type to be instantiable to any conservative type, we first need to make sure that all dependency annotations occurring in it can be instantiated to the corresponding dependency terms in a matching conservative type. This leads to the following definition of a *pattern* in the λ-calculus. It is based on the similar definition by [16] which in turn is a special case of a pattern in higher-order unification theory [4,21]. A λ-term is a *pattern* if it is of the form f β<sup>1</sup> ··· β<sup>n</sup> where f is a free variable and β1,...,β<sup>n</sup> are distinct bound variables. A unification problem of the form ∀β<sup>1</sup> ··· βn.f β<sup>1</sup> ··· β<sup>n</sup> = ξ where the left-hand side is a pattern is called *pattern unification*. A pattern unification problem ∀β<sup>1</sup> ··· βn.f β<sup>1</sup> ··· β<sup>n</sup> = ξ has a unique most general solution, namely the substitution [f → λβ1. ··· λβn.ξ] [4].

<sup>β</sup> <sup>∈</sup> <sup>α</sup><sup>i</sup> [P-Unit] α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>p</sup> unit & β α<sup>i</sup> β :: κ<sup>α</sup><sup>i</sup> ⇒ α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>p</sup> τ<sup>1</sup> & ξ<sup>1</sup> β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>p</sup> τ<sup>2</sup> & ξ<sup>2</sup> γ<sup>k</sup> :: κ<sup>γ</sup><sup>k</sup> [P-Prod] <sup>α</sup><sup>i</sup> :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>p</sup> <sup>τ</sup>1ξ1 × <sup>τ</sup>2ξ2 & <sup>β</sup> <sup>α</sup><sup>i</sup> β :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>⇒</sup> , <sup>β</sup><sup>j</sup> :: <sup>κ</sup><sup>β</sup><sup>j</sup> , <sup>γ</sup><sup>k</sup> :: <sup>κ</sup><sup>γ</sup><sup>k</sup> ∅ <sup>p</sup> τ<sup>1</sup> & ξ<sup>1</sup> β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> , β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> <sup>p</sup> τ<sup>2</sup> & ξ<sup>2</sup> γ<sup>k</sup> :: κ<sup>γ</sup><sup>k</sup> [P-Arr] <sup>α</sup><sup>i</sup> :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>p</sup> <sup>∀</sup>β<sup>j</sup> :: <sup>κ</sup><sup>β</sup><sup>j</sup> .τ1ξ1 → <sup>τ</sup>2ξ2 & <sup>β</sup> <sup>α</sup><sup>i</sup> β :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>⇒</sup> , <sup>γ</sup><sup>k</sup> :: <sup>κ</sup><sup>γ</sup><sup>k</sup>

Fig. 9: Pattern types (<sup>Σ</sup> <sup>p</sup> <sup>τ</sup>& ξ
Σ ), where <sup>β</sup> ∈ <sup>α</sup>i, <sup>β</sup><sup>j</sup> , <sup>γ</sup><sup>k</sup> , and [P-Sum] is like [P-Prod]

The definition of a pattern is then extended to annotated types using the rules from figure 9. Our definition is more precise than the one from previous work in that it makes explicit which variables are expected to be bound and which are free. We require that all variables with different names in the definition of these rules are distinct from each other.

An annotated type and depencency pair <sup>τ</sup> & <sup>ξ</sup> is a *pattern type* under the sort environment <sup>Σ</sup> if the judgment <sup>Σ</sup> <sup>p</sup> <sup>τ</sup> & ξ
Σ holds for some <sup>Σ</sup> . We call the variables in Σ *argument variables* and the variables in Σ *pattern variables*.

*Example 1.* A simple pattern type with the pattern variables β :: ⇒ and β :: ⇒ ⇒ is

$$
\forall \forall \beta\_1 :: \star. \widehat{\mathtt{unit}} \langle \beta\_1 \rangle \to (\forall \beta\_2 :: \star. \widehat{\mathtt{unit}} \langle \beta\_2 \rangle \to \widehat{\mathtt{unit}} \langle \beta' \rangle \beta\_1 \ \beta\_2 \rangle) \langle \beta \ \beta\_1 \rangle
$$

Note that since β<sup>1</sup> is quantified on the function arrow chain, it is passed on to the second function arrow. However, it is not propagated into the second argument. In general, annotations on the return type may depend on the annotations of all previous arguments while annotations of the arguments may not. This prevents any dependency between the annotations of arguments and guarantees that they are as permissive as possible. This is also why pattern variables in a covariant position are passed on to the next higher level while pattern variables in arguments are quantified in the enclosing function arrow. This allows the caller of a function to instantiate the dependency annotations of the parameters to the actual arguments.

As we stated earlier, a conservative function type makes the least assumptions over its arguments. Formally, this means that arguments of conservative functions are pattern types. We will later see that a pattern type can be instantiated to any conservative type of the same shape. On the other hand, non-functional conservative types are not constrained in their annotations. These characteristics are captured by the following definition based on *conservative types* [16] and *fully flexible types* [12].

An annotated type <sup>τ</sup> is *conservative* if

<sup>β</sup> fresh [C-Unit] α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>c</sup> unit : unit & β α<sup>i</sup> β :: κ<sup>α</sup><sup>i</sup> ⇒ α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>c</sup> τ<sup>1</sup> : τ<sup>1</sup> & ξ<sup>1</sup> β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> <sup>c</sup> τ<sup>2</sup> : τ<sup>2</sup> & ξ<sup>2</sup> γ<sup>k</sup> :: κ<sup>γ</sup><sup>k</sup> [C-Prod] <sup>α</sup><sup>i</sup> :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>c</sup> <sup>τ</sup><sup>1</sup> <sup>×</sup> <sup>τ</sup><sup>2</sup> : <sup>τ</sup>1ξ1 × <sup>τ</sup>2ξ2 & <sup>β</sup> <sup>α</sup><sup>i</sup> β :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>⇒</sup> , <sup>β</sup><sup>j</sup> :: <sup>κ</sup><sup>β</sup><sup>j</sup> , <sup>γ</sup><sup>k</sup> :: <sup>κ</sup><sup>γ</sup><sup>k</sup> ∅ <sup>c</sup> τ<sup>1</sup> : τ<sup>1</sup> & ξ<sup>1</sup> β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> α<sup>i</sup> :: κ<sup>α</sup><sup>i</sup> , β<sup>j</sup> :: κ<sup>β</sup><sup>j</sup> <sup>c</sup> τ<sup>2</sup> : τ<sup>2</sup> & ξ<sup>2</sup> γ<sup>k</sup> :: κ<sup>γ</sup><sup>k</sup> [C-Arr] <sup>α</sup><sup>i</sup> :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>c</sup> <sup>τ</sup><sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup> : <sup>∀</sup>β<sup>j</sup> :: <sup>κ</sup><sup>β</sup><sup>j</sup> .τ1ξ1 → <sup>τ</sup>2ξ2 & <sup>β</sup> <sup>α</sup><sup>i</sup> β :: <sup>κ</sup><sup>α</sup><sup>i</sup> <sup>⇒</sup> , <sup>γ</sup><sup>k</sup> :: <sup>κ</sup><sup>γ</sup><sup>k</sup>

Fig. 10: Type completion (<sup>Σ</sup> <sup>c</sup> <sup>τ</sup> :τ&ξ
Σ ), all β fresh, [C-Sum] is like [C-Prod]


Moreover, an annotated type and depencency pair <sup>τ</sup> & <sup>ξ</sup> is *conservative* if <sup>τ</sup> is *conservative* and an annotated type environment <sup>Γ</sup> is *conservative* if for all <sup>x</sup> <sup>∈</sup> dom(Γ), <sup>Γ</sup>(*<sup>x</sup>* ) is conservative.

The following type signature for the function f is a conservative type that takes the function type from example 1 as an argument.

$$\begin{array}{c} f: \forall \beta: \star \Rightarrow \star. \forall \beta': \colon \star \Rightarrow \star \Rightarrow \star. \forall \beta\_3 :: \star. \\ \langle \forall \beta\_1 :: \star. \widehat{\mathtt{unit}} \langle \beta\_1 \rangle \rightarrow \langle \forall \beta\_2 :: \star. \widehat{\mathtt{unit}} \langle \beta\_2 \rangle \rightarrow \widehat{\mathtt{unit}} \langle \beta' \ \beta\_1 \ \beta\_2 \rangle \rangle \langle \beta \ \beta\_1 \rangle \rangle \langle \beta\_3 \rangle \\ \rightarrow \widehat{\mathtt{unit}} \langle \beta\_3 \sqcup \beta \perp \sqcup \bot \beta' \perp \ell \rangle \ \& \ \bot \end{array}$$

Note that the pattern variables of the argument have been bound in the top-level function type. This allows callers of f to instantiate these patterns.

We can extend the previous definition of pattern types to the type completion relation shown in figure 10. It relates every underlying type τ with a pattern type <sup>τ</sup> such that <sup>τ</sup> erases to <sup>τ</sup> . It is defined through judgments <sup>Σ</sup> <sup>c</sup> <sup>τ</sup> : <sup>τ</sup>&ξ
Σ with the meaning that under the sort environment Σ, τ is completed to the annotated type <sup>τ</sup> and the dependency annotation <sup>ξ</sup> containing the pattern variables <sup>Σ</sup> . The completion relation can also be interpreted as a function taking Σ and τ as arguments and returning <sup>τ</sup>, <sup>ξ</sup> and <sup>Σ</sup> .

Lastly, we revisit the examples from the previous sections and show how a pattern type can be mechanically derived from an underlying type.

In example 1 we presented a pattern type for the underlying type unit → unit → unit. Using the type completion relation, we can derive the pattern type,

$$\begin{aligned} \langle \forall \beta\_1 . \widehat{\mathtt{unit}} \langle \beta\_1 \rangle \to \langle \forall \beta\_2 . \widehat{\mathtt{unit}} \langle \beta\_2 \rangle \to \widehat{\mathtt{unit}} \langle \beta' \ \beta\_1 \ \beta\_2 \rangle \rangle \langle \beta\_1 \beta\_2 \rangle \} \&\ \ \beta\_3 \end{aligned}$$

without having to guess. This is because the components <sup>τ</sup>, <sup>ξ</sup> and <sup>Σ</sup> in a judgment <sup>Σ</sup> <sup>c</sup> <sup>τ</sup> : <sup>τ</sup> & ξ
Σ are uniquely determined by <sup>Σ</sup> and <sup>τ</sup> from looking at the syntax alone. The resulting pattern type contains three pattern variables, β :: ⇒ , β :: ⇒ ⇒ and β<sup>3</sup> :: . If the initial sort environment is empty, these are also the only free variables of the pattern type.

Based on the type completion relation we can define least type completions. These are conservative types that are subtypes of all other conservative types of the same shape. Therefore, all annotations occurring in positive positions on the top level function arrow chain must also be least. We do not need to consider arguments here because those are by definition equal up to alpha-conversion due to being pattern types. We define the *least annotation term* of sort κ as

$$\begin{aligned} \bot\_{\star} &= \bot\\ \bot\_{\kappa\_1 \Rightarrow \kappa\_2} &= \lambda \beta : \kappa\_1 . \bot\_{\kappa\_2} . \end{aligned}$$

These least annotation terms correspond to the least elements of our bounded lattice for a given sort κ. This in turn leads us to the definition of the least completion of type τ (see figure 10) by substituting all free variables in the completion with the least annotation of the corresponding sort, i.e.

$$\bot\_{\tau} = [\overline{\bot\_{\kappa\_i} / \beta\_i}] \widehat{\tau} \text{ for } \emptyset \vdash\_c \tau : \widehat{\tau} \text{ \& } \xi \rhd \overline{\beta\_i :: \kappa\_i} \dots$$

**The algorithm** We can now move on to the type reconstruction algorithm that performs the actual analysis. At its core lies algorithm R shown in figure 11. The input of the algorithm is a triple (Γ,Σ, t ) consisting of a well-typed source term <sup>t</sup>, an annotated type environment <sup>Γ</sup> providing the types and dependency annotations of the free term variables in t and a sort environment Σ mapping each free annotation variable in scope to its sort. It returns a triple *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> consisting of an elaborated term *<sup>t</sup>* in the target language (that erases to the source term *<sup>t</sup>*), an annotated type <sup>τ</sup> and an dependency annotation <sup>ξ</sup> such that <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> holds. In the definition of <sup>R</sup>, to avoid clutter, we write <sup>Γ</sup> instead of <sup>Γ</sup> because we are only dealing with one kind of type environment.

The algorithm relies on the invariant that all types in the type environment and the inferred type must be conservative. In the version of [16], all inferred dependency annotations (including those nested as annotations in types) had to be canonically ordered as well. But as it turned out that this canonically ordered form was not enough for deciding semantic equality, so we lifted this requirement. We still mark those places in the algorithm where canonicalization would have occurred with ··, but the actual result of this operation does not matter as long as the dependency terms remain equivalent.

The algorithm for computing the least upper bound of types ( in figure 12) requires that both types are conservative, have the same shape and use the same names for bound variables. The latter can be ensured by α-conversion while the former two requirements are fulfilled by how this function is used in R.

The restriction to conservative types allows us to ignore functions arguments because these are always required to be pattern types, which are unique up to α-equivalence. This alleviates the need for computing a corresponding greatest lower bound of types, because the algorithm only traverses covariant positions.

# <sup>R</sup> : **AnnTyEnv** <sup>×</sup> **SortEnv** <sup>×</sup> **Tm** <sup>→</sup> **Tm** <sup>×</sup> **Ty** <sup>×</sup> **AnnTm**

```
R(Γ; Σ; x ) = x : Γ(x )
R(Γ; Σ; ()) = () : unit &  ⊥
R(Γ; Σ; ann(t)) =
  let t : τ & ξ = R(Γ; Σ; t)
  in ann(t) : τ & ξ  Σ
R(Γ; Σ; seq t1 t2) =
  let t

       1 : τ1 & ξ1 = R(Γ; Σ; t1)
      t

       2 : τ2 & ξ2 = R(Γ; Σ; t2)
  in seq t

          1 t

            2 : τ2 & ξ1  ξ2Σ
R(Γ; Σ; (t1, t2)) =
  let t

       1 : τ1 & ξ1 = R(Γ; Σ; t1)
      t

       2 : τ2 & ξ2 = R(Γ; Σ; t2)
  in (t

       1, t

          2) : τ1ξ1 × τ2ξ2 & ⊥
R(Γ; Σ; inlτ2 (t)) =
  let t : τ1 & ξ1 = R(Γ; Σ; t)
  in inlτ2 (t) : τ1ξ1 + ⊥τ2 ⊥ & ⊥
R(Γ; Σ; inrτ1 (t)) =
  let t : τ2 & ξ2 = R(Γ; Σ; t)
  in inrτ1 (t) : ⊥τ1 ⊥ + τ2ξ2 & ⊥
R(Γ; Σ; case t1 of {inl(x ) → t2;
                      inr(y) → t3 }) =
  let t

       1 : τξ + τ
                   ξ
                       & ξ1 = R(Γ; Σ; t1)
      t

       2 : τ2 & ξ2 = R(Γ, x : τ & ξ; Σ; t2)
      t

       3 : τ3 & ξ3 = R(Γ, y : τ & ξ
                                    ; Σ; t3)
  in case t

            1 of {inl(x ) → t

                             2; inr(y) → t

                                          3 }
```

```
: τ2  τ3Σ & ξ1  ξ2  ξ3Σ
R(Γ; Σ; proji(t)) =
  let t : τ1ξ1 × τ2ξ2 & ξ = R(Γ; Σ; t)
  in proji(t) : τi & ξ  ξi Σ
R(Γ; Σ; λx : τ1.t) =
  let τ1 & β  βi :: κi = C([ ]; τ1)
      Γ = Γ, x : τ1 & β
      Σ = Σ, βi :: κi
     t : τ2 & ξ2 = R(Γ
                        ; Σ
                            ; t)
  in Λβi :: κi.λx : τ1 & β.t
        : ∀βi :: κi.τ1β → τ2ξ2 & ⊥
R(Γ; Σ; t1 t2) =
  let t

      1 : τ1 & ξ1 = R(Γ; Σ; t1)
      t

      2 : τ2 & ξ2 = R(Γ; Σ; t2)
      τ
       2β → τξ  βi = I(τ1)
      θ = [β → ξ2 ] ◦ M([ ]; τ
                             2; τ2)
  in t

      1 θβi t

              2 : θτΣ & ξ1  θξΣ
R(Γ; Σ; μx : τ.t) =
  do i; τ0 & ξ0 ← 0; ⊥τ & ⊥
     repeat ti+1 : τi+1 & ξi+1
                 ← R(Γ, x : τi & ξi; Σ; t)
              i ← i + 1
     until (τi−1 ≡ τi ∧ ξi−1 ≡ ξi)
     return (μx : τi & ξi.ti) : τi & ξi
```
Fig. 11: Type reconstruction algorithm (R)

The handling of λ-abstractions uses the type completion algorithm C of figure 12, that defers its work to the type completion relation defined earlier which can be interpreted in a functional way (see figure 10). The underlying type of the function argument is completed to a pattern type. The function body is analyzed in the presence of the newly introduced pattern variables. Note that this pattern type is also conservative, thereby preserving the invariant that the context only holds conservative types. The inferred annotated type of the lambda abstraction universally quantifies over all pattern variables and the quantification is reflected on the term level through annotation abstractions Λβ :: κ.*t*.

In order to analyze function applications, we need two more auxiliary algorithms. The first one is the instantiation procedure I (see figure 12) which instantiates all top-level quantifiers with fresh annotation variables. The second is the matching algorithm M (see figure 12) which instantiates a pattern type

 : **Ty** <sup>×</sup> **Ty** <sup>→</sup> **Ty** unit unit = unit (τ1ξ1 × τ2ξ2) (τ 1ξ <sup>1</sup> × τ 2ξ <sup>2</sup>)=(τ<sup>1</sup> τ <sup>1</sup>)ξ<sup>1</sup> ξ <sup>1</sup> × (τ<sup>2</sup> τ <sup>2</sup>)ξ<sup>2</sup> ξ 2 (τ1β → τ2ξ2) (τ1β → τ 2ξ <sup>2</sup>) = τ1β → (τ<sup>2</sup> τ <sup>2</sup>)ξ<sup>2</sup> ξ 2 (∀β :: κ.τ) (∀β :: κ.τ ) = ∀β :: κ.τ τ

<sup>C</sup> : **SortEnv** <sup>→</sup> **Ty** <sup>×</sup> **AnnTm** <sup>×</sup> **SortEnv** <sup>C</sup>(Σ; <sup>τ</sup> ) = <sup>τ</sup> & ξ <sup>β</sup><sup>i</sup> :: <sup>κ</sup><sup>i</sup> **where** <sup>Σ</sup> <sup>c</sup> <sup>τ</sup> : <sup>τ</sup> & ξ <sup>β</sup><sup>i</sup> :: <sup>κ</sup><sup>i</sup>

<sup>I</sup> : **Ty** <sup>→</sup> **Ty** <sup>×</sup> **SortEnv** <sup>I</sup>(∀<sup>β</sup> :: κ.τ) = **let** <sup>τ</sup> Σ <sup>=</sup> <sup>I</sup>(τ) **in** [<sup>β</sup> → <sup>β</sup> ](τ ) β :: κ, Σ **where** β be fresh I(τ) = τ [ ]

<sup>M</sup> : **SortEnv** <sup>×</sup> **Ty** <sup>×</sup> **Ty** <sup>→</sup> **AnnSubst** M(Σ; unit; unit) = [ ] M(Σ; τ <sup>1</sup>β βi × τ <sup>2</sup>β βi; τ1ξ1 × τ2ξ2) = [β → λβ<sup>i</sup> :: Σ(βi).ξ1, β → λβ<sup>i</sup> :: Σ(βi).ξ<sup>2</sup> ] ◦ M(Σ; τ <sup>1</sup>; τ1) ◦ M(Σ; τ <sup>2</sup>; τ2) M(Σ; τ1β → τ <sup>2</sup>β βi; τ1β → τ2ξ) = [β → λβ<sup>i</sup> :: Σ(βi).ξ ] ◦ M(Σ; τ <sup>2</sup>; τ2) M(Σ; ∀β :: κ.τ ; ∀β :: κ.τ) = M(Σ,β :: κ; τ ; τ)

Fig. 12: Least upper bound of types ( ), completion (C), instantiation (I), and matching (M). Rules for · + · in and M are like those for ·×·.

with a conservative type of the same shape. It returns a substitution obtained by performing pattern unification on corresponding annotations.

**Soundness and Completeness** An annotated type environment <sup>Γ</sup> is wellformed under an environment <sup>Σ</sup>, if <sup>Γ</sup> is conservative and for all bindings *<sup>x</sup>* : <sup>τ</sup>&<sup>ξ</sup> in <sup>Γ</sup> we have <sup>Σ</sup> wft <sup>τ</sup> and <sup>Σ</sup> <sup>s</sup> <sup>ξ</sup> : .

In order to demonstrate the correctness of the reconstruction algorithm presented in this section we have to show that for every well-typed underlying term, it produces an analysis (i.e. annotated types and dependency annotations) that can be derived in the annotated type system (see figure 6). That is to say, algorithm R is sound w.r.t. the annotated type system.

**Theorem 5.** *Let t be a source term,* <sup>Σ</sup> *a sort environment and* <sup>Γ</sup> *an annotated type environment well-formed under* <sup>Σ</sup> *such that* <sup>R</sup>(Γ; <sup>Σ</sup>; *<sup>t</sup>*) = *<sup>t</sup>* : <sup>τ</sup>& <sup>ξ</sup> *for some t ,* <sup>τ</sup> *and* <sup>ξ</sup>*.*

*Then,* <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup>*,* <sup>Σ</sup> wft <sup>τ</sup>*,* <sup>Σ</sup> <sup>s</sup> <sup>ξ</sup> : *and* <sup>τ</sup> *is conservative.*

The next step is to show that our analysis succeeds in deriving an annotated type and dependency annotation for any well-typed source term: it is *complete*. The crucial part here is the termination of the fixpoint iteration. In order to show the convergence of the fixpoint iteration, we start by defining an equivalence relation on annotated type and depencency pairs.

Our type reconstruction algorithm handles polymorphic recursion through Kleene-Mycroft-iteration. Such an algorithm is based on fixpoint iteration and needs a way to decide whether two dependency terms are equal according to the denotational semantics of λ.

A straightforward way to decide semantic equivalence is to enumerate all possible environments and compare the denotations of the two terms in all of these (possibly after some semantics preserving normalization). This only works if the dependency lattice L is finite.

For some analyses, e.g., the set of all program locations in a slicing analysis, L = V is finite but large, and deciding equality in this fashion becomes impractical. To alleviate this problem, our prototype implementation applies a partial canonicalization procedure which, while not complete, can serve as an approximation of equality: if two canonicalized dependency terms become syntactically equal, then we can be assured that they are semantically equal, but if they are not we can still apply the above procedure to the canonicalized dependency terms. We omit formal details from the paper.

We can now state our completeness results for the type reconstruction algorithm. Here, we write Γ <sup>t</sup> *t* : τ to say that term *t* has type τ under the environment Γ in the underlying type system.

**Theorem 6 (Completeness).** *Given a source term* t*, a sort environment* Σ*, an annotated type environment* <sup>Γ</sup> *well-formed under* <sup>Σ</sup>*, and an underlying type* <sup>τ</sup> *such that* Γ <sup>t</sup> *<sup>t</sup>* : <sup>τ</sup> *, then there are t ,* <sup>τ</sup> *and* <sup>ξ</sup> *such that* <sup>R</sup>(Γ; <sup>Σ</sup>; *<sup>t</sup>*) = *<sup>t</sup>* : <sup>τ</sup>&<sup>ξ</sup> *and* τ <sup>=</sup> <sup>τ</sup> *, t* <sup>=</sup> *t .*

As a corollary of the foregoing theorems, our analysis is a conservative extension of the underlying type system.

**Corollary 2 (Conservative Extension).** *Let* t *be a source term,* τ *be a type and* <sup>Γ</sup> *a type environment such that* <sup>Γ</sup> <sup>t</sup> *<sup>t</sup>* : <sup>τ</sup> *. Then there are* <sup>Σ</sup>*,* <sup>Γ</sup>*, t ,* <sup>τ</sup>*,* <sup>ξ</sup> *such that* <sup>Σ</sup> <sup>|</sup> <sup>Γ</sup> te *<sup>t</sup>* : <sup>τ</sup> & <sup>ξ</sup> *with t* <sup>=</sup> *t ,* τ <sup>=</sup> <sup>τ</sup> *and* Γ <sup>=</sup> <sup>Γ</sup>*.*

# **7 Implementation and Examples**

Beyond the definition of the annotated system and the development of the associated algorithm and meta-theory we also have a REPL prototype implementation of our analysis in Haskell. Compared to the annotated type system in the paper, the prototype provides support for booleans and integers, including literals and conditionals **if** *c* **then** *t*<sup>1</sup> **else** *t*<sup>2</sup> for which the type rules can be straightforwardly derived. Concrete lattice implementations are provided only for bindingtime analysis and security analysis, but the reconstruction algorithm abstracts away from the choice for a particular lattice, so it is easy to add new instances. The implementation is available at http://www.staff.science.uu.nl/∼hage0101/ prototype-hrp.zip. Below we walk through a few examples, taking advantage of the slightly extended source language that our implementation supports. More (detailed) examples are discussed in [26].

**Construction and Elimination** Whenever something is constructed, be it a product, a sum or a lambda abstraction, the outermost dependency annotation is ⊥. This is because the analysis aims to produce the best possible and thereby least annotations for a given source program.

Consider the case of binding-time analysis, and suppose we have a variable of function type *f* :∀β.intβ → intβ&**D**. We can see that it preserves the annotations of its arguments, i.e. if we apply f to a static value, the return annotation is also instantiated to be static. The function itself, however, is dynamic. And therefore, the whole result of the function application must also be dynamic, because we cannot know which particular function has been assigned to *f* .

As elimination always introduces a dependency in the program, and this can uncover subtleties arising when functions only differ in their termination behavior. For example, compare λ*p* : int×int.*p* with λ*p* : int×int.(proj1(*p*), proj2(*p*)). In a call-by-value language, these two functions would be (extensionally) equivalent. However, with non-strict evaluation, *p* might be a non-terminating computation. In that case, applying the former function would diverge, while the latter function at least produces the pair constructor. This is also reflected in the annotated types that are inferred. For the former, we get

$$\forall \beta\_0, \beta\_1, \beta\_2 :: \star. (\text{int}\langle \beta\_0 \rangle \times \text{int}\langle \beta\_1 \rangle) \langle \beta\_2 \rangle \rightarrow (\text{int}\langle \beta\_0 \rangle \times \text{int}\langle \beta\_1 \rangle) \langle \beta\_2 \rangle \& \mathbf{S}, \text{and}$$

$$\forall \beta\_0, \beta\_1, \beta\_2 :: \star \text{.} \langle \text{int} \langle \beta\_0 \rangle \times \text{int} \langle \beta\_1 \rangle \rangle \langle \beta\_2 \rangle \to \langle \text{int} \langle \beta\_0 \sqcup \beta\_2 \rangle \times \text{int} \langle \beta\_1 \sqcup \beta\_2 \rangle \rangle \langle \mathbf{S} \rangle \text{ \& } \mathbf{S}$$

for the latter. In particular, the annotation of the product in the second type signature is **S**. Therefore, it can not depend on the input of the function.

**Polymorphic Recursion** One class of functions where the analysis benefits from polymorphic recursion are those that permute their arguments on recursive calls. Our example is a slightly modified version of an example from [5]:

$$\text{\(\mu f:\text{bool}\to\text{bool}\to\text{bool}.\lambda x:\text{bool}.\lambda y:\text{bool}.\text{ if }x\text{ then }true\text{ else }f\text{ }y\text{ }x$$

In an analysis with monomorphic recursion, the analysis assigns the same annotation to both parameters, large enough to accommodate for both arguments. This is due to the permutation of the arguments in the else branch. An analysis with polymorphic recursion is allowed to use a different instantiation for f in that case. Our algorithm hence infers the following most general type.

$$\forall \beta\_1 :: \star \widehat{\text{bool}} \langle \beta\_1 \rangle \to (\forall \beta\_2 :: \star \widehat{\text{bool}} \langle \beta\_2 \rangle \to \widehat{\text{bool}} \langle \beta\_1 \sqcup \beta\_2 \rangle) \langle \perp \rangle \&\perp$$

We see that the result of the function indeed depends on the annotations of both arguments, as both end up in the condition of the if-expression at some point. Yet, both arguments are completely unrestricted, and unrelated in their annotations. In contrast, a type system with monomorphic recursion would only admit a weaker type, possibly similar to

$$
\forall \beta\_1 :: \star. \mathsf{bool} \langle \beta\_1 \rangle \to (\mathsf{bool} \langle \beta\_1 \rangle \to \mathsf{bool} \langle \beta\_1 \rangle) \langle \bot \rangle \&\perp.
$$

A real world example of this kind is Euclid's algorithm for computing the greatest common divisor(see [26]).

**Higher-Ranked Polyvariance** This section discusses several examples for the dependency analysis instance of binding time analysis, comparing our outcomes with a let-polyvariant analysis[29].

A simple example to start with is a function that applies a function to both components of a pair<sup>2</sup>

*both* : (int → int) → int × int → int × int *both* = λ*f* : int → int.λ*p* : int × int.(*f* (proj1(*p*)), *f* (proj2(*p*)))

Suppose in the context of binding-time analysis that *both* is used to apply a statically known function to a pair whose first component is always computable at compile time, but whose second component is dynamic. For simplicity's sake, the function is the identity on integers.

*id* : int → int *id* = λ*x* : int.*x*

A non-higher-ranked analysis would assign types to *both* and *id*. The annotation on the function argument to *both* must be large enough to accommodate both components of the pair as input. When we consider the call *both id p* for some pair *p*:int**S**×int**D**&**S**. Then, the whole call has the type int**D**×int**D**.

Our higher-ranked analysis infers the following conservative types for *id* and *both*.

$$\begin{array}{lcl} \left(id : \forall \beta :: \star \text{.int}\langle\beta\rangle \rightarrow \text{int}\langle\beta\rangle\ \&\bot\\ \left(id = A\beta :: \star \lambda x : \text{int}\ \beta\right.\right.\\ \left(both : \forall \beta\_{1} :: \star \lambda \forall\_{2} :: \star \Rightarrow \star. (\forall\beta :: \star. \text{int}\langle\beta\rangle \rightarrow \text{int}\langle\beta\_{2}\ \beta\rangle)\langle\beta\_{1}\rangle\\ \quad \rightarrow (\forall\beta\_{3}, \beta\_{4}, \beta\_{5} :: \star. (\text{int}\langle\beta\_{3}\rangle \times \text{int}\langle\beta\_{4}\rangle)\langle\beta\_{5}\rangle\\ \quad \rightarrow (\text{int}\langle\beta\_{2}\ (\beta\_{3} \sqcup \beta\_{5}) \sqcup \beta\_{1}\rangle \times \text{int}\langle\beta\_{2}\ (\beta\_{4} \sqcup \beta\_{5}) \sqcup \beta\_{1}\rangle)\langle\mathbf{S}\rangle\ \&\mathbf{S}\\ \left(both = A\beta\_{1} :: \star. A\beta\_{2} :: \star \Rightarrow \star. \text{f}\ \beta :: (\forall\beta :: \star. \text{int}\langle\beta\rangle \rightarrow \text{int}\langle\beta\_{2}\ \beta\rangle)\right).\\ \left(A\beta\_{3} :: \star. A\beta\_{4} :: \star. A\beta\_{5} :: \star. \lambda p :: \text{int}\langle\beta\_{3}\rangle \times \text{int}\langle\beta\_{4}\rangle)\\ \quad \left(f \mid \beta\_{3} \sqcup \beta\_{5}\right)\left(\text{proj}\_{1}(p)\right), f \mid \beta\_{4} \sqcup \beta\_{5}\rangle\left(\text{proj}\_{2}(p)\right))\end{array}\right)$$

In case of *both*, the function parameter f can be instantiated separately for each component because our analysis assigns it a type that universally quantifies over

<sup>2</sup> NB. both is a simplified instance of a traversal <sup>∀</sup><sup>f</sup> .Applicative f <sup>⇒</sup> (Int <sup>→</sup> f Int) <sup>→</sup> (Int,Int) → f (Int,Int), in order to fit the restrictions of the source language [6,15].

the annotation of its argument. It is evident from the type signature that the components of the resulting pair only depend on the corresponding components of the input pair, and the function and the input pair itself. They do not depend on the respective other component of the input.

If we again consider the call *both id p*, we obtain β<sup>2</sup> = λβ :: .β, β<sup>1</sup> = β<sup>3</sup> = β<sup>5</sup> = **S** and β<sup>4</sup> = **D** through pattern unification. Normalization of the resulting dependency terms results in the expected return type int**S** × int**D**.

The generality provided by the higher-ranked analysis extends to an arbitrarily deep nesting of function arrows. The following example demonstrates this for two levels of arrows. Functions with more than two levels of arrows can arise directly in actual programs, but even more so in desugared code, e.g., when type classes in Haskell are implemented via explicit dictionary passing. Due to limitations of our source language, the examples are syntactically heavily restricted.

Consider the following function that takes a function argument which again requires a function.

 $foo : ((\text{int} \rightarrow \text{int}) \rightarrow \text{int}) \rightarrow \text{int} \times \text{int}$   $foo = \lambda f : (\text{int} \rightarrow \text{int}) \rightarrow \text{int}.(f \ (\lambda x : \text{int}.x), f \ (\lambda x : \text{int}.0))$ 

The higher-ranked analysis infers the following type and target term (where we omitted the type in the argument of the lambda term because it essentially repeats what is already visible in the top level type signature).

$$\begin{array}{lcll} foo : \forall \beta\_{4} :: \star \forall \beta\_{3} :: \star \Rightarrow (\star \Rightarrow \star) \Rightarrow \star. \\ (\forall \beta\_{2} :: \star. \forall \beta\_{1} :: \star \Rightarrow. (\forall \beta\_{0} :: \star. \text{int} \langle \beta\_{0} \rangle \rightarrow \text{int} \langle \beta\_{1} \mid \beta\_{0})) \langle \beta\_{2} \rangle \\ \rightarrow \text{int} \langle \beta\_{3} \; \beta\_{2} \; \beta\_{1} \rangle \langle \beta\_{4} \rangle \\ \rightarrow \text{int} \langle \beta\_{3} \; \mathbf{S} \; \big( \lambda \beta\_{5} :: \star. \beta\_{5} \big) \sqcup \beta\_{4} \rangle \times \text{int} \langle \beta\_{3} \; \mathbf{S} \; \big( \lambda \beta\_{6} :: \star. \mathbf{S} \big) \sqcup \beta\_{4} \rangle \langle \big{) \mathbf{S} \rangle \& \ \mathbf{S} \\ \rightarrow \text{int} \langle \beta\_{4} :: \star. \Lambda \beta\_{3} :: \star \Rightarrow (\star \Rightarrow \star) \Rightarrow \star. \lambda f :: \star. \\ (f \; \langle \mathbf{S} \rangle \; \langle \lambda \beta\_{0} :: \star. \beta\_{0} \rangle \langle \Lambda \beta\_{5} :: \star. \lambda x : \text{int} \; \& \ \beta\_{5} . x \rangle \\ \rightarrow \text{, } f \; \langle \mathbf{S} \rangle \; \langle \lambda \beta\_{0} :: \star. \mathbf{S} \rangle \quad \ \langle \Lambda \beta\_{6} :: \star. \lambda x : \text{int} \; \& \ \beta\_{6} . 1 \rangle \end{array}$$

Since the type of f is a pattern type, the argument to f is also a pattern type by definition. Therefore, the analysis of f depends on the analysis of the function passed to it. This gives rise to the *higher-order effect operator* β<sup>3</sup> [12]. Thus, f can be applied to any function with a conservative type of the right shape. As our algorithm always infers conservative types, the type of f is as general as possible. This is reflected in the body of the lambda where in both cases f is instantiated with the dependency annotation corresponding to the function passed to it. The result of this instantiation can be observed in the returned product type where β<sup>3</sup> is applied to the effect operators λβ<sup>0</sup> :: .β<sup>0</sup> and λβ<sup>0</sup> :: .**S** corresponding to the respective functions used as arguments to f.

Only when we finally apply *foo*, the resulting annotations can be evaluated.

$$\begin{array}{c} bar: \forall \alpha\_{2} :: \star. \forall \alpha\_{1} :: \star \Rightarrow. (\forall \alpha\_{0} :: \star. \mathsf{int} \langle \alpha\_{0} \rangle \rightarrow \mathsf{int} \langle \alpha\_{1} \ \alpha\_{0} \rangle) \langle \alpha\_{2} \rangle\\ \rightarrow \mathsf{int} \langle \alpha\_{1} \ \mathsf{D} \sqcup \alpha\_{2} \rangle \; \& \begin{array}{c} \mathsf{S} \\ \end{array} \rangle\\ bar = \Lambda \alpha\_{2} :: \star. \Lambda \alpha\_{1} :: \star \Rightarrow \star. \lambda f : \begin{array}{c} \neg . f \end{array} \begin{array}{c} \mathsf{ann\_{D}(0)) \end{array} \end{array}$$

For *bar* we obtain *foo bar* : int**D** × int**S** & **S**. In this case, β<sup>3</sup> = λβ<sup>2</sup> :: .λβ<sup>1</sup> :: ⇒ .β<sup>1</sup> **D** β2, because *bar* applies its argument to a value with dynamic binding time. This causes the first component of the returned pair to be deemed dynamic as well. On the other hand, in the second component *bar* is applied to a constant function. Thus, regardless of the argument's dynamic binding time, the resulting binding time is static. In a rank-1 system we would get int**D** × int**D** instead of int**D** × int**S**.

# **8 Related Work**

The basis for most type systems of functional programming languages is the Hindley-Milner type system [22]. Our algorithm *R* strongly resembles the wellknown type inference algorithm for the Hindley-Milner type system, *Algorithm W* [3], a distinct advantage of our approach. The idea to define an annotated type system as a means to design static analyses for higher-order languages is attributed to [19]. The major technical difference compared to a let-polyvariant analysis is that our annotations form a simply typed lambda-calculus.

Full reconstruction for a higher-ranked polyvariant annotated type system was first considered by [12] in the context of a control-flow analysis. However, we found that the (constraint-based) algorithm as presented in [12] generates constraints free of cycles. Therefore, it cannot faithfully reflect the constraints necessary for the fixpoint combinator. The algorithm incorrectly concludes for the following example that only the first and third 'False' term flow into the condition *x* , but not the second one.

#### (*fix* (λ*f* . λ*x* . λ*y*. λ*z* . **if** *x* **then** *True* **else** *fzxy*)) *False False False*

We reproduced this mistake with their implementation and verified that the mistake was not a simple bug in that implementation.

Close to our formulation is the (unpublished) work of [16] which deals with exception analysis, which uses a simply typed lambda-calculus with sets to represent annotations. We have chosen a more modular approach in which we offload much of the complexity of dealing with lattice values to the lattice. In [16] terms from the simply typed lambda-calculus with sets are canonicalized and then checked for alpha equivalence during Kleene-Mycroft iteration. We found however that two terms can have different canonical forms even though they are actually semantically equivalent. This causes Koot's reconstruction algorithm to diverge on a particular class of programs, because the inferred annotations continue to grow. The simplest such program we found is the following.

μ*f* : (unit → unit) → unit → unit.λ*g* : unit → unit.λ*x* : unit.*g* (*f gx* )

Our solution is to apply canonicalization to simplify terms as much as possible, and then compare the outcomes for all possible inputs.

The Dependency Core Calculus was introduced by [1] as a unifying framework for dependency analyses. Instances include binding-time analysis (see, e.g., [29]), exception analysis [17,16], secure information flow analysis [9] and static slicing [27]. They devised the *Dependency Core Calculus* (DCC) to which each instance of a dependency analysis can be mapped. This allowed them to compare different dependency analyses, uncover problems with existing instance analyses and to simplify proofs of noninterference [8,20]. The instance analyses in [1] were defined as a monovariant type and effect system with subtyping, for a monomorphic call-by-name language. An implicit, let-polymorphic implementation of DCC, FlowCaml, was developed by [25]. It is not higher-ranked.

The difference between DCC and our analysis is to a large extent a different focus: the DCC is a calculus defined in a way that any calculus that elaborates to DCC has the noninterference property and any other properties proven for the calculus. On the other hand, our analysis is meant to be implemented in a compiler (with the added precision), and that implementation (and its associated meta-theory) can then be reused inside the compiler for a variety of analyses. Comparable to DCC, we have proven a noninterference property for our generic higher-rank polyvariant dependency analysis, so that all its instances inherit it.

The Haskell community supports an implementation of DCC in which the (security) annotations are lifted to the Haskell type level [2]. Since the GHC compiler supports higher-rank types, the code written with this library can in fact model security flows with higher-rank. Because of the general undecidability of full reconstruction for higher-rank types [14], the programmer must however provide explicit type information. In [18], the authors introduce dependent flow types, that allows them to express a large variety of security policies. An essential difference with our work is that our approach is fully automated.

Early on in our research, we observed that the approach of [11] may lead to similar precision gains as higher-ranked annotations do. Since they deal with a different analysis, a direct comparison is impossible to make at this time.

# **9 Conclusion and Future Work**

We have defined a higher-rank annotation polymorphic type system for a generic dependency analysis, established its soundness and provided a sound and complete reconstruction algorithm. Examples show that we can achieve higher precision than plain let-polyvariance. The analysis we have defined is for a call-byname language. We expect the results to hold as well for a lazy language, but chose call-by-name for reduced bookkeeping in the proofs. We also believe the analysis can be adapted relatively easily to one for a call-by-value language, by letting the annotation on the argument flow into the effect of the call. However, we would need to re-examine the metatheory.

In future work we want to consider whether we can further refine the canonicalization of λ terms so that syntactic equality up to alpha-equivalence can completely replace our current approach.

**Acknowledgments** We acknowledge the contributions of Ruud Koot in unpublished work that made this work possible.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **ConSORT: Context- and Flow-Sensitive Ownership Refinement Types for Imperative Programs**

John Toman1, Ren Siqi1, Kohei Suenaga<sup>1</sup> , Atsushi Igarashi<sup>1</sup> , and Naoki Kobayashi<sup>2</sup>

<sup>1</sup> Kyoto University, Kyoto, Japan, {jtoman,shiki,ksuenaga,igarashi}@fos.kuis.kyoto-u.ac.jp <sup>2</sup> The University of Tokyo, Tokyo, Japan, koba@is.s.u-tokyo.ac.jp

**Abstract.** We present ConSORT, a type system for safety verification in the presence of mutability and aliasing. Mutability requires strong updates to model changing invariants during program execution, but aliasing between pointers makes it difficult to determine which invariants must be updated in response to mutation. Our type system addresses this difficulty with a novel combination of refinement types and fractional ownership types. Fractional ownership types provide flow-sensitive and precise aliasing information for reference variables. ConSORT interprets this ownership information to soundly handle strong updates of potentially aliased references. We have proved ConSORT sound and implemented a prototype, fully automated inference tool. We evaluated our tool and found it verifies non-trivial programs including data structure implementations.

**Keywords:** refinement types, mutable references, aliasing, strong updates, fractional ownerships, program verification, type systems

# **1 Introduction**

Driven by the increasing power of automated theorem provers and recent highprofile software failures, fully automated program verification has seen a surge of interest in recent years [5, 10, 15, 29, 38, 66]. In particular, refinement types [9, 21, 24, 65], which refine base types with logical predicates, have been shown to be a practical approach for program verification that are amenable to (sometimes full) automation [47, 61, 62, 63]. Despite promising advances [26, 32, 46], the sound and precise application of refinement types (and program verification in general) in settings with mutability and aliasing (e.g., Java, Ruby, etc.) remains difficult.

One of the major challenges is how to precisely and soundly support strong updates for the invariants on memory cells. In a setting with mutability, a single invariant may not necessarily hold throughout the lifetime of a memory cell; while the program mutates the memory the invariant may change or evolve. To model these changes, a program verifier must support different, incompatible invariants which hold at different points during program execution. Further, precise program verification requires supporting different invariants on distinct pieces of memory.

```
1 mk(n) { mkref n }
3 let p = mk(3) in
4 let q = mk(5) in
5 p := *p + 1;
6 q := *q + 1;
7 assert(*p = 4);
```
**Fig. 1.** Example demonstrating the difficulty of effecting strong updates in the presence of aliasing. The function mk is bound in the program from lines 3 to 7; its body is given within the braces.

```
1 loop(a, b) {
2 let aold = *a in
3 b := *b + 1;
4 a := *a + 1;
5 assert(*a = aold + 1);
6 if -
        then
7 loop(b, mkref -
                     )
8 else
9 loop(b,a)
10 }
11 loop(mkref -
              , mkref -
                        )
```
**Fig. 2.** Example with non-trivial aliasing behavior.

One solution is to use refinement types on the static program names (i.e., variables) which point to a memory location. This approach can model evolving invariants while tracking distinct invariants for each memory cell. For example, consider the (contrived) example in Figure 1. This program is written in an MLlike language with mutable references; references are updated with := and allocated with **mkref**. Variable p can initially be given the type {ν : **int** | ν = 3} **ref**, indicating it is a reference to the integer 3. Similarly, q can be given the type {ν : **int** | ν = 5} **ref**. We can model the mutation of p's memory on line 5 by strongly updating p's type to {ν : **int** | ν = 4} **ref**.

Unfortunately, the precise application of this technique is confounded by the existence of unrestricted aliasing. In general, updating just the type of the mutated reference is insufficient: due to aliasing, other variables may point to the mutated memory and their refinements must be updated as well. However, in the presence of conditional, may aliasing, it is impossible to strongly update the refinements on all possible aliases; given the static uncertainty about whether a variable points to the mutated memory, that variable's refinement may only be weakly updated. For example, suppose we used a simple alias analysis that imprecisely (but soundly) concluded all references allocated at the same program point might alias. Variables p and q share the allocation site on line 1, so on line 5 we would have to weakly update q's type to {ν : **int** | ν = 4 ∨ ν = 5}, indicating it may hold either 4 or 5. Under this same imprecise aliasing assumption, we would also have to weakly update p's type on line 6, preventing the verification of the example program.

Given the precision loss associated with weak updates, it is critical that verification techniques built upon refinement types use precise aliasing information and avoid spuriously applied weak updates. Although it is relatively simple to conclude that p and q do not alias in Figure 1, consider the example in Figure 2. (In this example, represents non-deterministic values.) Verifying this program requires proving a and b never alias at the writes on lines 3 and 4. In fact, a and b may point to the same memory location, but only in different invocations of loop; this pattern may confound even sophisticated symbolic alias analyses. Additionally, a and b share an allocation site on line 7, so an approach based on the simple alias analysis described above will also fail on this example. This mustnot alias proof obligation can be discharged with existing techniques [53, 54], but requires an expensive, on-demand, interprocedural, flow-sensitive alias analysis.

This paper presents ConSORT (CONtext Sensitive Ownership Refinement Types), a type system for the automated verification of program safety in imperative languages with mutability and aliasing. ConSORT is built upon the novel combination of refinement types and fractional ownership types [55, 56]. Fractional ownership types extend pointer types with a rational number in the range [0, 1] called an ownership. These ownerships encapsulate the permission of the reference; only references with ownership 1 may be used for mutation. Fractional ownership types also obey the following key invariant: any references with a mutable alias must have ownership 0. Thus, any reference with non-zero ownership cannot be an alias of a reference with ownership 1. In other words, ownerships encode precise aliasing information in the form of must-not aliasing relationships.

To understand the benefit of this approach, let us return to Figure 1. As mk returns a freshly allocated reference with no aliases, its type indicates it returns a reference with ownership 1. Thus, our type system can initially give p and q types {<sup>ν</sup> : **int** <sup>|</sup> <sup>ν</sup> = 3} **ref** <sup>1</sup> and {<sup>ν</sup> : **int** <sup>|</sup> <sup>ν</sup> = 5} **ref** <sup>1</sup> respectively. The ownership 1 on the reference type constructor **ref** indicates both pointers hold "exclusive" ownership of the pointed to reference cell; from the invariant of fractional ownership types p and q must not alias. The types of both references can be strongly updated without requiring spurious weak updates. As a result, at the assertion statement on line 7, <sup>p</sup> has type {<sup>ν</sup> : **int** <sup>|</sup> <sup>ν</sup> = 4} **ref** <sup>1</sup> expressing the required invariant.

Our type system can also verify the example in Figure 2 without expensive side analyses. As a and b are both mutated, they must both have ownership 1; i.e., they cannot alias. This pre-condition is satisfied by all invocations of loop; on line 7, b has ownership 1 (from the argument type), and the newly allocated reference must also have ownership 1. Similarly, both arguments on line 9 have ownership 1 (from the assumed ownership on the argument types).

Ownerships behave linearly; they cannot be duplicated, only split when aliases are created. This linear behavior preserves the critical ownership invariant. For example, if we replace line 9 in Figure 2 with loop(b,b), the program becomes ill-typed; there is no way to divide b's ownership of 1 to into two ownerships of 1.

Ownerships also obviate updating refinement information of aliases at mutation. ConSORT ensures that only the trivial refinement is used in reference types with ownership 0, i.e., mutably-aliased references. When memory is mutated through a reference with ownership 1, ConSORT simply updates the refinement of the mutated reference variable. From the soundness of ownership types, all aliases have ownership 0 and must therefore only contain the refinement. Thus, the types of all aliases already soundly describe all possible contents.<sup>3</sup>

ConSORT is also context-sensitive, and can use different summaries of function behavior at different points in the program. For example, consider the variant

<sup>3</sup> This assumption holds only if updates do not change simple types, a condition our type-system enforces.

```
1 get(p) { *p }
3 let p = mkref 3 in
4 let q = mkref 5 in
5 p := get(p) + 1;
6 q := get(q) + 1;
7 assert(*p = 4);
8 assert(*q = 6);
```
**Fig. 3.** Example of context-sensitivity

of Figure 1 shown in Figure 3. The function get returns the contents of its argument, and is called on lines 5 and 6. To precisely verify this program, on line 5 get must be typed as a function that takes a reference to 3 and returns 3. Similarly, on line 6 get must be typed as a function that takes a reference to 5 and returns 5. Our type system can give get a function type that distinguishes between these two calling contexts and selects the appropriate summary of get's behavior.

We have formalized ConSORT as a type system for a small imperative calculus and proved the system is sound: i.e., a well-typed program never encounters assertion failures during execution. We have implemented a prototype type inference tool targeting this impera-

tive language and found it can automatically verify several non-trivial programs, including sorted lists and an array list data structure.

The rest of this paper is organized as follows. Section 2 defines the imperative language targeted by ConSORT and its semantics. Section 3 defines our type system and states our soundness theorem. Section 4 sketches our implementation's inference algorithm and its current limitations. Section 5 describes an evaluation of our prototype, Section 6 outlines related work, and Section 7 concludes.

# **2 Target Language**

This section describes a simple imperative language with mutable references and first-order, recursive functions.

#### **2.1 Syntax**

We assume a set of variables, ranged over by x, y, z, . . . , a set of function names, ranged over by f, and a set of labels, ranged over by 1, 2,... . The grammar of the language is as follows.

$$\begin{array}{l} d ::= f \mapsto (x\_1, \ldots, x\_n)e \\ e ::= x \mid \textbf{let}\, x = y \,\textbf{in}\, e \mid \textbf{let}\, x = n \,\textbf{in}\, e \mid \textbf{ifz}\, x \,\textbf{then}\, e\_1 \,\textbf{else}\, e\_2 \\ \mid & \textbf{let}\, x = \textbf{mkre}\, y \,\textbf{in}\, e \mid \textbf{let}\, x = \*y \,\textbf{in}\, e \mid \textbf{let}\, x = f^\ell(y\_1, \ldots, y\_n) \,\textbf{in}\, e \\ \mid & x ::= y \,\textbf{;} e \mid \textbf{allis}\,(x = y) \,\textbf{;} e \mid \textbf{allis}\,(x = \*y) \,\textbf{;} e \mid \textbf{assert}(\varphi) \,; e \mid e\_1 \,\textbf{;} e\_2 \\ \mid & P ::= \langle \{d\_1, \ldots, d\_n\}, e \rangle \end{array}$$

ϕ stands for a formula in propositional first-order logic over variables, integers and contexts; we discuss these formulas later in Section 3.1.

Variables are introduced by function parameters or let bindings. Like ML, the variable bindings introduced by let expressions and parameters are immutable. Mutable variable declarations such as int x = 1; in C are achieved in our language with:

```
let y = 1 in(let x = mkref y in ...) .
```
As a convenience, we assume all variable names introduced with let bindings and function parameters are distinct.

Unlike ML (and like C or Java) we do not allow general expressions on the right hand side of let bindings. The simplest right hand forms are a variable y or an integer literal n. **mkref** y creates a reference cell with value y, and ∗y accesses the contents of reference y. For simplicity, we do not include an explicit null value; an extension to support null is discussed in Section 4. Function calls must occur on the right hand side of a variable binding and take the form f -(x1,..., xn), where x1,..., x<sup>n</sup> are distinct variables and is a (unique) label. These labels are used to make our type system context-sensitive as discussed in Section 3.3.

The single base case for expressions is a single variable. If the variable expression is executed in a tail position of a function, then the value of that variable is the return value of the function, otherwise the value is ignored.

The only intraprocedural control-flow operations in our language are if statements. **ifz** checks whether the condition variable x equals zero and chooses the corresponding branch. Loops can be implemented with recursive functions and we do not include them explicitly in our formalism.

Our grammar requires that side-effecting, result-free statements, **assert**(ϕ) **alias**(x = y), **alias**(x = ∗y) and assignment x := y are followed by a continuation expression. We impose this requirement for technical reasons to ease our formal presentation; this requirement does not reduce expressiveness as dummy continuations can be inserted as needed. The **assert**(ϕ); e form executes e if the predicate ϕ holds in the current state and aborts the program otherwise. **alias**(x = y); e and **alias**(x = ∗y); e assert a must-aliasing relationship between x and y (resp. x and ∗y) and then execute e. **alias** statements are effectively annotations that our type system exploits to gain added precision. x : = y ; e updates the contents of the memory cell pointed to by x with the value of y. In addition to the above continuations, our language supports general sequencing with e<sup>1</sup> ; e2.

A program is a pair D, e, where D = {d1, ... , d<sup>n</sup>} is a set of first-order, mutually recursive function definitions, and e is the program entry point. A function definition d maps the function name to a tuple of argument names x1, ... , x<sup>n</sup> that are bound within the function body e.

Paper Syntax. In the remainder of the paper, we will write programs that are technically illegal according to our grammar, but can be easily "de-sugared" into an equivalent, valid program. For example, we will write

**let** x = **mkref** 4 **in assert**(\*x = 4)

as syntactic sugar for:

**let** f=4 **in let** x = **mkref** f **in let** tmp = \*x **in assert**(tmp = 4); **let** dummy = 0 **in** dummy

#### **2.2 Operational Semantics**

We now introduce the operational semantics for our language. We assume a finite domain of heap addresses **Addr**: we denote an arbitrary address with a.


**Fig. 4.** Transition Rules (1).

A runtime state is represented by a configuration - H , R, F , e , which consists of a heap, register file, stack, and currently reducing expression respectively. The register file maps variables to runtime values v, which are either integers n or addresses a. The heap maps a finite subset of addresses to runtime values. The runtime stack represents pending function calls as a sequence of return contexts, which we describe below. While the final configuration component is an expression, the rewriting rules are defined in terms of E[e], which is an evaluation context E and redex e, as is standard. The grammar for evaluation contexts is defined by: E ::= E ; e | [].

Our operational semantics is given in Figures 4 and 5. We write dom(H ) to indicate the domain of a function and H {a → v} where a ∈ dom(H ) to denote a map which takes all values in dom(H) to their values in H and which additionally takes a to v. We will write H {a ← v} where a ∈ dom(H ) to denote a map equivalent to H except that a takes value v. We use similar notation for dom(R) and R{x → v}. We also write ∅ for the empty register file and heap. The step relation −→<sup>D</sup> is parameterized by a set of function definitions D; a program D, e is executed by stepping the initial configuration ∅, ∅, ·, e according to −→<sup>D</sup> . The semantics is mostly standard; we highlight some important points below.

Return contexts F take the form E[**let** y = [] **in** e]. A return context represents a pending function call with label , and indicates that y should be bound to the return value of the callee during the execution of e within the larger execution context E. The call stack F is a sequence of these contexts, with the first such return context representing the most recent function call. The stack grows at function calls as described by rule R-Call. For a call E[**let** x = f -(y1,..., yn)**in** e] where f is defined as (x1, ... , xn)e , the return context E[**let** y = []**in** e] is

<sup>f</sup> → (x1, .. , xn )<sup>e</sup> <sup>∈</sup> <sup>D</sup> - H , R, F , E[**let** x = f -(y1,..., yn ) **in** e- ] −→<sup>D</sup> - H , R, E[**let** x = [] **in** e- ] : F , [y1/x1] ··· [yn /xn ]<sup>e</sup> (R-Call) <sup>R</sup>(x) = a a <sup>∈</sup> dom(<sup>H</sup> ) - H , R, F , E[x : = y ; e] −→<sup>D</sup> - <sup>H</sup> {<sup>a</sup> <sup>←</sup> <sup>R</sup>(y)}, <sup>R</sup>, F , <sup>E</sup>[e] (R-Assign) R(x) = R(y) - H , R, F , E[**alias**(x = y) ; e] −→<sup>D</sup> - H , R, F , E[e] (R-Alias) R(y) = a H (a) = R(x) - <sup>H</sup> , <sup>R</sup>, F , <sup>E</sup>[**alias**(<sup>x</sup> <sup>=</sup> <sup>∗</sup>y) ; <sup>e</sup>] −→<sup>D</sup> - H , R, F , E[e] (R-AliasPtr) <sup>R</sup>(x) <sup>=</sup> <sup>R</sup>(y) - H , R, F , E[**alias**(x = y) ; e] −→<sup>D</sup> **AliasFail** (R-AliasFail) <sup>R</sup>(x) <sup>=</sup> <sup>H</sup> (R(y)) - <sup>H</sup> , <sup>R</sup>, F , <sup>E</sup>[**alias**(<sup>x</sup> <sup>=</sup> <sup>∗</sup>y) ; <sup>e</sup>] −→<sup>D</sup> **AliasFail** (R-AliasPtrFail) <sup>|</sup>= [R] <sup>ϕ</sup> - H , R, F , E[**assert**(ϕ) ; e] −→<sup>D</sup> - H , R, F , E[e] (R-Assert) |= [R] <sup>ϕ</sup> - H , R, F , E[**assert**(ϕ) ; e] −→<sup>D</sup> **AssertFail** (R-AssertFail)

**Fig. 5.** Transition Rules (2).

prepended onto the stack of the input configuration. The substitution of formal arguments for parameters in e , denoted by [y1/x1] ··· [yn/xn]e , becomes the currently reducing expression in the output configuration. Function returns are handled by R-Var. Our semantics return values by name; when the currently executing function fully reduces to a single variable x, x is substituted into the return context on the top of the stack, denoted by E[**let** y = []**in** e][x].

In the rules R-Assert we write <sup>|</sup>= [R] <sup>ϕ</sup> to mean that the formula yielded by substituting the concrete values in R for the variables in ϕ is valid within some chosen logic (see Section 3.1); in R-AssertFail we write <sup>|</sup>= [R] <sup>ϕ</sup> when the formula is not valid. The substitution operation [R] ϕ is defined inductively as [∅] ϕ = ϕ, [R{x → n}] ϕ = [R] [n/x ]ϕ, [R{x → a}] ϕ = [R] ϕ. In the case of an assertion failure, the semantics steps to a distinguished configuration **AssertFail**. The goal of our type system is to show that no execution of a well-typed program may reach this configuration. The **alias** form checks whether the two references actually alias; i.e., if the must-alias assertion provided by the programmer is correct. If not, our semantics steps to the distinguished **AliasFail** configuration. Our type system does not guarantee that **AliasFail** is unreachable; aliasing assertions are effectively trusted annotations that are assumed to hold.

In order to avoid duplicate variable names in our register file due to recursive functions, we refresh the bound variable x in a let expression to x . Take expression **let** x = y **in** e as an example; we substitute a fresh variable x for x in e, then bind x to the value of variable y. We assume this refreshing of variables preserves our assumption that all variable bindings introduced with let and function parameters are unique, i.e. x does not overlap with variable names that occur in the program.

Types <sup>τ</sup> ::= {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>} | <sup>τ</sup> **ref** <sup>r</sup> Ownership r ∈ [0, 1] Refinements ϕ ::= ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> | ¬ϕ | | φ(v-1, .. , vn ) | v-1 = v-2 | CP Ref. Values <sup>v</sup>- ::= x | n | ν Function Types σ ::= ∀λ.x<sup>1</sup> : τ1,..., x<sup>n</sup> : τ<sup>n</sup> → x<sup>1</sup> : τ - <sup>1</sup>,..., x<sup>n</sup> : τ - <sup>n</sup> | τ Context Variables λ ∈ **CVar** Concrete Context ::= : | Pred. Context C ::= : C | λ | Context Query CP ::= C Typing Context <sup>L</sup> ::= <sup>λ</sup> <sup>|</sup> 

**Fig. 6.** Syntax of types, refinements, and contexts.

# **3 Typing**

We now introduce a fractional ownership refinement type system that guarantees well-typed programs do not encounter assertion failures.

#### **3.1 Types and Contexts**

The syntax of types is given in Figure 6. Our type system has two type constructors: references and integers. τ **ref** <sup>r</sup> is the type of a (non-null) reference to a value of type τ . r is an ownership which is a rational number in the range [0, 1]. An ownership of 0 indicates a reference that cannot be written, and for which there may exist a mutable alias. By contrast, 1 indicates a pointer with exclusive ownership that can be read and written. Reference types with ownership values between these two extremes indicate a pointer that is readable but not writable, and for which no mutable aliases exist. ConSORT ensures that these invariants hold while aliases are created and destroyed during execution.

Integers are refined with a predicate ϕ. The language of predicates is built using the standard logical connectives of first-order logic, with (in)equality between variables and integers, and atomic predicate symbols φ as the basic atoms. We include a special "value" variable ν representing the value being refined by the predicate. For simplicity, we omit the connectives ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> and ϕ<sup>1</sup> =⇒ ϕ2; they can be written as derived forms using the given connectives. We do not fix a particular theory from which φ are drawn, provided a sound (but not necessarily complete) decision procedure exists. CP are context predicates, which are used for context sensitivity as explained below.

Example 1. {ν : **int** | ν > 0} is the type of strictly positive integers. The type of immutable references to integers exactly equal to 3 can be expressed by {<sup>ν</sup> : **int** <sup>|</sup> <sup>ν</sup> = 3} **ref** <sup>0</sup>.<sup>5</sup>.

As is standard, we denote a type environment with Γ, which is a finite map from variable names to type τ . We write Γ[x : τ ] to denote a type environment Γ such that Γ(x ) = τ where x ∈ dom(Γ), Γ, x : τ to indicate the extension of Γ with the type binding x : τ , and Γ[x ← τ ] to indicate the type environment Γ with the binding of x updated to τ . We write the empty environment as •. The treatment of type environments as mappings instead of sequences in a dependent type system is somewhat non-standard. The standard formulation based on ordered sequences of bindings and its corresponding well-formedness condition did not easily admit variables with mutually dependent refinements as introduced by our function types (see below). We therefore use an unordered environment and relax well-formedness to ignore variable binding order.

Function Types, Contexts, and Context Polymorphism. Our type system achieves context sensitivity by allowing function types to depend on where a function is called, i.e., the execution context of the function invocation. Our system represents a concrete execution contexts with strings of call site labels (or just "call strings"), defined by ::= <sup>|</sup> : . As is standard (e.g., [49, 50]), the string : abstracts an execution context where the most recent, active function call occurred at call site which itself was executed in a context abstracted by ; is the context under which program execution begins. Context variables, drawn from a finite domain **CVar** and ranged over by λ1, λ2,..., represent arbitrary, unknown contexts.

A function type takes the form ∀λ.x<sup>1</sup> : τ1,..., x<sup>n</sup> : τ<sup>n</sup>→x<sup>1</sup> : τ <sup>1</sup>,..., x<sup>n</sup> : τ <sup>n</sup> | τ . The arguments of a function are an n-ary tuple of types τi. To model side-effects on arguments, the function type includes the same number of output types τ <sup>i</sup> . In addition, function types have a direct return type τ . The argument and output types are given names: refinements within the function type may refer to these names. Function types in our language are context polymorphic, expressed by universal quantification "∀λ." over a context variable. Intuitively, this context variable represents the many different execution contexts under which a function may be called.

Argument and return types may depend on this context variable by including context query predicates in their refinements. A context query predicate CP usually takes the form <sup>λ</sup>, and is true iff is a prefix of the concrete context represented by <sup>λ</sup>. Intuitively, a refinement <sup>λ</sup> <sup>=</sup><sup>⇒</sup> <sup>ϕ</sup> states that <sup>ϕ</sup> holds in any concrete execution context with prefix , and provides no information in any other context. In full generality, a context query predicate may be of the form <sup>1</sup> <sup>2</sup> or <sup>1</sup> ...<sup>n</sup> : <sup>λ</sup>; these forms may be immediately simplified to , <sup>⊥</sup> or <sup>λ</sup>.

Example 2. The type {ν : **int** | (<sup>1</sup> λ =⇒ ν = 3) ∧ (<sup>2</sup> λ =⇒ ν = 5)} represents an integer that is 3 if the most recent active function call site is 1, 5 if the most recent call site is 2, and is otherwise unconstrained. This type may be used for the argument of f in, e.g., f-<sup>1</sup> (3) + f-<sup>2</sup> (5).

As types in our type system may contain context variables, our typing judgment (introduced below) includes a typing context L, which is either a single context variable λ or a concrete context . This typing context represents the assumptions about the execution context of the term being typed. If the typing context is a context variable λ, then no assumptions are made about the execution context of the term, although types may depend upon λ with context query predicates. Accordingly, function bodies are typed under the context variable universally quantified over in the corresponding function type; i.e., no assumptions are made about the exact execution context of the function body. As in parametric polymorphism, consistent substitution of a concrete context for a context variable λ in a typing derivation yields a valid type derivation under concrete context .

Remark 1. The context-sensitivity scheme described here corresponds to the standard CFA approach [50] without a priori call-string limiting. We chose this scheme because it can be easily encoded with equality over integer variables (see Section 4), but in principle another context-sensitivity strategy could be used instead. The important feature of our type system is the inclusion of predicates over contexts, not the specific choice for these predicates.

Function type environments are denoted with Θ and are finite maps from function names (f ) to function types (σ).

Well Formedness. We impose two well-formedness conditions on types: ownership well-formedness and refinement well-formedness. The ownership condition is purely syntactic: <sup>τ</sup> is ownership well-formed if <sup>τ</sup> <sup>=</sup> <sup>τ</sup> **ref** <sup>0</sup> implies <sup>τ</sup> <sup>=</sup> <sup>n</sup> for some n. <sup>i</sup> is the "maximal" type of a chain of i references, and is defined inductively as <sup>0</sup> <sup>=</sup> {<sup>ν</sup> : **int** | } , <sup>i</sup> <sup>=</sup> <sup>i</sup>−<sup>1</sup> **ref** <sup>0</sup>.

The ownership well-formedness condition ensures that aliases introduced via heap writes do not violate the invariant of ownership types and that refinements are consistent with updates performed through mutable aliases. Recall our ownership type invariant ensures all aliases of a mutable reference have 0 ownership. Any mutations through that mutable alias will therefore be consistent with the "no information" refinement required by this well-formedness condition.

Refinement well-formedness, denoted L | Γ WF ϕ, ensures that free program variables in refinement ϕ are bound in a type environment Γ and have integer type. It also requires that for a typing context L = λ, only context query predicates over <sup>λ</sup> are used (no such predicates may be used if <sup>L</sup> <sup>=</sup> ). Notice this condition forbids refinements that refer to references. Although ownership information can signal when refinements on a mutably-aliased reference must be discarded, our current formulation provides no such information for refinements that mention mutably-aliased references. We therefore conservatively reject such refinements at the cost of some expressiveness in our type system.

We write L | Γ WF τ to indicate a well-formed type where all refinements are well-formed with respect to L and Γ. We write L WF Γ for a type environment where all types are well-formed. A function environment is well-formed (written WF Θ) if, for every σ in Θ, the argument, result, and output types are wellformed with respect to each other and the context variable quantified over in σ. As the formal definition of refinement well-formedness is fairly standard, we omit it for space reasons (the full definition may be found in the full version [60]).

#### **3.2 Intraprocedural Type System**

We now introduce the type system for the intraprocedural fragment of our language. Accordingly, this section focuses on the interplay of mutability and

$$\begin{array}{|c|c|c|c|c|}
\hline
\Theta & \mathcal{L} & \Gamma[x : \tau\_1 + \tau\_2] \vdash x : \tau\_1 \Rightarrow \Gamma[x \stackrel{\sim}{\longrightarrow} \tau\_2] \\
\hline
\end{array}$$

<sup>Θ</sup> |L| <sup>Γ</sup>[<sup>y</sup> <sup>←</sup> τ<sup>1</sup> <sup>∧</sup><sup>y</sup> <sup>y</sup> <sup>=</sup>τ<sup>1</sup> <sup>x</sup>], <sup>x</sup> : (τ<sup>2</sup> <sup>∧</sup><sup>x</sup> <sup>x</sup> <sup>=</sup>τ<sup>2</sup> <sup>y</sup>) <sup>e</sup> : <sup>τ</sup> <sup>⇒</sup> <sup>Γ</sup> <sup>x</sup> ∈ dom(Γ- ) <sup>Θ</sup> |L| <sup>Γ</sup>[<sup>y</sup> : <sup>τ</sup><sup>1</sup> <sup>+</sup> <sup>τ</sup>2] **let** <sup>x</sup> <sup>=</sup> <sup>y</sup> **in** <sup>e</sup> : <sup>τ</sup> <sup>⇒</sup> <sup>Γ</sup>-(T-Let)

$$\frac{\Theta \mid \mathcal{L} \mid \varGamma, x:\{\nu:\mathsf{int} \mid \nu = n\} \vdash e:\tau \Rightarrow \varGamma' \qquad x \notin \hom(\varGamma')}{\Theta \mid \mathcal{L} \mid \varGamma \vdash \mathsf{let}\, x = n \,\mathsf{in}\, e:\tau \Rightarrow \varGamma'} \tag{\mathsf{T}\text{-}\text{L\text{str}\,\text{Intr}}}$$

$$\begin{array}{c} \Theta \mid \mathcal{L} \mid \Gamma[\boldsymbol{x} \Longleftarrow \{\boldsymbol{\nu} : \mathsf{int} \mid \boldsymbol{\varphi} \land \boldsymbol{\nu} = 0\}] \vdash e\_1 : \boldsymbol{\tau} \Rightarrow \boldsymbol{\Gamma}' \\ \Theta \mid \mathcal{L} \mid \Gamma[\boldsymbol{x} \Longleftarrow \{\boldsymbol{\nu} : \mathsf{int} \mid \boldsymbol{\varphi} \land \boldsymbol{\nu} \neq 0\}] \vdash e\_2 : \boldsymbol{\tau} \Rightarrow \boldsymbol{\Gamma}' \\ \hline \Theta \mid \mathcal{L} \mid \Gamma[\boldsymbol{x} : \{\boldsymbol{\nu} : \mathsf{int} \mid \boldsymbol{\varphi}\}] \vdash \mathsf{ifz} \, x \, \mathsf{then} \, e\_1 \, \mathsf{else} \, e\_2 : \boldsymbol{\tau} \Rightarrow \boldsymbol{\Gamma}' \end{array} \tag{\mathsf{T-Ir}}$$

<sup>Θ</sup> |L| <sup>Γ</sup>[<sup>y</sup> <sup>←</sup> τ1], <sup>x</sup> : (τ<sup>2</sup> <sup>∧</sup><sup>x</sup> <sup>x</sup> <sup>=</sup>τ<sup>2</sup> <sup>y</sup>) **ref** <sup>1</sup> <sup>e</sup> : <sup>τ</sup> <sup>⇒</sup> <sup>Γ</sup>- <sup>x</sup> ∈ dom(Γ- ) <sup>Θ</sup> |L| <sup>Γ</sup>[<sup>y</sup> : <sup>τ</sup><sup>1</sup> <sup>+</sup> <sup>τ</sup>2] **let** <sup>x</sup> <sup>=</sup> **mkref** <sup>y</sup> **in** <sup>e</sup> : <sup>τ</sup> <sup>⇒</sup> <sup>Γ</sup>- (T-MkRef) <sup>Θ</sup> |L| <sup>Γ</sup> <sup>e</sup><sup>1</sup> : <sup>τ</sup>- ⇒ Γ- Θ |L| Γ- <sup>e</sup><sup>2</sup> : <sup>τ</sup>-- ⇒ Γ-- <sup>Θ</sup> |L| <sup>Γ</sup> <sup>e</sup><sup>1</sup> ; <sup>e</sup><sup>2</sup> : <sup>τ</sup>-- ⇒ Γ-- (T-Seq)

$$\begin{array}{c} \tau' = \begin{cases} \tau\_{1} \wedge\_{y} \, y = \tau\_{1} \, & x \, \, r > 0\\ \tau\_{1} & \, r = 0\\ \Theta \mid \mathcal{L} \mid \Gamma[y \xleftarrow{\tau'} \, \mathtt{ref}^{r}], x : \tau\_{2} \vdash e : \tau \Rightarrow \Gamma'\\ x \notin \mathit{dom}(\Gamma')\\ \end{array} & \begin{array}{c} \Gamma \mid \Rightarrow \varphi \quad \epsilon \mid \Gamma \vdash\_{WF} \varphi\\ \Theta \mid \mathcal{L} \mid \Gamma \vdash\_{e} : \tau \Rightarrow \Gamma'\\ \end{array} \\ \begin{array}{c} \Theta \mid \mathcal{L} \mid \Gamma \vdash \mathtt{assert}(\varphi); e : \tau \Rightarrow \Gamma'\\ \end{array} & \begin{array}{c} \tau \mid \mathcal{L} \mid \Gamma \vdash\_{\mathtt{NF}} \varphi\\ \end{array} \\ \end{array}$$

**Fig. 7.** Expression typing rules.

refinement types. The typing rules are given in Figures 7 and 8. A typing judgment takes the form Θ |L| Γ e : τ ⇒ Γ , which indicates that e is well-typed under a function type environment Θ, typing context L, and type environment Γ, and evaluates to a value of type τ and modifies the input environment according to Γ . Any valid typing derivation must have L WF Γ, L WF Γ , and L | Γ WF τ , i.e., the input and output type environments and result type must be well-formed.

The typing rules in Figure 7 handle the relatively standard features in our language. The rule T-Seq for sequential composition is fairly straightforward except that the output type environment for e<sup>1</sup> is the input type environment for e2. T-LetInt is also straightforward; since x is bound to a constant, it is given type {ν : **int** | ν = n} to indicate x is exactly n. The output type environment Γ cannot mention x (expressed with x ∈ dom(Γ )) to prevent x from escaping its scope. This requirement can be met by applying the subtyping rule (see below) to weaken refinements to no longer mention x . As in other refinement type systems [47], this requirement is critical for ensuring soundness.

Rule T-Let is crucial to understanding our ownership type system. The body of the let expression e is typechecked under a type environment where the type of y in Γ is linearly split into two types: τ<sup>1</sup> for y and τ<sup>2</sup> for the newly created binding x . This splitting is expressed using the + operator. If y is a reference type, the split operation distributes some portion of y's ownership information to its new alias x . The split operation also distributes refinement information between the two types. For example, type {<sup>ν</sup> : **int** <sup>|</sup> ν > <sup>0</sup>} **ref** <sup>1</sup> can be split into (1) {<sup>ν</sup> : **int** <sup>|</sup> ν > <sup>0</sup>} **ref** <sup>r</sup> and {<sup>ν</sup> : **int** <sup>|</sup> ν > <sup>0</sup>} **ref** (1−r) (for <sup>r</sup> <sup>∈</sup> (0, 1)),

i.e., two immutable references with non-trivial refinement information, or (2) {<sup>ν</sup> : **int** <sup>|</sup> ν > <sup>0</sup>} **ref** <sup>1</sup> and {<sup>ν</sup> : **int** | } **ref** <sup>0</sup>, where one of the aliases is mutable and the other provides no refinement information. How a type is split depends on the usage of x and y in e. Formally, we define the type addition operator as the least commutative partial operation that satisfies the following rules:

$$\begin{aligned} \{\nu : \mathbf{int} \mid \varphi\_1\} + \{\nu : \mathbf{int} \mid \varphi\_2\} &= \{\nu : \mathbf{int} \mid \varphi\_1 \land \varphi\_2\} & \text{(TAD-INT)}\\ \tau\_1 \, \mathbf{ref}^{r\_1} + \tau\_2 \, \mathbf{ref}^{r\_2} &= (\tau\_1 + \tau\_2) \, \mathbf{ref}^{r\_1 + r\_2} & \text{(TADn-REF)} \end{aligned}$$

Viewed another way, type addition describes how to combine two types for the same value such that the combination soundly incorporates all information from the two original types. Critically, the type addition operation cannot create or destroy ownership and refinement information, only combine or divide it between types. Although not explicit in the rules, by ownership well-formedness, if the entirety of a reference's ownership is transferred to another type during a split, all refinements in the remaining type must be .

The additional bits ∧yy =τ<sup>1</sup> x and ∧xx =τ<sup>2</sup> y express equality between x and y as refinements. We use the strengthening operation τ ∧<sup>x</sup> ϕ and typed equality proposition x =<sup>τ</sup> y, defined respectively as:

$$\begin{aligned} \{\nu: \mathsf{int} \mid \varphi\} \wedge\_y \varphi' &= \{\nu: \mathsf{int} \mid \varphi \wedge \left[\nu/y\right] \varphi'\} & & (x =\_{\{\nu: \mathsf{int} \mid \varphi\}} y) = (x = y) \\ \tau \,\mathsf{ref}^r \wedge\_y \varphi' &= \tau \,\mathsf{ref}^r & & (x =\_{\tau \,\mathsf{ref}^r} y) = \top \end{aligned}$$

We do not track equality between references or between the contents of aliased reference cells as doing so would violate our refinement well-formedness condition. These operations are also used in other rules that can introduce equality.

Rule T-MkRef is very similar to T-Let, except that x is given a reference type of ownership 1 pointing to τ2, which is obtained by splitting the type of y. In T-Deref, the content type of y is split and distributed to x . The strengthening is conditionally applied depending on the ownership of the dereferenced pointer, that is, if r = 0, τ has to be a maximal type <sup>i</sup>.

Our type system also tracks path information; in the T-If rule, we update the refinement on the condition variable within the respective branches to indicate whether the variable must be zero. By requiring both branches to produce the same output type environment, we guarantee that these conflicting refinements are rectified within the type derivations of the two branches.

The type rule for assert statements has the precondition Γ |= ϕ which is defined to be <sup>|</sup><sup>=</sup> -<sup>Γ</sup> <sup>=</sup><sup>⇒</sup> <sup>ϕ</sup>, i.e., the logical formula -<sup>Γ</sup> <sup>=</sup><sup>⇒</sup> <sup>ϕ</sup> is valid in the chosen theory. -<sup>Γ</sup> lifts the refinements on the integer valued variables into a proposition in the logic used for verification. This denotation operation is defined as:

$$\begin{array}{ll} \left[\begin{smallmatrix} \bullet\\ \Gamma, x:\tau \end{smallmatrix} \right] = \top & \left[\begin{smallmatrix} \left\{\nu:\mathsf{int}\right\}\left\vert\varphi\right\} \right]\_y = \left[y/\,\nu\right]\varphi\\ \left[\begin{smallmatrix} \tau\\ \end{smallmatrix}\right] = \left[\begin{smallmatrix} \tau\\ \end{smallmatrix}\right] \land \left[\begin{smallmatrix} \tau\\ \end{smallmatrix}\right]\_x & \left[\begin{smallmatrix} \tau'\textbf{ }\textbf{ref}^r\end{smallmatrix}\right]\_y = \top \end{array} \end{array}$$

If the formula -<sup>Γ</sup> <sup>=</sup><sup>⇒</sup> <sup>ϕ</sup> is valid, then in any context and under any valuation of program variables that satisfy the refinements in -<sup>Γ</sup>, the predicate <sup>ϕ</sup> must be true and the assertion must not fail. This intuition forms the foundation of our soundness claim (Section 3.4).

696 J. Toman et al.

$$\frac{\Theta \mid \mathcal{L} \mid I \left[ x \xleftarrow{} \tau\_1 \right] \left[ y \iff \left( \tau\_2 \land\_y y =\_{\tau\_2} x \right) \mathsf{ref}^1 \right] \vdash e : \tau \Rightarrow I'}{\Theta \mid \mathcal{L} \mid I \left[ x \mathrel{\mathop{:}} \tau\_1 + \tau\_2 \right] \left[ y \mathrel{\mathop{:}} \tau' \text{ \textbf{ref}^1} \right] \vdash y : = x ; e : \tau \Rightarrow I'} \tag{\text{\textbf{T-AssCNN}}}$$

$$\begin{array}{c} \left(\tau\_{1}\operatorname{ref}^{r\_{1}} + \tau\_{2}\operatorname{ref}^{r\_{2}}\right) \approx \left(\tau\_{1}^{\prime}\operatorname{ref}^{r\_{1}^{\prime}} + \tau\_{2}^{\prime}\operatorname{ref}^{r\_{2}^{\prime}}\right) \\ \Theta \mid \mathcal{L} \mid \Gamma[x \longleftarrow \tau\_{1}^{\prime}\operatorname{ref}^{r\_{1}^{\prime}}][y \longleftarrow \tau\_{2}^{\prime}\operatorname{ref}^{r\_{2}^{\prime}}] \vdash e : \tau \Rightarrow \Gamma' \\ \hline \Theta \mid \mathcal{L} \mid \Gamma[x : \tau\_{1}\operatorname{ref}^{r\_{1}^{\prime}}][y : \tau\_{2}\operatorname{ref}^{r\_{2}^{\prime}}] \vdash \operatorname{allias}(x = y) : e : \tau \Rightarrow \Gamma' \end{array} \tag{\mathcal{T}\text{-}\operatorname{A}\mathbf{1}\text{.} \text{a}\text{s}})$$

$$\begin{array}{c} (\tau\_1 \operatorname{ ref}^{r\_1} + \tau\_2 \operatorname{ ref}^{r\_2}) \approx (\tau\_1' \operatorname{ ref}^{r\_1'} + \tau\_2' \operatorname{ ref}^{r\_2'})\\ \Theta \mid \mathcal{L} \mid \Gamma[x \longleftarrow \tau\_1' \operatorname{ ref}^{r\_1}][y \longleftarrow (\tau\_2' \operatorname{ ref}^{r\_2'}) \operatorname{ ref}^{r}] \vdash e : \tau \Rightarrow \Gamma'\\ \hline \Theta \mid \mathcal{L} \mid \Gamma[x : \tau\_1 \operatorname{ ref}^{r\_1}][y : (\tau\_2 \operatorname{ ref}^{r\_2}) \operatorname{ ref}^{r}] \vdash \operatorname{ alias} (x = \*y) : e : \tau \Rightarrow \Gamma' \end{array} \tag{\text{T-Aıl.s}\operatorname{Pr}}$$

$$\frac{\Gamma \le \Gamma' \qquad \Theta \mid \mathcal{L} \mid \Gamma' \vdash e : \tau \Rightarrow \Gamma' \qquad \Gamma''', \tau \le \Gamma'''', \tau'}{\Theta \mid \mathcal{L} \mid \Gamma \vdash e : \tau' \Rightarrow \Gamma'''} \tag{\text{T-Sva}}$$

τ<sup>1</sup> ≈ τ<sup>2</sup> iff • τ<sup>1</sup> ≤ τ<sup>2</sup> and • τ<sup>2</sup> ≤ τ1.

**Fig. 8.** Pointer manipulation and subtyping

Γ |= ϕ<sup>1</sup> =⇒ ϕ<sup>2</sup> <sup>Γ</sup> {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>1}≤{<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>2} (S-Int) r<sup>1</sup> ≥ r<sup>2</sup> Γ τ<sup>1</sup> ≤ τ<sup>2</sup> <sup>Γ</sup> <sup>τ</sup><sup>1</sup> **ref** <sup>r</sup><sup>1</sup> <sup>≤</sup> <sup>τ</sup><sup>2</sup> **ref** <sup>r</sup><sup>2</sup> (S-Ref) <sup>∀</sup> <sup>x</sup> <sup>∈</sup> dom(Γ- ).Γ <sup>Γ</sup>(x) <sup>≤</sup> <sup>Γ</sup>- (x) Γ ≤ Γ- (S-TyEnv) Γ, <sup>x</sup> : <sup>τ</sup> <sup>≤</sup> <sup>Γ</sup>- , x : τ <sup>x</sup> ∈ dom(Γ) Γ, τ ≤ Γ, τ-(S-Res)

**Fig. 9.** Subtyping rules.

Destructive Updates, Aliasing, and Subtyping. We now discuss the handling of assignment, aliasing annotations, and subtyping as described in Figure 8. Although apparently unrelated, all three concern updating the refinements of (potentially) aliased reference cells.

Like the binding forms discussed above, T-Assign splits the assigned value's type into two types via the type addition operator, and distributes these types between the right hand side of the assignment and the mutated reference contents. Refinement information in the fresh contents may be inconsistent with any previous refinement information; only the shapes must be the same. In a system with unrestricted aliasing, this typing rule would be unsound as it would admit writes that are inconsistent with refinements on aliases of the left hand side. However, the assignment rule requires that the updated reference has an ownership of 1. By the ownership type invariant, all aliases with the updated reference have 0 ownership, and by ownership well-formedness may only contain the refinement.

Example 3. We can type the program as follows:

```
let x = mkref 5 in // x : {ν : int | ν = 5} ref 1
let y=x in // x : 1, y : {ν : int | ν = 5} ref 1
 y := 4; assert(*y = 4) // x : 1, y : {ν : int | ν = 4} ref 1
```
In this and later examples, we include type annotations within comments. We stress that these annotations are for expository purposes only; our tool can infer these types automatically with no manual annotations.

As described thus far, the type system is quite strict: if ownership has been completely transferred from one reference to another, the refinement information found in the original reference is effectively useless. Additionally, once a mutable pointer has been split through an assignment or let expression, there is no way to recover mutability. The typing rule for must alias assertions, T-Alias and T-AliasPtr, overcomes this restriction by exploiting the must-aliasing information to "shuffle" or redistribute ownerships and refinements between two aliased pointers. The typing rule assigns two fresh types τ <sup>1</sup> **ref** <sup>r</sup>- <sup>1</sup> and τ <sup>2</sup> **ref** <sup>r</sup>- <sup>2</sup> to the two operand pointers. The choice of τ 1, r <sup>1</sup>, τ <sup>2</sup>, and r <sup>2</sup> is left open provided that the sum of the new types, (τ <sup>1</sup> **ref** <sup>r</sup>- <sup>1</sup> )+(τ <sup>2</sup> **ref** <sup>r</sup>- <sup>2</sup> ) is equivalent (denoted ≈) to the sum of the original types. Formally, ≈ is defined as in Figure 8; it implies that any refinements in the two types must be logically equivalent and that ownerships must also be equal. This redistribution is sound precisely because the two references are assumed to alias; the total ownership for the single memory cell pointed to by both references cannot be increased by this shuffling. Further, any refinements that hold for the contents of one reference must necessarily hold for contents of the other and vice versa.

```
Example 4 (Shuffling ownerships and refinements). Let ϕ=n be ν = n.
```
**let** x = **mkref** <sup>5</sup> **in** // <sup>x</sup> : {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>=5} **ref** <sup>1</sup> **let** y=x **in** // <sup>x</sup> : 1, <sup>y</sup> : {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>=5} **ref** <sup>1</sup> y := 4; **alias**(x = y) // <sup>x</sup> : {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>=4} **ref** <sup>0</sup>.<sup>5</sup>, <sup>y</sup> : {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>=4} **ref** <sup>0</sup>.<sup>5</sup>

The final type assignment for x and y is justified by

$$\begin{split} & \left\{ \top\_{1} + \left\{ \nu : \mathsf{int} \mid \varphi\_{=4} \right\} \mathsf{ref}^{1} = \left\{ \nu : \mathsf{int} \mid \top \wedge \varphi\_{=4} \right\} \mathsf{ref}^{1} \approx \\ & \left\{ \nu : \mathsf{int} \mid \varphi\_{=4} \wedge \varphi\_{=4} \right\} \mathsf{ref}^{1} = \left\{ \nu : \mathsf{int} \mid \varphi\_{=4} \right\} \mathsf{ref}^{0.5} + \left\{ \nu : \mathsf{int} \mid \varphi\_{=4} \right\} \mathsf{ref}^{0.5} . \end{split}$$

The aliasing rules give fine-grained control over ownership information. This flexibility allows mutation through two or more aliased references within the same scope. Provided sufficient aliasing annotations, the type system may shuffle ownerships between one or more live references, enabling and disabling mutability as required. Although the reliance on these annotations appears to decrease the practicality of our type system, we expect these aliasing annotations can be inserted by a conservative must-aliasing analysis. Further, empirical experience from our prior work [56] indicates that only a small number of annotations are required for larger programs.

Example 5 (Shuffling Mutability). Let ϕ<sup>=</sup><sup>n</sup> again be ν = n. The following program uses two live, aliased references to mutate the same memory location:

```
let x = mkref 0 in
let y=x in // x : {ν : int | ϕ=0} ref 1, y : 1
 x := 1; alias(x = y); // x : 1, y : {ν : int | ϕ=1} ref 1
 y := 2; alias(x = y); // x : {ν : int | ϕ=2} ref 0.5, y : {ν : int | ϕ=2} ref 0.5
 assert(*x = 2)
```

$$\begin{array}{c} \Theta(f) = \forall \lambda. \ \langle x\_{1} : \tau\_{1}, \dots, x\_{n} : \tau\_{n} \rangle \to \langle x\_{1} : \tau'\_{1}, \dots, x\_{n} : \tau'\_{n} \mid \tau \rangle \\ \sigma\_{\alpha} = \left[ \ell : \mathcal{L} / \lambda \right] & \sigma\_{x} = \left[ y\_{1} / x\_{1} \right] \cdots \left[ y\_{n} / x\_{n} \right] \\ \Theta \mid \mathcal{L} \mid \Gamma [y\_{i} \xleftarrow \sigma\_{\alpha} \, \sigma\_{x} \, \tau'\_{i}], x : \sigma\_{\alpha} \, \sigma\_{x} \vdash e : \tau' \Rightarrow \Gamma' \qquad x \notin dom(\Gamma') \\ \hline \Theta \mid \mathcal{L} \mid \Gamma \left[ y\_{i} : \sigma\_{\alpha} \, \sigma\_{x} \, \tau\_{i} \right] \vdash \mathsf{let} \, x = f^{\ell} (y\_{1}, \dots, y\_{n}) \, \mathsf{in} \, e : \tau' \Rightarrow \Gamma' \\ \end{array} \tag{\mathsf{T-Call.}}$$
 
$$\begin{array}{c} \Theta(f) = \forall \lambda \, \langle x\_{1} : \tau\_{1}, \dots, x\_{n} : \tau\_{n} \rangle \to \langle x\_{1} : \tau'\_{1}, \dots, x\_{n} : \tau'\_{n} \mid \tau \rangle \\ \hline \Theta \mid \lambda \mid x\_{1} : \tau\_{1}, \dots, x\_{n} : \tau\_{n} \vdash e : \tau \Rightarrow x\_{1} : \tau'\_{1}, \dots, x\_{n} : \tau'\_{n} \end{array} \mid \tau \rangle$$
 
$$\begin{array}{c} \Theta \vdash f \mapsto (x\_{1}, \dots, x\_{n}) \, e \end{array} \tag{\mathsf{T-CunDEup}}$$

$$\begin{array}{c} \forall f \mapsto (x\_1, \ldots, x\_n) \, e \in D. \Theta \vdash f \mapsto (x\_1, \ldots, x\_n) \, e \\ \quad \text{dom}(D) = \text{dom}(\Theta) \\ \hline \end{array} \qquad \begin{array}{c} \quad \begin{array}{c} \quad \Theta \vdash D \\ \quad \quad \quad \vdash\_{WF} \Theta \\ \quad \quad \quad \vdash\_{\langle D,e \rangle} \\ \quad \quad \quad \vdash\_{\langle D,e \rangle} \\ \quad \quad \quad \quad \text{(T-Proof)} \end{array}$$

**Fig. 10.** Program typing rules

After the first aliasing statement the type system shuffles the (exclusive) mutability between x and y to enable the write to y. After the second aliasing statement the ownership in y is split with x ; note that transferring all ownership from y to x would also yield a valid typing.

Finally, we describe the subtyping rule. The rules for subtyping types and environments are shown in Figure 9. For integer types, the rules require the refinement of a supertype is a logical consequence of the subtype's refinement conjoined with the lifting of Γ. The subtype rule for references is covariant in the type of reference contents. It is widely known that in a language with unrestricted aliasing and mutable references such a rule is unsound: after a write into the coerced pointer, reads from an alias may yield a value disallowed by the alias' type [43]. However, as in the assign case, ownership types prevent unsoundness; a write to the coerced pointer requires the pointer to have ownership 1, which guarantees any aliased pointers have the maximal type and provide no information about their contents beyond simple types.

#### **3.3 Interprocedural Fragment and Context-Sensitivity**

We now turn to a discussion of the interprocedural fragment of our language, and how our type system propagates context information. The remaining typing rules for our language are shown in Figure 10. These rules concern the typing of function calls, function bodies, and entire programs.

We first explain the T-Call rule. The rule uses two substitution maps. σ<sup>x</sup> translates between the parameter names used in the function type and actual argument names at the call-site. σ<sup>α</sup> instantiates all occurrences of λ in the callee type with : L, where is the label of the call-site and L the typing context of the call. The types of the arguments yi's are required to match the parameter types (post substitution). The body of the let binding is then checked with the argument types updated to reflect the changes in the function call (again, post substitution). This update is well-defined because we require all function arguments be distinct as described in Section 2.1. Intuitively, the substitution σ<sup>α</sup> represents incrementally refining the behavior of the callee function with partial context information. If L is itself a context variable λ , this substitution effectively transforms any context prefix queries over λ in the argument/return/output types into a queries over : λ . In other words, while the exact concrete execution context of the callee is unknown, the context must at least begin with which can potentially rule out certain behaviors.

Rule T-FunDef type checks a function definition <sup>f</sup> → (x1, .. , <sup>x</sup>n)<sup>e</sup> against the function type given in Θ. As a convenience we assume that the parameter names in the function type match the formal parameters in the function definition. The rule checks that under an initial environment given by the argument types the function body produces a value of the return type and transforms the arguments according to the output types. As mentioned above, functions may be executed under many different contexts, so type checking the function body is performed under the context variable λ that occurs in the function type.

Finally, the rule for typing programs (T-Prog) checks that all function definitions are well typed under a well-formed function type environment, and that the entry point e is well typed in an empty type environment and the typing context , i.e., the initial context.

Example 6 (1-CFA). Recall the program in Figure 3 in Section 1; assume the function calls are labeled as follows:

p := get-<sup>1</sup>(p) + 1; // ... q := get-<sup>2</sup>(q) + 1;

Taking τ<sup>p</sup> to be the type shown in Example 2:

$$\{\nu : \mathsf{int} \mid (\ell\_1 \preceq \lambda \implies \nu = 3) \land (\ell\_2 \preceq \lambda \implies \nu = 5)\}$$

we can give get the type ∀λ. z : τ<sup>p</sup> **ref** <sup>1</sup> → <sup>z</sup> : <sup>τ</sup><sup>p</sup> **ref** <sup>1</sup> <sup>|</sup> <sup>τ</sup><sup>p</sup> .

Example 7 (2-CFA). To see how context information propagates across multiple calls, consider the following change to the code considered in Example 6:

```
get_real(z) { *z }
get(z) { get_real-
                   3 (z) }
```
The type of get remains as in Example 6, and taking τ to be

$$\{\nu : \mathsf{int} \mid (\ell\_3 \ell\_1 \preceq \lambda' \implies \nu = 3) \land (\ell\_3 \ell\_2 \preceq \lambda' \implies \nu = 5)\}$$

the type of get\_real is: ∀λ . z : τ **ref** <sup>1</sup> → <sup>z</sup> : <sup>τ</sup> **ref** <sup>1</sup> <sup>|</sup> <sup>τ</sup> .

We focus on the typing of the call to get\_real in get; it is typed in context λ and a type environment where p is given type τ<sup>p</sup> from Example 6.

Applying the substitution [<sup>3</sup> : λ/λ ] to the argument type of get\_real yields:

$$\{\nu : \mathsf{int} \mid (\ell\_3 \ell\_1 \preceq \ell\_3 : \lambda \implies \nu = 3) \land (\ell\_3 \ell\_2 \preceq \ell\_3 : \lambda \implies \nu = 5)\} \,\mathsf{ref}^1 \approx$$

$$\{\nu : \mathsf{int} \mid (\ell\_1 \preceq \lambda \implies \nu = 3) \land (\ell\_2 \preceq \lambda \implies \nu = 5)\} \,\mathsf{ref}^1$$

which is exactly the type of p. A similar derivation applies to the return type of get\_real and thus get.

# **3.4 Soundness**

We have proven that any program that type checks according to the rules above will never experience an assertion failure. We formalize this claim with the following soundness theorem.

**Theorem 1 (Soundness).** If D, e, then ∅, ∅, ·, e −→<sup>∗</sup> <sup>D</sup> **AssertFail**.

Further, any well-typed program either diverges, halts in the configuration **AliasFail**, or halts in a configuration H , R, ·, x for some H , R and x , i.e., evaluation does not get stuck.

Proof (Sketch). By standard progress and preservation lemmas; the full proof has been omitted for space reasons and can be found in the full version [60].

# **4 Inference and Extensions**

We now briefly describe the inference algorithm implemented in our tool Con-SORT. We sketch some implemented extensions needed to type more interesting programs and close with a discussion of current limitations of our prototype.

# **4.1 Inference**

Our tool first runs a standard, simple type inference algorithm to generate type templates for every function parameter type, return type, and for every live variable at each program point. For a variable x of simple type τ<sup>S</sup> ::= **int** | τ<sup>S</sup> **ref** at program point <sup>p</sup>, ConSORT generates a type template <sup>τ</sup><sup>S</sup><sup>x</sup>,0,p as follows:

$$\left[\left[\mathbf{int}\right]\_{x,n,p} = \{\nu:\mathbf{int} \mid \varphi\_{x,n,p}(\nu; \mathbf{FV}\_p)\} \quad \left[\tau\_S \operatorname{ref}\right]\_{x,n,p} = \left[\tau\_S\right]\_{x,n+1,p} \operatorname{ref}^{r\_{x,n,p}}$$

ϕ<sup>x</sup>,n,p(ν; **FV**p) denotes a fresh relation symbol applied to ν and the free variables of simple type **int** at program point p (denoted **FV**p). r<sup>x</sup>,n,p is a fresh ownership variable. For each function f , there are two synthetic program points, f <sup>b</sup> and f <sup>e</sup> for the beginning and end of the function respectively. At both points, ConSORT generates type template for each argument, where **FV**<sup>f</sup> <sup>b</sup> and **FV**<sup>f</sup> <sup>e</sup> are the names of integer typed parameters. At f <sup>e</sup>, ConSORT also generates a type template for the return value. We write Γ<sup>p</sup> to indicate the type environment at point p, where every variable is mapped to its corresponding type template. -<sup>Γ</sup><sup>p</sup> is thus equivalent to <sup>x</sup>∈**FV**<sup>p</sup> <sup>ϕ</sup><sup>x</sup>,0,p(<sup>x</sup> ; **FV**p).

When generating these type templates, our implementation also generates ownership well-formedness constraints. Specifically, for a type template of the form {<sup>ν</sup> : **int** <sup>|</sup> <sup>ϕ</sup>x,n+1,p(ν; **FV**p)} **ref** <sup>r</sup>x,n,p ConSORT emits the constraint: <sup>r</sup>x,n,p <sup>=</sup> 0 =<sup>⇒</sup> <sup>ϕ</sup>x,n+1,p(ν; **FV**p) and for a type template (<sup>τ</sup> **ref** <sup>r</sup>x,n+1,p ) **ref** <sup>r</sup>x,n,p Con-SORT emits the constraint <sup>r</sup>x,n,p =0 =<sup>⇒</sup> <sup>r</sup>x,n+1,p = 0.

ConSORT then walks the program, generating constraints between relation symbols and ownership variables according to the typing rules. These constraints take three forms, ownership constraints, subtyping constraints, and assertion constraints. Ownership constraints are simple linear (in)equalities over ownership variables and constants, according to conditions imposed by the typing rules. For example, if variable x has the type template τ **ref** <sup>r</sup>x,0,p for the expression x : = y ; e at point p, ConSORT generates the constraint r<sup>x</sup>,0,p = 1.

ConSORT emits subtyping constraints between the relation symbols at related program points according to the rules of the type system. For example, for the term **let** x = y **in** e at program point p (where e is at program point p , and x has simple type **int ref**) ConSORT generates the following subtyping constraint:

$$\|\Gamma^p\| \wedge \varphi\_{y,1,p}(\nu; \mathbf{F} \mathbf{V}\_p) \implies \varphi\_{y,1,p'}(\nu; \mathbf{F} \mathbf{V}\_{p'}) \wedge \varphi\_{x,1,p'}(\nu; \mathbf{F} \mathbf{V}\_{p'}) \|$$

in addition to the ownership constraint r<sup>y</sup>,0,p = r<sup>y</sup>,0,p- + r<sup>x</sup>,0,p-.

Finally, for each **assert**(ϕ) in the program, ConSORT emits an assertion constraint of the form: -<sup>Γ</sup><sup>p</sup> <sup>=</sup><sup>⇒</sup> <sup>ϕ</sup> which requires the refinements on integer typed variables in scope are sufficient to prove ϕ.

Encoding Context Sensitivity. To make inference tractable, we require the user to fix a priori the maximum length of prefix queries to a constant k (this choice is easily controlled with a command line parameter to our tool). We supplement the arguments in every predicate application with a set of integer context variables c1,..., c<sup>k</sup> ; these variables do not overlap with any program variables.

ConSORT uses these variables to infer context sensitive refinements as follows. Consider a function call **let** x = f -(y1,..., yn)**in** e at point p where e is at point p . ConSORT generates the following constraint for a refinement ϕyi ,n,p(ν, c1,..., c<sup>k</sup> ; **FV**p) which occurs in the type template of yi:

$$\begin{aligned} &\varphi\_{y\_i,n,p}(\nu,c\_0,\ldots,c\_k; \mathbf{F}\mathbf{V}\_p) \implies \sigma\_x \, \varphi\_{x\_i,n,f^b}(\nu,\ell,c\_0,\ldots,c\_{k-1}; \mathbf{F}\mathbf{V}\_{f^b})\\ &\sigma\_x \, \varphi\_{x\_i,n,f^e}(\nu,\ell,c\_0,\ldots,c\_{k-1}; \mathbf{F}\mathbf{V}\_{f^e}) \implies \varphi\_{y\_i,n,p'}(\nu,c\_0,\ldots,c\_k; \mathbf{F}\mathbf{V}\_{p'})\\ &\sigma\_x = [y\_1/x\_1]\cdots[y\_n/x\_n] \end{aligned}$$

Effectively, we have encoded <sup>1</sup> ...<sup>k</sup> λ as ∧<sup>0</sup><i≤<sup>k</sup>c<sup>i</sup> = i. In the above, the shift from <sup>c</sup>0,..., <sup>c</sup><sup>k</sup> to , <sup>c</sup>0,..., <sup>c</sup><sup>k</sup>−<sup>1</sup> plays the role of <sup>σ</sup><sup>α</sup> in the T-Call rule. The above constraint serves to determine the value of c<sup>0</sup> within the body of the function f . If f calls another function g, the above rule propagates this value of c<sup>0</sup> to c<sup>1</sup> within g and so on. The solver may then instantiate relation symbols with predicates that are conditional over the values of ci.

Solving Constraints. The results of the above process are two systems of constraints; real arithmetic constraints over ownership variables and constrained Horn

clauses (CHC) over the refinement relations. Under certain assumptions about the simple types in a program, the size of the ownership and subtyping constraints will be polynomial to the size of the program. These systems are not independent; the relation constraints may mention the value of ownership variables due to the wellformedness constraints described above. The ownership constraints are first solved with Z3 [16]. These constraints are non-linear but Z3 appears particularly wellengineered to quickly find solutions for the instances generated by ConSORT. We constrain Z3 to maximize the number of non-zero ownership variables to ensure as few refinements as possible are constrained to be by ownership well-formedness.

The values of ownership variables inferred by Z3 are then substituted into the constrained Horn clauses, and the resulting system is checked for satisfiability with an off-the-shelf CHC solver. Our implementation generates constraints in the industry standard SMT-Lib2 format [8]; any solver that accepts this format can be used as a backend for ConSORT. Our implementation currently supports Spacer [37] (part of the Z3 solver [16]), HoICE [13], and Eldarica [48] (adding a new backend requires only a handful of lines of glue code). We found that different solvers are better tuned to different problems; we also implemented parallel mode which runs all supported solvers in parallel, using the first available result.

#### **4.2 Extensions**

Primitive Operations. As defined in Section 2, our language can compare integers to zero and load and store them from memory, but can perform no meaningful computation over these numbers. To promote the flexibility of our type system and simplify our soundness statement, we do not fix a set of primitive operations and their static semantics. Instead, we assume any set of primitive operations used in a program are given sound function types in Θ. For example, under the assumption that + has its usual semantics and the underlying logic supports +, we can give + the type ∀λ.x : 0, y : 0→x : 0, y : <sup>0</sup> | {ν : **int** | ν = x + y}. Interactions with a nondeterministic environment or unknown program inputs can then be modeled with a primitive that returns integers refined with .

Dependent Tuples. Our implementation supports types of the form: (x<sup>1</sup> : τ1,..., x<sup>n</sup> : τn), where x<sup>i</sup> can appear within τ<sup>j</sup> (j = i) if τ<sup>i</sup> is an integer type. For example, (x : {ν : **int** | } , y : {ν : **int** | ν > x}) is the type of tuples whose second element is strictly greater than the first. We also extend the language with tuple constructors as a new value form, and let bindings with tuple patterns as the LHS.

The extension to type checking is relatively straightforward; the only significant extensions are to the subtyping rules. Specifically, the subtyping check for a tuple element x<sup>i</sup> : τ<sup>i</sup> is performed in a type environment elaborated with the types and names of other tuple elements. The extension to type inference is also straightforward; the arguments for a predicate symbol include any enclosing dependent tuple names and the environment in subtyping constraints is likewise extended.

Recursive Types. Our language also supports some unbounded heap structures via recursive reference types. To keep inference tractable, we forbid nested recursive types, multiple occurrences of the recursive type variable, and additionally

fix the shape of refinements that occur within a recursive type. For recursive refinements that fit the above restriction, our approach for refinements is broadly similar to that in [35], and we use the ownership scheme of [56] for handling ownership. We first use simple type inference to infer the shape of the recursive types, and automatically insert fold/unfold annotations into the source program. As in [35], the refinements within an unfolding of a recursive type may refer to dependent tuple names bound by the enclosing type. These recursive types can express, e.g., the invariants of a mutable, sorted list. As in [56], recursive types are unfolded once before assigning ownership variables; further unfoldings copy existing ownership variables.

As in Java or C++, our language does not support sum types, and any instantiation of a recursive type must use a null pointer. Our implementation supports an **ifnull** construct in addition to a distinguished **null** constant. Our implementation allows any refinement to hold for the null constant, including ⊥. Currently, our implementation does not detect null pointer dereferences, and all soundness guarantees are made modulo freedom of null dereferences. As -Γ omits refinements under reference types, null pointer refinements do not affect the verification of programs without null pointer dereferences.

Arrays. Our implementation supports arrays of integers. Each array is given an ownership describing the ownership of memory allocated for the entire array. The array type contains two refinements: the first refines the length of the array itself, and the second refines the entire array contents. The content refinement may refer to a symbolic index variable for precise, per-index refinements. At reads and writes to the array, ConSORT instantiates the refinement's symbolic index variable with the concrete index used at the read/write.

As in [56], our restriction to arrays of integers stems from the difficulty of ownership inference. Soundly handling pointer arrays requires index-wise tracking of ownerships which significantly complicates automated inference. We leave supporting arrays of pointers to future work.

#### **4.3 Limitations**

Our current approach is not complete; there are safe programs that will be rejected by our type system. As mentioned in Section 3.1, our well-formedness condition forbids refinements that refer to memory locations. As a result, ConSORT cannot in general express, e.g., that the contents of two references are equal. Further, due to our reliance on automated theorem provers we are restricted to logics with sound but potentially incomplete decision procedures. ConSORT also does not support conditional or context-sensitive ownerships, and therefore cannot precisely handle conditional mutation or aliasing.

# **5 Experiments**

We now present the results of preliminary experiments performed with the implementation described in Section 4. The goal of these experiments was to answer the

**Table 1.** Description of benchmark suite adapted from JayHorn. **Java** are programs that test Java-specific features. **Inc** are tests that cannot be handled by ConSORT, e.g., null checking, etc. **Bug** includes a "safe" program we discovered was actually incorrect.


following questions: i) is the type system (and extensions of Section 4) expressive enough to type and verify non-trivial programs? and ii) is type inference feasible?

To answer these questions, we evaluated our prototype implementation on two sets of benchmarks.<sup>4</sup> The first set is adapted from JayHorn [32, 33], a verification tool for Java. This test suite contains a combination of 82 safe and unsafe programs written in Java. We chose this benchmark suite as, like ConSORT, JayHorn is concerned with the automated verification of programs in a language with mutable, aliased memory cells. Further, although some of their benchmark programs tested Java specific features, most could be adapted into our low-level language. The tests we could adapt provide a comparison with existing state-ofthe-art verification techniques. A detailed breakdown of the adapted benchmark suite can be found in Table 1.

Remark 2. The original JayHorn paper includes two additional benchmark sets, Mine Pump and CBMC. Both our tool and recent JayHorn versions time out on the Mine Pump benchmark. Further, the CBMC tests were either subsumed by our own test programs, tested Java specific features, or tested program synthesis functionality. We therefore omitted both of these benchmarks from our evaluation.

The second benchmark set consists of data structure implementations and microbenchmarks written directly in our low-level imperative language. We developed this suite to test the expressive power of our type system and inference. The programs included in this suite are:


<sup>4</sup> Our experiments and the ConSORT source code are available at https://www.fos. kuis.kyoto-u.ac.jp/projects/consort/.


**Table 2.** Comparison of ConSORT to JayHorn on the benchmark set of [32] (top) and our custom benchmark suite (bottom). T/O indicates a time out.

We introduced unsafe mutations to these programs to check our tool for unsoundness and translated these programs into Java for further comparison with JayHorn.

Our benchmarks and JayHorn's require a small number of trivially identified alias annotations. The adapted JayHorn benchmarks contain a total of 6 annotations; the most for any individual test was 3. The number of annotations required for our benchmark suite are shown in column **Ann.** of Table 2.

We first ran ConSORT on each program in our benchmark suite and ran version 0.7 of JayHorn on the corresponding Java version. We recorded the final verification result for both our tool and JayHorn. We also collected the end-to-end runtime of ConSORT for each test; we do not give a performance comparison with JayHorn given the many differences in target languages. For the JayHorn suite, we first ran our tool on the adapted version of each test program and ran JayHorn on the original Java version. We also did not collect runtime information for this set of experiments because our goal is a comparison of tool precision, not performance. All tests were run on a machine with 16 GB RAM and 4 Intel i5 CPUs at 2GHz and with a timeout of 60 seconds (the same timeout was used in [32]). We used ConSORT's parallel backend (Section 4) with Z3 version 4.8.4, HoICE version 1.8.1, and Eldarica version 2.0.1 and JayHorn's Eldarica backend.

#### **5.1 Results**

The results of our experiments are shown in Table 2. On the JayHorn benchmark suite ConSORT performs competitively with JayHorn, correctly identifying 29 of the 32 safe programs as such. For all 3 tests on which ConSORT timed out after 60 seconds, JayHorn also timed out (column T/O). For the unsafe programs, ConSORT correctly identified all programs as unsafe within 60 seconds; JayHorn answered Unknown for 7 tests (column Imp.).

On our own benchmark set, ConSORT correctly verifies all safe versions of the programs within 60 seconds. For the unsafe variants, ConSORT was able to quickly and definitively determine these programs unsafe. JayHorn times out on all tests except for **Shuffle** and **ShuffleBUG** (column **JH**). We investigated the cause of time outs and discovered that after verification failed with an unbounded heap model, JayHorn attempts verification on increasingly larger bounded heaps. In every case, JayHorn exceeded the 60 second timeout before reaching a preconfigured limit on the heap bound. This result suggests JayHorn struggles in the presence of per-object invariants and unbounded allocations; the only two tests JayHorn successfully analyzed contain just a single object allocation.

We do not believe this struggle is indicative of a shortcoming in JayHorn's implementation, but stems from the fundamental limitations of JayHorn's memory representation. Like many verification tools (see Section 6), JayHorn uses a single, unchanging invariant to for every object allocated at the same syntactic location; effectively, all objects allocated at the same location are assumed to alias with one another. This representation cannot, in general, handle programs with different invariants for distinct objects that evolve over time. We hypothesize other tools that adopt a similar approach will exhibit the same difficulty.

# **6 Related Work**

The difficulty in handling programs with mutable references and aliasing has been well-studied. Like JayHorn, many approaches model the heap explicitly at verification time, approximating concrete heap locations with allocation site labels [14, 20, 32, 33, 46]; each abstract location is also associated with a refinement. As abstract locations summarize many concrete locations, this approach does not in general admit strong updates and flow-sensitivity; in particular, the refinement associated with an abstract location is fixed for the lifetime of the program. The techniques cited above include various workarounds for this limitation. For example, [14, 46] temporarily allows breaking these invariants through a distinguished program name as long as the abstract location is not accessed through another name. The programmer must therefore eventually bring the invariant back in sync with the summary location. As a result, these systems ultimately cannot precisely handle programs that require evolving invariants on mutable memory.

A similar approach was taken in CQual [23] by Aiken et al. [2]. They used an explicit restrict binding for pointers. Strong updates are permitted through pointers bound with restrict, but the program is forbidden from using any pointers which share an allocation site while the restrict binding is live.

A related technique used in the field of object-oriented verification is to declare object invariants at the class level and allow these invariants on object fields to be broken during a limited period of time [7, 22]. In particular, the work on Spec# [7] uses an ownership system which tracks whether object a owns object b; like ConSORT's ownership system, these ownerships contain the effects of mutation. However, Spec#'s ownership is quite strict and does not admit references to b outside of the owning object a.

Viper [30, 42] (and its related projects [31, 39]) uses access annotations (expressed as permission predicates) to explicitly transfer access/mutation permissions for references between static program names. Like ConSORT, permissions may be fractionally transferred, allowing temporary shared, immutable access to a mutable memory cell. However, while ConSORT automatically infers many ownership transfers, Viper requires extensive annotations for each transfer.

F\*, a dependently typed dialect of ML, includes an update/select theory of heaps and requires explicit annotations summarizing the heap effects of a method [44, 57, 58]. This approach enables modular reasoning and precise specification of pre- and post-conditions with respect to the heap, but precludes full automation.

The work on rely–guarantee reference types by Gordon et al. [26, 27] uses refinement types in a language mutable references and aliasing. Their approach extends reference types with rely/guarantee predicates; the rely predicate describes possible mutations via aliases, and the guarantee predicate describes the admissible mutations through the current reference. If two references may alias, then the guarantee predicate of one reference implies the rely predicate of the other and vice versa. This invariant is maintained with a splitting operation that is similar to our + operator. Further, their type system allows strong updates to reference refinements provided the new refinements are preserved by the rely predicate. Thus, rely–guarantee refinement support multiple mutable, aliased references with non-trivial refinement information. Unfortunately this expressiveness comes at the cost of automated inference and verification; an embedding of this system into Liquid Haskell [63] described in [27] was forced to sacrifice strong updates.

Work by Degen et al. [17] introduced linear state annotations to Java. To effect strong updates in the presence of aliasing, like ConSORT, their system requires annotated memory locations are mutated only through a distinguished reference. Further, all aliases of this mutable reference give no information about the state of the object much like our 0 ownership pointers. However, their system cannot handle multiple, immutable aliases with non-trivial annotation information; only the mutable reference may have non-trivial annotation information.

The fractional ownerships in ConSORT and their counterparts in [55, 56] have a clear relation to linear type systems. Many authors have explored the use of linear type systems to reason in contexts with aliased mutable references [18, 19, 52], and in particular with the goal of supporting strong updates [1]. A closely related approach is RustHorn by Matsushita et al. [40]. Much like ConSORT, RustHorn uses CHC and linear aliasing information for the sound and—unlike ConSORT—complete verification of programs with aliasing and mutability. However, their approach depends on Rust's strict borrowing discipline, and cannot handle programs where multiple aliased references are used in the same lexical region. In contrast, ConSORT supports fine-grained, per-statement changes in mutability and even further control with **alias** annotations, which allows it to verify larger classes of programs.

The ownerships of ConSORT also have a connection to separation logic [45]; the separating conjunction isolates write effects to local subheaps, while ConSORT's ownership system isolates effects to local updates of pointer types. Other researchers have used separation logic to precisely support strong updates of abstract state. For example, in work by Kloos et al. [36] resources are associated

with static, abstract names; each resource (represented by its static name) may be owned (and thus, mutated) by exactly one thread. Unlike ConSORT, their ownership system forbids even temporary immutable, shared ownership, or transferring ownerships at arbitrary program points. An approach proposed by Bakst and Jhala [4] uses a similar technique, combining separation logic with refinement types. Their approach gives allocated memory cells abstract names, and associates these names with refinements in an abstract heap. Like the approach of Kloos et al. and ConSORT's ownership 1 pointers, they ensure these abstract locations are distinct in all concrete heaps, enabling sound, strong updates.

The idea of using a rational number to express permissions to access a reference dates back to the type system of fractional permissions by Boyland [12]. His work used fractional permissions to verify race freedom of a concurrent program without a may-alias analysis. Later, Terauchi [59] proposed a type-inference algorithm that reduces typing constraints to a set of linear inequalities over rational numbers. Boyland's idea also inspired a variant of separation logic for a concurrent programming language [11] to express sharing of read permissions among several threads. Our previous work [55, 56], inspired by that in [11, 59], proposed methods for type-based verification of resource-leak freedom, in which a rational number expresses an obligation to deallocate certain resource, not just a permission.

The issue of context-sensitivity (sometimes called polyvariance) is well-studied in the field of abstract interpretation (e.g., [28, 34, 41, 50, 51], see [25] for a recent survey). Polyvariance has also been used in type systems to assign different behaviors to the same function depending on its call site [3, 6, 64]. In the area of refinement type systems, Zhu and Jagannathan developed a context-sensitive dependent type system for a functional language [67] that indexed function types by unique labels attached to call-sites. Our context-sensitivity approach was inspired by this work. In fact, we could have formalized context-polymorphism within the framework of full dependent types, but chose the current presentation for simplicity.

# **7 Conclusion**

We presented ConSORT, a novel type system for safety verification of imperative programs with mutability and aliasing. ConSORT is built upon the novel combination of fractional ownership types and refinement types. Ownership types flowsensitively and precisely track the existence of mutable aliases. ConSORT admits sound strong updates by discarding refinement information on mutably-aliased references as indicated by ownership types. Our type system is amenable to automatic type inference; we have implemented a prototype of this inference tool and found it can verify several non-trivial programs and outperforms a state-of-the-art program verifier. As an area of future work, we plan to investigate using fractional ownership types to soundly allow refinements that mention memory locations.

Acknowledgments The authors would like to the reviewers for their thoughtful feedback and suggestions, and Yosuke Fukuda and Alex Potanin for their feedback on early drafts. This work was supported in part by JSPS KAKENHI, grant numbers JP15H05706 and JP19H04084, and in part by the JST ERATO MMSD Project.

# **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Mixed Sessions**

#### Vasco T. Vasconcelos , Filipe Casal , Bernardo Almeida , and Andreia Mordido

LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Lisbon, Portugal

**Abstract.** Session types describe patterns of interaction on communicating channels. Traditional session types include a form of choice whereby servers offer a collection of options, of which each client picks exactly one. This sort of choice constitutes a particular case of separated choice: offering on one side, selecting on the other. We introduce mixed choices in the context of session types and argue that they increase the flexibility of program development at the same time that they reduce the number of synchronisation primitives to exactly one. We present a type system incorporating subtyping and prove preservation and absence of runtime errors for well-typed processes. We further show that classical (conventional) sessions can be faithfully and tightly embedded in mixed choices. Finally, we discuss algorithmic type checking and a runtime system built on top of a conventional (choice-less) message-passing architecture.

**Keywords:** Type Systems · Session Types · Mixed Choice.

# **1 Introduction**

Session types provide for describing series of continuous interactions on communication channels [16,19,43,45,49]. When used in type systems for programming languages, session type systems statically verify that programs follow protocols, and hence that they do not engage in communication mismatches.

In order to motivate mixed sessions, suppose that we want to describe a process that asks for a fixed but unbounded number of integer values from some producer. The consumer may be in two states: happy with the values received so far, or ready to ask the producer for a new value. In the former case it must notify the producer so that this may stop sending numbers. In the latter case, the client must ask the producer for another integer, after which it "goes back to the beginning". Using classical sessions, and looking from the consumer side, the communication channel can be described by a (recursive) session type T of the form

```
⊕{ enough : end , more : ? int . T}
```
where <sup>⊕</sup> denotes internal choice (the consumer decides), the two branches in the choice are labelled with enough and more, type **end** denotes a channel on which no further interaction is possible, and ?**int** denotes the reception of an integer value. Reception is a prefix to a type, the continuation is T (in this case the "goes back to the beginning" part). The code for the consumer (and the producer as well) is unnecessarily complex, featuring parts that exchange messages in both directions: enough and more selections from the consumer to the producer, and **int** messages from the producer to the consumer. In particular, the consumer must first select option enough (outgoing) and then receive an integer (incoming).

Using mixed sessions one can invert the direction of the more selection and write the type of the channel (again as seen from the side of the consumer) as

```
⊕{ enough ! unit . end , more? int . T}
```
The changes seem merely cosmetic, but label/polarity pairs (polarity is ! or ?) are now indivisible and constitute the keys of the choice type when seen as a map. The integer value is piggybacked on top of selection more. As a result, the classical session primitive operations: selection and branching (that is, internal and external choice) and communication (output and input) become one only: mixed session. The producer can be safely written as

```
p (enough?z . 0 + more ! n . produce ! ( p , n+1) )
```
offering a choice on channel end p featuring mixed branches with labels enough? and more!, where **0** denotes the terminated process and produce(p, n+1) a recursive call to the producer. The example is further developed in Section 2.

Mixed sessions build on Vasconcelos presentation of session types which we call classical sessions [43], by adapting choice and input/output as needed, but keeping everything else unchanged as much as possible. The result is a language with


The rest of the paper is organised as follows: the next section shows mixed sessions in action; Section 3 introduces the technical development of the language, and Section 4 proves the main results (preservation and absence of runtime errors for typable processes). Then Section 5 presents the embedding and the correspondence proofs, Section 6 discusses implementation details, and Section 7 explores related work. Section 8 concludes the paper.

# **2 There is Room for Mixed Sessions**

This section introduces the main ideas of mixed sessions via examples. We address mixed choices, duplicated labels in choices, and unrestricted output, in this order.

#### **2.1 Mixed Choices**

Consider the producer-consumer problem where the producer produces only insofar as so requested by the consumer. Here is the code for a producer that writes on channel end x numbers starting from n.

```
def produce (x , n) =
  lin x (enough?z . 0 +
          more ! n . produce ! ( x , n+1)
  )
```
Syntax qx(M+N) introduces a choice between M and N on channel end x. Qualifier q is either **un** or **lin** and controls whether the process is persistent (remains after reduction) or is ephemeral (is consumed in the reduction process). Each branch in a choice is composed of a label (enough or more), a polarity mark (input ? or output !), a variable or a value (z or n), and a continuation process (after the dot). The terminated process is represented by **0**; notation **def** introduces a recursive process. The **def** syntax and its encoding in the base language is from the Pict programming language [36] and taken up by Sepi [12].

A consumer that requests n integer values on channel end y can be written as follows, where () represents the only value of type **unit**.

```
def consume (y , n) =
  i f n == 0
  then lin y (enough !() . 0)
  else lin y (more?z . consume ! ( x , n−1) )
```
Suppose that x and y are two ends of the same channel. When choices on x and on y get together, a pair of matching label-polarities pairs is selected and a value transmitted from the output continuation to the input continuation.

Types for the two channel ends ensure that choice synchronisation succeeds. The type of <sup>x</sup> is **rec** a. **lin** &{enough?**unit**.**end**, more!**int**.a} where the qualifier **lin** says that the channel end must be used in exactly one process, & denotes external choice, and each branch is composed of a label, a polarity mark, the type of the communication, and that of the continuation. The type **end** states that no further interaction is possible at the channel and **rec** introduces a recursive type. The type of <sup>y</sup> is obtained from that of <sup>x</sup> by inverting views (<sup>⊕</sup> and &) and polarities (! and ?), yielding **rec** b. **lin**⊕{enough!**unit**.**end**, more?**int**.b}. The choice at <sup>x</sup> in the produce process contains all branches in the type and so we select an external choice view & for x. The choices at y contain only part of the branches, hence the internal choice view <sup>⊕</sup>. This type discipline ensures that processes do not engage in runtime errors when trying to find a match for two choices at the two ends of a given channel.

A few type and process abbreviations simplify coding: i) the **lin** qualifier can be omitted, ii) the terminated process **0** together with the trailing dot can be omitted; iii) the terminated type **end** together with the trailing dot can be omitted; and iv) we introduce wildcards ( ) in variable binding positions (in input branches).

#### **2.2 Duplicated Labels in Choices for Types and for Processes**

Classical session types require distinct identifiers to label distinct branches. Mixed sessions relax this restriction by allowing duplicated labels whenever paired with distinct polarities. The next example describes two processes countDown and collect —that bidirectionally exchange a fixed number of msglabelled messages. The number of messages that flow in each direction is not fixed a priori, but instead decided by the non-deterministic operational semantics. The type that describes the channel, as seen by process countDown, is **rec** a.⊕{msg!**unit**.a, msg?**unit**.a, done!**unit**}, where one can see the msg label in two distinct branches, but with different polarities.

Process countDown features a parameter n that controls the number of messages exchanged (sent or received). The end of the interaction (when n reaches 0) is signalled by a done message.

```
countDown : ( rec a .⊕{msg ! unit . a , msg? unit .a , done ! unit } , int )
def countDown (x , n) =
  i f n == 0
  then x ( done !() )
  else x (msg ! ( ) . countDown ! ( x , n−1) +
            msg? . countDown ! ( x , n−1) )
```
Process collect sees the channel from the dual viewpoint, obtained by exchanging ? with ! and <sup>⊕</sup> with &. Parameter <sup>n</sup> in this case denotes the number of messages received. When done, the process writes the result on channel end r, global to the collect process.

```
collect : ( rec b.&{msg ! unit . b , msg? unit . b, done? unit } , int )
def collect (y, n) =
  y (msg ! ( ) . c o l l e c t ! ( y , n+1) +
      msg? . collect !(y, n) +
      done? . r ( result !n))
```
Mixed sessions allow for duplicated message-polarity pairs permitting a new form of non-determinism that uses exclusively linear channels. A process of the form (νxy)P declares a channel with end points x and y to be used in process P. The process

```
( ν xy ) (
  x (msg ! ( ) ) |
  y (msg? . z (m! true ) + msg? . z (m! false ) )
)
```
featuring two linear choices may reduce to z (m!**true**) or to z (m!**false**). Nondeterminism in the π-calculus without choice (that of Functions as Processes [27,29] for example) can only be achieved by introducing race conditions on un channels. For example, the π-calculus process

( <sup>ν</sup> xy ) (x !() <sup>|</sup> y ? .z! **true** <sup>|</sup> y ? .z! **false** ) )

reduces either to (z!**true** <sup>|</sup> (νxy)y? .z! **false** )) or to (z! **false** <sup>|</sup> (νxy)y? .z!**true**)), leaving for the runtime the garbage collection of the inert residuals. Also note that in this case, channel y cannot remain linear.

Duplicated message-polarities in choices lead to elegant and concise code. A random number generator with a given number n of bits can be written with two processes. The first process sends n messages on channel end x. The contents of the messages are irrelevant (we use value () of type **unit**); what is important is that n more messages are sent, followed by a done message, followed by silence.

```
write : ( rec a .⊕{ done ! unit , more ! unit . a } , int )
def write (x, n) =
  i f n == 0
  then x( done !() )
  else x (more ! ( ) . w ri te ! ( x , n−1) )
```
The reader process reads the more messages in two distinct branches and interprets messages received on one branch as bit 0, and on the other as 1. Upon the reception of a done message, the accumulated random number is conveyed on channel end r, a variable global to the read process.

```
read : ( rec b.&{ done? unit , more? unit . b } , int )
def read (y, n) =
  y ( done? . r ( result !n) +
     more? . read !(y, 2∗n) +
     more? . read !(y, 2∗n+1)
  )
```
Notice that mixed sessions allow duplicated label-polarity pairs in processes but not in types. This point is further discussed in Section 3. Also note that duplicated message labels could be easily added to traditional session types.

# **2.3 Unrestricted Output**

Mixed sessions allow for replicated output processes. The original version of the π-calculus [30,31] features recursion on arbitrary processes. Subsequent versions [29] introduce replication but restricted to input processes. When compared to languages with unrestricted input only, unrestricted output allows for more concise programs and fewer message exchanges for the same effect. Here is a process (call it P) containing a pair of processes that exchange msg-labelled messages ad-aeternum,

```
( ν xy ) ( un y (msg ! ( ) ) | un x (msg? ) )
```
where <sup>x</sup> is of type **rec** a.**un** &{msg?**unit**.a}. The un prefix denotes replication: an un choice survives reduction. Because none of the two sub-processes features a continuation P reduces to P in one step. The behaviour of **un** y (msg!()) can be mimicked by a process without output replication, namely,

( <sup>ν</sup>wz ) w ( !() ) <sup>|</sup> **un** z ( ? . y (msg ! ( ) . w ( !() )))


Fig. 1: The syntax of processes

Even if unrestricted output can be simulated with unrestricted input, the encoding requires one extra channel (wz) and an extra message exchange (on channel wz) in order to reestablish the output on channel end y.

It is a fact that unrestricted output can be added to any flavour of the πcalculus (session-typed or not). In the case of mixed sessions it arises naturally: there is only one communication primitive—choice—and this can be classified as lin or un. If an un-choice happens to behave in "output mode", then we have an unoutput. It is not obvious how to design the language of mixed choices without allowing unrestricted output, while still allowing unrestricted input (which is mandatory for unbounded behaviour).

# **3 The Syntax and Semantics of Mixed Sessions**

This section introduces the syntax and the semantics of mixed sessions. Inspired in Vasconcelos' formulation of session types for the π-calculus [43,45], mixed sessions replace input and output, selection and branching (internal and external choice), with a single construct which we call choice.

#### **3.1 Syntax**

Figure 1 presents the syntax of values and processes. Let x, y, z range over a (countable) set of variables, and let l range over a set of labels. Metavariable v ranges over values. Following the tradition of the π-calculus, set up by Milner et al. [30,31], variables are used both as placeholders for incoming values in communication and for channels. Linearity constraints, central to session types but absent in the π-calculus, dictate that the two ends of a channel must be syntactically distinguished; we use one variable for each end [43]. Different primitive values can be used. Here, we pick the boolean values (so that we may have a conditional process), and unit that plays its role in the embedding of classical session types (Section 5).

Metavariables P and Q range over processes. Choices are processes of the form qx- <sup>i</sup>∈<sup>I</sup> <sup>M</sup><sup>i</sup> offering a choice of <sup>M</sup><sup>i</sup> alternatives on channel end <sup>x</sup>. Qualifier <sup>q</sup> describes how choice behaves with respect to reduction. If q is lin, then the choice is consumed in reduction, otherwise q must be un, and in this case the choice persists after reduction. The type system in Figure 8 rejects nullary (empty) choices. There are two forms of branches: output l ! v.P and input l ?x.P. An output branch sends value v and continues as P. An input branch receives a value and continues as P with the value replacing variable x. The type system in Figure 8 makes sure that value v in l ?v.P is a variable.

The remaining process constructors are standard in the π-calculus. Processes of the form P | Q denote the parallel composition of processes P and Q. Scope restriction (νxy)P binds together the two channel ends x and y of a same channel in process P. The conditional process if v then P else Q behaves as process P if v is true and as process Q otherwise. Since we do not have nullary choices, we include **0**—called inaction—as primitive to denote the terminated process.

#### **3.2 Operational Semantics**

The variable bindings in the language are as follows: variables x and y are bound in P, in a process of the form (νxy)P; variable x is bound in P in a choice of the form l ?x.P. The sets of bound and free variables, as well as substitution, P[v/x], are defined accordingly. We work up to alpha-conversion and follow Barendregt's variable convention, whereby all variables in binding occurrences in any mathematical context are pairwise distinct and distinct from the free variables [2].

Figure 2 summarises the operational semantics of mixed sessions. Following the tradition of the π-calculus, a binary relation on processes—structural congruence—rearranges processes when preparing for reduction. Such an arrangement reduces the number of rules included in the operational semantics. Structural congruence was introduced by Milner [27,29]. It is defined as the least congruence relation closed under the axioms in Figure 2. The first three rules state that parallel composition is commutative, associative, and takes inaction as the neutral element. The fourth rule is commonly known as scope extrusion [30,31] and allows extending the scope of channel ends x, y to process Q. The side-condition *Structural congruence*, P ≡ P

$$\begin{aligned} P \mid Q \equiv Q \mid P & \quad (P \mid Q) \mid R \equiv P \mid (Q \mid R) & P \mid \mathbf{0} \equiv P\\ (\nu xy)P \mid Q \equiv (\nu xy)(P \mid Q) & \quad (\nu xy)\mathbf{0} \equiv \mathbf{0} & \quad (\nu wx)(\nu yz)P \equiv (\nu yz)(\nu wx)P \end{aligned}$$

*Reduction*, P → P

$$\begin{array}{c} \text{if true then } P \text{ else } Q \to P \qquad \text{if false then } P \text{ else } Q \to Q \qquad \text{[R-IrT] [R-IrF]} \\ (\nu xy) (\text{linx}(M+l^{\text{\tiny{l}}}v.P+M') \mid \text{liny}(N+l^{\text{\tiny{l}}}z.Q+N') \mid R) \to (\nu xy)(P \mid Q[v/z/R] \mid R) \\\\ (\nu xy) (\text{linx}(M+l^{\text{\tiny{l}}}v.P+M') \mid \text{uny}(N+l^{\text{\tiny{l}}}z.Q+N') \mid R) \to \newline \qquad \text{[R-LININ]} \\ (\nu xy) (P \mid Q[v/z] \mid \text{uny}(N+l^{\text{\tiny{l}}}z.Q+N') \mid R) \\\\ (\nu xy) (\text{unx}(M+l^{\text{\tiny{l}}}v.P+M') \mid \text{liny}(N+l^{\text{\tiny{l}}}z.Q+N') \mid R) \to \newline \qquad \text{[R-UNLIN]} \\\\ (\nu xy) (P \mid Q[v/z] \mid \text{unx}(M+l^{\text{\tiny{l}}}v.P+M') \mid R) \\\\ (\nu xy) (P \mid Q[v/z] \mid \text{unx}(M+l^{\text{\tiny{l}}}z.Q+N') \mid R) \to \newline \qquad \text{[R-UNL]} \\\\ (\nu xy) (P \mid Q[v/z] \mid \text{unx}(M+l^{\text{\tiny{l}}}v.P+M') \mid \text{uny}(N+l^{\text{\tiny$$

Fig. 2: Operational semantics

"x and y not free in Q" is redundant in face of the Barendregt convention. The fifth rule allows collecting channel bindings no longer in use, and the last rule allows for rearranging the order of channel bindings in a process.

Reduction includes six axioms, two for the destruction of boolean values (via a conditional process), and four for communication. The axioms for communication take processes of a similar nature. The scope restriction (νxy) identifies the two ends of the channel engaged in communication. Under the scope of the channel one finds three processes: the first contains an output process on channel end x, the second contains an input process on channel end y, and the third (R) is an arbitrary process that may contain other references to x and y (the witness process). Communication proceeds by identifying a pair of compatible branches, namely l ! v.P and l ?z.Q. The result contains the continuation process P and the continuation process Q with occurrences of the bound variable z replaced by value v (together with the witness process). The four axioms differ in the treatment of the process qualifiers: lin (ephemeral) and un (persistent). Ephemeral processes are consumed in reduction, persistent processes remain in the contractum.

Choices apart, rules [R-LinLin] and [R-LinUn] are already present in the works of Milner and Vasconcelos [29,43]. Rules [R-UnLin] and [R-LinLin] are absent on the grounds of economy: replicated output can be simulated with a new channel and a replicated in input. In mixed choices these rules cannot be

T ::= Types: q{Ui}<sup>i</sup>∈<sup>I</sup> choice end termination unit | bool unit and boolean μa.T recursive type a type variable U ::= Branches: l -T.T branch ::= Views: ⊕ | & internal and external Γ ::= Contexts: · empty Γ, x: T entry

Fig. 3: The syntax of types

omitted for there is no distinction between input and output: choice is the only (symmetrical) communication primitive.

We have designed mixed choices in such a way that labels may be duplicated in choices; more: label-polarity pairs may be also be duplicated. This allows for non-determinism in a linear context. For example, process

$$(\nu xy)(\limsup\_{l}(l^{!}\mathtt{true}.\mathbf{0} + l^{!}\mathtt{false}.\mathbf{0})\mid\mathsf{lin}\,y(l^{?}z.\mathtt{lin}\,w(m^{!}z.\mathbf{0}))))$$

reduces in one step to either lin w(m! true.**0**) or lin w(m! false.**0**).

The examples in Section 2 take advantage of a def notation, a derived process construct inspired in the SePi [12] and the Pict languages [36]. A process of the form def x(z) = P in Q is understood as

$$(\nu xy)(\mathfrak{un}\,y(\ell^?z.P)\mid Q))$$

and calls to the recursive procedure, of the form x!v, are interpreted as lin x(! v), for an arbitrarily chosen label. The derived syntax hides channel end y and simplifies the syntax of calls to the procedure. Procedures with more than one parameter require tuple passing, a notion that is not primitive to mixed sessions. Fortunately, tuple passing is easy to encode; see Vasconcelos[43].

#### **3.3 Typing**

Figure 3 summarises the syntax of types. We rely on an extra set, that of type variables, a, b, . . . Types describe values, including boolean and unit values, and

#### 724 V. T. Vasconcelos et al.

*Branch subtyping*, U <: U

$$\frac{S\_2 <: S\_1 \quad T\_1 <: T\_2}{l^! S\_1. T\_1 <: l^! S\_2. T\_2} \qquad \frac{S\_1 <: S\_2 \quad T\_1 <: T\_2}{l^? S\_1. T\_1 <: l^? S\_2. T\_2}$$

*Subtyping*, T <: T

$$\begin{array}{llll} \overline{\mathtt{end} \mathrel{\mathop{:}\mathtt{end}} \mathrel{\mathop{:}\mathtt{end}} \qquad \begin{array}{llll} \overline{\mathtt{init}} & \overline{\mathtt{init}} & \overline{\mathtt{bool}} \mathrel{\mathop{:}\mathtt{end}} & \overline{\mathtt{bool}} \mathrel{\mathop{:}\mathtt{end}} & \frac{S[\mu a.S/a] \mathrel{\mathop{:}\mathtt{1}} \mathrel{\mathop{:}\mathtt{T}}}{\mu a.S \mathrel{\mathop{:}\mathtt{T}} & \overline{S} \mathrel{\mathop{:}\mathtt{T}[\mu a.T/a]} \\ & \overline{g \oplus \{U\_{i}\}\_{i \in I} \mathrel{\mathop{:}}} \mathrel{\mathop{:}\mathtt{0}} \mathrel{\mathop{:}\mathtt{0}} & \frac{I \subseteq J \quad U\_{i} \mathrel{\mathop{:}\mathtt{T}} \mathrel{\mathop{:}\mathtt{0}}}{\overline{q \& \{U\_{i}\}\_{i \in I} \mathrel{\mathop{:}\mathtt{T}}} \end{array}$$

Fig. 4: Coinductive subtyping rules

channel ends. A type of the form q{Ui}<sup>i</sup>∈<sup>I</sup> denotes a channel end. Qualifier q states the number of processes that may contain references to the channel end: exactly one for lin, zero or more for un. View distinguishes external (⊕) from internal (&) choice. This distinction is not present in processes but is of paramount importance for typing purposes, as we shall see. The branches are either of output—l ! S.T—or of input—l ?S.T—nature. In either case, S denotes the object of communication and T describes the subsequent behaviour of the channel end. Type end denotes the channel end on which no more interaction is possible. Types μa.T and a cater for recursive types.

Types are subject to a few syntactic restrictions: i) choices must have at least one branch; ii) label-polarity pairs—l -—are pairwise distinct in the branches of a choice type (unlike in processes); iii) recursive types are assumed contractive (that is, containing no subterm of the form μa<sup>1</sup> . . . μan.a1). New variables, new bindings: type variable a is bound in T in type μa.T. Again the definitions of bound and free names as well as that of substitution—S[T /a]—are defined accordingly.

Mixed sessions come equipped with a notion of subtyping. Figure 4 introduces the rules that allow determining whether a given type is subtype of another. The rules must be read coinductively. Base types (end, unit, bool) are subtypes to themselves. The rules for recursive types are standard. Subtyping behaves differently in presence of external or internal choice. For external choice we require the branches in the subtype to contain those in the supertype: exercising less options cannot cause difficulties on the receiving side. For internal choice we require the opposite: here offering more choices can not cause runtime errors. For branches we distinguish output from input: output is contravariant on the contents of the message, input is covariant. In either case, the continuation is covariant. Choices, input/output, and recursive types receive no different treatment than those in classical sessions [15]. We can easily show that the <: relation is a preorder. Notation S ≡ T abbreviates S <: T and T <: S.

Duality is a notion central to session types. In order for channel communication to proceed smoothly, the two channel ends must be compatible: if one end says input, the other must say output; if one end says external choice, the *Polarity duality and view duality*, ⊥ and ⊥

$$?\perp\perp?\qquad?\perp\perp!\qquad\qquad\oplus\perp\&\qquad\&\perp\oplus\cdots$$

*Type duality*, T ⊥ T

end ⊥ end ⊥ <sup>i</sup> ⊥ •<sup>i</sup> S<sup>i</sup> ≡ S <sup>i</sup> T<sup>i</sup> ⊥ T i q{l - <sup>i</sup> Si.Ti}<sup>i</sup>∈<sup>I</sup> ⊥ q{l • <sup>i</sup> S i.T <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> S[μa.S/a] ⊥ T μa.S ⊥ T S ⊥ T[μa.T /a] S ⊥ μa.T

Fig. 5: Coinductive type duality rules

un *and* lin *predicates*, un(T), lin(T)

$$\begin{array}{cccc} \mathsf{un}(\mathsf{end}) & \mathsf{un}(\mathsf{un}\mathsf{int}) & \mathsf{un}(\mathsf{bool}) & \mathsf{un}(\mathsf{un}\mathsf{sharp}\{U\_{i}\}) & \frac{\mathsf{un}(T)}{\mathsf{un}(\mu a.T)} & & \overline{\mathsf{lin}(T)} \end{array}$$

$$\text{Fig.}\text{ }6\text{: The }\mathfrak{un}\text{ and }\mathfrak{lin}\text{ predicts on types.}$$

other must say internal choice. In presence of recursive types, the problem of building the dual of a given type has been elusive, as works by Bernardi and Hennessy, Bono and Padovani, Lindley and Morris show [5,7,25]. Here we eschew the problem by working with a duality relation, as in Gay and Hole [15].

The rules in Figure 5 define what we mean for two types to be dual. This is the coinductive definition of Gay and Hole in rule format (and adapted to choice). Duality is defined for session types only. Type end is the dual of itself. The rule for choice types requires dual views (& is the dual of ⊕, and vice-versa) and dual polarities (? is the dual of !, and vice-versa). Furthermore, the objects of communications must be equivalent (S<sup>i</sup> ≡ S <sup>i</sup>) and the continuations must be dual again (T<sup>i</sup> ⊥ T <sup>i</sup> ). The rules in the second line handle recursion in the exact same way as in type equivalence. As an example, we can easily show that

μa.lin ⊕ {l ?bool.lin&{m! unit.a}} ⊥ lin&{l ! bool.μb.lin⊕{m?unit.lin&{<sup>l</sup> ! bool.b}}}

It can be shown that ⊥ is an involution, that is, if R ⊥ S and S ⊥ T, then R ≡ T.

The meaning of the un and lin predicates are defined by the rules in Figure 6. Basic types—unit, bool, end—are unrestricted; un-annotated choices are unrestricted; μa.T is unrestricted if T is. Contractivity ensures that the predicate is total. All types are lin, meaning that both lin and non-lin types may be used in linear contexts.

Before presenting the type system, we need to introduce two notions that manipulate typing contexts. The rules in Figure 7 define the meaning of context split and context update. These two relations are taken verbatim from Vasconcelos [43]; context split is originally from Walker [48] (cf. Kobayashi et al. [22,23]). Context split is used when type checking processes with two sub-processes. In *Context split*, Γ = Γ ◦ Γ

$$\begin{array}{c c c} \cdot = \cdot \diamond \cdot & \frac{\varGamma\_1 \diamond \varGamma\_2 = \varGamma \qquad \mathsf{un}(T)}{\varGamma, x \colon T = (\varGamma\_1, x \colon T) \diamond (\varGamma\_2, x \colon T)}\\ \varGamma = \varGamma\_1 \diamond \varGamma\_2 & \frac{\varGamma = \varGamma\_1 \diamond \varGamma\_2}{\varGamma, x \colon \lim p = (\varGamma\_1, x \colon \lim p) \diamond \varGamma\_2} \end{array}$$

*Context update*, Γ + x: T = Γ

$$\frac{x \colon U \notin \Gamma}{\Gamma + x \colon T = \Gamma, x \colon T} \qquad \frac{\mathfrak{un}(T) \qquad T \equiv U}{(\Gamma, x \colon T) + x \colon U = (\Gamma, x \colon T)}$$

Fig. 7: Inductive context split and context update rules

this case we split the context in two, by copying unrestricted entries to both contexts and linear entries to one only. Context update is used to add to a given context an entry representing the continuation (after a choice operation) of a channel. If the variable in the entry is not in the context, then we add the entry to the context. Otherwise we require the entry to be present in the context and the type to be unrestricted.

The rules in Figure 8 introduce the typing system for mixed sessions. Here the un and lin predicates on types are pointwise extended to typing contexts. Notice that all contexts are linear and only some contexts are unrestricted. We require all instances of the axioms to be built from unrestricted contexts, thus ensuring that linear resources (channel ends) are fully consumed in typing derivations.

The typing rules for values should be straightforward: constants have their own types, the type for a variable is read from the context, and [T-Sub] is the subsumption rule, allowing a type to be replaced by a supertype.

The rules for branches—[T-Out] and [T-In]—follow those for output and input in classical session types. To type an output branch we split the context in two: one part for the value, the other for the continuation process. To type an input branch we add an entry with the bound variable x to the context under which we type the continuation process. Rule [T-In] rejects branches of the form l ?v.P when v not a variable. The continuation type T is not used in neither rule; instead it is incorporated in the type for the channel in Γ (cf. rule [T-Choice] below).

The rules for inaction, parallel composition, and conditional are from Vasconcelos [43]. That for scope restriction is adapted from Gay and Hole [15]. Rule [T-Inact] follows the general pattern for axioms, requiring a un context. Rule [T-Par] splits the context in two, providing each subprocess with one part. Rule [T-If] splits the context and uses one part to type guard v. Because v is unrestricted, we know that Γ<sup>1</sup> contains exactly the un entries in Γ<sup>1</sup> ◦ Γ<sup>2</sup> and that Γ<sup>2</sup> is equal to Γ<sup>1</sup> ◦ Γ2. Context Γ<sup>2</sup> is used to type both branches of the conditional, for only one of them will ever execute. Rule [T-Res] introduces in the typing context entries for the two channel ends, x and y, at dual types.

*Typing rules for values*, Γ v : T

$$\begin{array}{cccc} \underline{\mathsf{um}(I)} & \underline{\mathsf{um}(I)} & \underline{\mathsf{um}(I)} & \underline{\mathsf{T}} \\ \underline{\mathsf{T}} \vdash \mathsf{()} : \mathsf{unit} & \overline{\mathsf{T}} \vdash \mathsf{true}, \mathsf{false} : \mathsf{bool} & \overline{\begin{array}{c} \Pi\_{1}, \Pi\_{2} \vdash A \mathrel{\mathsf{T}} \\ \Pi\_{1}, x \mathrel{\mathsf{T}}, T, \Pi\_{2} \vdash x \mathrel{\mathsf{T}} \end{array}} & \underline{\begin{array}{c} \Gamma \vdash v \colon S \quad S < \mathrm{\mathrel{\mathsf{T}} \,\,\, \mathrm{T}} \\ \Gamma \vdash v \colon T \end{array}} & \underline{\begin{array}{c} \Gamma \vdash v \colon S \quad S < \mathrm{\mathrel{\mathsf{T}} \,\,\, \mathrm{B}} \\ \Gamma \vdash v \colon T \end{array}} \end{array}$$

*Typing rules for branches*, Γ M : U

$$\begin{array}{c} \begin{array}{c} \Gamma\_{1} \vdash v \colon S \quad \quad \Gamma\_{2} \vdash P\\ \hline \Gamma\_{1} \diamond \Gamma\_{2} \vdash l \, !v.P \colon l \,\!l^{\;}S.T \end{array} \qquad \begin{array}{c} \Gamma, x \colon S \vdash P\\ \hline \Gamma \vdash l^{\;}x.P \colon l^{\;l}S.T \end{array} \begin{array}{c} \text{( $\text{T-OUT}$  [T-IN])}\\ \text{( $\text{T-I}$ N $)} \end{array} \text{($ \text{T-I} $N$ )}$$

*Typing rules for processes*, Γ P

$$\begin{array}{c c c c} \frac{\mathsf{um}(\varGamma)}{\varGamma \vdash \mathbf{0}} & \frac{\varGamma\_{1} \vdash P \quad \varGamma\_{2} \vdash Q}{\varGamma\_{1} \circ \varGamma\_{2} \vdash P \mid Q} & \text{[T-InACr]} & \text{[T-InACr]} \\ \hline I\_{1} \vdash v \colon \mathbf{bool} & \varGamma\_{2} \vdash P \quad \varGamma\_{2} \vdash Q & \varGamma\_{2} \vdash S \, y & \text{ $\Gamma\_{2}$  \vdash  $Q$ } & \mathbf{[T-In] \, y \,\mathbf{T} \vdash P} & \mathbf{[T-In] \, y \,\mathbf{T}} \\ \hline I\_{1} \circ I\_{2} \vdash \mathbf{if} \, v \,\mathbf{then} & P \, \mathbf{else} \, Q & & \mathbf{[T-In] \, y \,\mathbf{T}} & \mathbf{[T-In] \, y \,\mathbf{T}} \\ \hline q\_{1}(\mathbf{T}\_{1} \circ \varGamma\_{2}) & I\_{1} \vdash x \colon q\_{2} \sharp \{l\_{1}^{\*}S\_{i}.T\_{i}\}\_{i \in I} & \varGamma\_{2} + x \colon T\_{j} \vdash l\_{j}^{\*} v\_{j}.P\_{j} \colon l\_{j}^{\*} S\_{j}.T\_{j} & \{l\_{j}^{\*}\}\_{j \in J} = \{l\_{i}^{\*}\}\_{i \in I} \\ \hline & \varGamma\_{1} \circ \varGamma\_{2} \vdash q\_{1} x \sum\_{j \in J} l\_{j}^{\*} v\_{j}.P\_{j} & & & \\ \end{array}$$

[T-Choice]

#### Fig. 8: Inductive typing rules

The rule for choice is new. The incoming context is split in two: one for the subject x of the choice, the other for the various branches in the choice. The qualifier of the process, q1, dictates the nature of the incoming context: un or lin. This allows for a linear choice to contain channels of an arbitrary nature, but limits unrestricted choices to unrestricted channels only (for one cannot predict how many times such choices will be exercised). The second premise extracts a type q2{l - <sup>i</sup> Si.Ti} for x. The third premise types each branch: type S<sup>j</sup> is used to type values v<sup>j</sup> in the branches and each type T<sup>j</sup> is used to type the corresponding continuation. The rule updates context Γ<sup>2</sup> with the continuation type of x: if q<sup>2</sup> is lin, then x is not in Γ<sup>2</sup> and the update operation simply adds the entry to the context. If, on the other hand, q<sup>2</sup> is un, then x is in Γ<sup>2</sup> and the context update operation (together with rule [T-Sub]) insists that type T<sup>j</sup> is a subtype of un{l - <sup>j</sup>S<sup>j</sup> .Tj}, meaning that T<sup>j</sup> is a recursive type.

The last premise to rule [T-Choice] insists that the set of labels in the choice type coincides with that in the choice process. That does not mean that the label-polarity pairs are in a one-to-one correspondence: label-polarity pairs are pairwise distinct in types (see the syntactic restrictions in Section 3.3), but not in processes. For example, process linx(l ?y.**0** + l ?z.**0**) can be typed against context x: lin⊕ {l ?bool.end}. From the fact that the two sets must coincide does not follow that the label-polarity pairs type in the context must coincide with those in the process. Taking advantage of subtyping, the above process can still be typed against context x: lin⊕{l ?bool.end, m! unit.end} because lin⊕{l ?bool.end, m! unit.end} <: lin⊕{l ?bool.end}. The opposite phenomenon happens with external choice, where one may remove branches by virtue of subtyping.

We complete this section by discussing examples that illustrate options taken in the typing system (we postpone the formal justification to Section 4). Suppose we allow empty choices in the syntax of types. Then the process

$$(\nu xy)(x() \mid y())$$

would be typable by taking x: ⊕(), y : &(), yet the process would not reduce. We could add an extra reduction rule for the effect

$$(\nu xy)(x() \mid y() \mid R) \to (\nu xy)R$$

which would satisfy preservation (Theorem 2). We decided not to include it in our reduction rules as we did not want the extra complexity. Including the rule also does not bring any apparent benefit.

The syntax of processes places no restrictions on the label-polarity pairs in choices; yet that of types does. What if we relax the restriction that labelpolarities pairs in choice types must be pairwise distinct? Then process

$$((\nu xy)(x(l^! \mathtt{true} + l^! ()) \mid y(l^? z. \mathtt{if} \ z \ \mathtt{then} \ \mathbf{0} \ \mathtt{else} \ \mathbf{0})))$$

could be typed under context x: &{l ! bool, l! unit}, y : ⊕ {l ?bool, l?unit}, yet the process might reduce to if () then **0** else **0** which is a runtime error.

# **4 Well-typed Mixed Sessions Do Not Lead to Runtime Errors**

This section introduces the main results of mixed choices: absence of runtime errors and preservation, both for well-typed processes.

We say that a process is a runtime error if it is structurally congruent to:

**–** a process of the form

$$(\nu x\_1 y\_1) \dots (\nu x\_n y\_n) (\nu xy) (qx \sum\_{i \in I} l\_i^\star v\_i. P\_i \mid q' y \sum\_{j \in J} l\_j^\star w\_j. Q\_j \mid R)$$

where {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> ∩ {l - <sup>j</sup> }<sup>j</sup>∈<sup>J</sup> = ∅ with each •<sup>i</sup> is obtained by dualising i, or

**–** a process of the form qz(M + l ?v.P + N) and v is not a variable, or

**–** a process of the form if v then P else Q and v is neither true nor false.

Examples of processes which are runtime errors include:

$$\begin{aligned} (\nu xy)(\mathsf{lin} x (l^{!} \mathsf{true}.\mathbf{0}) \mid \mathsf{lin} y (l^{!} \mathsf{true}.\mathbf{0}))) \\ (\nu xy)(\mathsf{un} x (l^{!} \mathsf{true}.\mathbf{0}) \mid \mathsf{lin} y (m^{?} z.\mathbf{0})) \\ \mathsf{un} x (l^{?} \mathsf{false}.\mathbf{0}) \\ \text{if } () \text{ then } \mathbf{0} \text{ else } \mathbf{0} \end{aligned}$$

Notice that processes of the form (νxy)linx - <sup>i</sup>∈<sup>I</sup> <sup>M</sup><sup>i</sup> cannot be classified as runtime errors for they may be typed. Just think of (νxy)linx(l ?z.liny(l ! true.**0**)), typable under the empty context. Unlike the interpretations of session types in linear logic by Caires, Pfenning and Wadler [8,14,46,47], typable mixed session processes can easily deadlock. Similarly, processes with more than one linchoice on the same channel end can be typed. For example process linx(l ! true.**0**) | linx(l ?z.**0**)) can be typed under context <sup>x</sup>: μa.un ⊕ {<sup>l</sup> ! unit.a, l?bool.a}. Recall the relationship between qualifiers in processes q<sup>1</sup> and those in types q<sup>2</sup> in the discussion of the rules for choice in Section 3.

**Theorem 1 (Well-typed processes are not runtime errors).** If · P, then P is not a runtime error.

Proof. In view of a contradiction, assume that · P and that P is

$$(\nu x\_1 y\_1) \dots (\nu x\_n y\_n) (q\_1 x\_n \sum\_{i \in I} l\_i^\star v\_i. P\_i \mid q\_2 y\_n \sum\_{j \in J} l\_j^\star w\_j. Q\_j \mid R)$$

and {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> ∩ {l - <sup>j</sup> }<sup>j</sup>∈<sup>J</sup> = ∅ with i⊥•i. From the typing derivation for P, using [T-Par] and [T-Res], we obtain a context <sup>Γ</sup> <sup>=</sup> <sup>Γ</sup><sup>1</sup> ◦ <sup>Γ</sup><sup>2</sup> ◦ <sup>Γ</sup><sup>3</sup> <sup>=</sup> x<sup>1</sup> : T1, y<sup>1</sup> : S1,...,x<sup>n</sup> : Tn, y<sup>n</sup> : Sn, T<sup>i</sup> ⊥ S<sup>i</sup> for all i = 1,...,n and that Γ<sup>1</sup> q1x<sup>n</sup> - <sup>i</sup>∈<sup>I</sup> <sup>l</sup> - <sup>i</sup> vi.P<sup>i</sup> and Γ<sup>2</sup> q2y<sup>n</sup> - <sup>j</sup>∈<sup>J</sup> <sup>l</sup> - <sup>j</sup> w<sup>j</sup> .Q<sup>j</sup> and Γ<sup>3</sup> R. Without loss of generality, due to the fact that x<sup>n</sup> and y<sup>n</sup> have dual types and from the premises of rule [T-Choice], assume that Γ <sup>1</sup> x<sup>n</sup> : q <sup>1</sup>&{l - kT k.T <sup>k</sup> }<sup>k</sup>∈<sup>K</sup> and Γ <sup>2</sup> y<sup>n</sup> : q <sup>2</sup>⊕{l • kS k.S <sup>k</sup> }<sup>k</sup>∈<sup>K</sup>, {l - <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> = {l - <sup>k</sup>}<sup>k</sup>∈<sup>K</sup> and {l - <sup>j</sup> }<sup>j</sup>∈<sup>J</sup> ⊆ {l • <sup>k</sup>}<sup>k</sup>∈<sup>K</sup>, with k⊥•k. This also implies that {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> = {l • <sup>k</sup>}<sup>k</sup>∈<sup>K</sup>. Thus, a label l - <sup>j</sup> from q2y<sup>1</sup> - <sup>j</sup>∈<sup>J</sup> <sup>l</sup> - <sup>j</sup> w<sup>j</sup> .Q<sup>j</sup> belongs to the set of labels {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> : l - <sup>j</sup> ∈ {l • <sup>k</sup>}<sup>k</sup>∈<sup>K</sup> = {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> , contradicting {l • <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> ∩ {l - <sup>j</sup> }<sup>j</sup>∈<sup>J</sup> = ∅ with i⊥•<sup>i</sup>

When P is qz(M + l ?v.P + N) and v is not a variable, the contradiction is with rule [T-Out], which can only be applied when the value v is a variable.

When P is if v then P else Q and v is not a boolean value, the contradiction immediately arises with rule [T-If]. 

In order to prepare for the preservation result we introduce a few lemmas.

# **Lemma 1 (Unrestricted weakening).** If Γ P and un(T), then Γ, x: T P.

Proof. The proof goes by mutual induction on the rules for branches and processes, but we first need to show the result for the value typing rules. We need to show that if Γ v : S and un(R) then Γ, x: R v : S. This follows by a simple case inspection of the rules [T-Unit], [T-True],[T-False],[T-Var] taking into consideration that un(R). For the rule [T-Sub], use the induction hypothesis to obtain Γ, x: <sup>R</sup> <sup>v</sup> : <sup>S</sup> and conclude, using [T-Sub], that Γ, x: <sup>R</sup> <sup>v</sup> : <sup>T</sup>.

For the branch and processes typing rules we detail the proof when the last rule is [T-Out]. Using the result for typing values, we obtain <sup>Γ</sup>1, x: <sup>R</sup> <sup>v</sup> : <sup>S</sup>, and the induction hypothesis for processes leads to Γ2, x: R P. Using the un context split property, taking into account that un(R), we conclude that Γ<sup>1</sup> ◦ Γ2, x: R l ! v.P : l ! S.T.

For the processes rule [T-Inact], the result is a simple consequence of un(T). For the other rules, the result follows by induction hypothesis in processes and branches rules, as well as using the value typing result. We detail the proof for rule [T-If]. Using the typing values result, we know that <sup>Γ</sup>1, x: <sup>T</sup> <sup>x</sup>: bool. By induction hypothesis we also obtain that Γ2, x: T P and Γ2, x: T Q. Using the un context split property, we conclude Γ<sup>1</sup> ◦ Γ2, x: T if v then P else Q. 

# **Lemma 2 (Preservation for** ≡**).** If Γ P and P ≡ Q, then Γ Q.

Proof. As in Vasconcelos [43, Lemma 7.4] since we share the structural congruence axioms. 

**Lemma 3 (Substitution).** If Γ<sup>1</sup> v : T and Γ2, x: T P and Γ = Γ<sup>1</sup> ◦ Γ2, then Γ P[v/x].

Proof. The proof follows by mutual induction on the rules for processes and branches. 

**Theorem 2 (Preservation).** If Γ P and P → Q, then Γ Q.

Proof. The proof is by rule induction on the reduction, making use of the weakening, substitution lemmas, and preservation for structural congruence. We sketch the cases for [R-LinLin] and [R-LinUn].

When reduction ends with rule [R-LinLin], we know that rule [T-Res] introduces x: X, y : Y with X⊥Y in the context Γ. From there, with applications of [T-Par] and [T-Choice], <sup>Γ</sup> <sup>=</sup> <sup>Γ</sup><sup>1</sup> ◦ <sup>Γ</sup><sup>2</sup> ◦ <sup>Γ</sup><sup>3</sup> and <sup>Γ</sup><sup>1</sup> lin <sup>x</sup>(<sup>M</sup> <sup>+</sup> <sup>l</sup> ! v.P + M ), Γ<sup>2</sup> lin y(N + l ?z.Q + N ), Γ<sup>3</sup> R. Furthermore, Γ<sup>1</sup> = Γ <sup>1</sup> ◦ Γ <sup>1</sup> and lin(Γ1), Γ <sup>1</sup> <sup>x</sup>: lin ⊕ {M,l! S.T, M } and Γ <sup>1</sup> , x: T l ! v.P : l ! S.T. From the [T-Out] rule, Γ<sup>v</sup> v : S and Γ<sup>4</sup> P. For the y side, Γ <sup>2</sup> <sup>y</sup> : lin&{N,l?U.V, N } and Γ <sup>2</sup> , y : Y l ?z.Q: l ?U.V . From the [T-In] rule, <sup>Γ</sup>z, y : V,z : <sup>U</sup> <sup>Q</sup>. We also have that S ≡ U from the duality of x and y. Using the substitution Lemma 3, <sup>Γ</sup>z, y : V,Γ<sup>v</sup> <sup>Q</sup>[v/z]. Using [T-Par] with the remaining contexts and [T-Res] types the conclusion of [R-LinLin].

When reduction ends with rule [R-LinUn], we know that rule [T-Res] introduces x: X, y : Y with X⊥Y in the context Γ. From there, with applications of [T-Par] and [T-Choice], <sup>Γ</sup> <sup>=</sup> <sup>Γ</sup><sup>1</sup> ◦ <sup>Γ</sup><sup>2</sup> ◦ <sup>Γ</sup><sup>3</sup> and <sup>Γ</sup><sup>1</sup> lin <sup>x</sup>(<sup>M</sup> <sup>+</sup> <sup>l</sup> ! v.P + M ), Γ<sup>2</sup> un y(N + l ?z.Q + N ), Γ<sup>3</sup> R. Furthermore, Γ<sup>1</sup> = Γ <sup>1</sup> ◦ Γ <sup>1</sup> and lin(Γ1), Γ <sup>1</sup> <sup>x</sup>: un ⊕ {M,l! S.T, M }. Here x is un since x and y are dual. We also have Γ <sup>1</sup> , x: T l ! v.P : l ! S.T, from which follows Γ<sup>4</sup> v : S and Γ<sup>5</sup> P from rule [T-Out]. For the y side, Γ <sup>2</sup> <sup>y</sup> : un&{N,l?U.V, N } and Γ <sup>2</sup> , y : Y l ?z.Q: l ?U.V which has <sup>Γ</sup>6, y : V,z : <sup>U</sup> <sup>Q</sup> from [T-In].

Types S and U are equivalent due to the duality of x, y and so Γ6, y : V,z : S Q. Using the substitution Lemma 3, Γ<sup>6</sup> ◦ Γ4, y : V Q[v/z]. From Γ<sup>5</sup> we also type the process P. Using [T-Par] with the remaining contexts and [T-Res], types the conclusion of [R-UnLin]. 

# **5 Classical Sessions Were Mixed All Along**

This section introduces the syntax and semantics of classical session types and shows that the language of classical sessions can be embedded in that of mixed sessions.

The syntax and semantics of classical session types are in Figure 9; we follow Vasconcelos [43]. The syntax and the rules for the various judgements extend those of Figures 1 to 8, where we remove choice both from grammar productions (for processes and types) and from the various judgements (operational semantics, subtyping, duality, and typing). On what concerns the syntax of processes, the choice construct of Figure 1 is replaced by new process constructors: output, linear (lin) and replicated (un) input, selection (internal choice) and branching (external choice). The four reduction axioms in Figure 2 that pertain to choice ([R-LinLin], [R-LinUn], [R-UnLin], [R-UnUn]) are replaced by the three axioms in Figure 9. Rule [R-LinCom] describes the output against ephemeral-input interaction, rule [R-UnCom] the output against replicated-input interaction, and rule [R-Case] selects a label in the menu at the other channel end.

The syntax of types features new constructs—linear or unrestricted input and output, and linear or unrestricted external and internal choice—replacing the choice construct in Figure 3. The subtyping rules for the new type constructors are taken from Gay and Hole [15]. Type duality is such that the objects of communication must be equivalent and the continuations (both in communication and choice) must be dual again. We omit the dual rules for q!S.S ⊥ q?T.T and q&{l<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ⊥ q⊕{l<sup>i</sup> : Ti}<sup>i</sup>∈<sup>I</sup> . The new duality rules are adapted from the coinductive definition of Gay and Hole [15]. The un predicate on types insists on the idea that un-annotated types are unrestricted: un(un S.T) and un(un{l<sup>i</sup> : Ti}). The typing rule for choice in Figure 8 is replaced by the four rules in Figure 9; these are taken verbatim from Vasconcelos [43].

The embedding of classical session types in mixed sessions is defined in Figure 10. It consists of two maps, one for processes, the other for types. These maps act as homomorphisms on all process and type constructors not explicitly shown. For example -<sup>P</sup> <sup>|</sup> <sup>Q</sup> <sup>=</sup> -<sup>P</sup> <sup>|</sup> -<sup>Q</sup>. We distinguish one label, msg, and use it to encode input and output (both processes and types). Input and output processes are encoded in choices with one only msg-labelled branch. The output process is qualified as lin (it does not survive reduction) and the input process reads its qualifier q from the incoming process. Choice processes in classical sessions are encoded in choices in mixed sessions. The value transmitted on the mixed session is irrelevant: we pick () of type unit for the output side, and a fresh variable y<sup>i</sup> on the input side. Both types are linear.

Input and output types are translated in choice types. For output we arbitrarily pick an external choice (⊕), and conversely for the input. The label in the only branch is msg in order to match our pick for processes, and the qualifier is read from the incoming type. For classical choices, we read the qualifier and the view from the incoming type. The type of the communication in the branches of the mixed choice is unit, again so that it matches our pick for processes.

Typing correspondence says that the embedding preserves typability.

*Classical syntactic forms*

$$\begin{aligned} P & \coloneqq \dots \dots \qquad \begin{aligned} &\text{Processes:}\\ x & \text{output} \\ qx & \text{input} \\ x & \text{output} \end{aligned} \\ \begin{aligned} qx & \text{?} P & \text{---} \\ x & \text{-selecting} \\ x & \text{+ } \end{aligned} \\ \begin{aligned} &\text{- } P & \text{---} \\ q & \text{- } \text{Types} \\ &\text{- } q \star T.T & \text{---} \text{ communication} \\ &\text{- } q & \text{- } \text{Construction} \end{aligned} \\ \begin{aligned} q & \text{- } q & \text{- } T & \text{---} \\ q & \text{- } q & \text{- } T & \text{- } q \end{aligned} \end{aligned}$$

*Classical reduction rules*, <sup>P</sup> <sup>→</sup> <sup>P</sup>, (plus [R-Res] [R-Par] [R-Struct] from Figure 2)

$$\begin{aligned} (\nu xy)(x!v.P \mid \text{lin } y?z.Q \mid R) &\to (\nu xy)(P \mid Q[v/z] \mid R) & \text{[R-LINOM]}\\ (\nu xy)(x!v.P \mid \text{un } y?z.Q \mid R) &\to (\nu xy)(P \mid Q[v/z] \mid \text{un } y?z.Q \mid R) & \text{[R-LINOM]} \end{aligned}$$

$$\begin{array}{c} \quad j \in I\\ \hline (\nu xy)(x \precl\_j. P \mid y \rhd \{l\_i \colon Q\_i\}\_{i \in I} \mid R) \to (\nu xy)(P \mid Q\_j \mid R) \end{array} \tag{\text{[R-CASE]}}$$

*Classical subtyping rules*, T <: T

$$\begin{array}{c c c} \hline T <: S & S' <: T'\\ \hline q!S.S' <: q!T.T'\\ \hline S \subseteq I & S\_j <: T\_j\\ \hline q \oplus \{l\_i:S\_i\}\_{i \in I} <: q \oplus \{l\_j:T\_j\}\_{j \in J} \\ \hline \end{array} \qquad \begin{array}{c c c} \begin{array}{c c} S <: T & S' <: T'\\ q?S.S' <: q?T.T'\\ \hline T \subseteq J & S\_i <: T\_i\\ \hline q \& \{l\_i:S\_i\}\_{i \in I} <: T\_i\\ \hline q \& \{l\_i:S\_i\}\_{i \in I} <: q \& \{l\_j:T\_j\}\_{j \in J} \\ \hline \end{array}$$

*Classical type duality rules*, T ⊥ T

$$\frac{S \equiv T \qquad S' \perp T'}{q \, ^\prime S . S' \perp q ! T . T'} \qquad \frac{S\_i \perp T\_i}{q \oplus \{l\_i \colon S\_i\}\_{i \in I} \perp q \& \{l\_i \colon T\_i\}\_{i \in I}}$$

*Classical typing rules*, Γ P

$$\frac{\Gamma\_1 \vdash x \colon q \upharpoonright T.U \qquad \Gamma\_2 \vdash v \colon T \qquad \Gamma\_3 + x \colon U \vdash P}{\Gamma\_1 \diamond \Gamma\_2 \diamond \Gamma\_3 \vdash x \colon v.P} \tag{\text{T-TOut}}$$

$$\frac{q\_1(\varGamma\_1 \diamond \varGamma\_2)}{} \quad \begin{array}{c} \varGamma\_1 \vdash x \colon q\_2 \mathbin{?} T. U \\ \varGamma\_1 \diamond \varGamma\_2 \vdash q\_1 x \mathbin{?} y. P \end{array} \begin{array}{c} (\varGamma\_2 + x \colon U), y \colon T \vdash P \\ \end{array} \tag{\mathcal{T}\text{-} T\text{IN}} \begin{array}{c} \mathcal{F}\text{-} \text{TIN} \end{array}$$

$$\frac{\Gamma\_1 \vdash x \colon q \& \{l\_i \colon T\_i\}\_{i \in I} \qquad \Gamma\_2 + x \colon T\_i \vdash P\_i \qquad \forall i \in I}{\Gamma\_1 \diamond \Gamma\_2 \vdash x \rhd \{l\_i \colon P\_i\}\_{i \in I}} \qquad \qquad \left[\text{T-BRANCH}\right]\_{i \in I}$$

$$\frac{\Gamma\_1 \vdash x \colon q \oplus \{l\_i \colon T\_i\}\_{i \in I} \qquad \Gamma\_2 + x \colon T\_j \vdash P \qquad j \in I}{\Gamma\_1 \circ \Gamma\_2 \vdash x \prec l\_j \colon P} \tag{\text{T-SEL}}$$

Fig. 9: Classical session types

#### **Theorem 3 (Typing correspondence).**


*Process translation*

$$\begin{aligned} \left[x!v.P\right] &= \left[\ln x \{\mathsf{msg}^! v.\llbracket P\right]\} \\ \left[qx?y.P\right] &= q \, x\{\mathsf{msg}^? y.\llbracket P\rceil\} \\ \left[x \lhd l.P\right] &= \left[\ln x \{l^!().\llbracket P\rceil\right] \\ \left[x \rhd \{l\_i \colon P\_i\}\_{i \in I}\right] &= \left[\ln x \{l\_i^? y\_i.\llbracket P\_i\rceil\}\_{i \in I} \quad \left(y\_i \notin \mathsf{fv}(P\_i)\right) \right] \end{aligned}$$

(Homomorphic for **<sup>0</sup>**, <sup>P</sup> <sup>|</sup> <sup>Q</sup>, (νxy)P, and if <sup>v</sup> then <sup>P</sup> else <sup>Q</sup>) *Type translation*

$$\begin{aligned} [q!S.T] &= q \oplus \{\mathsf{msg}^{\top}[S].[T]\} \\ [q?S.T] &= q \& \{\mathsf{msg}^{\top}[S].[T]\} \\ [q \oplus \{l\_{i} \colon T\_{i}\}\_{i \in I}] &= q \oplus \{l\_{i}^{!} \mathtt{unit}.[T\_{i}]\}\_{i \in I} \\ [q \& \{l\_{i} \colon T\_{i}\}\_{i \in I}] &= q \& \{l\_{i}^{?} \mathtt{unit}.[T\_{i}]\}\_{i \in I} \end{aligned}$$

(Homomorphic for end, unit, bool, μa.T, and a)

Fig. 10: Embedding classical session types

Proof. 1. A straightforward rule induction on the hypothesis.

2. By rule induction on the hypothesis. We sketch a few cases.

When the derivation ends with [T-TIn], we use item 1., induction, the fact that <sup>q</sup>1(Γ<sup>1</sup> ◦Γ2) implies <sup>q</sup><sup>1</sup>-<sup>Γ</sup><sup>1</sup> · <sup>Γ</sup><sup>2</sup>, and that (Γ2+x: <sup>T</sup>), y : <sup>T</sup> = (Γ1, y : <sup>T</sup>)+x: <sup>S</sup> because x and y are distinct variables.

When the derivation ends with [T-Branch], we obtain (Γ<sup>2</sup> <sup>+</sup> <sup>x</sup>: <sup>T</sup>i), y<sup>i</sup> : unit -<sup>P</sup><sup>i</sup> from the induction hypothesis <sup>Γ</sup><sup>2</sup> <sup>+</sup> <sup>x</sup>: <sup>T</sup><sup>i</sup> -<sup>P</sup><sup>i</sup> using weakening (Lemma 1). 

We complete this section by proving that the classical-mixed translation meets Gorla's good encoding criteria [17]. The five criteria proposed by Gorla ensure that the encoding is meaningful. There are two syntactical and three semantics-related criteria.

Let C range over classical processes and M range over mixed choice processes. The map -· : C→M described in Figure <sup>10</sup> is a translation from classical processes to mixed choice processes. To be in line with the criteria, we add the process representing a successfully terminating process to the syntax of both the source and the target languages. We denote by ⇒ the reflexive and transitive closure of the reduction relations, → , in both the source and target languages. Sometimes we use subscript M to denote the reduction of mixed choice processes and the subscript C for the reduction of classical processes, even though it should be clear from context.

We say that a process P does not reduce, P →, when it cannot make any reduction step. We say that a process diverges, <sup>P</sup> <sup>→</sup><sup>ω</sup>, when <sup>P</sup> can do an infinite number of reductions. On the other hand, a process is successful, P ⇓, if P reduces to a process in parallel with a success , that is, <sup>P</sup> <sup>⇒</sup> <sup>P</sup> <sup>|</sup> . Gorla's criteria view calculi as triples P, → , , where P is a set of processes, → a reduction relation (the operational semantics), and is a behavioral equivalence on processes.

The behavioral equivalence for mixed sessions we use coincides with structural congruence ≡.

The first criterion states that the translation is compositional. For this purpose, we define a context C( <sup>1</sup>; ... ; <sup>k</sup>) as a classical process with k holes.

**Theorem 4 (Compositionality).** The translation -· : C −→ M is compositional, i.e., for every k-ary operator op of M and for every subset N of channel ends, there exists a <sup>k</sup>-ary context <sup>C</sup><sup>N</sup> op( <sup>1</sup>; ... ; <sup>k</sup>) such that for all P1,...,P<sup>k</sup> with ∪k <sup>i</sup>=1 fv(Pi) = <sup>N</sup> and op(P1,...,Pk) <sup>=</sup> <sup>C</sup><sup>N</sup> op(-<sup>P</sup><sup>1</sup>; ... ; -<sup>P</sup><sup>k</sup>).

Proof. The translation of a process is defined in terms of the translation of their subterms, see Figure 10. 

Following the ideas from Peters et al. [34], the translation from mixed to classical sessions can be enriched with a renaming policy ϕ- , representing a map from channel ends to sequences of channel ends. The following theorem states that the proposed translation is name invariant.

**Theorem 5 (Name invariance).** The translation -· : C −→ M is name invariant, i.e., for every classical process P and substitution σ,

$$\begin{aligned} \left[ \left[ P \sigma \right] \right] \begin{cases} = \left[ P \right] \sigma' \text{ if } \sigma \text{ is injective} \\ \asymp \left[ P \right] \sigma' \text{ otherwise} \end{cases} \end{aligned} $$

where σ is such that ϕ- (σ(x)) = σ (ϕ-(x)), for every channel end x.

Proof. The translation transforms each channel end (x, in Figure 10) into itself. Thus, any substitution is preserved. See Figure 10. 

Operational correspondence states that the embedding preserves and reflects reduction. In our case the embedding is quite tight: one reduction step in classical sessions corresponds to one reduction step in mixed sessions. There is no runtime penalty in running classical sessions on a mixed sessions machine. Further notice that we do not rely on any equivalence relation on mixed sessions to establish the result: mixed-sessions images leave no "junk" in the process of simulating classical sessions.

**Theorem 6 (Operational correspondence).** Let P, P be classical sessions processes and Q a mixed sessions process.

1. If  $P \to P'$ , then  $\llbracket P \] \to \llbracket P' \rrbracket$ .
2. If  $\llbracket P \rbrack \to Q$ , then  $P \to P'$  and  $\llbracket P' \rrbracket = Q$ , for some  $P'$ .

Proof. Straightforward rule induction on the hypotheses, relying on the fact that -<sup>P</sup>[v/x] = -<sup>P</sup>[v/x] and <sup>x</sup><sup>i</sup> <sup>∈</sup>/ fv(Pi) in the translation of <sup>x</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup> . 

The following theorems concern the finite and infinite behavior of classical session processes and their corresponding translations.

**Theorem 7 (Divergence Reflection).** The translation -· : C −→ M reflects divergence, i.e., if -<sup>P</sup> <sup>→</sup><sup>ω</sup> <sup>M</sup> then <sup>P</sup> <sup>→</sup><sup>ω</sup> <sup>C</sup> for every process <sup>P</sup> ∈ C.

Proof. Corollary of Theorem 6. 

**Theorem 8 (Success Sensitivity).** The translation -· : C −→ M is success sensitive, i.e., <sup>P</sup> ⇓<sup>C</sup> iff -<sup>P</sup>⇓M, for every process <sup>P</sup> ∈ C.

Proof. Corollary of Theorem 6. 

# **6 What is in the Way of a Compiler?**

This section discusses algorithmic type checking and the implementation of choice in message passing architectures.

We start with type checking and then move to the runtime system. Gay and Hole present an algorithmic subtyping system for classical sessions [15]. Algorithmic subtyping for mixed sessions can be obtained by adapting the rules in Figure 4 along the lines of Gay and Hole. [T-Sub] is the only non syntax-directed rule in Figure 8.We delete this rule and distribute subtype checking among all rules that use, in their premises, sequents Γ v : T, as usual. Most of the rules include a non-deterministic context split operation. Take rule [T-Par], for example. Rather than guessing the right split, we take the incoming context and give it all to process P, later reclaiming the unused part. This outgoing context is then passed to process Q. The outgoing context of the parallel composition P | Q is that of Q. See, e.g., Vasconcelos or Walker for details [43,48]. Rule [T-Res] requires guessing the type of the two channel ends, so that one is dual to the other. Rather than guessing the type of channel end x, we require the help of the programmer by working with an explicitly typed syntax—(νxy : T)P—as in Franco and Vasconcelos [12,43], where T refers to the type of channel end x. For the type of channel end y, rather than guessing, we build it from type T; cf. [4,5,7,25].

Running mixed sessions on a message passing architecture need not be an expensive operation. Take one of the communication axioms in Figure 2. We set up a broker process that receives the label-polarity pairs of both processes ({l - <sup>i</sup> }<sup>i</sup>∈<sup>I</sup> and {l - <sup>j</sup> }<sup>j</sup>∈<sup>J</sup> ), decides on a matching pair (guaranteed to exist for typed processes), and communicates the result back to the two processes. The processes then exchange the appropriate value, and proceed. If the broker is an independent process, then we exchange five messages per choice synchronisation. This basic broker is instantiated for two processes P lin x(l ? <sup>1</sup>z.P<sup>1</sup> + l ! <sup>2</sup>v2.P<sup>1</sup> + l ! <sup>3</sup>v3.P3) and Q lin y(l ! <sup>1</sup>v1.Q<sup>1</sup> + l ? <sup>3</sup>w.Q3) in Figure 11a.

We can do better by piggybacking the values in the output choices together with the label-polarities pairs. The broker passes its decision to the input side in the form of a triple label-polarity-value, yielding one less message exchanged, as showcased in Figure 11b.

Fig. 11: Broker is an independent process

(a) P is the broker

(b) Q is the broker

Fig. 12: Broker is P or Q

Finally, we observe that the broker need not be an independent process; it can be located at one of the choice processes. This reduces the number of messages down to two messages in the general case, as described in Figures 12a and 12b where either P is the broker or Q is the broker. Even if the value was already sent by Q in the case that P is the broker, P must still let Q know which choice was taken, so that Q may proceed with the appropriate branch.

However, in particular cases one message may be enough. Take, for instance a process P un x(l ! <sup>1</sup>v1.P + l ! 2v2.P ). Independently of which branch is taken, the process proceeds as P . Thus, if the broker is located in a process Q, then P needs not be informed of the selected choice. The same is true for classical sessions where selection is a mixed-out choice of a single branch.

There are two other aspects that one should discuss when implementing mixed sessions on a message passing architecture other than the number of messages exchanged.

The first is related the type of broker used and to which values are revealed in a choice to the other party. In the case of the basic broker, only the chosen option value is revealed, and never to the broker itself. However, when we piggyback the values in the second type of broker, all values in the choice branches are revealed to the broker, even if they are not used in the end. This is even more striking in the case where one of the processes is the broker—the other party has access to all the possible values, independently of the choice that is taken.

The second aspect is also related to the values themselves which, in order to be presented in the choice, values must be computed a priori, even if they are not used in the choice.

When dealing with the privacy of the values, we can choose which type of broker to use depending on how much we want to reveal to the other party. However, to prevent computing before a branch is chosen, one should instead use classical sessions.

# **7 Related Work**

The origin of choice Free (completely unrestricted) choice is central to process algebras, including BPA and CCS [3,26]. Here we usually find processes of the form P + Q, where P and Q are arbitrary process. Free choice is also present in the very first proposal of the π-calculus [30,31], even if Milner later uses guarded choice [28]. Sangiorgi and Walker's book builds on the pi-calculus with guarded (mixed) choice [38]. Guarded choices in all preceding proposals operate on possibly distinct channels—x!true.P +y?z.Q— whereas choices on mixed sessions run on a common channel—x(l!true.P + m?y.Q). Kouzapas and Yoshida introduce the notion of mixed session in the context of multiparty session types [24]. Multiparty session types are projected into binary session types, hence the authors also consider mixed choices for binary sessions. This language is not as concise as the one we present, probably because it is designed so as to match projection from multiparty types.

Labelled-choices were embedded in the theory of session types by Honda et al. [18,19,41], where one finds primitives for value passing—x!true.P and x?y.Q—and, separately, for choice in the form of labelled selection—x l.P and branching—<sup>x</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup>—see Section 5. Coalescing label selection with output and branching with input was proposed by Vasconcelos [44] (and later used by Sangiorgi [37]) as a means to describe concurrent objects. Demangeon and Honda use a similar language to study embeddings of calculi for functions and for session-based communication [9]. All these languages offer only separated (unmixed) choices and only on the input side.

Mixed choices in the Singularity operating system Concrete syntax apart, the language of linear mixed choices is quite similar to that of channel contracts in Sing# [10]. Rather than explicit recursive types, Sing# contracts uses named states (akin to typestates [40]), providing for more legible contracts. In Sing#, each state in a contract corresponds to a mixed session lin&{l - <sup>i</sup> Si.Ti} (contracts are always written from the consumer side) where each l<sup>i</sup> denotes a message tag, the message direction (! or ?), S<sup>i</sup> the type of the value in the message, and T<sup>i</sup> the next state.

Stengel and Bultan showed that processes that follow Sing# contracts can engage in communication errors [39]. They further provide a realizability condition for contracts that essentially rules out mixed choices. Bono and Padovani present a calculus and a type system that models Sing# [6,7]. The type system ensures that well-typed processes are exempt from communication errors, but the language of types excludes mixed-choices. So it seems that Sing#-like languages only function properly under separated choice, yet our work survives under mixed choices. Contradiction? No! Sing# features asynchronous (or buffered) semantics whereas mixed sessions run under synchronous semantics. The operational semantics makes all the difference in this case.

Synchronicity, asynchronicity, and choice Pierce and Turner identified the problem: "In an asynchronous language guarded choice should be restricted still further since an asynchronous output in a choice is sensitive to buffering" [36] and Peters et al. state that "a discussion on synchrony versus asynchrony cannot be separated from a discussion on choice" [34,35]. Based on classical sessions, mixed sessions are naturally synchronous. The naive introduction of an asynchronous semantics would ruin the main results of the language (see Section 4). Asynchronous semantics are known to be compatible with classical sessions; see Honda et al. [20,21] for multiparty asynchronous session types and Fowler et al. [11] and Gay and Vasconcelos [16] for two examples of functional languages with session types and asynchronous semantics. So one can ask whether a language can be designed where mixed-choices are handled synchronously and separated-choices asynchronously, a type-guided operational semantics with bydefault asynchronous semantics, reverting to a synchronous semantics when in presence of mixed-choices.

Separation results Palamidessi shows that the π-calculus with mixed choice is more expressive than its subset with separated choice [32]. Gorla provides a simpler proof [17] of the same result and Peters and Nestmann analyse the problem from the perspective of breaking initial symmetries in separated-choice processes [33]. Unlike the π-calculus with separated choices, mixed choices operate on the same channel and are guided by types. It would be interesting to look into separation results for classical sessions and mixed sessions. Are mixed sessions more expressive than classical session under some widely accepted criteria (those of Gorla [17], for example)?

The origin of mixed sessions Mixed sessions dawned on us when looking into an algorithm to decide the equivalence of context-free session types [1,42]. The algorithm translates types into (simple) context-free grammars. The decision procedure runs on arbitrary simple grammars: the right-hand sides of grammar productions may start with a label-output or a label-input pair for the same non-terminal symbol at the left of the production. We then decided to explore mixed sessions and picked the simplest possible language for the effect: the πcalculus. It would be interesting to look into mixed context-free session types, given that decidability of type equivalence is guaranteed.

# **8 Conclusion**

We introduce mixed sessions: session types with mixed choice. Classical session types feature separated choice; in fact all the proposals in the literature we are aware of provide for choice on the input side only, even if we can easily think of choice on the output side. Mixed sessions increase flexibility in programming and are easily realisable in conventional message passing architectures.

Mixed choices come with a type system featuring subtyping. Typability is preserved by reduction. Furthermore well-typed programs are exempt from runtime errors. We provide suggestions on how to derive a type checking procedure, even if we do not formalise it. Classical session types are a particular case of mixed sessions: we provide for an encoding and show typing and operational correspondences.

We leave open the problem of looking into a typed separation result (or a proof of inseparability) between classical sessions and mixed sessions. An interesting avenue for further development includes looking for a hybrid type-guided semantics, asynchcronous by default, that reverts to synchronous when in presence of an output choice.

Acknowledgements We thank Simon Gay, Uwe Nestmann, Kirstin Peters, and Peter Thiemann for comments and discussions. This work was supported by FCT through the LASIGE Research Unit, ref. UIDB/00408/2020, and by Cost Action CA15123 EUTypes.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Higher-Order Spreadsheets with Spilled Arrays**

Jack Williams<sup>1</sup> , Nima Joharizadeh<sup>2</sup> , Andrew D. Gordon1,<sup>3</sup> , and Advait Sarkar1,<sup>4</sup>

> <sup>1</sup> Microsoft Research, Cambridge, UK {t-jowil,adg,advait}@microsoft.com <sup>2</sup> University of California, Davis, USA johari@ucdavis.edu <sup>3</sup> University of Edinburgh, Edinburgh, UK <sup>4</sup> University of Cambridge, Cambridge, UK

**Abstract.** We develop a theory for two recently-proposed spreadsheet mechanisms: *gridlets* allow for abstraction and reuse in spreadsheets, and build on *spilled arrays*, where an array value spills out of one cell into nearby cells. We present the first formal calculus of spreadsheets with spilled arrays. Since spilled arrays may collide, the semantics of spilling is an iterative process to determine which arrays spill successfully and which do not. Our first theorem is that this process converges deterministically. To model gridlets, we propose the *grid calculus*, a higher-order extension of our calculus of spilled arrays with primitives to treat spreadsheets as values. We define a semantics of gridlets as formulas in the grid calculus. Our second theorem shows the correctness of a remarkably direct encoding of the Abadi and Cardelli object calculus into the grid calculus. This result is the first rigorous analogy between spreadsheets and objects; it substantiates the intuition that gridlets are an object-oriented counterpart to functional programming extensions to spreadsheets, such as sheet-defined functions.

# **1 Introduction**

Many spreadsheets contain repeated regions that share the same formatting and formulas, perhaps with minor variations. The typical method for generating each variation is to apply the operations *copy-paste-modify*. That is, the user copies the region they intend to repeat, pastes it into a new location, and makes local modifications to the newly pasted region such as altering data values, formatting, or formulas. A common problem associated with *copy-paste-modify* is that updates to a source region will not propagate to a modified copy. A user must modify each copy manually—a process that is tedious and error-prone.

*Gridlets* [12] are a high-level abstraction for re-use in spreadsheets based on the principle of *live copy-paste-modify*: a pasted region of a spreadsheet can be locally modified without severing the link to the source region. Changes to the source region propagate to the copy.

The *central idea of this paper* is that we can implement gridlets using a formula operator G. If a cell a contains the formula

$$\mathbf{G}(r, a\_1, F\_1, \dots, a\_n, F\_n)$$

then the behaviour is to copy range r, modify cells a<sup>i</sup> with formulas Fi, and paste the computed array in cell a where its elements may be displayed in the cells below and to the right.

Consider the following example:



Source sheet

Evaluated sheet

The table computes and displays a Pythagorean triple, with intermediate calculation spread across many cells. To reuse the table a user creates a gridlet by inserting<sup>5</sup> a G formula in cell A6 as follows.


Source sheet

Evaluated sheet

The formula in A6 is interpreted as: *compute the source range* A1:C4 *with* B2 *bound to* 7*, and* B3 *bound to* 24*.* The result of the formula is an array corresponding to the computed range which then displays in the grid, emulating a *paste* action. A consequence of this design is that this single formula controls the content of a range of cells, below and to the right; we say that it *spills* into these cells.

Our overall goal is to explain the semantics of the gridlet operator G using array spilling. Spilling is not new in spreadsheets: both Microsoft Excel and Google Sheets allow a cell to contain a formula that computes an array, and whose computed value then spills into vacant cells below and to the right. While there is a practical precedent for spilling in spreadsheets, there is no corresponding formal precedent from which to derive a semantics for G. This paper therefore proceeds in two parts.

<sup>5</sup> The user may enter this formula either directly, or indirectly via some grid-based interface [12]; details of the user experience are beyond the scope of this paper.

First, we make sense of array spilling and its subtleties. Two formulas spilling into the same cell, or colliding, is one problem. Another problem is a formula spilling into an area on which it depends, triggering a *spill cycle*. Both problems make preserving determinism and acyclicity of spreadsheet evaluation a challenge. We give a semantics of spilling that exploits iteration to determine which arrays spill successfully, and which do not. Our solution ensures that there is at most one array that spills into any address, and that the iteration converges.

Second, we develop three new spreadsheet primitives that implement G when paired with spilled arrays. We present a higher-order spreadsheet calculus, the *grid calculus*, that admits sheets as first-class values and provides operations that manipulate sheet-values. Previous work has drawn connections between spreadsheets and object-oriented programming [5,8,9,15,17], but we give the first direct correspondence by showing that the Abadi and Cardelli object calculus [1] can be embedded in the grid calculus. Our translation constitutes a precise analogy between objects and sheets, and between methods and cells.

In our semantics for gridlets, we make three distinct technical contributions:


# **2 Challenges of Spilling**

In this section we describe the challenges of implementing spilled arrays. We describe core design principles for spreadsheet implementations and then illustrate how spilled arrays challenge these principles.

#### **2.1 Design Principles for Spreadsheet Evaluation**

Spreadsheet implementations rely on the following two properties to be predictable and efficient.


Both properties are satisfied by standard spreadsheet implementations, if we exclude a few nondeterministic worksheet functions such as RAND. Throughout this work we consider only deterministic worksheet functions. Given this assumption, spreadsheet formulas constitute a purely functional language, and so evaluation is deterministic. Cell evaluation tracks a *calculating* state for every cell and raises a circularity violation for any cell that depends on its own value.

Spilled arrays pose a challenge for preserving determinism and acyclicity which we illustrate with examples. For the remainder of our technical developments we drop the leading = from formulas. We begin with core terminology.

**Arrays** Spreadsheet arrays are finite two-dimensional matrices that use *onebased* indexing and are non-empty. We denote an (m, n) array literal as

$$\{V\_{1,1}, \dots, V\_{1,n}; \dots; V\_{m,1}, \dots, V\_{m,n}\}$$

where (,) delimits the n columns and (;) delimits the m rows. We use V to range over values, which are described in Section 3.


Consider the following example:


Source Sheet

Evaluated Sheet

Address A1 evaluates to a (1, 2) array and is a spill root with spill area {A1, B1}. Address A1 (1, 1)-spills into A1, and (1, 2)-spills into B1.

# **2.2 Spill Collisions**

Spill collisions can be *static* or *dynamic*, and may interfere with determinism.

*Static Collision* Every cell in a spill area should be *blank* except for the spill root; a blank cell has no formula. A static collision occurs when a spill root spills into another non-blank cell, and we say the non-blank cell is an *obstruction*. The choice to read the value from the obstruction or the spilled value violates determinism. We adopt a simple mechanism used by Excel and Sheets to resolve static spill collisions: the root evaluates to an error value, not an array, and spills nowhere. The ambiguity between reading the obstructing cell's value and the root's spilled value is resolved by preventing the root from spilling—we always read the value from the obstructing cell. Consider the following example:


Source Sheet

Evaluated Sheet

The address B1 obstructs spill root A1 and consequently address A1 evaluates to an error value, address B1 evaluates to 40, and address B2 evaluates to 42.

*Dynamic Collisions* A dynamic collision occurs when a blank cell is a spill target for two distinct spill roots. Dynamic collisions can be resolved in different ways.


Consider the following example that uses the single-spill approach:


Source Sheet

Root A2 wins

Root B1 wins

Addresses A2 and B1 are spill roots: the former evaluates to an array of size (1, 2) while the latter evaluates to an array of size (2, 1). The value of address A1 depends on which address from the colliding spill roots A2 and B1 are permitted to spill. Arbitrarily selecting which root is permitted to spill violates deterministic evaluation. Sheets and Excel resolve collisions using an ordering that prefers newer formulas. While consecutive evaluations of the same spreadsheet will produce the same result, two syntactically identical spreadsheets constructed in different ways can produce different results. In Section 4 we give a deterministic semantics for spilling that uses a total ordering on addresses to select a single root from a set of colliding roots.

#### **2.3 Spill Cycles**

A *spill cycle* occurs when the value of a spill root depends on an address in its spill area. Spill cycles violate acyclicity and subtly differ from cell cycles. A cell

cycle occurs when the value of a formula in a cell depends on the value of the cell itself. We know that it is never legal for a cell to read its own value and therefore it is possible to eagerly detect cell cycles during evaluation of a cell. In contrast, a spill cycle only occurs if the cell evaluates to an array that is spilled into a range the cell depends on, so it is not possible to detect the cycle until the cell has been evaluated.

We can thus proactively detect cell cycles, but only retroactively detect spill cycles. To see why, let us consider the following example, wherein we assume the definition of a conditional operator IF that is lazy in the second and third arguments, and the function INC that maps over an array and increments every number and converts to 0, where is the value read from a blank cell.

The evaluation of address B1 returns the sum of the range B2:B3. While the value of B1 depends on the values in the range B2:B3, the sum returns a scalar and therefore no spilling is required.

Consider the case where the value in A1 is changed to 43. The address B1 will evaluate the formula INC(B2:B3), first by dereferencing the range B2:B3 to yield {-; -}, and then by applying INC to yield {0; 0}. The array {0; 0} will attempt to spill into the range B1:B2—a range just read from by the formula. The attempt to spill will induce a spill cycle; there is no consistent value that can be assigned to the addresses B1, B2, and B3.

In Section 4 we give a semantics for spilling that uses dynamic dependency tracking to ensure that no spill root depends on its own spill area.

# **3 Core Calculus for Spreadsheets**

In this section we present a core calculus for spreadsheets that serves as the foundation of our technical developments.

#### **3.1 Syntax**

Figure 1 presents the syntax of the core calculus. Let a and b range over A1-style addresses, written Nm, composed from a column name N and row index m. A column name is a base-26 numeral written using the symbols A..Z. A row index is a decimal numeral written as usual. Let m and n range over positive natural numbers which we typically use to denote row or array indices. We assume a locale in which rows are numbered from top to bottom, and columns from left to right, so that A1 is the top-left cell of the sheet. We use the terms *address* and *cell* interchangeably. Let r range over *ranges* that are pairs of addresses that denote a rectangular region of a grid. Modern spreadsheet systems do not restrict which


**Fig. 1.** Syntax for Core Calculus

corners of a rectangle are denoted by a range but will automatically normalise the range to represent the top-left and bottom-right corners. We implicitly assume that all ranges are written in the normalised form such that range B1:A2 does not occur; instead, the range is denoted A1:B2.

A value V is either the blank value -, a constant c, an error ERR, or a two-dimensional array {Vi,j <sup>i</sup>∈1..m,j∈1..n}. We write {Vi,j <sup>i</sup>∈1..m,j∈1..n} as short for array literal {V<sup>1</sup>,<sup>1</sup>,...,V<sup>1</sup>,n; ... ; Vm,<sup>1</sup>,...,Vm,n}.

Let F range over formulas. A formula is either a value V , a range r, or a function application f(F1,...,Fn), where f ranges over names of pre-defined worksheet functions such as SUM or PRODUCT.

Let S range over sheets, where a sheet is a partial function from addresses to formulas that has finite domain. We write [] to denote the empty map, and we write S[a → F] to denote the extension of S to map address a to formula F, potentially shadowing an existing mapping. We do not model the maximum numbers of rows or columns imposed by some implementations. Each finite S represents an unbounded sheet that is almost everywhere blank: we say a cell a is blank to mean that a is not in the domain of S.

Let γ range over grids, where a grid is a partial function from addresses to values that has finite domain. A grid can be viewed as a function that assigns values to addresses, obtained by evaluating a sheet.

#### **3.2 Operational Semantics**

Figure 2 presents the operational semantics of the core calculus. Auxiliary definitions are present at the top of Figure 2.

*Formula Evaluation* The relation S F ⇓ V means that in sheet S, formula F evaluates to value V . A value V evaluates to itself. A function application <sup>f</sup>(F1,...,Fn) evaluates to <sup>V</sup> if the result of applying <sup>f</sup> to evaluated arguments is <sup>V</sup> , where <sup>f</sup> is the underlying semantics of <sup>f</sup>, a total function on values. A single cell range a:a evaluates to V if address a dereferences to V . A multiple cell range a1:a<sup>2</sup> evaluates to an array of the same dimensions, where each value in the array is obtained by dereferencing the corresponding single cell within the range. We write size(a1:a2) to denote the operation that returns the dimensions

size(N1m<sup>1</sup> :N2m2)=(m<sup>2</sup> − m<sup>1</sup> + 1, N<sup>2</sup> − N<sup>1</sup> + 1) Nm + (i, j) =(N + j − 1)(m + i − 1) Formula evaluation: S F ⇓ V S V ⇓ V S <sup>F</sup><sup>i</sup> ⇓ <sup>V</sup><sup>i</sup> <sup>f</sup>(V1,...,Vn) = <sup>V</sup> S f(F1,...,Fn) ⇓ V S a ! V S a:a ⇓ V a<sup>1</sup> = a<sup>2</sup> size(a<sup>1</sup> :a2)=(m, n) ∀i ∈ 1..m, j ∈ 1..n. S (a<sup>1</sup> + (i, j)) ! Vi,j S <sup>a</sup><sup>1</sup> :a<sup>2</sup> ⇓ {Vi,j <sup>i</sup>∈1..m,j∈1..n} Address dereferencing: S a ! V S(a) = F S F ⇓ V S a ! V a ∈ dom(S) S a ! - Sheet evaluation: S ⇓ γ S ⇓ <sup>γ</sup> def = ∀a ∈ dom(S). S a ! γ(a)

of a range written (m, n), where m is the number of rows, and n is the number of columns. We write a+ (i, j) to denote the address offset to the right and below a by i−1 rows and j −1 columns. For example, a+ (1, 1) maps to a, and a+ (1, 2) maps to the address immediately to the right of a. Both size(a1:a2) and a+ (i, j) are defined in Figure 2.

*Address Dereferencing* The relation S a ! V means that in sheet S, address a dereferences to V . If address a maps to formula F in sheet S, then dereferencing a returns V when F evaluates to V . If address a is not in the domain of S then dereference a returns the blank value -. We make range evaluation and address dereferencing distinct relations to aid our presentation in Section 4.

*Sheet Evaluation* The relation S ⇓ γ means that sheet S evaluates to grid γ and the relation is defined by point-wise dereferencing of every address in the sheet. Recall the spreadsheet design principles of determinism and acyclicity from Section 2.1. The relations of our semantics are partial functions (as stated in Appendix A of the extended version [21]). As for acyclicity, if there is a cycle where S(a) = F and evaluation of formula F must dereference cell a, then we cannot derive S F ⇓ V for any V . Although our calculus could be modified to model a detection mechanism for cell cycles, we omit any such mechanism for the sake of simplicity.

```
Formula F ::= ···| a# (postfix operator)
Dependency set D ::= {a1,...,an}
Grid γ ::= [ai → (V #
                            i , V !
                                i , Di)
                                     i∈1..n] (ai distinct)
Spill permit p ::= -
                      | ×
Spill oracle ω ::= [ai → (mi, ni, pi)
                                    i∈1..n] (ai distinct)
```
**Fig. 3.** Syntax for Spill Calculus (Extends and modifies Figure 1)

# **4 Spill Calculus: Core Calculus with Spilled Arrays**

The spill calculus, presented in this section, is the first formalism to explain the semantics of arrays that spill out of cells in spreadsheets. The spill calculus and its convergence, Theorem 1, is our first main technical contribution.

#### **4.1 Syntax**

Figure 3 presents the extensions and modifications to the syntax of Figure 1; we omit syntax classes that remain unchanged.

Let F range over formulas, extended to include the postfix root operator a#. The root operator a# evaluates to an array if address a is a *spill root*. Accessing an array via the root operator instead of a fixed-size range is more robust to future edits. For example, consider the sheet [A1 → F, B1 → SUM(A1:A10)] where formula F evaluates to a (10, 1) array. If the user modifies F such that the formula evaluates to an array of size (11, 1) then the summation in B1 still applies only to the first ten elements that spill from A1, even if the user intends to sum the whole array. The root operator allows a more robust formulation: [A1 → F, B1 → SUM(A1#)]. The summation in B1 applies to the entire array that spills from A1, regardless of its size. Section 4.3 shows the full semantics of the root operator.

Let D range over dependency sets, which denote a set of addresses that a formula bound to an address depends on.

Let γ range over grids, which now map addresses to tuples of the form (V #, V ! , <sup>D</sup>). If <sup>γ</sup>(a)=(<sup>V</sup> #, V ! , <sup>D</sup>) then <sup>V</sup> # is the pre-spill value obtained by applying the root operator # to a, while V ! is the post-spill value obtained by evaluating a, and D is the dependency set required to dereference a. Each dereferenced address has both a pre-spill and post-spill value, even if the cell content does not spill. If the pre-spill value is not an array, it cannot spill, and the post-spill value equals the pre-spill value.

Let p range over spill permits, where denotes that a root is permitted to spill and × denotes that it is not.

Let ω range over spill oracles, which map addresses to tuples of the form (m, n, p). A spill oracle governs how arrays spill in a sheet.

	- If p = the contents of a can spill with no obstruction.

$$\text{Let } \mathcal{S} \stackrel{\text{def}}{=} [\mathcal{A}1 \mapsto \{7; 8\}, \mathcal{B}1 \mapsto \mathsf{IF}(\mathcal{A}2 = 8, \{9; 10\}, \{100\})]$$

– If p = × then a cannot spill because either a formula obstructs the spill area, or another spill root will spill into the area.

Oracles track the size of each spilled array so we can find the spill root a of any spill target, and hence obtain the value for a spill target by dereferencing a.

#### **4.2 Spill Oracles and Iteration**

As discussed in Section 2.2, spill collisions have the potential to introduce nondeterminism if not handled appropriately. Our solution is to evaluate a sheet in a series of rounds, each determined by a *spill oracle*. Given a sheet, a grid is induced by evaluating the sheet and using the oracle to deterministically predict how each root spills. A discrepancy could be a new spill root the oracle missed, or an existing spill root with dimensions differing from the oracle. If any discrepancies are found we compute a new oracle, and start a new round. Iteration halts when the oracle is *consistent* with the induced grid. The notion of a consistent oracle is defined in Section 4.4. We can view the iteration as a sequence of n oracles where only the final oracle is consistent:

$$[] = \omega\_1 \longrightarrow \omega\_2 \longrightarrow \cdots \longrightarrow \omega\_n \text{ and } \omega\_n \text{ is consistent}$$

Consider the example in Figure 4. At the top we show the bindings of the sheet; at the bottom we show the oracle and induced grid for each round of spilling.

We define the initial spill oracle as ω<sup>1</sup> = [] and in the first round the oracle is empty. An empty oracle anticipates no spill roots and therefore no roots are permitted to spill. The array in A1 remains collapsed and B1 evaluates using the false branch. Once the sheet has been fully evaluated we determine that ω<sup>1</sup> was not a consistent prediction because there is an array in A1 with no corresponding entry in ω1. We compute a new oracle that determines that A1 is allowed to spill because the area is blank. We define the new oracle as <sup>ω</sup><sup>2</sup> = [A1 → (2, <sup>1</sup>, -)].

In the second round the root A1 is permitted to spill by the oracle and as a consequence B1 now evaluates to the array {9; 10}—this array is not anticipated by the oracle and remains collapsed. Once the sheet has been fully evaluated we determine that ω<sup>2</sup> was not a consistent prediction because there is an array in B1 with no corresponding entry in ω2. We compute a new oracle that determines that B1 is allowed to spill because the area is blank in the grid induced by ω2. We define the third oracle as <sup>ω</sup><sup>3</sup> = [A1 → (2, <sup>1</sup>, -), B1 → (2, <sup>1</sup>, -)].

In the third and final round the root A1 is permitted to spill by the oracle and B1 evaluates to the array {9; 10}. This time the oracle anticipates the root in B1 and permits the array to spill. Once the sheet has been fully evaluated we determine that ω<sup>3</sup> is a consistent prediction because the spill roots A1 and B1 are contained in the oracle. The iteration is the sequence of three oracles:

> [] −→ [A1 → (2, <sup>1</sup>, -)] −→ [A1 → (2, <sup>1</sup>, -), B1 → (2, <sup>1</sup>, -)]

*Spill Rejection* Spill oracles explicitly track the anticipated size of the array to ensure that spill rejections based on incorrect dimensions can be corrected. Consider the following example:


After the first round using an empty spill oracle there are three spill roots: A3 = {1, 2, 3}, B1 = {10; 20; 30}, and C1 = {1; 2}. There is sufficient space to spill C1 but only space to spill one of A3 and B1; the decision is resolved using the total ordering on addresses. Suppose that we allow A3 to spill such that the new oracle is: [A3 → (1, <sup>3</sup>, -), B1 → (3, <sup>1</sup>, <sup>×</sup>), C1 → (2, <sup>1</sup>, -)].

After the second round we find that address B1 returns an array of a smaller size because the root C1 spills into C2. Previously we thought B1 was too big to spill but with the new oracle we find there is now sufficient room; by explicitly recording the anticipated size it is possible to identify cases that require further refinement. We compute the new oracle [A3 → (1, <sup>3</sup>, -), B1 → (2, <sup>1</sup>, -), C1 → (2, 1, -)] that is consistent.

An interesting limitation arises if the total ordering places B1 before A3, which we discuss in Section 4.6.

#### **4.3 Operational Semantics**

Figure 5 presents the operational semantics for the spill calculus. The key additions to the relations for formula evaluation and address dereferencing are an oracle ω that is part of the context, and a dependency set D that is part of the output. We discuss each relation in turn and focus on the extensions and modifications from Figure 2. Auxiliary definitions are present at the top of Figure 5.

*Formula Evaluation:* S, ω F ⇓ V, D The spill oracle ω is not inspected by the relation but is threaded through the definition. Dependency set D denotes the transitive dependencies required to evaluate F. Evaluating a value or function application is as before, except we additionally compute the dependencies of the

owners(ω, a) = {(ar, i, j) <sup>|</sup> <sup>ω</sup>(ar)=(m, n, -) and a<sup>r</sup> + (i, j) = a and (i, j) ≤ (m, n)} area(a, m, n) = { a + (i, j) | ∀i ∈ 1..m, ∀j ∈ 1..n } size(<sup>V</sup> ) = - (m, n) if <sup>V</sup> <sup>=</sup> {Vi,j <sup>i</sup>∈1..m,j∈1..n} ⊥ otherwise Formula evaluation: S, ω F ⇓ V, D

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \mathcal{S},\omega\vdash V\Downarrow V,\mathcal{D} \end{array} \end{array} \qquad \begin{array}{c} \begin{array}{c} \mathcal{S},\omega\vdash F\_{i}\Downarrow V\_{i},\mathcal{D}\_{i} \end{array} \quad \begin{array}{c} \begin{array}{c} \left\{f\right\}(V\_{1},\ldots,V\_{n})=V\\ \mathcal{S},\omega\vdash F(F\_{1},\ldots,F\_{n})\Downarrow V,\bigcup\_{i=1}^{n}\mathcal{D}\_{i} \end{array} \end{array} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \mathcal{S},\omega\vdash a\mathbin{!}V^{\#},\mathcal{V}^{\top},\mathcal{D} \\ \hline \mathcal{S},\omega\vdash a\mathbin{!}V^{\#},\mathcal{D}\cup\{a\} \end{array} \quad \begin{array}{c} \begin{array}{c} \mathcal{S},\omega\vdash a\mathbin{!}V^{\#},\mathcal{V}^{\top},\mathcal{D} \\ \hline \mathcal{S},\omega\vdash a\mathbin{!}V^{\top},\mathcal{D}\cup\{a\} \end{array} \end{array} \end{array}$$

$$\frac{\mathsf{size}(a\_1; a\_2) = (m, n) \qquad \forall i \in 1..m, j \in 1..n. \ \mathcal{S}, \omega \vdash a\_1 + (i, j) \,\, ! \, V\_{i,j}^{\#}, V\_{i,j}^{!}, \mathcal{D}\_{i,j}}{\mathcal{S}, \omega \vdash a\_1; a\_2 \downarrow \,\, \{V\_{i,j}^{!} \, :^{i \in 1..m, j \in 1..n}\}, \underbrace{\mathcal{V}\_{i,j}^{\#} \, \mathcal{D}\_{i,j} \cup \{a\_1 + (i, j)\}}\_{i, j = 1, 1}}$$

Address dereferencing: <sup>S</sup>, ω <sup>a</sup> ! <sup>V</sup> #, V ! , D

$$\begin{array}{c} \mathsf{comers}(\omega, a) = \mathcal{Q} \qquad a \notin \mathsf{dom}(\omega) \qquad \mathcal{S}(a) = F \qquad \mathcal{S}, \omega \vdash F \Downarrow V, \mathcal{D} \\\hline \end{array} (1)$$

$$\frac{\mathsf{comers}(\omega, a) = \mathscr{Q} \qquad a \notin \mathsf{dom}(\omega) \qquad a \notin \mathsf{dom}(\mathscr{S})}{\mathscr{S}, \omega \vdash a \upharpoonright \epsilon, \epsilon, \mathscr{Q}} \; (2)$$

owners(ω, a) = <sup>∅</sup> <sup>ω</sup>(a)=(m, n, <sup>×</sup>) <sup>S</sup>(a) = <sup>F</sup> <sup>S</sup>, ω <sup>F</sup> ⇓ V, <sup>D</sup> <sup>S</sup>, ω <sup>a</sup> ! V, ERR, <sup>D</sup> (3)

$$\begin{array}{c} (a\_r, i, j) \in \mathsf{owners}(\omega, a) \qquad \omega(a\_r) = (m, n, \swarrow) \qquad \mathcal{S}(a\_r) = F\\ \hline \mathcal{S}, \omega \u{\urcorner} a\_r \vdash F \Downarrow V, \mathcal{D} \qquad \mathsf{size}(V) = (m, n) \qquad \mathsf{area}(a\_r, m, n) \cap \mathcal{D} = \mathcal{D} \end{array} (4)$$

$$\begin{array}{c} \omega(a\_r) = (m, n, \swarrow) \qquad \mathcal{S}(a\_r) = F \qquad \mathcal{S}, \omega \backslash a\_r \vdash F \Downarrow V, \mathcal{D} \qquad \mathsf{size}(V) \neq (m, n) \\\hline \hline \mathcal{S}, \omega \vdash a \mathbin{!} \big( a = a\_r \mathbin{?} \ V : \epsilon \rangle, (a = a\_r \mathbin{?} V : \epsilon), (a = a\_r \mathbin{?} \mathcal{D} : \mathcal{B}) \end{array} (5)$$

Sheet evaluation: S, ω ⇓ γ

$$\mathcal{S}, \omega \Downarrow \gamma \overset{\text{def}}{=} \forall a \in \text{dom}(\mathcal{S}). \text{ } \mathcal{S}, \omega \vdash a \textsf{!} \,\gamma(a).$$

formula. The dependency set required to evaluate a value is ∅. The dependency set required to evaluate a function application is the union of the dependencies of the arguments. Evaluating a root operation a# dereferences a and returns the pre-spill value V #. The dependency set required to evaluate a root operation a# is the dependency set required to dereference a and the address a itself. Evaluating a single cell range a:a dereferences a and returns the post-spill value V ! . The dependency set required to evaluate a single cell range a:a is the dependency set required to dereference a and the address a itself. Evaluating a multiple cell range a1:a<sup>2</sup> returns an array of the same dimensions, where each value in the array is obtained by dereferencing the corresponding single cell and extracting the post-spill value. The dependency set required to evaluate a multiple cell range is the dependency set required to dereference every address in the range, and the range itself.

*Address dereferencing* The relation <sup>S</sup>, ω <sup>a</sup> ! <sup>V</sup> #, V ! , D means that in sheet S with oracle ω, address a dereferences to pre-spill value V # and post-spill value V ! , and depends upon the addresses in D. Five rules govern address dereferencing, based on spill oracle ω and *owners* set owners(ω, a).

The set owners(ω, a) is key to the operational semantics and denotes the set of owners for address a. If a tuple (ar, i, j) is in the set owners(ω, a), we say a<sup>r</sup> *owns* a, meaning that a<sup>r</sup> is a spill root that we expect to spill into address a, and that a is offset from a<sup>r</sup> by i−1 rows and j−1 columns. Hence, to dereference a we must first compute the root a<sup>r</sup> and extract the (i, j)th spilled value from the root array. Our definition allows an address to own itself, denoted (a, 1, 1) ∈ owners(ω, a), and does not preclude an address having multiple owners, violating the *singlespill policy*. We enforce the single-spill policy in our technical results using an additional well-formedness condition on oracles, defined in Section 4.5.

Rule (1) applies when the address has no owner, the address is not a spill root, and the address has a formula binding in S. The pre-spill and post-spill values are the value obtained by evaluating the bound formula.

Rule (2) applies when the address has no owner, the address is not a spill root, and the address has no formula binding in S. The pre-spill and post-spill values are the blank value and the dependency set is empty. Rules (1) and (2) correspond to the address dereferencing behaviour described in the core calculus (Section 3) which is lifted to the new relation.

Rule (3) rule applies when the address is a spill root and the root is not permitted to spill. The pre-spill value is the value obtained by evaluating the bound formula; the post-spill value is an error value. If the address has no bound formula then the relation is undefined.

Rules (4) and (5) apply when an address with an *owner* is dereferenced. The owner a<sup>r</sup> is omitted from the spill oracle before evaluating the associated formula, denoted by S, ω\a<sup>r</sup> F ⇓ V, D. This prevents cycles when the oracle incorrectly expects the root to spill, but the root does not, and instead depends on the expected spill area. For example, B1 = SUM(B2:B3) and <sup>ω</sup> = [B1 → (3, <sup>1</sup>, -)]. The address B1 owns B2 according to ω, therefore dereferencing address B2 requires dereferencing B1, which in-turn depends on B2. If we did not remove

B1 from ω when evaluating the formula bound to B1 we would create a cycle. We remove B1 from ω so that when formula SUM(B2:B3) dereferences B2 a blank value is returned. Genuine spill cycles are detected post-dereferencing using the dependency set.

Rule (4) applies when the address has an owner and the formula bound to the owner evaluates to an array of the expected size according to ω. This rule is only defined when the intersection of the spill root's dependencies and its spill area is empty, preventing spill cycles. The pre-spill value is obtained using the conditional operator a = a<sup>r</sup> ? V : -. When the dereferenced cell is the root then the value is the root array, otherwise the value is blank. The post-spill value is obtained by indexing into the root array at the (i, j)th position.

Rule (5) applies when the address has an owner and the formula bound to the owner *does not* evaluate to an array of the expected size according to ω. In this case there is no attempt to spill as the oracle is incorrect. When the dereferenced address is the root then the pre-spill and post-spill values are obtained from the formula, otherwise the pre-spill and post-spill values are blank.

*Sheet evaluation:* S, ω ⇓ γ Sheet evaluation in the spill calculus accepts a spill oracle, but is otherwise unchanged from sheet evaluation in the core calculus. The computed grid only contains the value of addresses with a bound formula, and does not include the value of any blank cells that are in a spill area. In contrast, a spreadsheet application would display the value for all addresses, including those within a spill area. Obtaining this view can be done by dereferencing every address in the viewport using the sheet and oracle.

#### **4.4 Oracle Refinement**

We have shown how to compute a grid given a sheet and oracle, but we have not considered the accuracy of the predictions provided by the oracle. In Section 4.2 we informally describe an iterative process to refine an oracle from a computed grid; in this section we give the precise semantics of oracle refinement. Figure 6 presents the full definition of oracle refinement.

*Consistency* The relation γ |= ω states that grid γ is consistent with oracle ω. A grid is consistent if every address is consistent, written γ |=<sup>a</sup> ω. An address a is consistent in γ and ω if, and only if, the grid and oracle agree on the size of the value at address a. Consistency tells us that the oracle has correctly predicted the location and size of every spill root in the grid, and has not predicted any spurious roots.

*Refinement* The function refine(S, ω, γ) takes an inconsistent oracle and returns a new oracle that is refined using the computed grid. The function is defined as follows. First, start with subset ωok of ω that is consistent with γ. Second, collect the remaining unresolved spill roots in γ, denoted γr. Finally, recursively select the smallest address in γ<sup>r</sup> according to a total order on addresses, determining whether the root is permitted to spill and adding the permit to the accumulating

$$\begin{array}{c} \gamma \vDash \! = \! \omega \ \stackrel{\scriptstyle \mathsf{def}}{=} \forall m, n, p. \ (\omega(a) = (m, n, p)) \leftrightarrow \\ \qquad \, \exists V^{\#}, V^{!}, \mathcal{D}. \ (\gamma(a) = (V^{\#}, V^{!}, \mathcal{D}) \land \mathbf{size}(V^{\#}) = (m, n)) \\ \gamma \vDash \stackrel{\scriptstyle \mathsf{def}}{=} \forall a. \ \gamma \vDash \!= a \ \omega \end{array}$$

refine(S, ω, γ) = decide(S, ωok, γr) where

$$\begin{array}{l} \omega\_{\text{ok}} = \{ a \mapsto \langle m, n, p \rangle \in \omega \mid \gamma \vdash\_{a} \omega \} \\ \gamma\_{r} = \{ a \mapsto (V^{\#}, V^{\top}, \mathcal{D}) \in \gamma \mid \exists m, n. \text{ size}(V^{\#}) = (m, n) \text{ and } a \notin \text{dom}(\omega\_{\text{ok}}) \} \end{array}$$

decide(S, ω, []) = ω decide(S, ω, γ[<sup>a</sup> → (<sup>V</sup> #, V ! , D)]) = decide(S, ω[a → (m, n, p)], γ) where a is the least element in dom(γ) and size(V #)=(m, n) p = - if <sup>∀</sup>a<sup>t</sup> <sup>∈</sup> area(a, m, n). a <sup>=</sup> <sup>a</sup><sup>t</sup> <sup>⇒</sup> <sup>a</sup><sup>t</sup> ∈ dom(S) and owners(ω, at) = <sup>∅</sup> × otherwise

Spill iteration: ω −→<sup>S</sup> ω Final oracle: S ω final

$$\begin{array}{ccccc} \mathcal{S}, \omega \Downarrow \gamma & \gamma \not\models \omega & \mathsf{refine}(\mathcal{S}, \omega, \gamma) = \omega'\\ \hline & \omega \longrightarrow \mathcal{S} \ \omega' \end{array} \qquad\qquad\qquad \begin{array}{c} \mathcal{S}, \omega \Downarrow \gamma & \gamma \equiv \omega\\ & \mathcal{S} \vdash \omega \text{ final} \end{array}$$

Final sheet evaluation: S ⇓ γ

S ⇓ <sup>γ</sup> def = [] −→<sup>∗</sup> <sup>S</sup> ω and S ω final and S, ω ⇓ γ

oracle. A root is permitted to spill if the potential spill area is blank (excluding the root itself) and each address in the spill area has no owner, thereby preserving the single-spill policy.

*Spill iteration* The relation ω −→<sup>S</sup> ω denotes a single iteration of oracle refinement. When a computed grid is not consistent with the spill oracle that induced it, written γ |= ω, a new oracle is produced using function refine(S, ω, γ). We write −→<sup>∗</sup> <sup>S</sup> for the reflexive and transitive closure of −→<sup>S</sup> .

*Final oracle* The relation S ω final states that oracle ω is final for sheet S, and is valid when the grid induced by ω is consistent with ω.

*Final sheet evaluation* The relation S ⇓ γ denotes the evaluation of sheet S to grid γ which implicitly refines an oracle to a final state. The process starts with an empty oracle [] and iterates until a final oracle is found.

#### **4.5 Technical Results**

This section presents the main technical result of the spill calculus: that iteration of oracle refinement converges for well-behaved sheets. We begin with preliminary definitions and results.

To avoid ambiguous evaluation every spill area must be disjoint and unobstructed; an oracle is *well-formed* if it predicts non-blank spill roots, and predicts disjoint and unobstructed spill areas, defined below:

**Definition 1 (Well-formed oracle).** *We write* S ω wf *if oracle* ω *is wellformed for sheet* S*. An oracle* ω *is well-formed if for all addresses* a *the following conditions are satisfied:*


The definition of oracle refinement in Figure 6 preserves well-formedness.

**Lemma 1.** *If* S ω wf *and* S, ω ⇓ γ *then* S refine(S, ω, γ) wf*.*

Producing well-formed oracles alone is insufficient to guarantee convergence. Oracle refinement would never reach a consistent state if the predicted spill areas were incorrectly sized.

The definition of oracle refinement in Figure 6 predicts spill areas that are correctly sized with respect to the current grid.

**Lemma 2.** *If* S ω wf *and* S, ω ⇓ γ *then* γ |= refine(S, ω, γ)*.*

Predicting correctly sized spill areas is also insufficient to guarantee convergence. Oracle refinement would never reach a consistent state if it oscillates between permitting and rejecting the same root to spill. Consider the sheet:

$$\text{Let } \mathcal{S} \stackrel{\text{def}}{=} [\mathcal{A}1 \mapsto \{1; 2\}, \mathcal{B}1 \mapsto \|\mathsf{F}(\mathcal{A}2 = 2, \{3; 4\}, 0)\|\ ]$$

Spill iteration would continue indefinitely if refinement cycled between the following two well-formed and correctly sized oracles:

> [A1 → (2, <sup>1</sup>, -)] −→ [A1 → (2, <sup>1</sup>, <sup>×</sup>), B1 → (2, <sup>1</sup>, -)] −→ · · ·

To avoid oscillating spill iteration the process of oracle refinement should be *permit preserving*, defined below:

**Definition 2 (Permit preserving extension).** *We write* <sup>γ</sup> <sup>ω</sup> <sup>ω</sup> *if oracle* ω *is a permit preserving extension of* ω *in context* γ*. Defined as:*

$$\gamma \vdash \omega \lesssim \omega' \overset{\text{def}}{=} \forall a, m, n, p. \ (\gamma \vdash\_a \omega \land \omega(a) = (m, n, p)) \Rightarrow \omega'(a) = (m, n, p).$$

The definition of oracle refinement in Figure 6 is permit preserving.

**Lemma 3.** *If* S <sup>ω</sup> wf *and* <sup>S</sup>, ω ⇓ <sup>γ</sup> *then* <sup>γ</sup> <sup>ω</sup> refine(S, ω, γ)*.*

Spill iteration should be a converging iteration but this cannot be guaranteed in general; at any given step in the iteration a sheet can fail to evaluate to a grid. This can happen because the sheet contains a cell cycle, spill cycle, or diverging grid calculus term. Instead, we only expect that if the sheet is free from these divergent scenarios then spill iteration must converge. To allow us to dissect different forms of divergence and focus on spill iteration we only consider *acyclic* sheets, defined below:

**Definition 3 (Acyclic).** *A sheet* S *is acyclic if for all* ω *such that* S ω wf*, there exists some* γ *such that* S, ω ⇓ γ*.*

For instance, none of the following sheets are acyclic: [A1 → A1] has a cell cycle, [A1 → B1 : C1] has a spill cycle, and [A1 → Ω] has a formula Ω that diverges. Divergent terms are not encodable in the spill calculus but are encodable in the grid calculus, as we show in Section 6.1. An alternative approach would be to explicitly model divergence in our semantics of sheet evaluation and show that iteration converges or the sheet diverges. We choose not to pursue this approach to improve the clarity of our operational semantics, but note that our semantics can be extended to model cycles.

For any acylic sheet, spill iteration will converge to a final spill oracle.

**Theorem 1 (Convergence).** *For all acyclic* S *and* ω *such that* S ω wf*, there exists an oracle* ω *such that* ω −→<sup>∗</sup> <sup>S</sup> <sup>ω</sup> *and* S <sup>ω</sup> final*.*

*Proof.* (Sketch—see Appendix B of the extended version [21] for the full proof.) The value of any address with a binding is a function of its dependencies and the oracle prediction for that address. We inductively define an address as *fixed* if the oracle prediction is consistent for the address, and every address in the spilldependency set (defined in [21]) is fixed. Lemma 3 states that correct predictions are always preserved, therefore a fixed address remains fixed through iteration and its value remains invariant. The dependency graph of the sheet is acyclic therefore if there is a non-fixed address then there must be a non-fixed address with no dependencies but an inconsistent oracle prediction—we call this a *nonfixed source*. Lemma 2 states that every new oracle correctly predicts the size with respect to the previous grid, therefore any non-fixed sources will be fixed in the new oracle. We conclude by observing that the number of fixed addresses in the sheet strictly increases at each step, and when every address is fixed the oracle is final.

#### **4.6 Limitations and Differences with Real Systems**

Permit preservation requires that if the size of an array does not change then the permit (which may be ×) is preserved—this property is crucial for our proof of convergence.

Real spreadsheet systems such as Sheets and Excel do not guarantee permit preservation. A root a that is prevented from spilling using a permit × can later be permitted to spill, even if the size of the associated array does not change. This particular interaction arises when a root that was previously preventing a from spilling changes dimension, freeing a previously occupied spill area. Permitting roots to spill into newly freed regions of the grid is desirable from a user perspective because it reflects the visual aspect of spreadsheet programming where an array will spill into any unoccupied cells.

A limitation of our formalism, if implemented directly, is that there exist some spreadsheets that when evaluated will prevent an array from spilling, despite the potential spill area being blank. Consider the sheet:

$$\{\text{A3} \mapsto \{1, 2, 3\}, \text{C1} \mapsto \text{lF}(\text{lSERSOR}(\text{A3}), 0, \{4; 5; 6\})\}$$

When the total ordering used by oracle refinement orders A3 before C1 then the behaviour is as expected: A3 spills to the right and C1 evaluates to an error value. When the total ordering used by oracle refinement orders C1 before A3 then the behaviour appears peculiar: A3 evaluates to an error value and C1 evaluates to 0. The root A3 is prevented from spilling despite there appearing room in the grid! The issue is that the array in A3 never changes size, therefore the permit × assigned to the root is preserved, despite root C1 relinquishing the spill area on subsequent spill iterations.

The fundamental problem is one of constraint satisfaction. We would like to find a well-formed oracle that maximizes the number of roots that can spill in a deterministic manner. The total order on addresses ensures determinism but restricts the solution space. Our approach could be modified to deterministically permute the ordering until an optimal solution is found, however such a method would be prohibitively expensive.

Both Sheets and Excel find the best solution to our example sheet. We expect their implementations do not permute a total order on addresses, but implement a more efficient algorithm that runs for a bounded time. Finding a more efficient algorithm that is guaranteed to terminate remains an open challenge.

The limitation we present in our formalism only arises when a spreadsheet includes dynamic spill collisions and conditional spilling. We anticipate that this is a rare use case for spilled arrays, and does not arise when using spilled arrays to implement gridlets for live copy-paste-modify.

# **5 Grid Calculus: Spill Calculus with Sheets as Values**

In this section we present the grid calculus: a higher-order spreadsheet calculus with sheets as values. The grid calculus extends the spill calculus of Section 4.

#### **5.1 Extending Spreadsheets with Gridlets**

The gridlet concept [12] has been proposed but not implemented. Our observation is that spilling a range reference acts much like copy-paste, but lacks local modification. We propose to implement gridlets using spilled arrays, by extending the spill calculus with primitives that implement first-class grid modification.


Source range A1:C4

Gridlet invocation in A6

Revisiting the example from the introduction, there are four key interactions happening in the invocation of a gridlet.

First, select the content in the grid that is to be modified. Second, apply the selected modifications or updates. Third, calculate the grid using the modified content. Fourth and finally, project the calculated content into the grid.

Spreadsheets with spilled arrays support the final step but lack the capabilities to support the first three. We add these capabilities using four new constructs.

First-class sheet values S .

Operator GRID that evaluates to the current sheet.

Operator UPDATE that binds a formula in a sheet-value.

Operator VIEW that evaluates a given range in a sheet-value to an array.

Using these constructs we can implement gridlets, for example:

G(A1:C4, B2, 7, B3, 24) def = VIEW(UPDATE(UPDATE(GRID, B2, 7), B3, 24), A1:C4)

Formatting is a core feature of Gridlets, but we omit formatting from the grid calculus for clarity, on the basis that it would be a straightforward addition. We now describe the details of the grid calculus.

#### **5.2 Syntax and Operational Semantics**

Figure 7 presents the syntax and operational semantics for the grid calculus. The grid calculus does not require modification of existing rules; we only add formula evaluation rules for the new constructs, and evaluation relations for *views*.

*Syntax* Let x range over formula identifiers. Let F range over formulas which may additionally be identifiers x, LET(x, F1, F2) which binds the result of evaluating F<sup>1</sup> to x in F2, GRID which captures the current sheet, UPDATE(F1, a, F2) which updates a formula binding in a sheet-value, and VIEW(F, r) which extracts a dereferenced range from a sheet-value. Let V range over values which may additionally be a sheet-value S . Let V range over views; a view is a sheet with a range, denoted (S, r). A view range r delimits the addresses to be computed in sheet S.

Identifier <sup>x</sup> <sup>∈</sup> Ident Formula F ::= ···| x | LET(x, F1, F2) | GRID | UPDATE(F1, a, F2) | VIEW(F, r) Value V ::= · · · | S View V ::= (S, r)

Formula evaluation: S, ω F ⇓ V, D S, ω F<sup>1</sup> ⇓ V1, D<sup>1</sup> S, ω F2[x := V1] ⇓ V2, D<sup>2</sup> <sup>S</sup>, ω LET(x, F1, F2) ⇓ <sup>V</sup>2, <sup>D</sup><sup>1</sup> ∪ D<sup>2</sup> <sup>S</sup>, ω GRID ⇓ S, <sup>∅</sup> S, ω F<sup>1</sup> ⇓ S1, D S, ω UPDATE(F1, a, F2) ⇓ S1[a → F2], D<sup>1</sup> S, ω F ⇓ S1, D (S1, r) ⇓ V S, ω VIEW(F, r) ⇓ V, D View evaluation: V, ω ⇓ γ (S, r), ω ⇓ <sup>γ</sup> def = ∀a ∈ dom(S) ∩ area(r). S, ω a ! γ(a) Spill iteration: ω −→<sup>V</sup> ω Final oracle: V ω final (S, r), ω ⇓ γ γ |= ω refine(S, ω, γ) = ω ω −→(S,r) ω V, ω ⇓ γ γ |= ω V ω final Final view evaluation: V ⇓ V (S, r) ⇓ <sup>V</sup> def = [] −→<sup>∗</sup> (S,r) ω and (S, r) ω final and S, ω r ⇓ V, D

**Fig. 7.** Syntax and Operational Semantics for Grid Calculus (Extends Figures 3—6)

*Formula evaluation:* S, ω F ⇓ V, D A formula LET(x, F1, F2) evaluates in the standard way. A formula GRID evaluates to a sheet-value that captures the current sheet. A formula UPDATE(F1, a, F2) updates a formula binding in a sheetvalue. If evaluating formula F<sup>1</sup> produces sheet-value S<sup>1</sup> then UPDATE(F1, a, F2) evaluates to the sheet-value where a is bound to F<sup>2</sup> in S1, denoted S1[a → F2] . A formula VIEW(F, r) evaluates a sheet-value and extracts a range. If evaluating formula F produces sheet-value S<sup>1</sup> then VIEW(F, r) evaluates to the value obtained by evaluating view (S1, r). View evaluation is defined in Figure 7 and we describe the semantics at the end of the section. Here we address a subtle property of VIEW; evaluating a view (S, r) adds no dependencies to the containing formula. Dependency tracking in our semantics is used to prevent spill cycles and captures dependence between *values* of addresses: the value of a spill root should not depend on the value of an address in the spill area. In contrast, sheet-values depend on the *formula* of an address in the containing sheet, but

not the value of an address in the containing sheet. For example:

$$\text{Let } \mathcal{S} \stackrel{\text{def}}{=} [\mathcal{A}1 \mapsto \mathsf{VilEW}(\mathsf{U}\mathsf{PDATE}(\mathsf{GRID}, \mathsf{A1}, 10), \mathsf{A2}), \mathsf{A2} \mapsto \mathsf{A1}]^{\perp}$$

Sheet S evaluates to grid [A1 → 10, A2 → 10]. What are the dependencies of each address? The value of A2 in the grid depends on the value of A1 in the grid. In contrast, the value of A1 in the grid does not depend on the value of A2 in the grid. This is because evaluating the formula in A1 constructs a private grid from which the value of A2 is obtained. However, A1 does depend on the formula of A2 in the containing grid. Our semantics only considers value dependence, therefore the dependency set of A1 is ∅—the address has no dependence on values in the containing grid.

Formula dependence is vital for efficient recalculation, though we do not model that in our semantics and only use dependency tracking to prevent spill cycles. If an address depends on the value of another address bound in a sheet, then it also depends on the formula of that address. The converse is not true in the presence of sheet-values.

*View evaluation:* V, ω ⇓ γ Evaluation of view (S, r) with oracle ω is defined in a similar manner as evaluation of sheets, however the induced grid γ is limited to the sheet bindings that intersect the range r. There are two key consequences that arise from limiting the induced grid. First, we only evaluate the bindings in S required to evaluate the bindings in r. Second, only roots that are within range r are permitted to spill; any root that is outside r remains as an address containing a collapsed array. There is a difference between an address that holds a collapsed array and a root that is prevented from spilling an array by permit ×. The former has a pre-spill and post-spill value that is an array; the latter has a pre-spill value that is an array and a post-spill value that is an error.

*Spill iteration:* ω −→<sup>V</sup> ω The definition of spill iteration for views is the same as spill iteration for sheets, except that we use view evaluation rather than sheet evaluation.

*Final oracle:* V ω final The definition of a final oracle for views is the same as a final oracle for sheets, except that we use view evaluation rather than sheet evaluation.

*Final view evaluation:* V ⇓ V Evaluating a view (S, r) computes a final oracle for the view and then evaluates range r in the context of sheet S. Final view evaluation will evaluate range r, rather than extracting values from an induced grid, because viewing a range should sample all values in the range—including blank cells. If we extract values from the induced grid we can only obtain the values for addresses with a binding in r.

#### **5.3 Formulas for Gridlets**

We can encode the G operator using primitives from the grid calculus.

$$\begin{aligned} \left[ \mathsf{G}(r, a\_1, V\_1, \ldots, a\_n, V\_n) \right] &= \mathsf{VLEV}(\left[ (a\_1, V\_1, \ldots, a\_n, V\_n) \right], r) \\ \left[ \left( a\_1, V\_1 \right) \right] &= \mathsf{UPDATE}(\mathsf{GRID}, a\_1, V\_1) \\ \left[ \left( a\_1, V\_1, \ldots, a\_{n+1}, V\_{n+1} \right) \right] &= \mathsf{UPDATE}(\left[ \left( a\_1, V\_1, \ldots, a\_n, V\_n \right) \right], a\_{n+1}, V\_{n+1}) \end{aligned}$$

The G operator translates to the VIEW operator, and any bindings translate to a sequence of UPDATE operations. The initial sheet-value is obtained from the context using the GRID operator.

The translation illustrates that G is not higher-order because every application returns the value obtained by evaluating a view on a sheet-value. A language that only provides G does not permit sheet-values to escape and be manipulated by formulas. This is acceptable when emulating copy-paste because a copy is always taken with respect to the top-level sheet, however this does limit the usefulness of G as an implementation construct. This limitation motivates the design of the grid calculus; as we show in the next section, the grid calculus is capable of encoding other language features.

# **6 Encoding Objects, Lambdas, and Functions**

In this section we give three encodings that target the grid calculus: objects, lambdas, and sheet-defined functions.

#### **6.1 Encoding the Abadi and Cardelli Object Calculus**

We introduce the grid calculus to implement gridlets and the concept of live copy-paste. Perhaps surprisingly, the grid calculus can encode object-oriented programming, in particular the untyped object calculus of Abadi and Cardelli [1]. Their calculus is a tiny object-based programming language, akin to a prototypebased language such as Self [6], but capable of representing class-based objectoriented programming via encodings.

We draw a precise analogy between spreadsheets and objects. A sheet is like an object. A cell is like a method name. A formula in a cell is like a method implementation. The GRID operator is like the this keyword. Formula update is like method update.

We assume an isomorphism between method names and cell addresses a and use in both the object calculus and grid calculus. We define the translation of object calculus terms to grid calculus formulas, denoted [[b]], as follows:

$$\begin{aligned} [x] &= x \\ [[\ell\_i = \varsigma(x\_i)b\_i^{\ i \in 0..n}]] &= \langle [\ell\_i \mapsto [\varsigma(x\_i)b\_i]^{i \in 0..n}] \rangle \\ [b.\ell] &= \mathsf{ValW}([b], \ell) \\ [b\_1.\ell \gets \varsigma(x)b\_2] &= \mathsf{UPDATE}([b\_1], \ell, [\varsigma(x)b\_2]) \\ [\varsigma(x)b] &= \mathsf{LPT}(x, \mathsf{GRID}, [b]) \end{aligned}$$

The translation makes our analogy concrete. We use the LET formula to lexically capture *self* identifiers. The grid calculus allows the construction of diverging formulas, as discussed in Section 4.5. We demonstrate this using a diverging object calculus term.

$$\Omega = \mathbb{I}[\mathbf{A}1 = \varsigma(x)x.\mathbf{A}1].\mathbf{A}1\mathbf{I} \\ \mathbb{I} = \mathsf{VllEW}(\langle[\![\mathbf{A}1 \mapsto \mathsf{LET}(x, \mathsf{GRID}, \mathsf{VlEM}(x, \mathsf{A}1))]\rangle, \mathsf{A}1))$$

The operational semantics are preserved by the translation. We assume a bigstep relation for object calculus terms, denoted b ⇓ o. The proof is in Appendix C of the extended version [21].

**Theorem 2.** *If* <sup>b</sup> *is a closed and* <sup>b</sup> ⇓ <sup>o</sup> *then* [], [] [[b]] ⇓ [[o]], <sup>∅</sup>*.*

#### **6.2 Encoding the Lambda Calculus**

We give an encoding of the lambda calculus that is inspired by the object calculus embedding of the lambda calculus. We use ARG1 to hold the argument and VAL1 to hold the result of a lambda. In spreadsheet languages both ARG1 and VAL1 are legal cell addresses; for example, address ARG1 denotes the cell at column 1151 and row 1.

$$\begin{aligned} [x] &= x \\ [\lambda x.M] &= \mathsf{UPDATE}(\mathsf{GRID}, \mathsf{VAL1}, \mathsf{LET}(x, \mathsf{ValEW}(\mathsf{GRID}, \mathsf{ARG1}), [M])) \\ [M.N] &= \mathsf{VLW}(\mathsf{UPDATE}([M], \mathsf{ARG1}, [N]), \mathsf{VAL1}) \end{aligned}$$

#### **6.3 Encoding Sheet-Defined Functions**

A sheet-defined function [14, 17, 19, 20] is a mechanism for a user to author a function using a region of a spreadsheet. We can model a sheet-defined function f as a triple (S,(a0,...,an), r) that consists of the moat or sheet-bindings for the function, the addresses from the moat that denote arguments, and the range from the moat that denotes the result. The application f(V0,...,Vn) can be encoded in the grid calculus as follows, where f = (S,(a0,...,an), r):

$$\begin{aligned} \left[ f(V\_0, \dots, V\_n) \right] &= \mathsf{ValEW}(\left[ \left( V\_0, \dots, V\_n \right) \right], r) \\ \left[ \left( \right) \right] &= \left\langle \mathcal{S} \right\rangle \\ \left[ \left( V\_0, \dots, V\_{n'+1} \right) \right] &= \mathsf{UDATE}(\left[ \left( V\_0, \dots, V\_{n'} \right) \right], a\_{n'+1}, V\_{n'+1}) \end{aligned}$$

# **7 Related Work**

*Formal Semantics of Spreadsheets.* Our core calculus is similar to previous formalisms for spreadsheets, Several previous works [3, 7, 14, 19] offer formal semantics for spreadsheet fragments. Mokhov et al. [16] capture the logic of recalculating dependent cells. Finally, Bock et al. [4] provide a cost semantics for evaluation of spreadsheet formulas.

*Spilling.* Major spreadsheet implementations like Sheets <sup>6</sup> and Excel <sup>7</sup> implement spilled arrays [11], but do not document details of the implementation. In [17], authors propose a spilling-like mechanism that allows matrix values in cells to spread across a predefined range—this is closely related to *"Ctrl+Shift+Enter" formulas* <sup>8</sup> in Excel. The proposal in [17] is significantly simpler than spilled arrays because the dimension of the spilled area is fixed and declared ahead of time. Sarkar et al. [18] note that spilled arrays violate Kay's *value principle* [13] because a user is unable to edit constituent cells, except for the spill root.

*Extending the Spreadsheet Paradigm.* Clack and Braine [8] propose a spreadsheet based on a combination of functional and object-oriented programming. Their integration is different from our analogy: in their system, a class is a collection of parameterised worksheets, and a parameterised worksheet corresponds to a method. In gridlets, the grid corresponds to an object and cells on the grid correspond to methods of the object.

*Similarity Inheritance in Forms/3.* Forms/3 [5] is a visual programming language that borrows the key concept of cell from spreadsheets. Instead of a tabular sheet, cells in Forms/3 are arranged on a *form*: a canvas with no structure. Forms/3 explored an abstraction model called "similarity inheritance" through which a form may borrow cells from another form and optionally modify attributes of certain cells. This resembles substitution in gridlets, however reusing a portion of the tabular grid and spilling into adjacent cells are primary to gridlets, whereas such notions are absent from Forms/3.

*Sheet-defined Functions.* Sheet-defined functions [17] (SDFs) allow the user to reuse logic defined using formulas in the grid. The user nominates input cells, an output cell, and gives the function a name. When the function is called, a virtual copy of the workbook is instantiated. Arguments to the function are placed in the input cells, the virtual workbook is calculated, and the result from the output cell is returned.

Elastic SDFs [14] generalize SDFs to handle input arrays of arbitrary size. In [4], the authors provide a precise semantics for SDFs, closures and array formulas, but not for spilling. Gridlets are more general than SDFs as each Gridlet invocation can have a unique set of local substitutions, whereas all calls to an SDF share the same arguments, giving greater flexibility to the user.

*Error prevention and Error detection.* Abraham and Erwig propose type systems for error detection [3] and automatic model inference [2]. Abraham and Erwig [3] provide an operational semantics for sheets that is similar to the core calculus in Section 3, but they do not give a semantics for spilled arrays.

Gencel [10] is a typed "template language" that describes the layout of a desired worksheet along with a set of customized update operations that are specific

<sup>6</sup> https://support.google.com/docs/answer/6208276?hl=en

<sup>7</sup> https://aka.ms/excel-dynamic-arrays

<sup>8</sup> https://aka.ms/excel-cse-formulas

to the particular template. The type system guarantees that the restricted set of update operations keeps the desired worksheet free from omission, reference, and type errors.

Cheng and Rival [7] use abstract interpretation to detect formula errors due to mismatch in type. Their technique also incorporates analysis of associated programs, such as VBA scripts, along with formulas on the grid.

# **8 Conclusion**

Repetition is common in programming—spreadsheets are no different. The distinguishing property of spreadsheets is that reuse includes formatting and layout, and is not limited to formula logic. Gridlets [12] are a high-level re-use abstraction for spreadsheets. In this work we give the first semantics of gridlets as a formula. Our approach comes in two stages.

First, we make sense of spilled arrays, a feature that is available in major spreadsheet implementations but not previously formalised. The concept is simple and belies the many subtleties involved in implementing spilled arrays. We present the spill calculus as a concise description of spilling in spreadsheets.

Second, we extend the spill calculus with the tools to implement gridlets. The grid calculus introduces the concept of first-class sheet values, and describes the semantics of three higher-order operators that emulate *copy-paste-modify*. The composition of these operators gives the semantics for gridlet operator G.

Spreadsheet programming bears a resemblance to object-oriented programming, alluded to often in the literature. We show that the resemblance runs deep by giving an encoding of the object calculus into the grid calculus, with a direct parallel between objects and sheets.

#### **Acknowledgements**

Thank you to the Microsoft Excel team for hosting the second author during his research internship at Microsoft's Redmond campus. Thank you to Tony Hoare, Simon Peyton Jones, Ben Zorn, and members of the Microsoft Excel team for their feedback and assistance with this work.

> Cambridge, UK, and Davis, California, USA Spreadsheet Day, October 17, 2019

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Abate, Carmine 1 Adams, Michael D. 197 Ahman, Danel 29 Almeida, Bernardo 715 Appel, Andrew W. 428 Armstrong, Alasdair 626

Barthe, Gilles 56 Batty, Mark 599 Bauer, Andrej 29 Birkedal, Lars 336 Blanco, Roberto 1 Bohrer, Brandon 84 Bono, Viviana 169

Casal, Filipe 715 Chatterjee, Krishnendu 112 Ciobâcă, Ștefan 1 Cooksey, Simon 599 Costea, Andreea 141 Crubillé, Raphaëlle 56

D'Antoni, Loris 572 Dagnino, Francesco 169 Dezani-Ciancaglini, Mariangiola 169 Durier, Adrien 1

Emmi, Michael 280 Enea, Constantin 280

Flur, Shaked 626

Garg, Deepak 1 Gavazzo, Francesco 56 Germane, Kimball 197 Goharshady, Amir Kafshdar 112 Gordon, Andrew D. 743 Gregersen, Simon Oddershede 336

Hage, Jurriaan 656 Hajdu, Ákos 224 Honoré, Wolf 428

Hrițcu, Cătălin 1 Hu, Qinheping 572

Ibsen-Jensen, Rasmus 112 Igarashi, Atsushi 684

Joharizadeh, Nima 743 Jongmans, Sung-Shik 251 Jovanović, Dejan 224, 280

Kobayashi, Naoki 484, 684 Krishna, Siddharth 280, 308 Krogh-Jespersen, Morten 336

Lago, Ugo Dal 56 Laurel, Jacob 366 Löding, Christof 515

Madhusudan, P. 515 Mamouras, Konstantinos 394 Mansky, William 428 Maranget, Luc 626 Matsuda, Kazutaka 456 Matsushita, Yusuke 484 Misailovic, Sasa 366 Mordido, Andreia 715 Murali, Adithya 515

Nair, Sreeja S. 544

Ohlenbusch, Marit Edna 336 Owens, Scott 599

Pan, Rong 572 Paradis, Anouk 599 Patrignani, Marco 1 Paviotti, Marco 599 Pavlogiannis, Andreas 112 Peña, Lucas 515 Petri, Gustavo 544 Pichon-Pharabod, Jean 626 Platzer, André 84

Polikarpova, Nadia 141 Pulte, Christopher 626 Sarkar, Advait 743 Sergey, Ilya 141 Sewell, Peter 626 Shapiro, Marc 544 Simner, Ben 626 Singh, Rishabh 572 Siqi, Ren 684 Suenaga, Kohei 684 Summers, Alexander J. 308

Tanter, Éric 1 Thibault, Jérémy 1 Thorand, Fabian 656 Timany, Amin 336 Toman, John 684 Tsukada, Takeshi 484 Vasconcelos, Vasco T. 715 Wies, Thomas 308 Williams, Jack 743 Wright, Daniel 599 Yoshida, Nobuko 251 Zhu, Amy 141 Zucca, Elena 169